You should appreciate the infrequency of my blog posts

fish_5April2015_transparentToday’s illustration doesn’t have anything to do with the topic below. I made it for a ten minute talk I’ll give tomorrow, at the local “Physics Slam.” You can see the program here. Short version: Six physics faculty will have ten minutes each to explain something. The audience votes on their favorite presentation. Apparently, when it was done last a few years ago, several hundred people came. We’ll see what happens this time! My title:

Why do bacteria care about physics?

At some point, I should practice…

Now on to today’s topic:

Everyone agrees that it’s impossible to keep up with the ever-expanding scientific literature. An interesting recent paper* takes a look at this phenomenon, verifying that the number of papers published every year is, indeed, growing exponentially:

* “Attention decay in science,” Pietro Della Briotta Parolo et al., http://arxiv.org/pdf/1503.01881v1.pdf

Parolo Fig 5The authors look at what this means for scientific “memory.” In general, the rate at which a paper is cited by later papers decays over time (after an initial peak), as it is forgotten or as it is gives rise to other works that are cited instead. One might guess that a growth in publication rate might correlate with a larger decay rate for citations — we spend less time with the past as we’re swamped by new stuff. This is indeed what Parolo et al. find: a decay rate that has steadily grown over decades. This is unfortunate: by not considering papers of the more distant past we risk needlessly re-discovering insights, and we disconnect ourselves from our fields’ pioneering perspectives.

Returning to the overall number of papers: I wonder if this terrifying growth is driven primarily by an increase in the number of scientists or by an increase in papers written per person. I suspect the former. Even within the US, there are a lot more scientists than there used to be [e.g. this graph]. In the developing world this increase is far more dramatic (see e.g. here), as (presumably) it should be.

Unfortunately, I can’t find any data on the total number of scientists worldwide — at least not with just a few minutes of searching — or even the total number of Ph.D.’s awarded each year.

Looking around for any data that might help illuminate trends of population and paper production, I stumbled upon historical data for the American Physical Society (APS), namely the number of members in each year, since 1905 (http://www.aps.org/membership/statistics/upload/historical-counts-14.pdf). It’s not hard to tabulate the total number of papers published each year in the Physical Review journals — the publications of the APS. Looking at how each of these change with time might give a rough sense of whether one tracks the other. Of course, there are a lot of problems with interpreting any correlation between these two things: APS members (like me) publish in all sorts of journals, not just APS ones; non-APS members publish in APS journals; etc. Still, let’s see what these two look like:

APS_membership_and_papersJust considering APS journals alone, the number of papers published each year is 10 times what it was a few decades ago! Within the microcosm of APS, the number of papers being published has been growing at a far faster rate than the membership.

What does all this mean? I don’t really know. It’s impossible to do something about the general complaint that there are too many papers to read unless we have some deeper understanding of why we’re in this state. Lacking that, I suppose we’re just stuck reading papers as best we can, or feeling guilty for not reading…

T-minus 9 days for my graduate biophysics course

urchin_Feb2015_transparentNext term, I’ll be teaching a brand-new graduate biophysics course. (It’s the first time teaching a graduate course in my eight years as a professor!) I’ve spent quite a while thinking of what should be in it and how the course should be structured. Here, I’ll just note my list of topics (below, with a few comments), and provide a link to the syllabus (here). Hopefully in weeks to come I’ll comment on how the course is going.

Topics

Introduction; Physics, statistics, and sight

What are the fundamental limits on vision, and how close does biology come to reaching them? (A brief look.)

Components of biological systems

What are the components of biological systems? What are the length, time, and energy scales that we’ll care about? How can we organize a large list of “parts?”

Probability and heredity (a quick look)

We’ll review concepts in probability and statistical mechanics. We’ll discuss a classic example of how a quantitative understanding of probability revealed how inheritance and mutation are related.

Random Walks

We can make sense of a remarkable array of biophysical processes, from the diffusion of molecules to the swimming strategies of bacteria to the conformations of biomolecules, by understanding the properties of random walks.

Life at Low Reynolds Number

We’ll figure out why bacteria swim, and why they don’t swim like whales.

Entropy, Energy, and Electrostatics

We’ll see how entropy governs electrostatics in water, the “melting” of DNA, phase transitions in membranes, and more.

Mechanics in the Cell

We’ll look more at the mechanical properties of DNA, membranes, and other cellular components, and also learn how we can measure them.

Circuits in the Cell

Cells sense their environment and perform computations using the data they collect. How can cells build switches, memory elements, and oscillators? What physical principles govern these circuits?

Multicellular organization and pattern formation

How does a collections of cells, in a developing embryo, for example, organize itself into a robust three-dimensional structure? We’re beginning to understand how multicellular organisms harness small-scale physical processes, such as diffusion, and large-scale processes, such as folding and buckling, to generate form. We’ll take a brief look at this.

Cool things everyone should be aware of

We live in an age in which we can shine a laser at particular neurons in a live animal to stimulate it, paste genes into a wide array of organisms, and sequence a genome given only a single cell. It would be tragic to be ignorant of these sorts of almost magical things, and they contain some nice physics as well!

Comments

As you’ve probably concluded, this is too much for a ten-week course! I will cull things as we go along, based on student input. I definitely want to spend some time on biological circuits, though, which I’m increasingly interested in. I also want to dip into the final topic of “cool things” — I find it remarkable and sad that so many physicists are unaware of fantastic developments like optogenetics, CRISPR, and high-throughput sequencing. Students: prepare to be amazed.

My sea urchin illustration above has nothing to do with the course, but if you’d like a puzzle: figure out what’s severely wrong with this picture.

Mini-Geo-Engineering

IMG_1442I’m at a conference at Biosphere 2, the large ecological research facility in the Arizona desert that was originally launched as an attempt at creating a sealed, self-contained ecosystem.

It’s a surreal place — a collection of glass pyramids and domes housing miniature rain forests, deserts, an “ocean,” and a few other biomes — that’s now used for more “normal” research and education. I’m here not to join some a futuristic commune (at least not yet), but rather as a participant in a fascinating conference organized by Research Corporation called “Molecules Come to Life” — basically, it’s getting a lot of people who are interested in complex living systems together to discuss big questions, think of new research directions, and launch new projects. It’s a fascinating and very impressive group that’s here. Interestingly, a huge fraction are physicists, either physicists in physics departments (like me) or people trained as physicists who are now in systems biology, bioengineering, microbiology, etc., departments.

Do the conference topic and the venue have anything to do with one another? Explicitly, no. But in an indirect sense, both touch on issues of scale. A key issue in the study of all sorts of complex systems is how to relate phenomena across different extents of space and time. How can we connect the properties of molecules to the operation of a biological circuit? A circuit to a cell? A cell to an organism? Are there general principles — like those that tie the individually chaotic behaviors of atoms in a gas into robust many-particle properties like pressure and density — that lead to a deeper understanding? Would a piece of a complex system have the same behavior as the whole, or are collective properties scale-dependent?

The initial goal with Biosphere 2 was that these small-scale ecosystems under glass could function sustainably. This failed quite badly (at least at first — see Wikipedia for more details). As we learned on an excellent tour this afternoon, nearly all animals in the enclosure died, the food grown was so minimal that everyone was hungry all the time, and oxygen levels dropped from about 20% to 14% (at which point oxygen had to be pumped in). Walking around, the issue that kept coming to mind was: what is the scale of an ecosystem? Biosphere 2 is really not very big — it’s a few football fields in total area. Are the webs of interaction that can exist in an area this size sufficient to mimic a “real” rainforest, savannah, or other environment? Are they large enough to be stable, and not fluctuate wildly?

Perhaps these questions couldn’t have been answered without building the structure and trying the experiment. (Or perhaps they could.) It would be great to talk to the people behind the project — they were commune dwellers, not scientists — and see what thoughts, assessments, dreams, and predictions went into the planning of this impressive, but odd, place.

Some more photos:

IMG_1445IMG_1448IMG_1454IMG_1455IMG_1458

What have I got in my pocket?

What makes a good exam question? Not surprisingly, I try to write exams that most students who are keeping up with the course should do well on — almost by definition, the exam should be evaluating what I’m teaching. But I also want the exam to reveal and assess different levels of understanding; it would be useless to have an exam that everyone aced, or that everyone failed. Also not surprisingly, I’m not perfect at coming up with questions that achieve these aims. For years, however, I’ve been using the data from the exam scores themselves to tell me about the exam. Here’s an illustration:

I recently gave a midterm exam in my Physics of Energy and the Environment course. It consisted of 26 multiple choice questions and 8 short answer questions. For the multiple choice questions, I can calculate (i) the fraction of students who got a question correct, and (ii) the correlation between student scores on that question and scores on the exam as a whole. The first number tells us how easy or hard the question is, and the second tells us how well the question discriminates among different levels of understanding. (It also tells us whether the question is assessing the same things that the exam as a whole is aiming for, roughly speaking.) These are both standard things to look at, and I’ll note for completeness there’s lots of literature I tend not to read and can’t adequately cite about the mechanics of testing.

Here’s the graph of correlation coefficient vs. fraction correct for each of the multiple choice questions from my exam:

miderm correlations

We notice first of all a nice spread: there are questions in the lower right that lots of people get right. These don’t really help distinguish between students, but they probably make everyone feel better! The upper left shows questions that are more difficult, and that correlate strongly with overall performance. In the lower left are my mistakes (questions 6 and 15): questions that are difficult and that don’t correlate with overall performance. These might be unclear or irrelevant questions. Of course I didn’t intend them to be like this, and now after the fact I can discard them from my overall scoring. (Which, in fact, I do.)

I can also include the short answer questions, now plotting mean score rather than fraction correct (since the scoring isn’t binary for these). We see similar things — in general the correlation coefficients are higher, as we’d expect, since these short answer questions give more insights into how students are thinking.

all correlations

It’s fascinating, I think, to plot and ponder these data, and it has an important goal of assessing whether my exam is really doing what I want. I’m rather happy to note that only a few of my questions fall into the lower-left-corner of mediocrity. I was spurred to post this because we’re doing a somewhat similar exercise with my department’s Ph.D. qualifier exam. One might think, given the enormous effect of such an exam on students’ lives, and the fact that a building full of quantitative scientists create it, that (i) we routinely analyze the exam’s properties, and (ii) it passes any metrics of quality one could think of. Sadly, neither is the case. Only recently, thanks to a diligent colleague, do we have a similar analysis of response accuracy and question discrimination. Frighteningly, we have given exams in which a remarkable fraction of questions are poor discriminators, correlating weakly or even negatively with overall performance! I am cautiously optimistic that we will do something about this. Of course, it is very difficult to write good questions. However: rather than telling ourselves we can do it flawlessly, we should let the results inform the process.

Modeling Life (a freshman seminar) — Part 2

fig, abstract, watercolor, transparentIn Part 1, I described the motivations behind a “Freshman Interest Group” (FIG) seminar I taught last term, called “Modeling Life,” that explored how contemporary science can make sense of biology by way of physical and computational models. I also wrote about several of the topics explored in the class. Here, I’ll describe some of the assignments and projects, along with thoughts on whether the course succeeded in its aims, and whether I’ll teach it again.

Assignments

Since the course was only a one-credit, one hour per week seminar, and was focused on awareness of what can and can’t be done with models rather than actually conveying skills in modeling, I kept the assignments minimal. Many weeks involved just writing a paragraph or two. For example, following the first class’ discussion of a paper modeling waves of jostling penguins (see Part 1), students had to “Think of at least one other system besides penguins (biological or not) that would be amenable to this sort of modeling of interactions, and describe what ingredients or rules you’d put into a model of it.” Students proposed various systems of interacting agents, nearly all involving animals, people, or cars. This led to a nice discussion of, for example, the field of traffic modeling, and to Itai Cohen’s group’s simulations of “Collective dynamics in mosh pits.”

All FIGs are supposed to do something with the library, and so I came up with an assignment I’m quite fond of that explored the “demographics” of article authorships. The students picked one of two papers that we had mentioned in class:

and then looked “forward” and “backwards” at some subset of its citations (e.g. via Web of Knowledge) and its references. The students picked at least two characteristics like:

  • What departments the authors are from;
  • What countries the authors are from;
  • Whether the papers are about experiments, computation, or both (just determined from the abstract)

and described what they found about the collection of studies linked to the chosen article. (An extended version of this assignment was an option for the final project for the class.) Even more than I expected, students were surprised and interested to find things like the wide array of departments represented by the authors (biology, physics, computer science, various forms of engineering); the number of countries represented (with the very large US fraction being even larger among references than citations); and more. We spent a while discussing authorship — most students have a nineteenth-century notion of lone scientists writing single-author papers — and how numbers of people in research groups varies between fields. I of course showed an example from high-energy physics; this one has over three hundred authors, which is fairly typical:

Screen Shot 2015-02-08 at 3.15.46 PMThe full first page:

Screen Shot 2015-02-08 at 3.16.16 PM

Final project

For a final project, students had a choice of either an expanded version of the ‘follow the literature’ assignment described above, or they could write simple computer programs that illustrated biased random walks (as in bacterial chemotaxis) or logistic growth (chaotic population dynamics). They could work in groups. About 2/3 chose the programming exercises. All of these went well — better than I expected in terms of both students’ interest in the project and their success in implementing them. (The students made use of the simple programming methods they were learning in the computer science class — I cringed to watch graphs being made by having a “turtle graphics” cursor trace out paths, and had flashbacks to seventh grade.)

Overall assessments

Did the course succeed? In some ways: yes. Students seemed very interested in the topics we explored, and most weeks we had quite good discussions. And it certainly was the case that the things we learned about were, to the students, completely new and far outside the scope of standard things they had previously encountered. If this were a “normal” course, I’d call it a success based on the level of engagement and interest we achieved. However, it was not a normal course, and there were three issues with it that dampen my enthusiasm for repeating it.

First, since I taught this concurrently with my Physics of Life course, a typical large, four-credit class, it added to my workload. Of course, I knew this going in. But, because I have far, far more things to do every week than there are hours in which to do them, I should really be subtracting from, rather than adding to, my list.

Second, a goal of the FIGs in general is that they’re social as well as academic experiences, and it’s apparent that I have neither the time nor the inclination to be very social. The high point of this aspect of the course was during the first few weeks, when I made sure to have coffee or lunch with all the students, in groups of 1-5. This was fun, and it was interesting to get some insights into their very different backgrounds, levels of comfort with the university, and experiences. Especially with respect to programming, the students ranged from ones who had never programmed anything prior to their concurrent computer science course to one who had held a job as a programmer. Aside from these chats, I did one social activity outside of class, a very short hike up Skinner Butte. (I had hoped for Spencer Butte, about an hour to a rocky summit with beautiful views, but the logistics of transportation foiled us.) A few students came, along with my kids; it was a nice walk on a sunny Sunday afternoon.

Third, the demographics of the FIG weren’t really what I was aiming for. The FIG connects my Physics of Life course with the introductory computer science class; students in the FIG are enrolled in both these courses. The intended audience of the Physics of Life class is non-science-major undergraduates. Introductory computer classes, at UO and elsewhere, are attracting sharply increasing numbers of students (see here) with a very wide range of interests. Therefore I was hoping for the same diverse assortment of students in the FIG — people interested in majoring in history, or political science, or art, etc. Instead, eighteen out of twenty in the course were intended computer science majors! They were a great bunch, but they were not my target in terms of general education. One could argue that these students are precisely those who we should be introducing to quantitative biology, since the field very much needs them. I would agree with this, and if I were part of a quantitative biology program I might agree that this is part of my job. But I’m not.

Overall, I don’t intend to teach the seminar again in the near future, though I could imagine happily revisiting it again someday. In case anyone plans similar courses, hopefully the thoughts noted here are of some use — feel free to email me for more details. The topic of mathematical and physical modeling of biological systems is fascinating, and it is certainly one that more students, especially early in their undergraduate careers, should be exposed to!

Notice how I’m transferring knowledge, for free

I’ll finish a “real” post soon: Part 2 of my recap of a freshman seminar course on models. (I made the painting for it already!) But, since I’ve written about funding issues in science before (e.g. here), I can’t resist a small post on a proposed new NIH funding program.

There is, of course, a lot of concern about low levels of grant funding, overpopulation of scientists, etc. The last thing one would expect to read is a serious proposal, by the NIH, that it should fund more “emeritus” investigators (i.e. very senior people). But, here it is. The idea is that the program would help these researchers “transition out of a position that relies on funding from NIH research grants” and “facilitate the transfer of their work, knowledge and resources to junior colleagues.” I had to check my calendar to see if April 1 had come up without my noticing. I could point out that “transitioning” is easy to accomplish by not applying for grants, or by collaborating with other researchers. I could also point out that transferring knowledge is what one should be doing already, as a part of a university. But all these points and more are well made in scores of scathing comments at the NIH site. Even better is a brilliant takedown at the “Complex Roots” blog — I highly recommend it.

Modeling Life (a freshman seminar) — Part 1

Fig (watercolor) Last term I taught a small freshman seminar called “Modeling Life,” on ways of looking at biology through the lens of physical and computational models. It was part of the university’s “Freshman Interest Group” (FIG) program, in which one creates small seminars that connect two regular courses that each student in the FIG takes. This was the first time I’ve taught a FIG, and I proposed linking my Physics of Life course with the introduction to programming course in the computer science department. Both are “100-level” general education courses, intended to reach a wide range of students. The FIG was an interesting experiment, and since I haven’t documented it anywhere else, I’ll write a little bit about it here. Though I don’t think my approach was great, there is definitely a need for more venues that expose students, at an early stage, to the intersection of biology, physics, and computation, and that convey what concepts like “modeling” mean, so perhaps some of the material I put here might be of use to some future course somewhere. Motivations. The study of life is being revolutionized by the study of information. Thanks to DNA sequencing and other high-throughput ways of identifying biomolecules, we have troves of data on genomes, gene activity, protein interactions, and more. Thanks to advances in imaging, sensing, and tracking, we can acquire huge amounts of information about form and structure. In itself, all this data doesn’t provide any insights into how living things work. For this, we need models — simplified representations of reality that highlight key features, as well as tools that let us navigate through data. The goal of this course was to reveal the existence of these broad themes in contemporary science. It was a non-technical course, focused more on this landscape than on particular paths through it. I tried to use lots of examples related to bacteria, because of their importance, because of their utility in illustrating biophysical and “systems biological” concepts, and because of my own interests in bacterial communities (see also here).

Topics

“Dry” biology. The term began with some readings about the gut microbiota, as well as a short piece from Science on “Biology’s Dry Future” [1], describing how, via new computational methods, “researchers are making fundamental discoveries without ever filling a pipette, staining a cell or dissecting an animal.” The existence of this mode of research was completely alien to for all the students, disjoint from any conception of biology that they had gotten in high school, and they were surprised and excited by it. The Physics of Penguins. To introduce the concept of models, we discussed a paper, “The origin of traveling waves in an emperor penguin huddle” [2], in which a group of physicists explain how waves of jostling motions propagate through a group of penguins. The paper illustrates the idea of creating simple, tractable models that capture the essence of a phenomena, and helps set up the idea of agent-based simulations. I’m not actually very fond of the penguin paper: in the twenty-first century no one should be at all surprised that collective phenomena like waves can emerge from simple objects with nearest-neighbor interactions, and we shouldn’t need to run a computer simulation to realize this. Originally, I planned to discuss this and other criticisms, but in the end abandoned this in the interests of time and simplicity. Analytic and Numerical Approaches (and growth and disease). Thinking about agent-based models led us to models for diseases and epidemics and, more broadly, the distinction between analytic models and numerical simulations. We discussed the advantages and disadvantages to writing a simulation, and the “art” of figuring out what problems are amenable to analytic solution and what require brute force. I started with a simple example: figuring out the average of random numbers uniformly distributed between -10 and 10, which one could determine by simulation but which, we all realize, is trivial to figure out just by thinking. We then moved on to examples in which it’s less obvious, that there are “clever” exact solutions, for example logistic growth: a population with some growth rate dependent on the present value of the population (giving exponential growth) and on some sort of constraint on the overall carrying capacity of the environment (giving a stable ceiling to the population). It’s easy to see how to simulate this; it’s not readily apparent that there’s a nice analytic solution to the population as a function of time. This also let me discuss my lab’s research on microbial growth, which we returned to several times. Noisy gene expression. The theme of simulating versus exactly knowing what form a model takes led to a discussion of a few pieces of a beautiful paper from Ido Golding & colleagues, “Real-Time Kinetics of Gene Activity in Individual Bacteria” [3], which illustrates both approaches. We talked about genes, and asked what gene expression ‘looks like’ at the level of the actual, physical, molecules involved. How can the same genes lead to different outcomes, and how can we think about randomness and predictability in systems of genes? These questions could easily fill a whole term; we were very superficial. Still, the discussion achieved its aim of conveying that there’s a far greater depth to how genes act than is even hinted at in cartoons from introductory biology books, and that quantitative ideas about physical processes are helping us explore these areas. Noisy gene expression. It is fascinating that, in an age when genes and DNA sequencing are referred to everywhere, most people have no idea of the computational challenges involved in figuring anything out from sequence-based data. To illustrate this, I sketched the basic idea behind a neat approach by Curtis Huttenhower and colleagues [4] to infer the genes contained in microbial communities given sparse information about what species are present. First, we discussed an analogy: suppose you knew how to say “seven” in several Indo-European languages, but not in Italian. How would you try to predict what ‘seven’ in Italian would be? This was fun to discuss, and again, the existence of algorithmic challenges like this in biology was a complete surprise to everyone in the class. Image analysis. Another key role for computation is in getting data in the first place (often very large amounts of data). This is something my lab deals with a lot, in the context of microscopy and imaging. We took everyone to my lab, and looked with our microscopes at bacteria swimming, and ogled arrays of mirrors and lasers. We also discussed some basic themes of image analysis, starting by asking “what is a digital image?” (A few people could give a decent answer; most could not.) Given that an image is an array of numbers, how, for example, can we identify objects like cells? Throughout several of the topics, we visited and revisited questions like: What is a model, and What is modeling good for? I’ll leave it to the reader to supply answers.

Next time…

In this post, I haven’t said anything about who the students were, what projects and assignments we had, how the course went, and whether I’ll teach it again. I’ve already spent far too much time writing, though, so all this will have to wait until Part 2. Stay tuned!

Update: I’ve written Part 2.

[1] R. F. Service, “Biology’s Dry Future.” Science. 342, 186–189 (2013). [http://www.sciencemag.org/content/342/6155/186.summary] [2] R. C. Gerum et al., The origin of traveling waves in an emperor penguin huddle. New J. Phys. 15, 125022 (2013). [http://iopscience.iop.org/1367-2630/15/12/125022/article]; see also http://www.nature.com/nature/journal/v505/n7483/full/505265e.html [3] I. Golding, J. Paulsson, S. M. Zawilski, E. C. Cox, Real-Time Kinetics of Gene Activity in Individual Bacteria. Cell. 123, 1025–1036 (2005) [http://www.cell.com/cell/abstract/S0092-8674%2805%2901037-8] [4] M. G. I. Langille et al., Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 31, 814–821 (2013). http://www.nature.com/nbt/journal/v31/n9/full/nbt.2676.html