In memoriam: Steven Vogel


I was sad to learn that Steven Vogel passed away yesterday. He was a giant in the field of biomechanics, and his books on the subject are brilliant, fascinating, and fun. I’ve lost count of how many people I’ve run into who, like me, have found these books deeply inspirational. The first one I read was Life’s Devices: The Physical World of Animals and Plants, which remains a favorite, full of well-explained examples of how life is “engineered” — how the mechanics of fluid flows, forces on beams, velocities, and viscosities dictate and illuminate how living things work. Why can’t bacteria swim like dolphins? How do prairie dogs keep from suffocating in their burrows? Why do big animals need such thick bones? Vogel’s writings spanned a remarkably diverse set of subjects, from elephants to ants to fungal spores to plants, and conveys to the reader a deep sense of how physics and biology are intimately related.The books occupy a curious middle-ground between books for specialists and books for the non-scientist general reader; they are warm, conversational, and don’t require advanced knowledge of physics or biology, but they do contain “real” science, with equations when necessary.

Prairie dog burrow

From “Life’s Devices.” A prairie dog, and airflows generated in its burrow by the geometry of the tunnel entrances.

It’s very rare to find books that really change the way one looks at the world, but Vogel’s did just that, showing that woven amid the remarkable diversity exhibited by the living world run unifying threads of physical function. And just as we develop a deeper appreciation of the planets by understanding the simple laws that govern their motions, we gain a deeper appreciation of our fellow organisms by understanding the forces that guide them.

In my own work, I don’t study anything macroscopic. My lab looks a lot at larval zebrafish, a few millimeters long, but even here we focus on the microscopic bacteria within them. My group’s work on membranes is also very small-scale. Nonetheless, the perspective that we can gain insights into these systems by considering their material properties and spatial structure is central to our work, and to a large swathe of modern biophysics. It is, however, not a universal belief, and there’s a constant tension with the view, often implicit, that cataloging the pieces of living systems, especially the genes that “cause” various processes or the networks that link genes together, is equivalent to understanding life.

I’m happy that a few years ago I met Steven Vogel, at a conference on education at the interface of physics and biology. He was energetic and very friendly. We corresponded a bit by email afterwards; I was keen to get his thoughts on an article I was writing on the biophysics-for-non-science-majors course I had developed. (I’ve assigned several excerpts from his books when teaching the class.) His comments were warm and insightful. I’ve thought often of elaborating on materials I’ve written for the class to write a popular book on biophysics. I’ve also thought that if I were to do so, it would be great to get Professor Vogel’s comments — sadly, it is now too late for that. I do hope that someday I’ll write something substantial, and that it will have at least some of the spirit and charm of Vogel’s books.

Today’s illustration: a sea turtle I painted a few weeks ago. The entry for “sea turtle” in the index of Life’s Devices:

sea turtle. See turtle, sea, y’see.

The text explores the hull shapes of boats and buoyant animals, including baby sea turtles.

Learning about (machine) learning — part II

Mega Man X -- colored pencil, RPIn Part I, I wrote about how I started exploring the topic of machine learning, and we looked briefly described one of its main aims: automating the task of classifying objects based on their properties. Here, I’ll give an example of this in action, and also describe some general lessons I’ve drawn from this experience. The first part is probably not particularly interesting to most people, but it might help to make the ideas of Part I more concrete. The second part gets at the reasons I’ve found it rewarding to learn about machine learning, and why I think it’s a worthwhile activity for anyone in the sciences: the subject provides a neat framework for thinking about data, models, and what we can learn from the both of them.

1 I’d recognize that clump anywhere

I thought I’d create a somewhat realistic but simple example of applying machine learning to images, to be less abstract than the last post’s schematic of pears and bananas. My lab works with bacteria a lot, and a very common task in microbiology is to grow bacteria on agar plates and count the colonies to quantify bacterial abundance. (In case you want to make plates with your kids, by the way, check out [1].) Here’s what a plate with colonies looks like:

from_Sep11_111553amThe colonies are the little white circles. Identifying them by eye is very easy. It’s also quite easy to write a non-machine-learning program to select the colonies, defining a priori thresholds for intensity and shape that distinguish colonies quite accurately. (In fact, I’ve assigned this as an exercise in an informal image analysis I’ve taught.) But, for kicks, let’s imagine we aren’t clever enough to think of a classification ourselves. How could we use machine learning?

1.1 Manual training

We first need to identify objects — colonies and things that aren’t colonies, like streaks of glare. Let’s do this by simple intensity thresholding, considering every connected set of pixels that are above some intensity threshold as an object. (Actually, I first identify the circle of the petri dish, and apply high- and low-pass filters, but this isn’t very interesting. I’ll comment more on this later.)

Filtered (left) and filtered and thresholded (right) plate images.

Filtered (left) and filtered and thresholded (right) plate images.

Next we create our “training set” — manually identifying some objects as being colonies, and some as not being colonies. I picked about 30 of each. The non-colonies tend to be large and elongated, or very small specks:

Caption: manually identified colonies (green) and not-colonies (red). Black objects have not been classified.

Caption: manually identified colonies (green) and not-colonies (red). Black objects have not been classified.

For each object, whether or not it’s in the training set, we can measure various properties: size, aspect ratio, intensity, orientation, etc. Like the pear and banana example in Part I, we want to create a boundary in the space of these parameters that separates colonies and not-colonies. What parameters should we consider? Let’s look at the area of each object relative to the median area of colonies, and the aspect ratio of each object, since these seem reasonable for distinguishing our targets. You might be aghast here — we’re having to be at least a little clever to think of parameters that are likely to be useful. What happened to letting the machine do things? I’ll return to this point later, also.

For colonies and not-colonies, what do these parameters look like? Let’s plot them — since I’ve chosen only two parameters, we can conveniently make a two-dimensional plot.

training_pointsIt does seem like we should be able to draw a boundary between these two classes. We’d like the optimal boundary, that maximizes the gap between the two classes, since this should give us the greatest accuracy in classifying future objects. Put differently, we not only want a boundary that separates colonies from non-colonies, but we want the thickest boundary such that colonies are on one side and non-colonies on the other. Technically, we want to maximize the “margin” between the two groups. A straight-line boundary is fairly straightforward to calculate, but it’s obvious that such a boundary won’t work here. Instead, we can try to transform the parameter space such that in the new space we can aim for a linear separation. One might imagine, perhaps, that instead of area and aspect ratio, the coordinates in the new space are area^3 and (aspect ratio)^2*area^4, for example. Remarkably, one doesn’t actually need to know the transformation from the normal parameter space; all we need is the inner product of vectors in this space. This approach, of determining optimal boundaries in some sort of parameter space, is that of a support vector machine, one of the key methods of machine learning. The actual calculation of the “support vectors,” the data points that lie on the optimal margin between the two groups, is a neat exercise in linear algebra and numerical optimization. The support vectors for our “training set” of manually-curated groups in the bacteria plate image are indicated by the yellow circles above.

There is, as one might guess, a great deal of flexibility in the choice of transformations. There is also freedom in setting the cost one assigns to objects of one class that lie in the territory of the other class. (In general it may be impossible to perfectly separate classes — imagine forcing a linear boundary on the training data above — so setting this cost is important.)

1.2 Does it work?

Now we’ve got a classifier — the “machine” has learned, from the data, a criterion for identifying colonies! I had to specify what parameters were relevant, and a few other things, but I never had to set anything about what values of these parameters differentiate colonies from non-colonies. We can now apply this classifier to new data, such a completely new plate image, using the same support vectors we just learned. If all has gone well, the algorithm will do a decent job of identifying what is and isn’t a colony. Let’s see:

Left: a new plate image. Right: classification of objects. Blue = colonies; yellow = not-colonies.

Left: a new plate image. Right: classification of objects. Blue = colonies; yellow = not-colonies.

Not perfect, but pretty good! We could improve this with a larger set of training data, and by considering more or different parameters (though this can be dangerous, as I’ll get to shortly). Here’s what the object features look like:

test_pointsAgain, it seems pretty good. There are probably a handful of mis-classified points. (I’m not going to bother actually figuring out the accuracy.)

So, there it is! Machine learning applied to bacterial colonies. If you look at the plates, you can see regions in which colonies have grown together, making two conjoined circles. We could go further and “learn” how to identify these pairs, again starting with a training set of manually identified objects. We could also iterate this process, finding errors in the machine classification and adding this to our training set. The possibilities are endless…

1.3 How sausage is made

Now let’s return to the several issues I glossed over. We first notice that we needed human input at several places, besides the creation of the training data set: identification of the plate, image filtering, choices of parameter transformations, etc. This seems rather non-machine-like. In principle, we could have learned all of these from the data as well: classifying pixels based as belonging to plate or non-plate, examining a space of possible filtering and other image manipulations, etc. However, each of these would have a substantial “cost” — a vast amount of training data on which to learn the appropriate classification. If we’re Google and have billions of annotated images on hand, this works; if not, it’s neither feasible nor appealing. Recall that we started using machine learning to avoid having to be “clever” about analysis algorithms. In practice, there’s a continuum of tradeoffs between how clever we need to be and how much data we’ve got.

We should be very careful, however. In general, we’ve got a lot of degrees of freedom at our disposal, and it would be easy to dangerously delude ourselves about the accuracy of our machine learning if we did not account for this flexibility. We could try, for example, lots of different “kernels” for transforming our parameter spaces; we may find that one works well — is this just by chance, or is it a robust feature of the sort of data we are considering? It’s especially troublesome that in general, the task of learning takes place in high-dimensional parameter spaces, not just the two-parameter space I used for this example, making it more difficult to visually determine whether things “make sense.”

2 Learning about data

Was learning about machine learning worthwhile?

From a directly practical point of view: yes. As mentioned at the start of the last post, my lab is already using approaches like those sketched above to extract information from complex images, and there’s lots of room for improvement. Especially if I view machine learning as an enhancement of human-driven analysis rather than an expectation that one’s algorithms act autonomously to make inferences from data, I can imagine many applications in my work. It has been rewarding to realize the continuum noted above that exists between human insight / little data and automation / lots of data, and it’s been good to learn some computational tools to use for this automation [2].

But this adventure has also been worthwhile from a broader point of view. The subject of machine learning provides a useful framework for thinking about data and models. Those of us who have been schooled in quantitative data learn a lot of good heuristics — having lots of parameters in a model is bad, for example. As Fermi famously said, “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” More formally, we can think of model fitting as having the ultimate goal of minimizing the error between our model and future measurements (e.g. the second, “test” plate above), but with the constraint that all we have access to are our present measurements (the “training” plate). It is, of course, easy to overfit the training data, giving a model that fits it perfectly but that fails badly on the future tests. This is both because we may be simply fitting to noise in the training data, and because fitting models that are overly complex expands the ways in which we can miss the unknowable, “true” process that describes the data, even in the absence of noise.

Much of the machine learning course dealt with the statistical tools to understand and deal with these sorts of issues — regularization to “dampen” parameters, cross-validation to break up training data into pieces to test on, etc. None of this is shocking, but I had never explored it systematically.

What was shocking, though, was to learn a bit of the more abstract concepts underlying machine learning, such as how to assess whether it is possible for algorithms to classify data, and how this feeds into bounds on classification error (e.g. this). It’s fascinating. It’s also fairly recent, dating in large part just a few decades into the past. I generally think I’m pretty well read in a variety of areas, but I really was unaware that much of this existed! It’s a great feeling to uncover something new and unexpected. That in itself would have made the course worthwhile!

3 Learning about Mega Man

Continuing the illustration theme of Part I, I drew Mega Man (at the top of the post), which took far more time than I should admit to. K. did a quick sketch:

K_megaman_Oct2015[1] The present issue of Cultures, from the American Society for Microbiology, is the “Kid’s Issue.” Click here for the PDF and here for the “Flipbook.” Pages 96-97 describe how to make your own gel in which to grow cultures.

[2] I’ve written everything I’ve done, both for the course and these examples, in MATLAB, using the LIBSVM library []. For one homework assignment, I wrote my own support vector machine algorithm, which made me realize how wonderfully fast LIBSVM is.

Learning about (machine) learning — Part I

mega_man_Suryan_Oct2015Machine learning is everywhere these days, as we train computers to drive cars, play video games, and fold laundry. This intersects my lab’s research as well, which involves lots of computational image analysis (e.g.). Nearly everything my students and I write involves writing or applying particular algorithms to extract information from data. In the past two years or so, however, we’ve dipped our toes into some problems that may be better served by machine learning approaches. (I explain the distinction below.) “We” was really Matt Jemielita, a recently-graduated Ph.D. student (now off to Princeton), who applied basic machine learning methods to the classification of “bacteria” and “not-bacteria” in complex images.

Given all this — its relevance to the contemporary world and to my research — I thought I should dive more systematically into understanding what machine learning is and how to apply it. I’m certainly not an expert on the subject, but here’s an account of how I went about exploring it. This will be continued in Part II, which will go into why I’ve found the topic fascinating and (probably) useful as a framework for thinking about data. Also in Part II, I’ll give an example of machine learning applied to analyzing some “real” images. In Part I, I’ll mostly describe how I learned about learning, and all you’ll get as an example is a silly schematic illustration of identifying fruits.

1 Starting 34th grade

My usual approach when learning new topics is to read, especially from a textbook if the subject is a large one that I want to cover systematically. This time, however, I decided to follow a course, watching pre-recorded lectures on-line and doing all the homework assignments and exams. The class is “Learning from Data (CS156),” taught at Caltech by Professor Yaser Abu-Mostafa (see here for details: It’s a computer science course, intended for a mix of upper-level undergraduates and lower-level graduate students. All eighteen lectures are available via YouTube, and the course was explicitly designed to be made publicly accessible. I had read good things about the course on-line. I can’t really remember how I picked it over another popular machine learning course, Andrew Ng’s at Stanford, but I did notice that the videos of the Caltech course were aesthetically more pleasant. (The professor has very nice color palettes of shirts and jackets and ties. I briefly wondered if I should wear ties when lecturing in my own classes — but only very briefly.)

The course was excellent: clear, interesting, and well-organized. It’s well known that viewership of on-line courses and lectures drops precipitously as the course goes on, and this appears to the case for this class as well, at least as measured by YouTube views of each of the lectures:

Views of “Learning from Data (CS156)” lectures on YouTube

Views of “Learning from Data (CS156)” lectures on YouTube, as of Dec. 17, 2014. The spikes are the classes on neural networks and support vector machines — more on the latter later.

My own rate of progress was very non-uniform. I started the course during the 2014-15 Winter break, when I had relatively large amounts of time; I finished close to half the course in three weeks (plotted below). Then, when the academic term started, time became more scarce. When the Spring term started — and I was teaching a new biophysics graduate course — large blocks of time to spend on machine learning essentially disappeared. I finally watched lecture #18 in June, about four months after lecture #17! It was September before I finished the final exam. Still, I did it, and I managed to average about 90% correct on the homework assignments, which generally involved a good amount of programming. I scored 100% on the final exam.


Days on which I watched lectures 1-18. The dashed line indicates the start of the Winter 2015 term.

2 Active and Passive Learning

The lectures, as mentioned, were great — clear and focused, while also projecting a warm and enthusiastic attitude towards the subject and the students. It’s interesting, though, that they were vastly different in style from the classes I teach. They were purely lectures, without any “active learning” activities — no clicker questions, no interactive demonstrations, no discussions with one’s neighbors (which in my case would have involved me either pestering my kids or random people at a café). Though I’m a great fan of active learning, I have to say that this was wonderful. How do I reconcile these thoughts? It’s important to keep in mind that one of the main effects of active learning methods is student engagement — not just getting students interested in the topic, but getting them to retrospectively and introspectively think about what they’re learning and whether they understand it. However, one of the reasons adopting active learning methods when teaching seems, at first, odd is that many of us who have succeeded as academics are the sorts of people who independently do this sort of thinking. I watch the lectures; I take notes; I re-examine the notes and think about the logic of the material; I reconstruct the principles underlying homework questions as I work on them; etc. (Normally I might also think of questions to ask, but that’s not really feasible here.) With this approach, a “straight” lecture is not only fine, but it’s extremely efficient.

3 Machine Learning and Classification

So what exactly is machine learning? In essence, it’s the development of predictive models by a computer (“machine”) based on characteristics of data (“learning”), in contrast to models that exist as some fixed set of instructions. Very often, this is applied to problems of classifying data into categories; in machine learning, the goal is to not have an a priori model of what defines the category boundaries, but rather for the algorithm to itself learn, from the data, what classifiers are effective.

Here’s an example: suppose you had a bunch of pears and bananas and wanted to identify which is which from images. Your program can recognize the shape and color of a fruit. Imagine that for each of many fruits you were to plot the fruit’s “yellowness” (perhaps red/green in an RGB color space) and some measure of how symmetric it is about its long axis. In general, bananas are yellower and less symmetric than pears, so you’d expect a plot like this:

bananas_pears_classifer_graph_with_fruitsThere’s a lot of variation in both sets of points. Some pears are yellower than others, and while nearly all bananas are curved, some views of them will appear more symmetric than others. Nonetheless, we can easily imagine drawing a curve on the plot that does a good job of separating the pears from the bananas, so that if we encounter a new fruit, we can see where in the partitioned landscape its symmetry and yellowness lie and decide from that what fruit it is.

banas_pears_classifer_graph_with_separatorThe goal in machine learning is to have the computer, given “training” data of known pears and bananas, determine where this boundary should be. This is quite different from the usual approach one takes in analyzing data, which is more akin to figuring out ahead of time some model of banana and pear morphologies and appearances, and evaluating the observed image characteristics relative to this model. (To give a less convoluted example: imagine identifying circles in images by applying what one knows about geometry, for example that all points on the circle are equidistant from the center. A purely machine learning approach, in contrast, would consist of training an algorithm with lots of examples of circles and not-circles, and letting the boundary between these groups form wherever it forms.) Roughly speaking, the non-machine-learning approach is “better” if it’s feasible: one has an actual model for one’s data. However, there are countless cases for which it’s too complicated or too difficult to form a mathematical model of the data of interest, but for which abundant examples on which to “train” exist, and that’s where machine learning can shine.

Even in the contrived example above of bananas and pears, we can see from the graph that it’s not actually obvious how to draw the separator between the two fruits. Do we draw a line, or a curve? A gentle curve, which leaves some data points stranded in the wrong territory, or a convoluted curve, which gets the training data exactly “right,” but seems disturbingly complex for what should be a simple classification? Considering these dilemmas is central to the practice of machine learning. Since this post is getting long, I’ll save that for Part II, in which I’ll also show a “real” example of applying machine learning to a task of object classification in images. I’ll also try to describe why I’ve found exploring this topic worthwhile — beyond its practical utility, it provides a nice framework for thinking about data and models.

Today’s top-of-the-post illustration is Mega Man, by S. (age 6). Mega Man is a robot who looks like a boy. In the innumerable comic books my kids have read about him, I don’t think machine learning algorithms are discussed. I could be mistaken, however.

To be continued…

Berkeley astronomy news (rotten eggs part 2?)

radio telescopeI spent much of my undergraduate life in UC Berkeley’s Astronomy department. I was an astrophysics and physics double major for quite a while, and I spent countless hours working with our undergraduate-built rooftop radio telescope (shown), both helping build it and serving as a teaching assistant in the laboratory course we designed around it. (Our article on the rooftop telescope was my first published paper!) It’s especially disturbing, therefore, to read the recent news about a nine-year pattern of sexual harassment by Berkeley astronomy professor Geoff Marcy, as revealed in a recent investigation publicized by Buzzfeed. The behavior is bad enough, but what’s really dismaying, as Michael Eisen and other have pointed out, is that according to Berkeley’s statements there are no consequences other than perhaps a firm talking to for repeatedly behaving in ways that are obviously wrong. “Obviously” is not open to much debate, and it seems like issues with Marcy’s behavior were pointed out to him many times over the past years — you can read the links for details. (New: a good summary from the Chronicle of Higher Education.) The university’s response is that there will be no leniency for future violations. Seriously.

One encouraging development, though, is that nearly all of the Astronomy department faculty have signed a letter stating that Marcy should no longer be part of the faculty. I’m happy, though certainly not surprised, to see that Carl Heiles, the main faculty supervisor of our undergraduate radio astronomy efforts, is a signatory. Working and studying in Berkeley’s astronomy department had an enormous influence on my development as a scientist, and I remember it very fondly as an environment that was stimulating and challenging as well as friendly and supportive. The faculty who worked with us — most notably Prof. Carl Heiles and Dr. Dick Treffers — were wonderful. I’ll also note, by the way, that two of the four student authors on the paper linked above are women — one of whom, unlike me, is now an Astronomy professor.

I hope that other students at Berkeley today and in years to come can have the same sorts of experiences that I did. I’m dismayed that Berkeley’s administration appears to support the activity of a faculty member who, it seems, has very different views on meaningful interactions with students. I wouldn’t advocate sacking him altogether (though I don’t think this is out of the question), but I would at least have hoped for a period of barring Marcy from supervision of and contact with students (with a corresponding loss of pay).

(Note: the title refers to my previous post, not other problems at Berkeley.)

Update: As I was inserting links before posting this, I learned that Geoff Marcy, a few hours ago, resigned from Berkeley’s astronomy department.

Seeing the smell of rotten eggs

JACS_cover_and_RPsubmissionI’m a bit behind in writing summaries of recently published papers from my group. Here’s one that’s a few months old — I’m spurred to write now since I just learned two days ago that it got onto the cover of JACS, the flagship journal of the American Chemical Society:

M. D. Hammers, M. J. Taormina, M. M. Cerda, L. A. Montoya, D. T. Seidenkranz, R. Parthasarathy, M. D. Pluth, “A Bright Fluorescent Probe for H2S Enables Analyte-Responsive, 3D Imaging in Live Zebrafish Using Light Sheet Fluorescence Microscopy.” J. Am. Chem. Soc. 137: 10216-10223 (2015), [link].

The cover image is the one on the left, above. (More on that in a moment.) The paper is primarily from the lab of my colleague, Mike Pluth, a remarkable organic chemist here at Oregon. Mike’s group devised a new fluorescent reporter of hydrogen sulfide (H2S) — i.e. it becomes fluorescent when it binds H2S. We’d probably all recognize H2S by its characteristic rotten egg smell. (Thankfully, my lab has never had enough of it around to notice!) There’s an increasing interest in detecting and studying hydrogen sulfide in living organisms, since it’s used by cells as a signaling molecule to regulate various physiological processes. It’s also produced by various bacterial species, and so could give insights into microbial activity.

Chemical reporters of H2S, however, have tended to be hard to use, highly toxic, or both. The Pluth lab’s new molecule is sensitive and looked like it would be amenable for use in live organisms. Since my lab does a lot of three-dimensional microscopy of larval zebrafish, we took on the task of imaging this reporter in vivo, seeing if we could detect its signal inside the larval gut. “We” is really Mike Taormina, a very skilled postdoc in my lab. This required, of course, getting the reporter molecules into the gut, which Mike did by the amazing method of microgavage — carefully inserting a fine capillary into the mouth of a larval fish, injecting the contents, and removing the capillary without damage to the fish. (A larval zebrafish is about 0.5 mm wide x a few mm long, so this procedure has to be done under a microscope.) We used light sheet fluorescence microscopy (which I’ve written about before) to image the reporter molecules, and to determine that they are properly localized in the gut. The optical sectioning capabilities of light sheet microscopy turn out to be very useful in distinguishing the gut reporter signal from the abundant background fluorescence of the zebrafish. We had hoped to detect intrinsic H2S, but the levels were insufficient for this study. Instead, we also gavaged H2S donor molecules, and detected their presence. This may seem a bit silly — detecting the very molecules we ourselves put in — but it allowed quantitative measures of sensitivity, and most importantly showed that all this could be done inside a live animal without any apparent toxicity.

In addition to showcasing the Pluth Lab’s remarkable chemical creations, the project ties into my lab’s interests in imaging not only physical processes and biological components of gut ecosystems, but also chemical activity.

After our manuscript was accepted by JACS, it was picked as an “Editor’s Choice,” and we were asked if we’d like to propose a cover image. I rather quickly painted this one as a possibility:

Parthasarathy_Pluth_Cover_Illustration_June2015_smallJACS suggested revising it, hence the version at the top right. It’s not great, but I rather like the fish. Usually what happens with cover art submissions is that they’re either accepted or rejected. This time, oddly, JACS was keen on having its own cover artist make a cover, which they did, but incorporating the fish from my submission as part of it. It’s a bit strange, and I have to say I’m not thrilled by the resulting cover (maybe just because I have a low tolerance for gradient shading). But still, it’s nice to have some publicity for our ability to see the smell of rotten eggs!

On the replication crisis in science and the twigs in my backyard

K_compass_Sept2015A long post, in which you’ll have to slog or scroll through several paragraphs to get to the real question: can we navigate using fallen sticks?

These days we seem to be inundated with deeply flawed scientific papers, often featuring shaky conclusions boldly drawn from noisy data, results that can’t be replicated, or both. I was reminded of this several times over the past few days: (i) A group published an impressive large-scale attempt to replicate the findings reported in 100 recent psychology studies , recovering the “significant” findings of the original papers only about a third of the time [1]. (ii) A colleague sent me a link to an appalling paper claiming to uncover epigenetic signatures of trauma among Holocaust survivors; it pins major conclusions on noisy data from small numbers of people, with the added benefit of lots of freedom in data analysis methods. Of course, it attracted the popular press. (iii) I learned from Andrew Gelman’s blog, where it was roundly criticized, of a silly study involving the discovery that “sadness impaired color perception along the blue-yellow color axis” (i.e. feeling “blue” alters your perception of the color blue). (The post is worth reading.)

Of course, doing science is extremely difficult, and it’s easy to make mistakes. (I’ve certainly made large ones, and will undoubtedly make more in the future.) What seems to characterize many of the sorts of studies exemplified above, though, is not technical errors or experimental mis-steps, but a more profound lack of understanding of what data are, and how we can gain insights from measurements.

Responding to a statement on Andrew Gelman’s blog, “Nowhere does [the author] consider [the possibility] that the original study was capitalizing on chance and in fact never represented any general pattern in any population,” I wrote:

I’m very often struck by this when reading terrible papers. … Don’t people realize that noise exists? After asking myself this a lot, I’ve concluded that the answer is no, at least at the intuitive level that is necessary to do meaningful science. This points to a failure in how we train students in the sciences. (Or at least, the not-very-quantitative sciences, which actually are quantitative, though students don’t want to hear that.)

If I measured the angle that ten twigs on the sidewalk make with North, plot this versus the length of the twigs, and fit a line to it, I wouldn’t get a slope of zero. This is obvious, but I increasingly suspect that it isn’t obvious to many people. What’s worse, if I have some “theory” of twig orientation versus length, and some freedom to pick how many twigs I examine, and some more freedom to prune (sorry) outliers, I’m pretty sure I can show that this slope is “significantly different” from zero. I suspect that most of the people we rail against in this blog have never done an exercise like this, and have also never done the sort of quantitative lab exercises that one does repeatedly in the “hard” sciences, and hence they never absorb an intuition for noise, sample sizes, etc. (Feel free to correct me if you disagree.) This “sense” should be a prerequisite for adopting any statistical toolkit. If it isn’t, delusion and nonsense are the result.

It occurred to me that it would be fun to actually try this! (The twig experiment, that is.) So my six-year-old son and I wandered the backyard and measured the length and orientation of twigs on the ground. I couldn’t really give a good answer to his question of why we were doing this; I said I wanted to make a graph, and since I’m always making graphs, this satisfied him. This was a nicely blind study — he selected the twigs, so we weren’t influenced by preconceptions of the results I might want to find. We investigated 10 sticks.

Here’s a typical photo:

crop Photo Sep 06, 3 23 59 PMThis particular twig points about 70 degrees west of North (i.e. it lies along 110- 290 degrees).

What’s the relationship between the orientation of a twig and its length? Here’s the graph, with all angles in the range [-90,90] degrees, with 0 being North:

twig_orientation_northThe slope isn’t zero, but rather 1.5 ± 2.3 degrees/inch. (It’s almost unchanged with the longest stick removed, by the way.)

The choice of North as the reference angle is arbitrary — perhaps instead of asking if the shorter or longer sticks differentially prefer NW/SE vs NE/SW, as this analysis does, I should pick a different reference angle. Perhaps a 45 degree reference angle would be sensible, since N/S and E/W orientations are nicely mapped onto positive and negative orientation values. Or perhaps I should account for the 15 degree difference between magnetic and true North in Oregon. Let’s pick a -65 degree reference angle (i.e. measuring the twig orientation relative to a direction 65 degrees West of North). Here’s the graph:

twig_orientation_m65Great! Now the slope is -7.0 ± 3.4 degrees/inch. The p-value* is 0.01.** I didn’t even have to eliminate data points, or collect more until the relationship became “significant.”

Clearly the data indicate a deep and previously undiscovered relationship between the length of twigs and the orientation they adopt relative to the geographic landscape, perhaps indicating a magnetic character to wood that couples to the local magnetic and gravitational fields. Or that’s all utter nonsense.

Having done this, I’m now even more convinced that analyzing “noise” is an entertaining thing to do — it would make a great exercise in a statistics class, coupled with an essay-type assignment examining its procedures and outcomes.

Today’s illustration (at the top of the post) isn’t mine; it’s by my 10-year-old, and it coincidentally shows the cardinal directions. (We’ve been playing around a bit with compass-and-ruler drawings.)

* I find it hard to understand how one makes a p-value for a linear regression slope. I did it by brute force, simulating data drawn from a null relationship between orientation and length and counting the fraction of instances with a slope greater than the observed value.

** The astute reader asks, “shouldn’t you apply some sort of multiple comparisons test?” Sure, but how many comparisons did I make?

[1] Open Science Collaboration, Estimating the reproducibility of psychological. Science. 349, aac4716 (2015).

Review times revisited

horse leg muscles -- rpTwo posts ago, I wondered about how long the average peer-review of a journal article takes to write. Most people I know reported “a few hours” as the average time, with the upper end of the range being a day or two. I emailed several journals — mostly ones that I’ve reviewed papers for or published in during the past year — asking whether they’ve collected data on how much time reviewers spend reviewing. Of eight journals, three replied. Of these, only one had actual data!

The Optical Society of America (which publishes Optics Express and other journals) very nicely wrote:

… there was a survey taken in 2010 of 800 responses from OSA authors. We have the following numbers that varied across the board:

2-5 hours at 37%

6-10 hours at 29%

11+ hours at 22%

I don’t know why the numbers don’t add up to 100%. Perhaps 12% were <2 hours? If so, the median time would be about 5 or 6 hours.

It would have been nice to get more data on this but perhaps, as a colleague of mine cynically noted, journals don’t want to know how much free labor they’re asking people to provide! (I don’t think this is really the case.)

Now I should get back to the review I’m presently working on — I’m at 3 hours so far, and I feel compelled to re-plot the authors’ data to clarify various issues… (They nicely provide it in table form, and I’m fond of making graphs…)

(Today’s illustration: ‘the external view of the left fore leg of the horse,’ which I sketched from a sketch in “Animal Painting and Anatomy” by W. Frank Calderon — an odd book, which apparently defines “Animal” as “horse, dog, cow, or sometimes lion.”)