These days we seem to be inundated with deeply flawed scientific papers, often featuring shaky conclusions boldly drawn from noisy data, results that can’t be replicated, or both. I was reminded of this several times over the past few days: (i) A group published an impressive large-scale attempt to replicate the findings reported in 100 recent psychology studies , recovering the “significant” findings of the original papers only about a third of the time . (ii) A colleague sent me a link to an appalling paper claiming to uncover epigenetic signatures of trauma among Holocaust survivors; it pins major conclusions on noisy data from small numbers of people, with the added benefit of lots of freedom in data analysis methods. Of course, it attracted the popular press. (iii) I learned from Andrew Gelman’s blog, where it was roundly criticized, of a silly study involving the discovery that “sadness impaired color perception along the blue-yellow color axis” (i.e. feeling “blue” alters your perception of the color blue). (The post is worth reading.)
Of course, doing science is extremely difficult, and it’s easy to make mistakes. (I’ve certainly made large ones, and will undoubtedly make more in the future.) What seems to characterize many of the sorts of studies exemplified above, though, is not technical errors or experimental mis-steps, but a more profound lack of understanding of what data are, and how we can gain insights from measurements.
Responding to a statement on Andrew Gelman’s blog, “Nowhere does [the author] consider [the possibility] that the original study was capitalizing on chance and in fact never represented any general pattern in any population,” I wrote:
I’m very often struck by this when reading terrible papers. … Don’t people realize that noise exists? After asking myself this a lot, I’ve concluded that the answer is no, at least at the intuitive level that is necessary to do meaningful science. This points to a failure in how we train students in the sciences. (Or at least, the not-very-quantitative sciences, which actually are quantitative, though students don’t want to hear that.)
If I measured the angle that ten twigs on the sidewalk make with North, plot this versus the length of the twigs, and fit a line to it, I wouldn’t get a slope of zero. This is obvious, but I increasingly suspect that it isn’t obvious to many people. What’s worse, if I have some “theory” of twig orientation versus length, and some freedom to pick how many twigs I examine, and some more freedom to prune (sorry) outliers, I’m pretty sure I can show that this slope is “significantly different” from zero. I suspect that most of the people we rail against in this blog have never done an exercise like this, and have also never done the sort of quantitative lab exercises that one does repeatedly in the “hard” sciences, and hence they never absorb an intuition for noise, sample sizes, etc. (Feel free to correct me if you disagree.) This “sense” should be a prerequisite for adopting any statistical toolkit. If it isn’t, delusion and nonsense are the result.
It occurred to me that it would be fun to actually try this! (The twig experiment, that is.) So my six-year-old son and I wandered the backyard and measured the length and orientation of twigs on the ground. I couldn’t really give a good answer to his question of why we were doing this; I said I wanted to make a graph, and since I’m always making graphs, this satisfied him. This was a nicely blind study — he selected the twigs, so we weren’t influenced by preconceptions of the results I might want to find. We investigated 10 sticks.
Here’s a typical photo:
What’s the relationship between the orientation of a twig and its length? Here’s the graph, with all angles in the range [-90,90] degrees, with 0 being North:
The choice of North as the reference angle is arbitrary — perhaps instead of asking if the shorter or longer sticks differentially prefer NW/SE vs NE/SW, as this analysis does, I should pick a different reference angle. Perhaps a 45 degree reference angle would be sensible, since N/S and E/W orientations are nicely mapped onto positive and negative orientation values. Or perhaps I should account for the 15 degree difference between magnetic and true North in Oregon. Let’s pick a -65 degree reference angle (i.e. measuring the twig orientation relative to a direction 65 degrees West of North). Here’s the graph:
Clearly the data indicate a deep and previously undiscovered relationship between the length of twigs and the orientation they adopt relative to the geographic landscape, perhaps indicating a magnetic character to wood that couples to the local magnetic and gravitational fields. Or that’s all utter nonsense.
Having done this, I’m now even more convinced that analyzing “noise” is an entertaining thing to do — it would make a great exercise in a statistics class, coupled with an essay-type assignment examining its procedures and outcomes.
Today’s illustration (at the top of the post) isn’t mine; it’s by my 10-year-old, and it coincidentally shows the cardinal directions. (We’ve been playing around a bit with compass-and-ruler drawings.)
* I find it hard to understand how one makes a p-value for a linear regression slope. I did it by brute force, simulating data drawn from a null relationship between orientation and length and counting the fraction of instances with a slope greater than the observed value.
** The astute reader asks, “shouldn’t you apply some sort of multiple comparisons test?” Sure, but how many comparisons did I make?
 Open Science Collaboration, Estimating the reproducibility of psychological. Science. 349, aac4716 (2015). http://www.sciencemag.org/content/349/6251/aac4716