On the replication crisis in science and the twigs in my backyard

5 thoughts on “On the replication crisis in science and the twigs in my backyard”

  1. So do dogs not really orient to north to poop? It was just noise? Or do they survey twigs in their vicinity, find a slope based on their preconceived notion that twigs orient to north, and orient to the twigs after finding a collection that show some orientation towards north? Or do they listen to the C. elegans in the soil trying to orient to some cone projecting up and take an orthogonal of that?

  2. The thought of worms and your post on replication made me think of a paper published long ago that I have used in quarterly exams as an example (I can’t remember if I’ve already gone on about this or not to you).

    The paper is: DAF-16 Target Genes That Control C. elegans Life-Span and Metabolism
    http://www.ncbi.nlm.nih.gov/pubmed/12690206 , published in Science.

    They look for DAF-16 (a transcription factor) binding sites in C. elegans, and then use the idea that orthologous genes in fly and C. briggsae that also have the motif are more likely to be real targets since those motifs have been retained over time.

    One section that seems problematic:
    “We surveyed 1 kb upstream of the predicted ATG of 17,085 C. elegans and 14,148 Drosophila genes and identified 947 C. elegans and 1760 Drosophila genes that contain at least one perfect-match consensus DAF-16 binding site within the 1-kb promoter region. We then compared these DAF-16 binding site–containing worm and fly genes with a list of 3283 C. elegans and Drosophila genes that are orthologous to each other, and identified 17 genes that are orthologous between Drosophila and C. elegans and bear a DAF-16 binding site within 1 kb of their start codons in both species (Table 1).”

    They found 947/17,085 C. elegans genes had the site, and 1760/14,148 fly genes had the site. By chance, you would expect 947/17,085 x 1760/14,148 = 0.00682 of the resulting genes to have it in both fly and worm, right? 0.00682 of 3283 orthologues = 22, so they found fewer than you would expect by chance, and yet this cross-species screening is the basis for their selecting the genes for further study and considered validation that these are true targets.

    Next, in Table 1, they showed that these 17 genes that had sites in fly and worm also had sites in briggsae. But the bottom of the table has this note:
    “† These binding sites contain one mismatch from the consensus that retains DAF-16 binding in vitro.”
    Now, the binding site TTGTTTAC and the reverse complement might be expected to appear every 15 kb given a GC content of 33%. Add some variant mismatch motifs allowed (from their reference it looks like they allowed 6 alternate sequences, so 7 in total) and you would expect to see a variant in half of the 1 kb regions. When they didn’t find a motif, or a mismatch motif in the 1 kb region, they expanded it to 2.7 kb and found motifs in all the targets. Again, with 7 motifs allowed and each motif occurring every 15 kb, 2.7 kb is not a useful filter as most genes should have a motif since 15 kb/7 = 2.14 kb.

    Later, they increase the search space for worm by 50% and fly by 5-fold, and increase the target set of genes by 66. The new probabilities would be 0.0825 x .62 = .05, so .05 x 3283 = 167 genes predicted by chance, so worse than expected by chance alone again.

    Despite choosing a cross-species probability method that apparently would have no power in finding true targets, and mixing in degenerate motifs when needed that also had little power and then expanding the target range without any calculation on how this would affect the number of targets, this became a Science paper!

    1. Thanks for writing all this — interesting! No, I don’t think you mentioned this paper before. In the first part, it seems to use the same avoidance of multiplying probabilities as the paper we were chatting about a few weeks ago:
      M. Kellis, B.W. Birren and E.S. Lander, Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisae, Nature 2004.
      (noted after reading https://liorpachter.wordpress.com/2015/05/26/pachters-p-value-prize/) . That’s the one in which 76/457 gene pairs show accelerated evolution in at least one of the pair, but it is “striking” that only 4 show acceleration in both pairs, even though that low fraction is roughly what one would expect by squaring 76/457. (Or something like that — I’m not going to go back and read the exact discussion.)

      I just looked up the paper you wrote about; it’s been cited 478 times.

Leave a comment