Tuesday, February 16, 2016

A Cautionary Tale of Bioinformatics Tool Interpretation

Chromosome 2 Fusion Controversy
Abigail Kacpura
Marilyn Hayden
Harrison Kitts

We have read the articles posted by Dr. Jeffrey Tomkins allegedly refuting the human chromosome 2 fusion from distinct chimpanzee chromosomes 2A and 2B. We believe that he is erroneous in his analysis of previous scientific findings, and that his misconceptions are due to his religious presuppositions. In his attempts to prove his belief that creation, not evolution, occurred, he has incorrectly used and interpreted the findings of bioinformatics tools, and ignored scientific evidence published by others in his field.
The following summarizes some of his key points:
  • Claim 1: He believes that the original research stating that human chromosome 2 is formed from the fusion site of two telomeres from chimpanzee chromosomes is false. He claims that there is a gene DDX11L2, present at the alleged fusion site. According to Tomkins, “Functional genes like DDX11L2 do not arise by the mythical fusing of telomeres. The alleged fusion site is not a degenerate fusion sequence but is and, since creation, has been a functional feature in an important gene. Because the DDX11L2 gene is encoded on the reverse-oriented strand, it is read in the reverse direction (see Exon 1 arrow). Thus, the alleged fusion sequence is not read in the forward orientation typically used in literature as evidence for a fusion—rather, it is read in the reverse direction and encodes a key regulatory switch.
  • Claim 2: He additionally believe that there is a low frequency of telomere motif repeats in the fusion region, and that the sequence is degenerate to such a point that it does not indicate that the sequence was ever derived from a telomeric region. Tomkins and Bergman state "In a 30 kb region surrounding the fusion site [of human chromosome 2], there exists a paucity of intact telomere motifs (forward and reverse) and very few of them are in tandem or in frame" [1].
  • Claim 3: Tomkins has also stated that the similarity between human and chimpanzee genomes are about 70%, even though other accredited scientists have found the similarities to run around 96%.

First, we investigated his claims using peer-reviewed scientific evidence and genome browsers.

  • Claim 1:Tompkins has the correct starting assumptions, but he has incorrect scientific arguments based on investigation using NCBI and USCS genome browser. Below, I attempted to recreate the figure using the genome build (GRCh38/hg38) used as well as the 798 base fusion site via USCS genome browser and BLAT search. The actual sequence that Tompkins obtained might have not been the correct base pair sequence as the authors FAn et al 2002 note that the sequence was derived from contigs of the BAC artificial chromosome, see Figure 3. With that said, the BLAT search that Dr. Tompkins did was not executed properly. In order to replicate the true BLAT search, one would need to obtain these contigs, as currently only the entire sequence of the BAC clone used is available at http://www.ncbi.nlm.nih.gov/nuccore/6013067?report=fasta.  Therefore, any claims that Dr. Tompkins makes using this figure is not truly valid.

    Figure 3.
  • Claim 2: First, we examined the 30 kb region surrounding the fusion site using the UCSC Table Browser. The location of this 30 kb region is 113494368-113669101 bp on chromosome 2. With the sequence derived, we searched for the telomere motifs “TTAGGG” and “CCCTAA”. The sequence “TTAGGG” appears 46 times, and only once is it truly a repeat with no interspersed nucleotides. A second time it occurs adjacent to the motif with one nucleotide in between. For the second motif, he states it is “CCTAAA”, however, the telomere repeat is actually “CCCTAA.” This motif is found 136 times in the 30 kb region. There are 17 instances in which the motif was found to be repeating. There was a low frequency of these motifs, however, this confirms previous findings which state that the fusion region is highly degenerate [2]. It is highly likely that the nature of this degenerate sequence contributed to the likelihood of fusion. Blasco et al [3] demonstrated that defective telomeres, those which lack detectable telomere repeats, are predisposed to chromosomal abnormalities, which includes end-to-end fusions. While Tomkins finds the degeneracy of the code in the fusion region to be evidence against the chromosome 2 fusion, it actually provides evidence of a genomic situation which predisposes to the end-to-end chromosomal fusion that he seeks to disprove.
  • Claim 3: The evidence of the similarity between the chimpanzee genome and human genome is very clear cut. A study done in 2005 published in Nature magazine has come to the conclusion that “the difference between the two genomes is actually not ∼1%, but ∼4%—comprising ∼35 million single nucleotide differences and ∼90 Mb of insertions and deletions [7].” This discovery was revolutionary in the scientific community as it has allowed scientists to understand the human genome that much better and as a result, medicine and research have gone through major breakthroughs. The National Human Genome Research Institute Director Francis S. Collins, M.D., Ph.D. has said that "As we build upon the foundation laid by the Human Genome Project, it's become clear that comparing the human genome with the genomes of other organisms is an enormously powerful tool for understanding our own biology." These are reputable scientists whose research is considered accurate because they document their processes closely and carefully. Tomkins looks to refute this evidence of evolution with claims that are outlandish as best.


Finally, we sought out critiques of his findings from other sources. The following represents a summary of some of the critiques and analysis of his work.

  • Claim 1: A reddit user, euphmus also addressed Tomkin’s inability to use the bioinformatics tools, USCS browser and NCBI correctly. Using the NCBI Genome Data Browser to look at the genes located at the proposed fusion site of human chromosome 2 (2q13–2q14.1), euphmus found multiple genes other than DDX11L2 in the fusion location. This user, used the Homologene function of NCBI and the results can be found here:http://www.ncbi.nlm.nih.gov/homologene/?term=PAX8 and http://www.ncbi.nlm.nih.gov/homologene/90913
  • Claim 2: Reddit user Aceofspades25 [4] has addressed Tomkins’ claim that the low frequency of telomere motif repeats indicates that the fusion of 2 chimp chromosomes did not occur to product the human chromosome 2. These motifs are “TTAGGG” joined to a series of repeats of the motif “CCCTAA”. While the frequency of these repeated motifs may be lower than expected on initial examination, he believes that it is not surprising. He cites, Carl Zimmer, a popular science writer and blogger, who focuses on the study of evolution and parasites. He has written several books on the topic and is a science writer for The New York Times, Discover, and National Geographic. Zimmer states “The ends of chromosomes are very vulnerable places. If they simply dangle loosely, DNA-cutting enzymes can nibble away at them, destroying the genes they encounter. The dangling end of one chromosome can also get attached to the dangling end of another, fusing chromosomes together. We are mostly protected from such changes thanks to special proteins called telomerases. They tack on little repeating bits of DNA, which form a loop–a telomere–so that chromosomes end as a hairpin curve, rather than dangling ends” [5]. The loop prevents chromosomes from undergoing fusion. With age, telomeres shorten and become mutated. A high degree of mutation prevents formation of the protective loop, which would be conducive to fusion. Figure 1 presents the sequence of the chromosome 2 fusion site and highlights the telomere motifs present in the region. Due to the instability of this region, it is likely that mutation has occurred. Taking this knowledge into account, figure 2 highlights the telomere motifs present as well as the telomere motifs which contain a single mutation. In this case, nearly the entire region is highlighted. When we account for mutation, it indicates that it is highly likely that this region came from a mutated telomere.
Figure 1. Chromosome 2 Fusion Site Sequence. Highlighted regions indicate intact telomere motifs (TTAGGG and CCCTAA).
Figure 2. Chromosome 2 Fusion Site. Highlighted regions indicate either telomere motifs (TTAGGG and CCCTAA) or telomere motifs with a single mutation.
  • Claim 3: Blogger Bill Needle has written about Tomkins’ claims regarding the “similarity between the human genome and the genome of chimpanzees [6].” According to Tomkin, there is only about 70% between humans and chimps in regards to genomes. However, Needle claims that actual scientists have found that there is a 96% to 98% similarity between humans and chimps. This would mean that Tomkins would either have fabricated the information, or there was some error in his analysis that would lead to a faulty conclusion. In Needle’s opinion, the incorrect analysis is due to “a bug in the analysis software [6].” Apparently, when other people try to recreate Dr. Tomkins’ results, they simply cannot be recreated. In addition, Needle claims that “Tomkins employed an ungapped parameter in the BLAST+ software he used in his analysis [6].” According to a reader of Needle’s blog, someone that uses an ungapped parameter in BLAST+ will get faulty results because “The ungapped parameter determines whether to account for small indels in the comparison. If the ungapped parameter is used, and there is a putative single nucleotide insertion in one of the sequences, then the BLAST algorithm cannot continue the alignment [6].” If someone like Dr. Tomkins is going to make claims about the similarities between Human and chimpanzee genomes, he should take the necessary precautions so that all analyses done are scientifically valid. The fact that so many credited scientists have found results vastly different than Tomkins should lead people to question his results and the techniques used to gather them.












References

1. Tomkins J, Bergman J. The chromosome 2 fusion model of human evolution—part 2: re-analysis of the genomic data. Journal of Creation. 2011;25(2):111-7.
2. Fan, Y. et al., Genomic structure and evolution of the ancestral chromosome fusion site in 2q13-2q14.1 and Paralogous Regions on Other Human Chromosomes, Genome Res. 12:1651–1662, 2002.
3. Blasco MA, Lee HW, Hande MP, Samper E, Lansdorp PM, DePinho RA, Greider CW. Telomere shortening and tumor formation by mouse cells lacking telomerase RNA. Cell. 1997 Oct 3;91(1):25-34.
4.  Reddit. Chromosome 2 fusion - A response to a question on biology stack exchange • /r/NaturalTheology [Internet]. 2014 [cited 15 February 2016]. Available from: https://www.reddit.com/r/NaturalTheology/comments/28v2ib/chromosome_2_fusion_a_response_to_a_question_on/?
5. The Loom. And Finally the Hounding Duck Can Rest - The Loom [Internet]. 2012 [cited 16 February 2016]. Available from: http://blogs.discovermagazine.com/loom/2012/07/23/and-finally-the-hounding-duck-can-rest/

6. Marmotism.blogspot.com. Marmotism: Jeffrey Tomkins' Fail! [Internet]. 2016 [cited 16 February 2016]. Available from: http://marmotism.blogspot.com/2015/08/jeffrey-tomkins-fail.html

7. Comparing the human and chimpanzee genomes: Searching for needles in a haystack
Ajit Varki and Tasha K. Altheide
Genome Res. December 2005 15: 1746-1758; doi:10.1101/gr.3737405
http://genome.cshlp.org/citmgr?gca=genome%3B15%2F12%2F1746