By Julia Karow
Applied Biosystems’ SOLiD platform discovered more forms of genome variation in a HapMap sample with less coverage than Illumina’s Genome Analyzer did in a previous analysis of the same sample, although the overall number of variations discovered was smaller, according to an ABI researcher. However, a formal comparison of the two studies — which both used sequencing platforms that have since improved — has not been published to date.
Last month, a team of researchers led by the Life Technologies division published online in Genome Research the genome sequence and analysis of an African man, HapMap sample NA18507, based on data from the SOLiD platform (see In Sequence 6/23/2009). The same sample was sequenced on the Illumina Genome Analyzer by a group led by Illumina scientists, who published their analysis in Nature last fall (see In Sequence 11/11/2008).
According to Kevin McKernan, senior director of SOLiD scientific operations at ABI and the corresponding author of the Genome Research paper, he and his team were able to detect “more forms of variation” than the Illumina group with half the coverage, “perhaps not in number but in structural complexity.”
Overall, the Illumina team, using the Genome Analyzer I, sequenced the sample to about 40-fold average depth and reported approximately 4 million SNPs, as well as 400,000 short insertions and deletions up to 16 bases, and 5,704 structural variants ranging in size from 50 bases to more than 35 kilobases. They generated 35-base reads from libraries with 200-base and 2-kilobase inserts.
The ABI researchers, on the other hand, used the SOLiD 2.0 to sequence the sample to 18-fold haploid coverage and identified 3.87 million SNPs as well as 226,529 small intra-read indels; 5,590 large indels between mate-paired reads, 91 inversions, and 4 gene fusions. They produced 25-base and 50-base mate-paired reads with inserts up to 3.5-kilobases as well as single-end 50-base reads.
Two reads of SOLiD data were required to detect a SNP, whereas previous Illumina publications suggest three or four Illumina reads are needed for the same purpose, McKernan said, attributing this difference to the SOLiD platform’s high accuracy.
In addition, he said, having 50-base paired reads enabled his team to “fill in the blind spot of variation detection” — meaning deletions 20 to 100 base pairs in length — by detecting split reads and contracted or expanded mate pairs. “This blind spot reduction is a huge step forward in next-gen tools starting to completely supplant old-generation features,” he told In Sequence by e-mail last week.
He and his colleagues were also able to resolve haplotype phases from mate-pair data, “which has never been done genome-wide before on a next-generation [sequencing] platform,” he said.
Their analysis also found that “large inserts are especially valuable for structural variation discovery,” he said. Since about half of all breakpoints are thought to be in repeats, he explained, “longer inserts increase our ability to uniquely place one end of a pair in unique sequence.”
The ABI scientists did not include a direct comparison of their results with Illumina’s in their paper, because at the time of submission — Genome Research received the manuscript Feb. 1 — only the genotype locations were publicly available for the Illumina study but not the genotype calls, according to McKernan.
He and his colleagues did compare the indels found in both studies and found that Illumina did not see the same preference for even-sized indels greater than four bases in size that “SOLiD and most other technologies” have detected in the human genome, he said.
According to David Wheeler, an associate professor at Baylor College of Medicine who had not studied the two papers in detail yet, “the results should be fairly comparable, and it would be very interesting to see whether there are any biases in the two technologies.”
Wheeler was involved in the sequencing of Jim Watson’s genome with the Roche/454 platform and, more recently, in sequencing human cancer genomes using the SOLiD technology (see In Sequence 5/12/2009). “I would really expect the two analyses to overlap substantially, but it would be definitely a comparison that should be done,” he said.
Wheeler cautioned that both technologies have improved since the data for the two studies was produced, and that if Illumina repeated its study today with its longer reads, the results would probably be better.
Though the two papers are currently the only published studies of the same human genome sample analyzed independently on two next-generation sequencing platforms, they are not the only such studies.
At the Biology of Genomes conference this spring, for example, Gonçalo Abecasis, a researcher at the University of Michigan and co-chair for the analysis group of the 1000 Genomes Project, said that one of the trio samples that is part of a pilot study for the project was sequenced independently at 30-fold depth coverage on the Illumina and the SOLiD platforms. The best results, he said, emerged when both datasets were combined. “Each platform has different characteristics; none of them is uniformly better than the other,” Abecasis said at the time (see In Sequence 5/12/2009).
Illumina did not get back before deadline with comment for this article.