A team of researchers led by Life Technologies’ Applied Biosystems division has published an analysis of the genome sequence of a HapMap sample, the first published human genome sequenced using the SOLiD technology.
The study, which appeared online in Genome Research this week, is the second published analysis of the genome sequence of HapMap sample NA18507, a Yoruban man. Last year, a team led by Illumina published its study of the sample, sequenced on Illumina’s Genome Analyzer platform (see In Sequence 11/11/2008).
“Here, we demonstrate that SOLiD sequencing is capable of efficiently surveying single nucleotide polymorphisms and many forms of structural variation concurrently at relatively modest coverage levels,” the ABI team writes in the article. “The unprecedented clone coverage allows us to uncover a significantly larger number of structural variants in a size range not efficiently explored in previous studies, helping to complete the picture of functional variants in this genome.”
For their study, the ABI researchers used the SOLiD platform to generate almost 77 gigabases of mate-paired reads, as well as 10.5 gigabases of fragment reads 45-50 bases in length, that aligned to the human reference genome. Overall, those reads covered the haploid genome approximately 18-fold, while the clone coverage was about 300-fold.
They detected almost 3.9 million SNPs, 19 percent of which are novel; almost 227,000 intra-read indels; almost 5,600 indels between mate-paired reads; 91 inversions; and 4 gene fusions.
For comparison, the Illumina team, for its study, generated 135 gigabases of paired-end reads with 200-base and 2-kilobase inserts, covering the genome at an average depth of 40-fold.
They identified approximately 4 million SNPs, 400,000 short indels, and about 5,700 structural variants.
Among the variants the ABI researchers identified in their study are “dozens of mutations previously described in OMIM and hundreds of non-synonymous single-nucleotide and structural variants in genes previously implicated in disease,” according to the article.
The results suggest that “it is important to consider structural variation in determining the potential disease alleles in a genome and population studies.”
The analyses “provide guidance for future exploration of human genetic variation with ultra-high throughput short-read sequencing technologies such as SOLiD and confirm that accuracy is an important factor that interplays with throughput in determining the cost-effectiveness of the new sequencing methods in whole human re-sequencing,” the authors state. “As with the initial sequencing of the human genome, it appears longer-range mate pairs continue to provide structure and phasing information of significant value to understand personal genomes.”
At the time of writing — the manuscript was received by Genome Research on Feb. 1 — the data for the study could be generated in one or two 30- to 50-gigabase runs on a SOLiD instrument at an estimated reagent cost of less then $30,000, according to the paper. “The time to analyze such large data sets is not keeping pace with these increases in data generation and we anticipate much pioneering work ahead on whole-genome sequence analysis,” the authors note.