Researchers from the Wellcome Trust Sanger Institute in the UK reported last week in Nature that they have sequenced melanoma and lung cancer genomes using Illumina and SOLiD sequencing technology, respectively. In both cases, the techniques identified both known cancer-causing mutations, as well as novel mutations. They also found evidence of ultraviolet damage in the melanoma cancer genome and damage from tobacco carcinogens in the lung cancer genome.
The two papers indicate that whole-genome short-read sequencing is becoming increasingly important for identifying cancer-causing genes that could eventually lead to better diagnosis and treatment, according to the authors.
The studies are part of a growing number of cancer genome sequencing projects, — such as the Sanger Institute’s Cancer Genome Project and the National Cancer Institute’s Cancer Genome Atlas — many of them under the umbrella of the International Cancer Genome Consortium.
The papers “demarcate the new era from the old era,” Peter Campbell, a member of the Wellcome Trust Sanger group that conducted the research, told In Sequence. “They demonstrate quite convincingly that we can find mutations in all classes at a cost that’s rapidly decreasing and a time frame that’s rapidly decreasing.”
The researchers noted that at the time they did the experiments, the sequencing cost around $100,000 on each platform, including both the tumor and normal genomes. They estimated that if they were to do it again today, the price would be around $50,000 on each platform.
The researchers said that they used both the Illumina and SOLiD technologies because they thought it was important to assess the two systems, though they noted that the platforms produced comparable results, and any differences between the technologies would not be relevant today because both platforms have been improved significantly.
“Both were able to deliver high-quality cancer genome sequences in which we could get comprehensive catalogs of somatic mutations,” said Michael Stratton, who heads the Sanger’s Cancer Genome Project and who led the research.
Campbell agreed and said that improving the algorithms would yield more benefits than using one sequencing technology over the other. “You need good informatics to take that data set and pull out what is [a] genuine mutation, what is sequencing error, and what is artifact,” he said.
Both approaches sequenced genomes from the tumor cell line as well as a normal cell line from the same patient and compared the genomes to each other.
To sequence the lung cancer genome, the scientists used the SOLiD platform to generate 25-base pair mate-pair shotgun sequences and achieved about 39-fold coverage of the tumor genome and 31-fold coverage of the normal genome.
In total, they detected 22,910 somatic substitutions, and confirmed an additional 65 indels, 58 genomic rearrangements, and 334 copy number segments.
Of the 29 known base substitutions they found 22. They also tested 79 new coding substitutions and 354 randomly chosen genome-wide variants, and confirmed 97 percent and 94 percent respectively, using capillary sequencing. They also confirmed 25 percent of indels using capillary sequencing.
In addition, they detected mutational patterns previously associated with carcinogens in tobacco smoke. “The complicated mutational processes, all of which can be traced back to carcinogens, indicate that there is a cocktail of carcinogens that work together to produce the mutations that cause cancer,” said Stratton.
To sequence the genome from the malignant melanoma cell line, the scientists used the Illumina Genome Analyzer II and a paired-end sequencing strategy (see In Sequence 9/22/2009). They constructed short libraries of 200 and 400 base pairs and paired mate libraries of 2, 3, and 4 kilobases, generated read lengths of 75 base pairs, and achieved 40-fold coverage of the tumor genome and 32-fold coverage of the normal genome.