July 17, 2009
By Vivien Marx
Although rapid advances in sequencing technology are making it tough for bioinformatics developers to stay ahead of the curve, that isn’t stopping them from trying.
At a recent meeting on algorithms for short-read analysis, several speakers noted how much the field has changed in the past year. Dubbed “short-SIG,” the special interest group session held as part of the Intelligent Systems for Molecular Biology conference and European Conference on Computational Biology highlighted new developments in SNP and structural variation discovery, RNA sequencing, metagenomics, assembly, and statistics.
Some topics that were “completely hot” last year, such as alignment and assembly, are “mostly solved” now, the University of Toronto’s Michael Brudno, a meeting organizer, told BioInform. In the case of short-read mapping, “in a year’s time, we went from having almost no tools out there to having 12 or 13, of which almost all are good or very good,” he said.
While last year’s Short-SIG presentations focused on “basic algorithms” for read mapping and assembly, there were more talks this year on polymorphism detection, said Jens Stoye from the University of Bielefeld, another meeting organizer. This is a sign that “the field is maturing, away from technical problems that need to be solved, towards the biological and medical applications,” he said.
Another change over the last 12 months is the increase in read length that second-generation sequencers can produce. “Short reads are no longer as short as they used to be,” Stoye said.
The Illumina Genome Analyzer, for example, which was generating reads on the order of 30 to 40 base pairs last summer, is expected to reach the 100-base pair mark by the end of 2009 — an improvement that may require next year’s meeting to be called “Mid-SIG,” Brudno said.
But while read length growth is a positive development for end-users, Brudno noted that many bioinformaticians who have been focused on developing short-read algorithms over the last year are finding that they “don’t just scale” to longer reads.
In the meantime, developers have been busy creating new tools for short-read analysis challenges. For example, several groups presented new algorithms for splice junction mapping, which “cannot be handled by ‘classical’ read mappers, where by ‘classical’ I mean those from 2008,” Stoye said.
For example, TopHat is a new splice junction mapping algorithm for use with RNA-seq by Steven Salzberg’s group at the University of Maryland’s Center for Bioinformatics and Computational Biology. The scientists reported that it maps close to 2.2 million reads per CPU hour, a pace that will allow processing of an RNA-seq experiment in less than a day on a desktop computer.
Scientists from the University of Kentucky and the University of North Carolina at Chapel Hill presented another splice junction mapping tool called MapSplice, which they claim is faster than TopHat.