Sequencing Study Highlights Genetic Diversity in Southern Africa

By a GenomeWeb staff reporter

NEW YORK (GenomeWeb News) – An international research team reported in Nature today that it has characterized five human genomes from southern Africa, identifying millions of SNPs never before found in the human population.

The American, African, and Australian researchers sequenced the full genomes of two African individuals: a member of a hunter-gatherer population in the Kalahari desert known as the Bushmen, San, or Khoisan, and a Bantu individual from South Africa — Nobel peace prize winner Archbishop Desmond Tutu. After sequencing the exomes of three other Khoisan men, the team compared all five genomes, identifying more than 1.3 million previously undetected SNPs.

During their subsequent analyses, they not only found genetic differences between southern African populations and populations from other parts of Africa and the world but also within the Khoisan population — findings that may eventually inform everything from studies of human population history and adaptations to agriculture to personalized medicine strategies in southern Africa.

“On average, there are more genetic differences between any two Bushmen in our study than between a European and an Asian,” co-lead author Stephan Schuster, a biochemistry and molecular biology researcher with Pennsylvania State University’s Center for Comparative Genomics and Bioinformatics, said in a statement.

Southern Africa is believed to be the source of modern humans and, subsequently, is home to a great deal of human genetic diversity. But despite the decades-long effort to characterize the human genome and human population genetics, most studies have lacked representatives from this region, Schuster explained during a telephone briefing with reporters this morning.

In an effort to get a better sense of the genetic variation within humans, he and his team set out to characterize the genomes of individuals from the Khoisan population — thought to be the oldest modern human population. Schuster described the project at the American Society for Human Genetics meeting last fall, though this paper marks the first publication from the sequencing effort.

Archbishop Tutu, who has ancestry from Sotho-Tswana and Nguni language groups, which represent roughly 90 percent of southern Africans, also participated in the study. Tutu was a good candidate not only because of his ancestry but also because he is known to have survived polio, tuberculosis, and prostate cancer and because he is a voice for southern Africa and indigenous populations, senior author Vanessa Hayes, a cancer genetics researcher at the University of New South Wales, told reporters.

The four Khoisan men who participated in the study all came from different communities in Namibia’s Kalahari Desert. Each was the most elder member of his community.

The team sequenced the genome of a Khoisan man named named !Gubi to 10.2 times coverage using paired-end sequencing with the Roche 454 GS FLX Titanium platform. !Gubi is believed to be around 86 years old and lives in the southern Kalahari on the Namibia-Botswana border. They also used a similar approach to sequence the genome of a second Khoisan man named G/aq’o, from a community in the northern Kalahari, to about two times coverage.

Meanwhile, the researchers sequenced Tutu’s genome using the Applied Biosystems SOLiD 3.0 platform, generating sequence covering the genome about 12.3 times.

For the exome sequencing portion of the study, the team captured protein-coding sequences for each of the five individuals with the NimbleGen 2.1 M array and sequenced them by Roche 454 Titanium sequencing.

The genomic and exomic sequences were verified using a range of approaches, including genotyping and whole-genome and exome sequencing with the Illumina platform, which was used to sequence !Gubi’s genome to 23.2 times coverage and Tutu’s genome to 7.2 times coverage.

When the team compared the genomes and exomes to version 18 of the human reference genome and eight personal genomes sequenced, they found 1.3 million previously undetected SNPs, including 13,146 new SNPs that alter the amino acid sequence of 7,720 genes.

!Gubi’s genome contained more SNPs than Tutu’s, though both contained more SNPs overall — and more novel SNPs — than any other individual genome sequenced so far.

And from their population level analyses, the researchers detected as many or more genetic differences between the Khoisan and West African populations than between West African and European populations.

The team’s preliminary peek at the functional role of genes affected by new SNPs in the Khoisan population suggests that these variants tend to fall in genes involved in immune response, reproduction, and sensory perception.

“We believed that because of their extremely long lineage, their genome would be very different,” co-lead author Webb Miller, a researcher at Penn State’s Center for Comparative Genomics and Bioinformatics, told reporters. And, he said, the findings so far support that hypothesis.

This type of genetic diversity within the human genome is believed to have helped humans thrive over thousands of years, Schuster said, though he emphasized that modern human genomes from all around the world still share far more similarities than differences. “We are genetically one healthy species,” he said.

The team believes understanding human genetic diversity in southern Africa will likely be medically important, both for developing personalized medicine in this region and for identifying and understanding the roles of rare variants in human health and disease in general.

“Adding the described variants to current databases will facilitate the inclusion of southern Africans in medical researchers’ efforts, particularly when family and medical histories can be correlated with genome-wide data,” the researchers wrote.

The researchers have already started developing microarrays incorporating the newly identified southern African SNPs. For the next phase of the study, they plan to use these microarrays to genotype hundreds of individuals from southern Africa.

A Novel Method for SNP Identification

In an advance online publication of Genome Research,

research out of the Scripps Translational Science Institute describes a novel method for SNP identification, “SNIP-Seq.” SNIP-Seq utilizes population sequence data to detect SNPs and assign genotypes to individuals. The team used data from a region on chromosome 9p21 of the human genome (sequenced in 48 individuals, with five sequenced in duplicate) and found that many of the novel SNPs identified by SNIP-Seq were validated by pooled sequencing data; they were also confirmed by Sanger sequencing. “Collectively, these results suggest that analysis of population sequencing data is a powerful approach for the accurate detection of SNPs and the assignment of genotypes to individual samples,” the team writes.

Also published in advance online this week, Wilfried Haerty and G. Brian Golding of McMaster University, Ontario, describe their discovery of genome-wide evidence for selection acting on single amino acid repeats. Haerty and Golding tested the effect of splicing on the structure of homopolymer sequences. The team discerned a “relationship between alternative splicing and homopolymer sequences with alternatively spliced genes being enriched in number and length of homopolymer sequences.” They also found lower codon density and longer homocodons, which they say suggests a balance connected with the pressures imposed by selection.

This week in Genome Research, researchers at Harvard and MIT propose an improved method for identifying gene interactions using high-dimensional single-cell morphological data from genetic screens, applied in a systematic computational model to RhoGAP/GTPase regulation in Drosophila melanogaster. The team writes that while their model appears to create only mediocre predictions, it represents a vast improvement from alternative methods. “This work demonstrates the fundamental fact that high-throughput morphological data can be used in a systematic, successful fashion to identify genetic interactions and, using additional elementary knowledge of network structure, to infer signaling relations,” they write.

In a methods paper, Andrew Young of the National Human Genome Research Institute and his colleagues describe a novel strategy for de novo genomic assemblies using short sequence reads and reduced representation libraries. Young et al. developed a method to partition the genome prior to assembly by using two independent restriction enzymes to create overlapping fragment libraries ― each containing a manageable subset of the genome. “Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence,” the team writes. In a proof-of-concept study, the team applied their method in assembling the Drosophila genome, and when compared with the reference genome, they deemed their version significantly comparable.

The Technology Pilot

If you’ve been wondering how the 1,000 Genomes Project is doing, here’s an account from Dan Koboldt at his MassGenomics blog about last week’s meeting at Baylor where participants discussed the “Pilot 3″ phase of the project. “Unlike pilots 1 and 2, which emphasized whole genome sequencing to low or high coverage, respectively, in Pilot 3, the exons of 1,000 genes (~1.5 Mbp total) were selectively targeted for sequencing by capture technologies,” Koboldt writes. The team is also checking data across platforms and pipelines. “Overall, the Pilot 3 variant calls are looking good – dbSNP concordances in the 70-80% range or higher, and transition/transversion ratios of about 3-3.50 – and consistent across 454 and Solexa data from multiple centers,” he writes.

First genome-wide, single-base-resolution maps

Joe Ecker is senior author on a paper that provides the first genome-wide, single-base-resolution maps of methylated cytosines in a mammalian genome. Comparing both human embryonic stem cells and fetal fibroblasts, they found “widespread differences” between the two, including almost one-quarter of all methylation in embryonic stem cells was in a non-CG context, suggesting that embryonic stem cells may use different methylation mechanisms to affect gene regulation, they write.

Several opinion pieces appear this week. One checks in with experts and their concerns regarding the stimulus grants, another looks at the open-source Polymath Project, while a third by Cameron Neylon examines the potential of Google’s open-sorce collaboration tool, Google Wave.

A special insight section explores the changing landscape of neuroscience research. Says an editorial, “The experimental landscape has changed markedly over the past few years, given the technological advances in molecular genetics, optogenetics and functional imaging.” Articles cover molecular genetics and imaging technologies for circuit-based neuroanatomy, neuroscience and systems biology, and multimodal techniques for diagnosing Alzheimer’s disease.

Research led by Joel Levine, a neuroscientist from the University of Toronto, has determined that Drosophila melanogaster flies use a single chemical to communicate gender and sibling identity in order to pick the right sex partners. By inserting a transgene into the fly’s genome that killed cells that produced these special hydrocarbon signaling chemicals, they report that hydrocarbon-free male flies attempted copulating with each other, says a story at the BBC. Check out the accompanying video, too.

GWAS and Differences in DNA Between Tissues

Posted by Bob Grant
[Entry posted at 20th July 2009 04:52 PM GMT]
www.the-scientist.com/blog
Recent findings may spell trouble for genome-wide association studies based on DNA obtained through blood samples: Genetic material may vary between blood cells and other tissues in a single individual, a study in the July issue of Human Mutation reports.

Image: Wikimedia

The study “raises a very interesting question,” Howard Edenberg, director of the Indiana University School of Medicine’s center for medical genomics, told The Scientist. Many genome-wide association studies — especially studies on systemic diseases such as diabetes and atherosclerosis — depend solely upon DNA harvested from blood samples to identify genes associated with medical conditions. But this study “suggests that looking only at blood, you may miss some things.”

Searching for the genes behind a fatal condition called abdominal aortic aneurysm (AAA), researchers from McGill University in Montreal found that complementary DNA from diseased abdominal aortic tissue did not match genomic DNA from leukocytes in blood from the same patient. “We did not expect to find a difference in the tissue [genes] compared to the leukocyte [genes],” said endocrinologist Morris Schweitzer, who led the study.

Schweitzer and his team uncovered three single nucleotide polymorphisms (SNPs) in samples of diseased tissue from 31 AAA patients that were not present in matching blood samples. They also tested five aortic and blood samples from normal individuals and found the same discrepancy. Schweitzer said that the apparent genetic difference between different cells in the body may cast some doubt on genome-wide association studies that only use DNA from blood samples to infer disease states. “I think they may not be accurate because they might not reflect what’s in the tissue,” he said, adding that researchers should look upon such genetic results “very carefully and very trepidatiously”

Edenberg, who was not involved with the study but who conducts genome-wide association studies to explore the genetic roots of alcoholism and bipolar disorder, said that while the findings are interesting, they are very preliminary. “If they’re correct about this, and there are these genomic differences between tissues and blood at certain alleles, then we’re missing some things,” he said. Edenberg explained that experimenters generally take into account that such studies are somewhat “underpowered” in terms of their ability to catch every genetic indicator of disease. Schweitzer’s results, he noted, may add another layer to this consideration, but do not suggest that genome-wide association studies would turn up false positives, or blood-based genes mistakenly attributed to a particular disease.

Sudha Seshadri, a Boston University neurologist who was not involved in the study, told The Scientist that though the McGill group’s results are important, they do not negate genome-wide association data that scientists have already gathered. “I don’t think [the study] says much about the usefulness or validity of genome-wide association studies as they are being done in cohorts around the world.” Genome-wide studies on diabetes, for example, have identified about 16 genes that are related (in varying degrees) to the disease, said Seshadri, who collaborates on the Framingham Heart Study, a six-decade longitudinal study on more than 5,000 people that has more recently included genomic data.

“I think I would have suggested a few more experiments, personally,” Edenberg added. In particular, he pointed to the fact that the McGill researchers were comparing complementary DNA from aortic tissue to genomic DNA from blood. “At the moment,” he said, the discrepancy “seems relatively compatible with RNA editing [rather] than with a genomic issue.” The study should have compared genomic DNA from the aortic tissues with the genomic blood DNA, and cDNA from both cell types, Edenberg said.

Schweitzer said his group is currently working on this experiment and “should have results probably in a couple of weeks.” He noted that differences between tissue and blood DNA may account for the relatively low levels of association turned up by most genome-wide association studies. Of all the genome-wide association studies that have been conducted, he said, “No one has really found that one miracle gene that really points to something.”

Seshadri, however, said it’s hasty to dismiss the value of such studies. “I think [the authors] make some provocative statements that express a viewpoint, but not a widely-accepted viewpoint,” she said. “It’s far too early in the process of genome-wide association studies to conclude that they have not been fruitful.”

What can DNA tell us? Place your bets now

* 08 July 2009 by Lewis Wolpert and Rupert Sheldrake
* Magazine issue 2716. Subscribe and get 4 free issues.
* For similar stories, visit the Essays and Genetics Topic Guides

Read full article

From Newton to Hawking, scientists love wagers. Now Lewis Wolpert has bet Rupert Sheldrake a case of fine port that: “By 1 May 2029, given the genome of a fertilised egg of an animal or plant, we will be able to predict in at least one case all the details of the organism that develops from it, including any abnormalities.” If the outcome isn’t obvious, then the Royal Society will be asked to adjudicate. Watch this space…

Competition: Challenge New Scientist to a scientific wager
Lewis Wolpert

I HAVE entered into this wager with Rupert Sheldrake because of my interest in the details of how embryos develop, and how our understanding of this process will progress. In my latest book, How We Live and Why We Die, I suggest that it will one day be possible to predict from an embryo’s genome how it will develop, and I believe it is possible for this to happen in the next 20 years.

I am, in fact, being a little over-keen because 40 years is a more likely time frame for such a breakthrough. Cells and embryos are extremely complicated: for their size, embryonic cells are the most complex structures in the universe.

Animals develop from a single cell, a fertilised egg, which divides to produce cells that will form the embryo. How that egg develops into an embryo and newborn animal is controlled by genes in the chromosomes. These genes are passive: they do nothing, just provide the code for proteins. It is proteins that determine how cells behave. While the DNA in every cell contains the code for all the proteins in all the cells, it is the particular proteins produced in particular cells that determine how those cells behave.

Every cell of the embryo contains many copies of several thousand different proteins. These proteins have a plethora of functions: acting as enzymes to break down and build other molecules, providing structures for the cell, interacting with each other, and many more. The complexity of the interactions between millions of molecules is amazing.

As the proteins determine how the cells behave, it is their activity that causes the embryo to develop. Underlying this process, though, are the genes, as they control which proteins are made – including some proteins that activate specific genes. It is essential that there is this control over which cells continue to divide, and of mechanisms to pattern the embryo so that different cells develop into different structures, such as the brain or limbs.

There is a huge incentive to understand these processes and so be able to work out the development of an embryo given only its genome. This ability could pave the way for regenerative medicine by allowing scientists to program stem cells to become structures that could replace damaged parts of the body.

To win the bet, we will have to be able to predict the behaviour of almost all the cells in the embryo. In a small worm, say the nematode Caenorhabditis elegans, there are 959 cells, making it the ideal model to solve this problem. It is a major challenge, but advances in cell biology, systems biology and computing will take us there.
One of the nematode worms, with just 959 cells, is the ideal model to solve this problem

Rupert Sheldrake

LEWIS WOLPERT’s faith in the predictive power of the genome is misplaced. Genes enable organisms to make proteins, but do not contain programs or blueprints, or explain the development of embryos.

The problems begin with proteins. Genes code for the linear sequences of amino acids in proteins, which then fold up into complex three-dimensional forms. Wolpert’s wager presupposes that the folding of proteins can be computed from first principles, given the sequence of amino acids specified by the genes. So far, this has proved impossible. As in all bottom-up calculations, there is a combinatorial explosion. For example, by random folding, the amino-acid chain of the enzyme ribonuclease, a small protein, could adopt more than 1040 different shapes, which would take billions of years to explore. In fact, it folds into its habitual form in 2 minutes.

Even if we could solve protein-folding, the next stage would be to predict the structure of cells on the basis of the interactions of millions of proteins and other molecules. This would unleash a far worse combinatorial explosion, with more possible arrangements than all the atoms in the universe.

Random molecular permutations simply cannot explain how organisms work. Instead, cells, tissues and organs develop in a modular manner, shaped by morphogenetic fields, first recognised by developmental biologists in the 1920s. Wolpert himself acknowledges the importance of such fields. Among biologists, he is best known for “positional information”, by which cells “know” where they are within the field of a developing organ, such as a limb. But he believes morphogenetic fields can be reduced to standard chemistry and physics. I disagree. I believe these fields have organising abilities, or systems properties, that involve new scientific principles.

Issue 2716 of New Scientist magazine

* Like what you’ve just read?
* Don’t miss out on the latest content from New Scientist.
* Get 51 issues of New Scientist magazine plus unlimited access to the entire content of New Scientist online.

Applied Biosystems Presents SNP Guru

One hour. One kit. Any sample.
Fast track your SNP analysis with the new TaqMan® Sample-to-SNP™ Kit


UK Academy Offers Recommendations to ‘Fulfill Promise’ of GWAS

By a GenomeWeb staff reporter

July 09, 2009

NEW YORK (GenomeWeb News) – The UK’s Academy of Medical Sciences is calling for greater investment into genome-wide association studies and has offered several recommendations for ways to ensure that GWAS findings will ultimately benefit patients.

In a report released yesterday, AMS noted that “despite the many successes and exciting potential of GWA studies, there is considerable scope to further capitalize on the opportunities and secure real benefits for healthcare,” though it adds that “fulfilling this promise will take time.”

AMS is a network of biomedical scientists from commercial and academic organizations in the UK. The report’s recommendations are drawn from an October 2008 meeting on GWAS that was supported by GlaxoSmithKline.

A key challenge for GWAS, the report notes, is “moving from a statistical indication that a gene variant or region of
DNA is involved in a disease, to locating and identifying causal variants and the associated biological pathways.” This challenge “can only be met by greater integration between three historically distinct approaches to disease causality: genetic mapping, epidemiology and studies of pathophysiological mechanisms.”

In order to move knowledge gained from GWAS closer to medical use, AMS said, there is a need to identify additional factors that contribute to genetic variance, including the role of SNPs, CNVs, and epigenetics.

This will require the collection of samples from diverse populations for multiple diseases that have “some commonality of clinical datasets, patient consent and data access arrangements,” the group added.

AMS also sees a case for integrated epigenetic GWA studies that would combine sequence-based quantitative genetics and epigenome dynamics. “Crucial to this goal” will be initiatives that develop high-resolution reference epigenome maps such as the Alliance for the Human Epigenome and Disease, the report said.

The group also noted that researchers must have access to high-quality data from prospective studies as well as access to population-based samples such as the UK Biobank and disease registries.

There should also be further investment in bioinformatics and statistical methods to interpret sequence data, and tools must be developed to assess gene-gene and gene-environment effects and clinical endpoints, AMS advised.

AMS also pointed to a need for studies of differences in gene expression using diverse tissue types, and it urged the development of improved in vivo and in vitro models for assessing human causal variants.

The aim is to begin to translate the “wave of genetic findings” on common diseases into improved diagnostics, preventions, and treatments.

GWA studies also could contribute more to understanding genetic variation if their results were transferred to other populations, if re-sequencing studies were conducted to find rare variants, and if there were more cohort studies, AMS advised.

“Continued investment will be needed to translate new knowledge into benefits for patients and to ensure that the UK maintains a leading international position in this exciting area,” John Bell, president of the AMS, said in a statement.

TaqMan® OpenArray™ Workflow

Seguir

Obtenha todo post novo entregue na sua caixa de entrada.