Human and chimp genome sequencing

Human and chimp genome sequencing

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Can someone comment on this article which claims that biologists have been misleading the masses when they compare the genome of humans and chimps

I can comment.

From your article:

Is 99% human-chimp genome similarity less impressive in light of the fact that domestic cats share 90% of their genes with humans and yeast share over 30% of their genes with us, etc.? What should we make of these various quantitative comparisons? In reality, it is difficult to make sense of these percentages without a uniform metric to reference. Unfortunately, the biological sciences do not provide one.

The public has not been misled. The fact is that chimps and humans share 99% of their genes. The rest is interpretation. The rest is extrapolation. The rest is philosophy.

As a biologist, it's absolutely no surprise to me that we share a lot of our genome with cats, earthworms and even yeast. We do presumably because successful life on the planet Earth takes particular biological processes which are shared. We live in a Nitrogen-Oxygen atmosphere; we have carbon-based energy sources; we need to have similar enzymes to use oxygen and carbon, etc.

Beyond that, scientists have opinions they share. But no one is forcing you to believe a particular interpretation. I happen to believe in what makes the most sense. You can believe in what makes less sense if you prefer. Have there been misrepresentations? Absolutely, but that is often because journalists sharing the information misunderstand it. One of the most misunderstood papers I've seen is Sex Differences in Brain Gray and White Matter in Healthy Young Adults: Correlations with Cognitive Performance by Ger et al. It has been repeatedly and widely inaccurately represented in the lay press, partly (or perhaps wholly) because it is not a particularly well-written paper though the science in it is sound.

Don't assign blame for any belief to science just because you want it to say something different.

Well. Science is a funny thing compared to say the Koren, the Bible or any other religious text. As Scientist learn more, the narrative in Science will change. For instance, in the 1800, there were ideas that there was free following water on Mar… channels and ditches and such. In the 1960s, Mars was considered bone dry. No water ever. A desolate waste land. In the 1990s, Mar considered to have water, but only in the distant past 3 billion years ago. No water now. Then in the 2010s it is now considered to have subsurface water ice, that may occasionally turn into free flowing water under rare conditions. And worst yet, until recently… perhaps all those features of recent running water way have been caused by out gassing of CO2… Conflict over basic interpretation.

Now in that time, religion text, while silence on the matter of Mars, has not changed much if at all. Religious text has stayed the same.

So has Science been misleading the masses on Mars? Does Mars have water? Yes or No? Science keeps flip flopping.

In some ways yes, Science has been misleading the public. It has not been on target with a singular message, unlike say a religious text book, which has been delivering the same unchanging message for a thousand years or so.

So if you are looking for a consist message, well Science way not the thing for you. Science tends to tell different stories as scientist learn more, learn new techniques, re-evaluate ideas and generally fight with among themselves. The holy men Science like to review, rewrite their holy text rather often and fight. And the common man may find that dissatisfying and perhaps worrisome. If holy men keep changing the holy text, how will the common people worship? Is the Science real? Given it changes when the holy text of many world religions have not changed in centuries or even millennia.

So back to article… the percentage similarity of human to chimp genome changes as techniques improve. If you look at the actual age of the citations used in said article… we run from papers in the 1970s, before genome sequencing was available, and the ideas of one gene-one protein was how Science thought DNA worked, and karyotyping and chromosome banding was about the only means to make large scale chromosome comparison… to papers in the 2014. Where we know DNA does more than just encodes proteins, and non DNA coding are not junk but can encode RNA with biochemical activity (gene regulation, transcription factor regulation etc). And whole human genomes can be sequenced for less than USD 1,400 when it originally cost USD 2.7 billion.

This is nearly 50 years of change and development. Going from the time before the internet and calculators were serious business machines worth thousands of dollars to a time you can have a a 1980s super computer as a wrist watch.

So yes… the message changes. Science progresses, techniques that could only be imagined by previous generations of scientist have become routine. We have learned more… RNA isn't just a middle man between DNA and protein. One gene can give rise to many proteins.

Have the masses been mislead?

Scientist are telling the Masses information that is to the best of their knowledge. And if the current generation of scientist (2014 scientist) say the past generation (1970s scientist) had it wrong or not quite right… If scientist right now disagree on the best matrix… is that misleading?

A science book is not as safe as a religious text. And one of the worst offenders is biology. A biology text book book from the 1980s looks very different from 2010s, even if they are from the same series. Is that misleading or is that scientific advancement?

PS: Common genes and sequence identity is not the same. There is a bit of confusion in the article. A cat and a human may share many many genes. But those shared genes may have sequence difference between them. So you can say humans and cat share 90% (made up example) of their gene (ie both cat and human have the PTEN gene) but have only 70% (made up example) sequence identity (ie there are several DNA difference between the human PTEN gene and the cat PTEN gene). It is easy to confuse the two.

Also are you only only considering gene coding DNA? Or are you also considering the non coding DNA (which until cheap genome sequencing in the 2010s was extremely difficult to capture and sequence)

How similar are human and chimpanzee genomes?

I recently participated in a discussion on the Biologos forum on the degree of similarity between the human and chimpanzee genomes. I was asked for my current view on this issue by Dennis Venema, who had found a old quote online from a newspaper article that I had written in 2008 on this issue. In 2008, in a couple of newspaper articles, I did some simple calculations based on the 2005 Chimpanzee genome paper. On the basis of these, I had come to the surprising conclusion that these data suggested that the human and chimpanzee genomes in their entirety could be only 70% identical. Dennis Venema asked me if this was still my view. You can read the whole discussion here. It is rather long, with lots of tangential contributions. If you want a quick summary of my perspective, here is my final closing statement (which I originally posted here):

“How similar are the human and chimpanzee genomes?” is a relatively straightforward scientific question. We are hindered by the still somewhat incomplete nature of both the human and the chimpanzee reference genome assemblies, but we can make this clear in our assessments and allow for the uncertainties that it raises.

The best way to assess the similarity of two genomes is to take complete genome assemblies of both species, that have been assembled independently, and align them together. The alignment process involves searching the contents of the two genomes against each other. Parts of both genomes that are too different to match one another will be absent from the alignment, unless they are very short, in which case they will be included as “indels” (longer indels, even if they have well characterised flanking sequences, will be absent from the alignment). Within parts that do align, there will be some mismatches between the two genomes, where one or a few nucleotides differ, which in this discussion we have been calling “SNPs”. In addition there will be some parts of each genome that are present twice or multiple times in one genome and are present fewer times in the other genome. We have referred to these as “paralogs” or “copy number variants” (CNVs). To come up with an accurate figure of the similarity of the entirety of two genomes, we need to take into account all these types of difference.

For some purposes, when talking about the similarity between two genomes we may want to just focus on one type of difference, such as SNPs. If we do this, we should always specify which types of difference we have and have not taken into account. The most well-known estimates for the similarity of the human and chimpanzee genomes only take into account SNPs and small indels. Copy number variants are less often included, and regions of the two genomes that do not align are commonly ignored.

When assessing the total similarity of the human genome to the chimp genome, we also need to bear in mind that roughly 5% of the human genome has not been fully assembled yet, so the best we can do for that 5% is predict how similar it will be to the chimpanzee genome. We do not yet know for sure. The chimpanzee genome assembly is less well assembled, so in future we may assemble parts of the chimpanzee genome that are similar to the human genome – this is another source of uncertainty to keep in mind.

To come up with the most accurate current assessment that I could of the similarity of the human and chimpanzee genome, I downloaded from the UCSC genomics website the latest alignments (made using the LASTZ software) between the human and chimpanzee genome assemblies, hg38 and pantro6. See discussion post #35 for details. This gave the following for the human genome:

4.06% had no alignment to the chimp assembly
5.18% was in CNVs relative to chimp
1.12% differed due to SNPs in the one-to-one best aligned regions
0.28% differed due to indels within the one-to-one best aligned regions

The percentage of nucleotides in the human genome that had one-to-one exact matches in the chimpanzee genome was 84.38%

In order to assess how improvements in genome assemblies can change these figures, I did the same analyses on the alignment of the older PanTro4 assembly against Hg38 (see discussion post #40). The Pantro4 assembly was based on a much smaller amount of sequencing than the Pantro6 assembly (see discussion post #39). In this Pantro4 alignment:

6.29% had no alignment to the chimp assembly
5.01% was in CNVs relative to chimp
1.11% differed due to SNPs in the one-to-one best aligned regions
0.28% differed due to indels within the one-to-one best aligned regions

The percentage of nucleotides in the human genome that had one-to-one exact matches in the chimpanzee genome was 82.34%.

Thus the large improvement in the chimpanzee genome assembly between PanTro4 and PanTro6 has led to an increase in CNVs detected, and a decrease in the non-aligning regions. It has only increased the one-to-one exact matches from 82.34% to 84.38% even though the chimpanzee genome assembly is at least 8% more complete (I think) in PanTro6.

The PanTro4 assembly has also been aligned to the human genome using the software Mummer 4 (reported in: Marçais, Guillaume, et al. “MUMmer4: A fast and versatile genome alignment system.” PLoS Computational Biology 14.1 (2018): e1005944). This method gives broadly similar figures to my analyses of the UCSC LASTZ alignments. MUMmer places 2.782 Gb of the sequence in mutual best alignments, and the total length of the LASTZ alignment is 2.761Gb. In the MUMmer analysis approximately 306 Mb (9.91%) of the human sequence did not align to the chimpanzee sequence in mutual best alignments. This fits well with the LASTZ result of 6.29% non-aligning plus 5.01% CNV = 11.30% not aligning. Overall, the MUMer software has been slightly more generous in aligning the human and chimp genomes, but as Steve Schaffner has pointed out [here], MUMer is giving a higher estimate of SNP differences within its alignments. This is probably a signal that it has over-aligned the two genomes and some of its alignments are spurious. Thus I think we are best off trusting the LASTZ alignment over the MUMer alignment, though the difference between the results of the two methods is rather small.

As 5% of the human genome is still unassembled, and 5% seems to be CNVs relative to chimp, and 4% is unaligned to the chimp genome, I cannot agree with Dennis Venema [here] and Steve Schaffner [here] that “95% is the best estimate we have for the genome-wide identity of chimps and humans”. I would accept 95% as a prediction, but not as a statement of established fact.

I predict that the 95% figure will prove to be wrong, because (on the basis of my comparison of the PanTro4 and PanTro6 alignments to Hg38) I think that the CNV differences are here to stay, and I doubt that all of the currently unaligned or unsequenced regions of the human genome will prove to all be 95% the same as the chimpanzee genome. Some of the “unaligned” human sequences are medium-sized indels, and it is hard to see why they would not have been assembled in the chimp if they were present. I also expect at least some of these unaligned or unsequenced sequences to be rapidly evolving.

In 2008 I wrote “I predict that when we have a reliable, complete chimpanzee genome, the overall similarity of the human genome will prove to be close to 70% (and very far from 99%).” This prediction is not borne out by the more recent data above. I made a mistake in my 2008 calculations in the way in which I dealt with CNVs, which put me out by 2.7%, but this was only a minor component of why my estimate was so low. The main reason why my estimate was so low was because I thought that the 2005 chimpanzee genome assembly was far more complete than it actually was. This was because the authors claimed in the main text of the chimpanzee paper “the draft genome assembly…covers

94% of the chimpanzee genome with >98% of the sequence in high-quality bases.” Thanks to discussion in this thread with Steve Schaffner (see post #62, #63 and others in the discussion), who was one of the authors of the 2005 chimp genome paper, I can now see that the 2005 draft genome assembly was not as good as this claim suggested. However, in 2008 I did not know this, and my prediction was made in good faith on the basis of my understanding of the 2005 paper.

The macaque and human evolution

One of the hopes and justifications for sequencing the chimpanzee genome was that it would allow us to identify the genetic changes 'that make us human'. Once chimpanzee genome sequences started to become available, papers quickly appeared, searching for unique genetic changes along the human lineage after we separated from chimpanzees. In the absence of other primate genome sequences, the mouse was used for comparison with chimpanzee and human [3]. However, given the relatively deep evolutionary divergence of the mouse and primate lineages, of the order of at least 70 million years ago, so many changes could have occurred either along the mouse lineage or on the long branch leading to the common ancestor of humans and chimpanzees that we cannot with much confidence estimate what nucleotide was present in any position in that ancestor. Thus, we were not able to reasonably estimate whether a given difference between the chimp and human genomes had occurred in the human lineage or in the chimpanzee lineage (Figure 1). Using the macaque genome as a comparison, however, we can now place changes on a lineage far more reliably, because the probability of convergent changes is much smaller than with the mouse.

The macaque is a better outgroup than the mouse for inferring the history of sequence changes in human and chimpanzee genomes. (a) The scaled phylogeny of primates with respect to the mouse. Over long evolutionary periods, multiple mutations are likely to occur at the same position in the genome, obscuring that base's true evolutionary history. This is indicated here by the change of the initial T to a C and later to an A in the mouse genome, and the change from the T to a G in the primate line, and later to an A in the chimpanzee line only. (b) If a distantly related species (the mouse) is used as the outgroup in a comparison of the human and chimpanzee genomes, this can lead to the mistaken conclusion that a unique mutation has occurred along the human lineage, as demonstrated in the diagram on the left. When the genomes are compared using a more closely related outgroup (the macaque) the more probable history of this difference is revealed, as shown in the diagram on the right.

Screens for positively selected changes between chimpanzees and humans using the mouse genome as an outgroup initially suggested that selected changes were more numerous in the human lineage than in the chimp lineage [3]. Other studies found a possibly accelerated rate of change in conserved noncoding regions in the human lineage [4]. These observations were readily accepted, in part because they supported our naturally anthropocentric view that humans are special and so there should be a molecular signature of our uniqueness. More recent analyses using the more closely related macaque as the outgroup suggest, however, that a greater number of positively selected changes has in fact occurred along the chimpanzee lineage, leaving humans as the more 'primitive' species from a genomic standpoint [5, 6]. This is somewhat surprising, given that overall the skeletons of our 5-6-million-year-old ancestors look remarkably chimpanzee-like. With respect to our extremely large and complex brain, studies using the mouse as outgroup proposed an accelerated rate of evolution in nervous-system genes in humans [7]. But, perhaps no longer surprisingly, a recent study using the macaque genome for comparison showed that even genes expressed specifically in the brain were found to be under no greater selection in humans than in chimpanzees [8]. Thus, we still do not know the molecular basis for the evolution of the uniquely large human brain.

The macaque genome has also benefited our understanding of the human genome in other ways. For example, the method called 'phylogenetic shadowing' involves the comparison of DNA sequences across multiple species to reveal conservative sequence blocks. Such conserved regions may be putative exons, regulatory elements, or otherwise functionally significant [9, 10]. By comparing sequences of closely related species (for example, between primates, rather than distinctly related animals), the rare changes within these 'least variable' regions may highlight the critical mutations that make a species unique.

DNA analysis for chimpanzees and humans reveals striking differences in genes for smell, metabolism and hearing

Nearly 99 percent alike in genetic makeup, chimpanzees and humans might be even more similar were it not for what researchers call "lifestyle" changes in the 6 million years that separate us from a common ancestor. Specifically, two key differences are how humans and chimps perceive smells and what we eat.

A massive gene-comparison project involving two Cornell University scientists, and reported in the latest issue of the journal Science (Dec. 12, 2003), found these and many other differences in a search for evidence of accelerated evolution and positive selection in the genetic history of humans and chimps.

In the most comprehensive comparison to date of the genetic differences between two primates, the genomic analysts found evidence of positive selection in genes involved in olfaction, or the ability to sense and process information about odors. "Human and chimpanzee sequences are so similar, we were not sure that this kind of analysis would be informative," says evolutionary geneticist Andrew G. Clark, Cornell professor of molecular biology and genetics. "But we found hundreds of genes showing a pattern of sequence change consistent with adaptive evolution occurring in human ancestors." Those genes are involved in the sense of smell, in digestion, in long-bone growth, in hairiness and in hearing. "It is a treasure-trove of ideas to test by more careful comparison of human and chimpanzee development and physiology," Clark says.

The DNA sequencing of the chimpanzee was performed by Celera Genomics, in Rockville, Md., as part of a larger study of human variation headed by company researchers Michele Cargill and Mark Adams.

Celera generated some 18 million DNA sequence "reads," or about two-thirds as many as were required for the first sequencing of the human genome. Statistical modeling and computation was done by Clark and by Rasmus Nielsen, a Cornell assistant professor of biological statistics and computational biology. Some of the analysis, which also compared the mouse genome, used the supercomputer cluster at the Cornell Theory Center. Clark explains, "By lining up the human and chimpanzee gene sequences with those of the mouse, we thought we might be able to find genes that are evolving especially quickly in humans. In a sense, this method asks: What are the genes that make us human? Or rather, what genes were selected by natural selection to result in differences between humans and chimps?" The study started with almost 23,000 genes, but this number fell to 7,645 because of the need to be sure that the right human, chimp and mouse genes were aligned.

According to Clark, all mammals have an extensive repertoire of olfactory receptors, genes that allow specific recognition of the smell of different substances. "The signature of positive selection is very strong in both humans and chimps for tuning the sense of smell, probably because of its importance in finding food and perhaps mates," says Clark. In addition to the great departure in smell perception, differences in amino acid metabolism also seem to affect chimps' and humans' abilities to digest dietary protein and could date back to the time when early humans began eating more meat, Clark speculates. Anthropologists believe that this occurred around 2 million years ago, in concert with a major climate change.

"This study also gives tantalizing clues to an even more complex difference -- the ability to speak and understand language," Clark says. "Perhaps some of the genes that enable humans to understand speech work not only in the brain, but also are involved in hearing." Evidence for this came from a particularly strong sign of selection acting on the gene that codes for an obscure protein in the tectorial membrane of the inner ear. One form of congenital deafness in humans is caused by mutations to this gene, called alpha tectorin.

Mutations in alpha tectorin result in poor frequency response of the ear, making it hard to understand speech. "It's something like replacing the soundboard of a Stradivarius violin with a piece of plywood," Clark notes. The large divergence between humans and chimps in alpha tectorin, he says, could imply that humans needed to tune the protein for specific attributes of their sense of hearing. This leads Clark to wonder whether one of the difficulties in training chimpanzees to understand human speech is that their hearing is not quite up to the task. Although studies of chimpanzee hearing have been done, detailed tests of their transient response have not been carried out.

Clark emphasizes that a study like this cannot prove that the biology of humans and chimps differ because of this or that particular gene. "But it generates many hypotheses that can be tested to yield insight into exactly why only 1 percent in DNA sequence difference makes us such different beasts," he says.

Also collaborating in the study were researchers at Applied Biosystems (Foster City, Calif.), Celera Diagnostics (Alameda, Calif.) and Case Western Reserve University in Cleveland. The Science report is titled, "Inferring non-neutral evolution from human-chimp-mouse orthologous gene trios."

Genetic Similarities Between Humans and Chimps

  • The size of the human and chimp genome is similar.
  • Genome sequences of humans and chimps are 98.8% similar.
  • Human and chimpanzee chromosomes are very similar.
  • The mean divergence of chromosomes is similar.
  • Further, the mean divergence of nonpolymorphic sites and CpG sites are also similar.
  • The number of nucleotide substitutions is 35 million.
  • Also, the number of insertions and deletions are 5 million. The total amount of insertions and deletions is 90 Mb.
  • Nucleotide divergence of the mitochondrial genomes is similar.
  • Can find 66% gene duplications in both genomes.
  • 29% of the human and chimp orthologous proteins are similar.

A scan for positively selected genes in the genomes of humans and chimpanzees

Since the divergence of humans and chimpanzees about 5 million years ago, these species have undergone a remarkable evolution with drastic divergence in anatomy and cognitive abilities. At the molecular level, despite the small overall magnitude of DNA sequence divergence, we might expect such evolutionary changes to leave a noticeable signature throughout the genome. We here compare 13,731 annotated genes from humans to their chimpanzee orthologs to identify genes that show evidence of positive selection. Many of the genes that present a signature of positive selection tend to be involved in sensory perception or immune defenses. However, the group of genes that show the strongest evidence for positive selection also includes a surprising number of genes involved in tumor suppression and apoptosis, and of genes involved in spermatogenesis. We hypothesize that positive selection in some of these genes may be driven by genomic conflict due to apoptosis during spermatogenesis. Genes with maximal expression in the brain show little or no evidence for positive selection, while genes with maximal expression in the testis tend to be enriched with positively selected genes. Genes on the X chromosome also tend to show an elevated tendency for positive selection. We also present polymorphism data from 20 Caucasian Americans and 19 African Americans for the 50 annotated genes showing the strongest evidence for positive selection. The polymorphism analysis further supports the presence of positive selection in these genes by showing an excess of high-frequency derived nonsynonymous mutations.


Figure 1. Distribution of Mutations

Figure 1. Distribution of Mutations

The figure shows the number of synonymous and nonsynonymous nucleotide…

Figure 2. Frequency Spectra

Figure 2. Frequency Spectra

The figure shows the frequency spectra of nonsynonymous (red) and synonymous…

Monkey genome reveals DNA similarities with humans

Scientists have decoded the genome of the rhesus macaque monkey and compared it with the genomes of humans and their closest living relatives — the chimps — revealing that the three primate species share about 93 percent of the same DNA.

The sequencing was completed by an international consortium of researchers, including scientists at the Genome Sequencing Center at the School of Medicine, and is published in a special section of the April 13 issue of the journal Science.

In related news, University scientists recently completed the raw sequences for the orangutan and marmoset genomes. Analysis of these genomes and a comparison with human and the other primates will be carried out over the next several months.

The National Human Genome Research Institute, one of the National Institutes of Health, is funding all three sequencing projects.

“Having this growing portfolio of primate genomes will allow us to better understand the important biology that underlies the information encoded in the genome sequences,” said Richard K. Wilson, Ph.D., director of the Genome Sequencing Center. “We’ll be able to gain clues as to why humans and some primates develop certain cancers and other diseases, while other primates do not.”

By placing the human genome alongside those of the other primates, scientists can identify all the molecular changes that separate the various species. On a practical level, this work is likely to help determine how and when genetic alterations associated with certain diseases, including hepatitis, malaria and Alzheimer’s disease, crept into the genome and why non-human primates do not develop such illnesses.

Genome sequencing involves determining the precise order of the 3 billion letters — a combination of As, Cs, Gs and Ts — that make up the animal’s DNA. DNA was derived from a blood sample taken from each primate no animals were harmed as part of this research.

The macaque genome is the second non-human primate after the chimp to have its genome sequenced and the first Old World monkey to have its DNA deciphered. The macaque genome is important because it is more distant to the human than the chimp or the orangutan genomes. This means that important genome features that are conserved through evolution can be more easily seen by comparing the macaque with human.

“Having in hand the genomes of primates more distantly related to humans than the chimp will give researchers an opportunity to determine the precise changes in each of the genomes over the course of evolution, from macaques to marmosets to orangutan, chimps and humans,” Wilson said. “This is important because macaques and marmosets also serve as a valuable model for studying human infectious diseases, such as HIV, and for vaccine research.

Independent assemblies of the macaque genome data were carried out at the University, Baylor College of Medicine and the J. Craig Venter Institute in Rockville, Md. Baylor also collaborated with the University on sequencing the orangutan and marmoset.

Human Genome Sequencing: Approaches and Applications

A list of different methods used for mapping of human genomes is given below. These techniques are also useful for the detection of normal and disease genes in humans.

1. DNA sequencing : Physical map of DNA can be identified with highest resolution.

2. Use of probes : To identify RFLPs, STS and SNPs.

3. Radiation hybrid mapping: Fragment genome into large pieces and locate markers and genes. Requires somatic cell hybrids.

4. Fluorescence in situ hybridization (FISH) : To localize a gene on chromosome.

5. Sequence tagged site (STS) mapping : Applicable to any part of DNA sequence if some sequence information is available.

6. Expressed sequence tag (EST) mapping : A variant of STS mapping expressed genes are actually mapped and located.

7. Pulsed-field gel electrophoresis (PFGE) : For the separation and isolation of large DNA fragments.

8. Cloning in vectors (plasmids, phages, variable lengths, cosmids, YACs, BACs).: To isolate DNA fragments of variable length.

9. Polymerase chain reaction (PCR) : To amplify gene fragments.

10. Chromosome walking : Useful for cloning of overlapping DNA fragments (restricted to about 200 kb).

11. Chromosome jumping : DNA can be cut into large fragments and circularized for use in chromosome walking.

12. Detection of cytogenetic abnormalities : Certain genetic diseases can be identified by cloning the affected genes e.g. Duchenne muscular dystrophy.

13. Databases : Existing databases facilitate gene identification by comparison of DNA and protein sequences.

For elucidating human genome, different approaches were used by the two HGP groups. IHCSC predominantly employed map first and sequence later approach. The principal method was hierarchical shotgun sequencing. This technique involves fragmentation of the genome into small fragments (100-200 kb), inserting them into vectors (mostly bacterial artificial chromosomes, BACs) and cloning. The cloned fragments could be sequenced.

Celera Genomics used whole genome shotgun approach. This bypasses the mapping step and saves time. Further, Celera group was lucky to have high-throughput sequenators and powerful computer programmes that helped for the early completion of human genome sequence.

Whose Genome was Sequenced?

One of the intriguing questions of human genome project is whose genome is being sequenced and how will it relate to the 6 billion or so population with variations in world? There is no simple answer to this question.

However, looking from the positive side, it does not matter whose genome is sequenced, since the phenotypic differences between individuals are due to variations in just 0.1% of the total genome sequences. Therefore many individual genomes can be used as source material for sequencing.

Much of the human genome work was performed on the material supplied by the Centre for Human Polymorphism in Paris, France. This institute had collected cell lines from sixty different French families, each spanning three generations. The material supplied from Paris was used for human genome sequencing.

Human Genome Sequence -Results Summarised:

The information on the human genome projects is too vast, and only some highlights can be given below. Some of them are briefly described.

Major Highlights of human Genome:

1. The draft represents about 90% of the entire human genome. It is believed that most of the important parts have been identified.

2. The remaining 10% of the genome sequences are at the very ends of chromosomes (i.e. telomeres) and around the centromeres.

3. Human genome is composed of 3200 Mb (or 3.2 Gb) i.e. 3.2 billion base pairs (3,200,000,000).

4. Approximately 1.1 to 1.5% of the genome codes for proteins.

5. Approximately 24% of the total genome is composed of introns that split the coding regions (exons), and appear as repeating sequences with no specific functions.

6. The number of protein coding genes is in the range of 30,000-40,000.

7. An average gene consists of 3000 bases, the sizes however vary greatly. Dystrophin gene is the larget known human gene with 2.4 million bases.

8. Chromosome 1 (the target human chromosome) contains the highest number of genes (2968), while the Y chromosome has the lowest. Chromosomes also differ in their GC content and number of transposable elements.

9. Genes and DNA sequences associated with many diseases such as breast cancer, muscle diseases, deafness and blindness have been identified.

10. About 100 coding regions appear to have been copied and moved by RNA-based transposition (retro- transposons).

11. Repeated sequences constitute about 50% of the human genome.

12. A vast majority of the genome (

97%) has no known functions.

13. Between the humans, the DNA differs only by 0.2% or one in 500 bases.

14. More than 3 million single nucleotide polymorphisms (SNPs) have been identified.

15. Human DNA is about 98% identical to that of chimpanzees.

16. About 200 genes are close to that found in bacteria.

Most of the Genome Sequence is Identified:

About 90% of the human genome has been sequenced. It is composed of 3.2 billion base pairs (3200 Mb or 3.2 Gb). If written in the format of a telephone book, the base sequence of human genome would fill about 200 telephone books of 1000 pages each. Some other interesting analogs/ sidelights of genome are given in Table 12.3.

Individual differences in genomes:

It has to be remembered that every individual, except identical twins, have their own versions of genome sequences. The differences between individuals are largely due to single nucleotide polymorphisms (SNPs). SNPs represent positions in the genome where some individuals have one nucleotide (i.e. an A), and others have a different nucleotide (i.e. a G). The frequency of occurrence of SNPs is estimated to be one per 1000 base pairs. About 3 million SNPs are believed to be present and at least half of them have been identified.

Benefits/Applications of Human Genome Sequencing:

It is expected that the sequencing of human genome and the genomes of other organisms will dramatically change our understanding and perceptions of biology and medicine. Some of the benefits of human genome project are given.

Identification of human genes and their functions:

Analysis of genomes has helped to identify the genes, and functions of some of the genes. The functions of other genes and the interaction between the gene products needs to be further elucidated.

Understanding of polygenic disorders:

The biochemistry and genetics of many single- gene disorders have been elucidated e.g. sickle-cell anemia, cystic fibrosis, and retinoblastoma. A majority of the common diseases in humans, however, are polygenic in nature e.g. cancer, hypertension, diabetes. At present, we have very little knowledge about the causes of these diseases. The information on the genome sequence will certainly help to unravel the mysteries surrounding polygenic diseases.

Improvements in gene therapy:

At present, human gene therapy is in its infancy for various reasons. Genome sequence knowledge will certainly help for more effective treatment of genetic diseases by gene therapy.

Improved diagnosis of diseases:

In the near future, probes for many genetic diseases will be available for specific identification and appropriate treatment.

Development of pharmacogenomics:

The drugs may be tailored to treat the individual patients. This will become possible considering the variations in enzymes and other proteins involved in drug action, and the metabolism of the individuals.

Genetic basis of psychiatric disorders:

By studying the genes involved in behavioural patterns, the causation of psychiatric diseases can be understood. This will help for the better treatment of these disorders.

Understanding of complex social trait:

With the genome sequence now in hand, the complex social traits can be better understood. For instance, recently genes controlling speech have been identified.

Knowledge on mutations:

Many events leading to the mutations can be uncovered with the knowledge of genome.

Better understanding of developmental biology:

By determining the biology of human genome and its regulatory control, it will be possible to understand how humans develop from a fertilized eggs to adults.

Comparative genomics:

Genomes from many organisms have been sequenced, and the number will increase in the coming years. The information on the genomes of different species will throw light on the major stages in evolution.

Development of biotechnology:

The data on the human genome sequence will spur the development of biotechnology in various spheres.

Comparison of human and chimpanzee genomes reveals striking similarities and differences

The first comprehensive comparison of the genetic blueprints of humans and chimpanzees shows our closest living relatives share perfect identity with 96 percent of our DNA sequence, an international research consortium reported today. Led by scientists from the Broad Institute of the Massachusetts Institute of Technology and Harvard University, Cambridge, MA, and the Washington University School of Medicine in Saint Louis, MO, the Chimpanzee Sequencing and Analysis Consortium reported its findings in the Sept. 1 issue of the journal Nature.

Comparison of the chimpanzee and human genomes reveals extraordinary similarities, significant differences and new paths for biomedical research:

  • It provides unambiguous confirmation of the common and recent evolutionary origin of human and chimpanzees, as first predicted by Charles Darwin in 1871.
  • It provides key information for human medicine by revealing important properties of the human genome, including the types of genes that have been evolving most rapidly over millions of years and specific chromosomal regions that have undergone strong positive selection during recent human history. This sheds light on human biology and especially on human disease, because at least some of these reflect responses to recent infectious agents or evolutionary changes relevant to human health.
  • It demonstrates that the human and chimpanzee species have tolerated more deleterious mutations than other mammals, such as rodents. This confirms an important evolutionary prediction, and may account for greater innovation in primates than rodents, as well as a high incidence of genetic diseases.

"We now have a nearly complete catalog of the genetic changes that occurred during the evolution of the modern human and chimpanzee species from our common ancestor," said the study's lead author, Tarjei S. Mikkelsen of the Broad Institute. "By cross-referencing this catalog against clinical observations and other biological data, we can begin to identify the specific changes that underlie the unique traits of the human species."

"The evolutionary comparison of the human and chimpanzee genomes has major implications for biomedicine," said Eric Lander, director of the Broad Institute. "It provides a crucial baseline for human population genetic analysis. By identifying recent genetic changes and regions with unusually high or low variation, it can point us to genes that vary as a response to infectious agents and environmental pressures."

Among the major findings of the Consortium are:

1. The chimpanzee and human genomes are strikingly similar and encode very similar proteins. The DNA sequence that can be directly compared between the two genomes is almost 99 percent identical. When DNA insertions and deletions are taken into account, humans and chimpanzees still share 96 percent sequence identity. At the protein level, 29 percent of genes code for the same amino sequences in chimpanzees and humans. In fact, the typical human protein has accumulated just one unique change since chimpanzees and humans diverged from a common ancestor about 6 million years ago.

2. A few classes of genes are changing unusually quickly in both humans and chimpanzees compared with other mammals. These classes include genes involved in perception of sound, transmission of nerve signals, production of sperm and cellular transport of ions. The rapid evolution of these genes may have contributed to the special characteristics of primates.

3. Humans and chimpanzees have accumulated more potentially deleterious mutations in their genomes over the course of evolution than have mice, rats and other rodents. While such mutations can cause diseases that may erode a species' overall fitness, they may have also made primates more adaptable to rapid environmental changes and enabled them to achieve unique evolutionary adaptations.

4. About 35 million DNA base pairs differ between the shared portions of the two genomes. In addition, there are another 5 million sites that differ because of an insertion or deletion in one of the lineages, along with a much smaller number of chromosomal rearrangements. Most of these differences lie in what is believed to be DNA of little or no function. However, as many as 3 million of the differences are found in crucial protein-coding genes or other functional areas of the genome. Somewhere in these relatively few differences lies the biological basis for the unique characteristics of the human species, including human-specific diseases such as Alzheimer's disease, certain cancers, and HIV/AIDS.

5. Although the statistical signals are relatively weak, a few classes of genes appear to be evolving more rapidly in humans than in chimpanzees. The single strongest outlier involves genes that code for transcription factors, molecules that regulate the activity of other genes and that play key roles in embryonic development.

6. A small number of other genes have undergone even more dramatic changes. More than 50 genes present in the human genome are missing or partially deleted from the chimpanzee genome. The corresponding number of gene deletions in the human genome is not yet precisely known. For example, three key genes involved in inflammation appear to be deleted in the chimpanzee genome, possibly explaining some of the known differences between chimpanzees and humans in respect to immune and inflammatory response. On the other hand, humans appear to have lost the function of the caspase-12 gene, which produces an enzyme that may affect the progression of Alzheimer's disease.

7. There are six regions in the human genome that have strong signatures of selective sweeps over the past 250,000 years (selective sweeps occur when a mutation arises in a population and is so advantageous that it spreads throughout the population within a few hundred generations and eventually becomes "normal.") One region contains more than 50 genes, while another contains no known genes and lies in an area that scientists refer to as a "gene desert." Intriguingly, this gene desert may contain elements regulating the expression of a nearby protocadherin gene, which has been implicated in patterning of the nervous system.

A seventh region with moderately strong signals contains the FOXP2 and CFTR genes. FOXP2 has been implicated in the acquisition of speech in humans. CFTR, which codes for a protein involved in ion transport and, if mutated, can cause the fatal disease cystic fibrosis, is thought to be the target of positive selection in European populations.

The initial complete sequence of the chimpanzee genome and comparison to the human genome is an important milestone in what will be several years of intensive work at understanding human evolutionary history and applying these data to biomedical research. The fact that these data, and all future data from the Consortium, are being placed in the public domain means that scientists worldwide can contribute to this work.

The 67 researchers who took part in the Chimpanzee Sequencing and Analysis Consortium share authorship of the Nature paper. The sequencing and assembly of the chimpanzee genome was done at the Broad Institute and at the Washington University School of Medicine in Saint Louis, MO. In addition to those centers, the consortium included researchers from institutions elsewhere in the United States, as well as Israel, Italy, Germany and Spain. The work of the Chimpanzee Sequencing and Analysis Consortium is funded in part by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health.

The team was co-led by Lander, Richard Wilson of the Washington University School of Medicine in Saint Louis, MO and Robert Waterston of the University of Washington, Seattle WA.

A complete list of authors and affiliations can be found at

About the Broad Institute of MIT and Harvard

The Broad Institute of MIT and Harvard was founded in 2003 to bring the power of genomics to biomedicine. It pursues this mission by empowering creative young scientists to construct new and robust tools for genomic medicine, to make them accessible to the global scientific community, and to apply them to the understanding and treatment of disease.

The Institute is a research collaboration that involves faculty, professional staff and students from throughout the MIT and Harvard academic and medical communities. It is governed jointly by the two universities.

Organized around Scientific Programs and Scientific Platforms, the unique structure of the Broad Institute enables scientists to collaborate on transformative projects across many scientific and medical disciplines.

Broad Institute

The chimpanzee is the closest living relative to modern humans and thus its genome holds a wealth of information about recent human history and biology. Although not a "model organism" in the classic sense of disease study, the chimpanzee genome can offer major insights into the genetic basis of human disease as well as understanding the mechanisms of evolution at work on the human genome.

The initial goal of the chimpanzee genome sequencing project was to produce a draft sequence of a male chimpanzee (named Clint). The project was undertaken as a collaborative effort of the Broad Institute and the Washington University Genome Sequencing Center (WUGSC) and has been used for an initial comparison with the human genome. The project also produced a large collection of single-nucleotide polymorphisms (SNPs) within and between western and central chimpanzee populations, which provide a genetic backdrop for the study of human population genetics.

Additional whole-genome shotgun sequence has subsequently been produced by our collaborators, and an updated genome assembly (PanTro2.1, 6X coverage) is available from established genome web browsers. A comprehensive BAC based physical map has also been produced.

NIH-funded scientists publish orangutan genome sequence

Slower orangutan evolution has resulted in a more stable genome compared to humans.

It is easy to feel a kinship with orangutans when looking into their soulful eyes and observing their socially complex behavior. Perhaps that’s because orangutans and humans share 97 percent of their DNA sequence, according to an analysis of the great ape's genome published today by an international group of scientists.

Orangutans, known for their distinctive auburn hair, are primarily tree dwellers native to the Southeast Asian islands of Sumatra and Borneo. The DNA sequence published in the Jan. 27 issue of Nature is from a female Sumatran (Pongo abelii) orangutan. In addition, five Sumatran and five Bornean (Pongo pygmaeus) orangutan genomes were sequenced at a less detailed level. The orangutan is the third non-human primate to have its genome sequenced, after the chimp and rhesus macaque. Of the great apes, orangutans are the most distantly related to humans, while chimpanzees are the most closely related.

Funded in part by the National Human Genome Research Institute (NHGRI), a component of the National Institutes of Health (NIH), the study was led by scientists from the Washington University School of Medicine in St. Louis and Baylor College of Medicine, Houston.

Researchers can now leverage the orangutan genome sequence to learn more about the biology of this endangered species and to identify what has been added or deleted in the evolution of primate and human genomes that may have contributed to unique human characteristics.

"The unique evolutionary position of the orangutan can be leveraged to discover parts of the human genome that differ among primates," said NHGRI Director Eric D. Green, M.D., Ph.D. "Sequencing many primate genomes can help us define and understand the conserved DNA sequences that set humans apart from primates."

While humans and orangutans are similar at the DNA level, comparing available primate genome sequences revealed that the orangutan has evolved much more slowly than chimpanzees and humans. The orangutan genome has fewer large DNA sequence structural rearrangements than its chimpanzee and human counterparts. Large genome structural rearrangements are DNA mutations that result in large genomic segments being duplicated, deleted, inserted or inverted.

"In terms of evolution, the orangutan genome is quite special among great apes in that it has been extraordinarily stable over the past 15 million years," says senior author Richard K. Wilson, Ph.D., director of the Washington University Genome Center. "This compares with chimpanzees and humans, both of which have experienced large-scale structural rearrangements in their DNA that may have accelerated their evolution."

The researchers catalogued one type of large structural rearrangement called segmental duplications that have played a major role in restructuring other primate genomes. These large, almost identical copies of DNA are present in at least two locations of the genome and known to be associated with human diseases. Segmental duplications make up about 5 percent of human and chimpanzee genomes, but are present in only about 3.8 percent of the orangutan genome.

There are also many fewer Alu elements — short stretches of DNA that insert themselves into a genome these are associated with new mutations and gene recombination. The human genome possesses about 5,000 human-specific Alu elements, while the chimpanzee genome has about 2,000 chimp-specific Alu elements. Only 250 such elements were found in the orangutan genome.The lack of newer Alu elements could be one of the reasons that the orangutan genome does not have the degree of structural rearrangement found in other great apes.

Another structural oddity encountered is the presence of a neocentromere on orangutan chromosome 12. A neocentromere is a centromere that appears in a novel location. A centromere sits in the middle of and joins the two arms of a chromosome. It also helps to keep chromosomes properly aligned during the complex process of cell division. This is the first neocentromere discovered in a primate genome. One was previously found in the horse genome. Discovery of the neocentromere will help researchers understand how centromeres, and therefore chromosomes, change and evolve.

"This variant in the chromosome 12 centromere position appears in both populations of orangutan," said Kim Worley, Ph.D., an author of the study and associate professor of the Baylor College of Medicine Human Genome Sequencing Center. "It attracts centromeric proteins in the same way that a normal centromere does."

The analysis also reveals the immense genetic diversity across and within Sumatran and Bornean orangutans. Diversity is important because it enhances the ability of populations to stay healthy and adapt to changes in the environment. The new research shows that the Sumatran and Bornean orangutans diverged some 400,000 years ago. Today, only about 50,000 Bornean and 7,000 Sumatran orangutans still live in the wild. But, in a finding that seems counter-intuitive, the scientists found the smaller population of Sumatran orangutans is genetically more diverse than the Bornean population.

"The average orangutan is still more diverse — genetically speaking — than the average human," says lead author Devin Locke, Ph.D., an evolutionary geneticist at Washington University's Genome Center. "We found deep diversity in both Bornean and Sumatran orangutans, but it’s unclear whether this level of diversity can be maintained in light of continued widespread deforestation of their homes."

"It is our hope that the genome assembly and population variation data presented here provide a valuable resource to the community to aid the preservation of these precious species," according to the paper's conclusion.

The chimpanzee, orangutan and human genome sequences, along with those of a wide range of other organisms, can be accessed through the following public genome browsers: GenBank at NIH's National Center for Biotechnology Information (NCBI) the Ensembl Genome Browser at the Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute the DNA Data Bank of Japan and EMBL-Bank , at the European Molecular Biology Laboratory's Nucleotide Sequence Database.