We are searching data for your request:
Upon completion, a link will appear to access the found materials.
Reading from Oshima et al. (2016):
We identified 3,868 noncoding mutations including 394 located <5 Kb downstream, 1,762 intergenic, 1,621 intronic, 81 <5 Kb upstream, 7 UTR 3', 2 UTR 5', and 1 intragenic variants.
How upstream, downstream and intragenic variants are defined? Shouldn't intragenic variants be classified within intronic variants, and upstream/downstream within intergenic variants?
Edit: By adding the different numbers it seems they are mutual exclusive. While upstream/downstream variants can be defined as intergenic variants that lie close to genes, I haven't found a definition to tell the intragenic variants from the intronic ones.
Edit 2: The only explanation I could think of for intragenic variant is that it is located in a coding region which is not transcribed due to alternative splicing. Can it be possible?
- Oshima, Koichi, et al. "Mutational landscape, clonal evolution patterns, and role of RAS mutations in relapsed acute lymphoblastic leukemia." Proceedings of the National Academy of Sciences (2016): 201608420.
We can take a look at the following picture (from wikipedia):
The regions are then:
- intergenic: everything outside of the shown gene region
- <5 Kb downstream: outside of the transcribed region, but possibly part of the promoter or an enhancer (yellow boxes before coding region).
- UTR 5': in the transcribed region, but not part of the coding region, as shown in the picture (blue box before coding region).
- intronic: In the Introns, as shown in the picture (grey boxes)
- intragenic: These have to be mutations inside the coding region (red boxes), which is confusing since the authors specifically describe them as 'non-coding' mutations. If you read that as 'mutations that don't change the protein sequence', these could be silent mutations
- UTR 3': in the transcribed region, but not part of the coding region, as shown in the picture (blue box after coding region)
- <5 Kb upstream: outside of the transcribed region, but possibly part of an enhancer (yellow boxes after coding region)
The Sequence Ontology project defines an intragenic variant as "A variant that occurs within a gene but falls outside of all transcript features. This occurs when alternate transcripts of a gene do not share overlapping sequence." source
Intragenic means the variant is located within the same gene, which only implies that the sequence analysis assigned a certain variant to a gene. So, there is incomplete data regarding the gene product (such as different splice forms).
You can look up the mutation locations here: http://grch37.ensembl.org/index.html
DNA and Race
As human migration progressed throughout the world, genetic isolation led to the development of distinct populations that shared common DNA and other genetic material. Race is defined as a group related by common descent or heredity. Often, these groups also share similar phenotypic traits. Outside of genetic characteristics, race can also include cultural and ethnic similarities of a people.
Though different populations developed through the process of migration and genetic isolation, all humans fall under the same species, Homo sapiens. The field of eugenics, which posited that most human phenotypic and personality traits were controlled by genes, was developed at the beginning of the 20 th century (Allen, 2011). The practice of eugenics was often used to discriminate against certain racial and ethnic groups based on their DNA. There are many ethical problems with this now-discredited branch of “science,” in addition to the problems associated with the fraudulent scientific methods that were used to show support for the movement.
Today, researchers that are interested in genetic variation between populations focus on differences that have arisen from divergent evolutionary histories rather than trying to find a scientific or medical definition for “race”. For example, skin color is genetically controlled. All humans possess the many genes involved in determining skin color. However, these genes have different versions (alleles) that ultimately control an individual’s pigmentation. Scientists believe that all humans evolved in Africa and originally had dark skin. As groups left Africa and migrated north, there was selective pressure for lighter skin to evolve to allow for sufficient vitamin D production (which requires UV radiation).
We find another example of genetic variation due to environmental pressure in the people of Tibet. Eighty-seven percent of Tibetans have a version of the HIF2a gene (hypoxia-inducible factor 2-alpha) that allows them to live at high altitude with no ill effects. In comparison, only 9% of Han Chinese carry the same version of HIF2a.
Scientists are also interested in understanding how shared ancestry relates to disease risk. For example, Ashkenazi Jews, who descended from a common people and region, are more likely to carry the gene mutation that causes Tay-Sachs disease, for which there is a high morbidity rate and no known cure (NIH).
Proposed routes and dates of human migration out of Africa.
Image courtesy of Wikimedia Commons
CLICK HERE for an introduction to discovering ancestry
Allen, Garland E. Eugenics and Modern Biology: Critiques of Eugenics, 1910-1945. Annals of Human Genetics 2011 75: 314-325.
National Institute of Health (NIH). “Tay-Sachs Disease Information Page.” Last updated 6 Oct 2011.
What is Coding DNA
Coding DNA is the type of DNA in the genome, encoding for protein-coding genes. Significantly, it accounts for 1% of the human genome. Actually, coding DNA consists of the coding region of protein-coding genes in other words, exons. Also, all exons in a protein-coding gene collectively known as the coding sequence or CDS. However, in eukaryotes, the coding region is interrupted by introns. In the meanwhile, coding regions start from the start codon at the 5′ end and terminates with the stop codon at the 3′ end. Apart from DNA, RNA can also contain coding regions.
Figure 1: Protein Synthesis
Furthermore, the coding region of a protein-coding gene undergoes transcription to produce an mRNA. In the mRNA, the 5′ UTR and 3′ UTR flank the coding region. Also, the CDS in the mRNA transcript undergoes translation to produce an amino acid sequence of a functional protein. Therefore, proteins are the gene product of the coding DNA. For instance, they have structural, functional, and regulatory importance in the cell.
Cells express (transcribe and translate) only a subset of their genes. Cells respond and adapt to environmental signals by turning on or off expression of appropriate genes. In multicellular organisms, cells in different tissues and organs differentiate, or become specialized by making different sets of proteins, even though all cells in the body (with a couple of exceptions) have the same genome. Such changes in gene expression, or differential gene expression among cells, are most often regulated at the level of transcription.
There are three broad levels of regulating gene expression:
- transcriptional control (whether and how much a gene is transcribed into mRNA)
- translational control (whether and how much an mRNA is translated into protein)
- post-translational control (whether the protein is in an active or inactive form, and whether the protein is stable or degraded)
Based on our shared evolutionary origin, there are many similarities in the ways that prokaryotes and eukaryotes regulate gene expression however, there are also many differences. All three domains of life use positive regulation (turning on gene expression), negative regulation (turning off gene expression), and co-regulation (turning multiple genes on or off together) to control gene expression, but there are some differences in the specifics of how these jobs are carried out between prokaryotes and eukaryotes.
Similarities between prokaryotes and eukaryotes: promoters and regulatory elements
Promoters are sites in the DNA where RNA polymerase binds to initiate transcription. Promoters also contain, or have near them, binding sites for transcription factors, which are DNA-binding proteins that can either help recruit, or repel, RNA polymerase. A regulatory element is a DNA sequence that certain transcription factors recognize and bind to in order to recruit or repel RNA polymerase. The promoter along with nearby transcription factor binding elements regulate gene transcription.
Regulatory elements can be used for either positive and negative transcriptional control. When a gene is subject to positive transcriptional control, the binding of a specific transcription factor to the regulatory element promotes transcription. When a gene is subject to negative transcriptional control, the binding of a specific transcription factor to a regulator elements represses transcription. A single gene can be subject to both positive and negative transcriptional control by different transcription factors, creating multiple layers of regulation.
Some genes are not subject to regulation: they are constitutively expressed, meaning they are always transcribed. What sorts of genes would you imagine a cell would always need to have on, regardless of the environment or situation?
Differences between prokaryotes and eukaryotes: mechanisms of co-regulation
Often a set of proteins are needed together to respond to a certain stimulus or carry out a certain function (for example, many metabolic pathways). There are often mechanisms to co-regulate such genes such that they are all transcribed in response to the same stimulus. Both prokaryotic and eukaryotic cells have ways of co-regulating genes, but they use very different mechanisms to accomplish this goal.
In prokaryotes, co-regulated genes are often organized into an operon, where two or more functionally related genes are transcribed together from a single promoter into one long mRNA. This mRNA is translated to make all of the proteins encoded by the genes in the operon. Ribosomes start at the 5′ end, begin translating at the first AUG codon, terminate when they run into a stop codon, and then re-initiate at the next AUG codon.
A generic operon in prokaryotes. R = a regulatory protein (transcription factor) P = promoter Pol = RNA polymerase
With a few exceptions (C. elegans and related nematodes), eukaryotic genomes do not have genes arranged in operons. Instead, eukaryotic genes that are co-regulated tend to have the same DNA regulatory element sequence associated with each gene, even if those genes are located on completely different chromosomes. This means that the same transcriptional activator or repressor can regulate transcription of every single gene that has that particular DNA regulatory element associated with it. For example, eukaryotic HSP (heat shock protein) genes are located on different chromosomes. HSPs help cells survive and recover from heat shock (a type of cellular stress). All HSP genes are transcribed simultaneously in response to heat stress, because they all have a DNA sequence element that binds a heat shock response transcription factor.
Additional complexities specific to eukaryotic gene regulation: chromatin and alternative splicing
Another major difference between prokaryotic gene regulation and eukaryotic gene regulation is that the eukaryotic (but not prokaryotic) DNA double helix is organized around proteins called histones which organize the DNA into nucleosomes. This combination of DNA + histones is called chromatin.
Chromatin can be condensed in a 30-nm fiber formation (tightly compacted nucleosomes) or loosely arranged as “beads-on-a-string,” where the DNA between and around nucleosomes is more accessible. This compaction is controlled by post-translational modifications which are added to the histones in the nucleosomes. When histones have acetyl groups added to them by enzymes called histone acetyl transferases (HATs), the acetyl groups physically obstruct the nucleosomes from packing too densely and help to recruit other enzymes that further open the chromatin structure. Conversely, when the acetyl groups are removed by histone deacetylases (HDACs), the chromatin assumes a condensed formation that prevents transcription factors from being able to access the DNA. In the image below, you can clearly see how much more compact and inaccessible the 30-nm fiber is (top) compared to the beads-on-a-string formation (bottom).
Chromatin plays a fundamental role in positive and negative gene regulation, because transcriptional activators and RNA polymerase cannot physically access the DNA regulatory elements when chromatin is in a compact form.
Prokaryotic DNA does have some associated proteins that help to organize the genomes, but it is fundamentally different from chromatin prokaryotic DNA can essentially be thought of as ‘naked’ compared to eukaryotic chromatin, so prokaryotic cells lack this layer of gene regulation.
Another difference between prokaryotic and eukaryotic gene regulation is that eukaryotic mRNAs must be properly processed with addition of the 5′ cap, splicing out of introns, and addition of the 3′ poly(A) tail (discussed in more detail here). Each of these processing steps is also subject to regulation, and the mRNA will be degraded if any of them are not properly completed. The export of mRNAs from the nucleus to the cytoplasm is also regulated, as is stability of the properly processed mRNA in the cytoplasm.
Finally, eukaryotic genes often have different splice variants, where different exons can be included in different mRNAs that are transcribed from the same gene. Here you can see a cartoon of a gene with color-coded exons, and two different mRNA molecules transcribed from this gene. The different mRNAs encode for different proteins because they contain different exons. This process is called alternative splicing and we will discuss it more here.
Often different types of cells in different tissues express different splice variants of the same gene, such that there is a heart-specific transcript and a kidney-specific transcript of a particular gene.
In general, eukaryotic gene regulation is more complex than prokaryotic gene regulation. The upstream regulatory regions of eukaryotic genes have binding sites for multiple transcription factors, both positive regulators and negative regulators, that work in combination to determine the level of transcription. Some transcription factor binding sites, called enhancers and silencers, work at quite a distance, thousands of base pairs away from the promoter. Activators are examples of positive regulation and repressors are examples of negative regulation.
Eukaryotic transcription initiation, from biology.kenyon.edu (after Tjian)
Overall differences and similarities
If you understand the similarities and differences in eukaryotic and prokaryotic gene regulation, then you know which of the following process are exclusive to eukaryotes, which are exclusive to prokaryotes, which occur in both, and how each is accomplished:
- coupled transcription and translation
- 5′ cap and 3′ poly(A) tail
- AUG as the translation initiation codon
- regulation of gene expression by proteins binding to DNA regulatory elements
- alternative mRNA splicing
- regulation of gene expression through chromatin accessibility
Putting it all together: the lac operon in E. coli
The lac operon is a good model gene for understanding gene regulation. You should use the information below to make sure you can apply all of the details of gene regulation described above to a specific gene model.
E. coli lac operon: dual positive and negative regulation
lacI is the gene that encodes the lac Repressor protein CAP = catabolite activator protein O = Operator P = promoter lacZ = gene that encodes beta-galactosidase lacY encodes permease lacA encodes transacetylase. Source: Wikimedia Commons (https://commons.wikimedia.org/wiki/File:Lac_operon-2010-21-01.png)
The lac operon of E. coli has 3 structural genes required for metabolism of lactose, a disaccharide found at high levels in milk:
- lacZ encodes the enzyme beta-galactosidase, which cleaves lactose into glucose and galactose
- lacY encodes permease, a membrane protein for facilitated diffusion of lactose into the cell
- lacA encodes transacetylase, an enzyme that modifies lactose
An mRNA encoding all 3 proteins is transcribed at high levels only when lactose is present, and glucose is absent.
Negative regulation by the Repressor – In the absence of lactose, the lac Repressor protein, encoded by the lacI gene with a separate promoter that is always active, binds to the Operator sequence in the DNA. The Operator sequence is a type of DNA regulatory element as described above. Repressor protein bound to the Operator prevents RNA polymerase from initiating transcription.
When lactose is present, an inducer molecule derived from lactose binds allosterically to the Repressor, and causes the Repressor to leave the Operator site. RNA polymerase is then free to initiate transcription, if it successfully binds to the lac promoter.
Positive regulation by CAP – Glucose is the preferred substrate for energy metabolism. When glucose is present, cells transcribe the lac operon only at very low levels, so the cells obtain most of their energy from glucose metabolism. RNA polymerase by itself binds rather poorly to the lac promoter.
Glucose starvation causes a rise in the level of cyclic adenosine monophosphate (cAMP), an intracellular alarm signal. Cyclic AMP binds to the catabolite activator protein (CAP). The CAP+cAMP complex binds to the CAP binding site near the lac promoter and recruits RNA polymerase to the promoter.
High level transcription of the lac operon requires both that CAP+cAMP be bound to the CAP binding site, and that Repressor is absent from the Operator. These conditions normally occur only in the absence of glucose and presence of lactose.
The lac operon in E. coli is a classic example of a prokaryotic operon which is subject to both positive and negative regulation. Positive regulation and negative regulation are universal themes for gene regulation in both prokaryotes and eukaryotes.
As with many new short-read deep-sequencing protocols, the PAR-CLIP approach to elucidate RNA binding sites enables specific opportunities for in-depth analysis and interpretation of genomic data. In addition to mapping sequence-specific RBPs such as PUM2, QKI or IGF2BP1, an anticipated popular application of this protocol will be to study binding by members of the RISC, making it possible to identify the joint set of transcriptome-wide miRNA targets under specific conditions. To address the challenges posed by these two scenarios, we described the PARalyzer approach, which uses a kernel density estimate classification to generate a high-resolution map of RNA-protein interaction sites. In addition, we described an extension of our previous motif finding algorithm, cERMIT, to subsequently identify binding motifs for sequence-specific RBPs or over-represented miRNA seed matches.
Analysis of the Argonaute datasets showed that miRNA seed matches allowed for refining several previous findings on miRNA targeting. As reported, miRNA binding sites are located within AU-rich regions, but this was limited to sites in the 3' UTR miRNA seed matches found in the coding regions of genes did not exhibit this nucleotide bias. While the overall number of interaction sites found in coding regions was smaller than in 3' UTRs, the signal-to-noise ratio of the identified coding interaction sites almost reached the levels at seed matches found in 3' UTRs. The evidence for binding alone obviously does not imply that these sites have similar functional consequences to those found within the 3' UTR. Confirming previous studies based on sequence or expression, but not direct binding, miRNAs were most likely to interact with their targets near the ends of the 3' UTRs, including alternative poly-adenylation sites.
A detailed study of sequence-specific RBPs (PUM2, QKI and IGF2BP1) revealed the strengths and current limitations of the PAR-CLIP protocol, and as a consequence, methods for the analysis of PAR-CLIP data. PUM2 data showed a high likelihood of T = > C conversion occurring directly at the RNA-protein interaction site and within the conserved binding motif. In such cases, our approach can identify the true transcriptome-wide interaction sites at (nearly) single nucleotide resolution. On the other hand, analysis of QKI data exhibited differences: while the 'AUUAAY' binding motif showed strong likelihood of T = > C conversion at a particular nucleotide in the recognition motif, the 'ACUAAY' motif had no specific site where a conversion event could be detected. In such cases, the lack of a particular location of conversion prevents single nucleotide resolution of the interaction site, and at first glance seems to erase the strengths of PAR-CLIP compared to standard CLIP data. However, requiring T = > C conversions to occur in the vicinity is still a good method to enrich for true binding sites: while no particular nucleotide near the binding motif exhibited conversion preferences, it suggested that non-specific, possibly stabilizing interactions of another component of the RBP with the RNA molecule gave PAR-CLIP an advantage over other in vivo RBP-RNA interaction detection protocols.
The different, and in many cases unknown, crosslinking properties for RBPs presents a challenge for all CLIP protocols, and requires small adjustments as to how to call and expand interaction sites to ensure the inclusion of the binding site. In instances of newly studied proteins, for which the motif or conversion pattern is not known-for example, the recently analyzed HuR protein -it is thus best to use PARalyzer with the 'extend-by-read' option in combination with the output of motif finding to determine if significant top-scoring motifs tend to have specific locations of high conversion. If there is at least one location of high conversion, as is, for example, the case for PUM2, then a tighter extension can be used to reduce the size of the interaction map.
In addition to the RBP-specific sequence affinity preferences, the RBP-RNA interaction has been shown to be influenced by the secondary structure of the targeted RNA sequence and has been successfully exploited in previous work on RBP motif discovery [35–37]. Incorporating information on the RBP structural preferences into the motif analysis proposed in the current work could be implemented by means of a prior distribution on the binding evidence for individual sequence regions inferred by PARalyzer, biasing the motif discovery towards high-scoring sequence patterns that contain favorable sequence context for RBP binding. This could help filter out non-specific interactions with highly abundant mRNAs. In the context of AGO-mediated regulation, a prior based on the predicted miRNA-mRNA duplex stability could be used in a similar fashion
Due to the use of 4SU nucleoside analogue in the original PAR-CLIP protocol, the 'U' content of an actual binding site and its vicinity will obviously impact the identification of RBP binding sites. If a recognition site does not contain any uridines, precise delineation using this approach is compromised on the other hand, many U residues may either cause problems with alignment due to the potential of many mismatches, and/or to spread out the signal over multiple positions. The current investigations of additional amenable photoactivatable nucleosides , complemented by the use of different digestion enzymes , are expected to reduce potential biases, and can easily be specified in PARalyzer. As such, our pipeline provides a standardized solution for the analysis of RBP binding sites via PAR-CLIP, for subsequent motif finding for sequence-specific RBPs, and for the elucidation of post-transcriptional regulatory mechanisms and networks.
What is Coding DNA?
The DNA sequences in the genome that transcribe and translate into proteins are known as coding DNA. Coding sequences are found within the coding region of the genes. The coding region is composed of sequences known as exons. Exons are portions of genes which have the genetic code for the production of specific proteins. Exons are interspersed within the noncoding sequences known as introns in the genes. In humans, coding DNA accounts for a small percentage. Only about 1.5 % of the entire genome length corresponds to coding DNA which translates into proteins. This coding DNA has more than 27000 genes and produces all the proteins which are essential for cellular processes.
Proteins encoding sequences of the genes are transcribed into mRNA sequences first. Then these mRNA sequences are translated into amino acid sequences which turn into polypeptide chains. Every three nucleotide set in the exon sequence is termed as a codon. One codon has genetic information for an amino acid. Codon sequence gives an amino acid sequence. Amino acid sequence collectively makes the protein which is encoded by the sequence.
Coding sequences usually begin with a start codon ATG and terminate with a stop codon TAA TAA.
Figure 01: Coding DNA
And that may well be what is happening in the Barnett Shale region around, yes, Dallas and Irving.
Instead, spa hotels filled up with over 30,000 refugees from the war-troubled Donbas region of eastern Ukraine.
In doing so he exposed the failure of other airlines in the region to see the huge pent-up demand for cheap travel.
The region is marketed for visitors as “Aryan Valley,” and many citizens have taken to tacking on “Aryan” to their last names.
The weather on the route of AirAsia Flight 8501 was not unusual for the region and the season.
Within the past thirty years civilization has rapidly taken possession of this lovely region.
Nowhere can be found a region capable of supporting a larger population to the square mile than Lombardy.
It is small in cloudy swelling from toxins and drugs, and variable in renal tuberculosis and neoplasms.
From that region they issue to inflict diseases, especially blindness and deafness.
At that time, the postage on letters from that region was very high, sometimes as much as fifty or sixty cents, or even a dollar.
Scientific journal articles for further reading
Maston GA, Evans SK, Green MR. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet. 20067:29-59. Review. PubMed: 16719718.
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6489(7414):57-74. doi: 10.1038/nature11247. PubMed: 22955616 Free full text available from PubMed Central: PMC3439153.
Plank JL, Dean A. Enhancer function: mechanistic and genome-wide insights come together. Mol Cell. 2014 Jul 355(1):5-14. doi: 10.1016/j.molcel.2014.06.015. Review. PubMed: 24996062.
3. The sequences of nitrogenous bases on the two strands of a DNA molecule are complementary.
The sequence of nitrogenous bases on one strand of a DNA molecule’s double helix matches up in a particular way with the sequence on the other strand. Adenine pairs with thymine and cytosine pairs with guanine.
Why do the nitrogenous bases pair in this specific way? The bases on each strand are joined to the bases on the other strand with hydrogen bonds, but different bases have different chemical structures. Cytosine and thymine (and uracil in RNA) are pyrimidines, containing one ring. Adenine and guanine are purines, containing two rings. The pyrimidines pair with the purines: cytosine and guanine form three hydrogen bonds, and adenine and thymine form two.
DNA forensics: Creating a DNA fingerprint
Our DNA is a genetic code made up of 4 letters (A, T, G, C), called DNA bases, that are interpreted by our cells to make the molecules and structures that allow our bodies to function. Regions of DNA that encode molecules known as “proteins” are called genes. The unique code in every person results in physical differences—such as brown or blonde hair and blue or brown eyes—between individuals. It can also be used for identification purposes. Although the vast majority of DNA (99.9% on average) between two individual humans is the same, scientists have characterized regions of DNA that are different between people who are not closely related.
The most commonly used method of genetic testing in forensics looks at these variable sections of DNA. Forensic labs look at 20 DNA regions that vary between individuals, called short tandem repeats (STRs), to create a DNA “fingerprint” (Figure 1). These STRs are located in stretches of DNA between gene-coding regions and consist of short DNA sequences (e.g. “TATT”) that are repeated different numbers of times in different people. For example, in person A, the stretch of DNA may be “TATTTATTTATT” (three repeats), but in person B, the same region of DNA may be “TATTTATTTATTTATTTATT” (five repeats). Labs can then compare the number of repeats at each of these STRs to a sample taken from a crime scene and calculate the probability that the DNA from a suspect matches that sample. The chance that two people who aren’t closely related have the same DNA profile is 1 in 1,000,000,000,000,000,000 .
Figure 1: Creating a DNA “fingerprint.” DNA profiles made from STR analysis are like a fingerprint or very long social security number. We can use them to calculate the statistical likelihood that different DNA samples came from the same person. Because each person inherits two copies of each gene in a cell (one from their mother and one from their father), they can have two numbers of repeats at each STR. The chances that two unrelated people have the same number of repeats at all 20 STRs is extraordinarily small.
Biology of Race
The biological definition of race is a geographically isolated breeding population that shares certain characteristics in higher frequencies than other populations of that species, but has not become reproductively isolated from other populations of the same species. (A population is a group of organisms that inhabit the same region and interbreed.) Human racial groups compose a number of breeding units that in the past remained geographically and perhaps temporally isolated, yet could interbreed and produce viable offspring within the species Homo sapiens sapiens. Paleoanthropological evidence suggests that these units have been interbreeding between populations for at least the last two hundred thousand years or longer in what may once have been considered racial groups.
More recently, molecular techniques have developed to examine genetic differences between individuals and populations, including karyotypes providing chromosomal number and patterns, deoxyribonucleic acid (DNA) hybridization, protein sequences, and nuclear and mitochondrial base sequences from ancient and modern DNA. From all this evidence, it is clear that populational, but not racial, differences do exist within the human species. Race should not be equated with ethnicity, which has a sociological meaning. Ethnicity is a self-described category that has three components𠅊ncestry, language, and culture—that all have affinities to certain ancestral groups.
Early racial classification systems for humans used specific phenotypic characteristics that occurred in higher frequencies in certain populations. Initially, three classes were identified by anthropologists: Caucasoids, Mongoloids, and Negroids later, Australoids and Capoids (Bushmen) were added. Following this, even more classifications were made, with no consensus among biological anthropologists. Difficulties with these early classification systems stem from the immense genotypic and phenotypic human variation found in modern living populations. While the genotypic variation was not studied in great detail in the early part of the twentieth century, phenotypic variation in skin color, body height, hair type, nasal width, and other characteristics was studied in great detail.
Some genetic differences do exists between groups, but these by and large do not correspond to historical racial categories. For instance, there are populational differences in the frequency of ABO blood types. Native North and South Americans have an incidence of nearly 100 percent type O (less than 1 percent have type AB), while Asians have a lower incidence of O (60 percent) and higher incidence of type B (22 percent). Some characteristics, such as skin color and body height, are considered to be polygenic traits. Skin color has a clinal distribution, with indigenous peoples with darker skin colors found in native peoples at the equator and lighter skin colors found in natives from higher latitudes.
Skin color is an adaptation to sunlight that provides protection from skin cancer, yet at the same time allows for vitamin D production for calcium absorption. Darker skin provides more protection, while lighter skin allows more penetration of the weaker sun in temperate regions. While body height is also considered a polygenic trait, it is very much affected by inheritance, as well as environmental stressors (such as malnutrition and infectious disease).
Some differences between populations may correlate with historical exposure to different infectious diseases. For example, certain genetic variants of hemoglobin (for example, those causing sickle-cell anemia in people of African descent and thalassemia in people of Mediterranean descent) were strongly selected because they provide defensive mechanisms against infection by the organism that causes malaria ( Plasmodium ). Such environmental selection pressures have caused more than three hundred variants of the hemoglobin molecule. Cystic fibrosis (CF), a disorder of a gene that produces a protein that forms a chloride pump in cell membranes, allows for the buildup of mucus in the respiratory tract, thereby leading to death from pathogenic invasion. Yet the heterozygous condition for CF protects against extreme dehydration due to cholera. Tay-Sachs disease, a disorder of an enzyme that breaks down a molecule in the myelin sheath of nerve fibers, is found more commonly in people of eastern European Jewish descent than in other populations. Whether the Tay-Sachs gene protects against an infectious disease is unknown, though some have made a connection to tuberculosis exposure.
The molecular techniques outlined above now allow anthropologists to study the migration patterns of ancient peoples. Genetic diversity has resulted from the extensive hybridization that has occurred in the last two hundred thousand years, hiding any clear evidence for typological classification of race. Moreover, when selection pressures (temperature, altitude) are coupled with phenotypic variation, phenotypic expression defies taxonomic assignment of race. The genetic diversity within any historically defined race swamps the small amount of difference between such groups, making the boundaries of these categories entirely arbitrary. Therefore, race in humans does not have a biological meaning.