We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I am not sure which project database to use UCSC or Ensembl for my asthma study about ADRB2 gene (Arg/Arg-16 genotype).
I am using the original Human Genome Project database at the moment. However, I think Ensembl may be better suited for me.
Which genome database is good for Asthma research for Single Nucleotide Polymorphism (SNP variations)? I am looking for something which has existing visualisation tools or one where you can easily write one yourself.
If you are looking for variations in a gene, there are a few databases you an consult.
You can of course use the usual genome browsers, search for your gene and then select the variations listed. For ADRB2 this looks like this on Ensembl. If you plan to retrieve and compare data from different genomes/genes, I really recommend learning how to use Biomart, as this is a very powerful tool.
Then you can use specific databases or search engines, which only target SNP's. Examples would be: DBSnp and SPSmart (which queries different databases).
Utah Genome Project
Launched in 2012, the Utah Genome Project is a large-scale, genome sequencing and analysis initiative that is discovering genetic signatures of inherited disease. Current projects are focusing on understanding conditions ranging from cancers to spontaneous preterm birth to suicidality.
The UGP is unique among genome initiatives because instead of studying unrelated individuals, it utilizes the ancestral power represented within the Utah Population Database [UPDB]—the world’s largest repository of family histories and public health records linked to clinical records. Comparing genomic sequences within multigenerational high-risk families works like a magnifying glass for finding the genetic cause of their condition.
Already, UGP investigators have identified more than 50 genes behind common and rare diseases — including breast and ovarian cancers (BRCA1, 2), colon cancer (APC), and heart arrhythmia (HERG) — and research in animal models is elucidating the biology behind these conditions. In collaboration with the Center for Genomic Medicine, these discoveries are advancing a new era of precision medicine and population health.
You can read about the background and history of the 100,000 Genomes Project below.
April 2003 marked one of the most significant scientific breakthroughs of modern times. After years of painstaking research carried out by thousands of dedicated scientists across the world, the complete genetic code of a human being – their genome – was published.
The Human Genome Project, as this work was known, was the largest international collaboration ever undertaken in biology with British scientists leading the global race to read the 3 billion letters of the human genome, letter by letter. This is a technique called sequencing. The UK has often led the world in scientific breakthroughs and DNA was no exception. Crick and Watson won the Nobel Prize for discovering the double helix structure of DNA. And it was a British double Nobel Prize winning scientist, Fred Sanger, who discovered how to sequence it.
Now there is a real opportunity to turn the very important scientific discoveries about DNA and the way it works into a potentially life-saving reality for NHS patients across the country.
Most of us have heard of genetics, the study of the way particular features or diseases are inherited through genes passed down from one generation to the next. But the more we learn about genes, the more we understand that the old idea of having a single gene for this, or a single gene for that, which determines your fate is not – except in the case of unusual inherited diseases – a good way of describing the complexity of genes. In fact, groups of genes work together and their activity is influenced by a huge variety of environmental and other factors.
Your genome is your body’s instruction manual and you have a copy of it in almost every healthy cell in your body. The study of that genome and all the technologies needed to analyse and interpret it is called genomics.
When the first draft of the whole human genome was announced it was claimed that it would revolutionise medical treatment. It had taken 13 years and over £2 billion to laboriously read every letter of the human genetic code. It took such a long time because the DNA sequence of humans is very long – 3 billion letters – and because the sequencing machines available at the time were so slow and laborious. Now a human genome can be sequenced in a few days for less than £1000. It’s the leap in the speed and cost of technology that has opened up the potential of genomics and brought it within reach of mainstream healthcare.
But haven’t we already got a good understanding of genetics? One of the great surprises from the Human Genome Project was that there were only about 20,000 genes– about the same number as a starfish. The role of the remainder of a human’s genome – in fact a staggering 95 percent of it – was a mystery. Now we know that the remaining DNA is not irrelevant as was once thought but that much of it has a critically important role, influencing, regulating and controlling the rest. That’s why it’s necessary to sequence the whole human genome (rather than just looking at the 20,000 genes currently used for diagnosis in medicine) if we are to really understand the role of genes in health and disease.
But people are very different, so studying only a small number of genomes would not be enough to give doctors and scientists a true picture of our genes and their relationship to disease. Another key point is that by itself, a genome can’t tell you very much. To make sense of it, it is essential to know much more about the person who donated it details like their symptoms and when they first started, along with physiological measurements, such as heart rate or blood pressure (this sort of information is provided by clinicians and called phenotypic data). Another set of information which may be important in interpreting genomic data comes from their past medical records and would include such things as previous illnesses, medications and birth weight.
And this is where the NHS comes in. The way in which the NHS is able to link a whole lifetime of medical records with a person’s genome data and the fact it can do this on a large scale is unique. The richness of this data can help to understand disease and to tease apart the complex relationship between our genes, what happens to us in our lives and illness.
So what can genomics do? You can use it to predict how well a person will respond to a treatment or find one that will work best for them – so called personalised medicine. A good example in use already is whether or not a woman’s breast cancer is HER2 positive. If it is, Herceptin will be very effective for her but not for someone who doesn’t have HER2. You can also use genomics to test how well a cancer might respond to radiotherapy. For some that can mean far fewer radiotherapy sessions. Or use it to find the 30,000 people who currently use insulin for their Type 1 diabetes but would do better on simple tablets. Genomics can be used to track infectious disease, precisely pinpointing the source and nature of the outbreak through looking at the whole genomes of bugs. The potential of genomics is huge, leading to more precise diagnostics for earlier diagnosis, new medical devices, faster clinical trials, new drugs and treatments and potentially, in time, new cures.
The supersonic age of genomics has begun. And just as the NHS has been at the forefront of scientific breakthroughs before, the NHS is at the forefront again, with its patients benefiting from all that genomics offers, becoming the first mainstream health service in the world to offer genomic medicine as part of routine care for NHS patients.
In late 2012, Prime Minister David Cameron announced the 100,000 Genomes Project.
Genomics England, a company wholly owned and funded by the Department of Health & Social Care, was set up to deliver this flagship project and sequence 100,000 whole genomes from NHS patients, something that at the time no one in the world had even attempted. Its four main aims were to create an ethical and transparent programme based on consent to bring benefit to patients and set up a genomic medicine service for the NHS to enable new scientific discovery and medical insights and to kick start the development of a UK genomics industry.
The project focused on patients with a rare disease and their families and patients with cancer. The first samples for sequencing were being taken from patients living in England with discussions taking place with Scotland, Wales and Northern Ireland about potential future involvement.
In the UK, just fewer than 160,000 people died from cancer in 2011 with over 330,000 new cases reported every year. Because cancer is more likely to occur as people age, we expect the number of cancer cases to rise as people live longer. And although rare diseases are individually very uncommon, because there are between 5000 and 8000 of them, a surprisingly large number of people are affected in total – 3 million – or, put another way one in 17 (or between 6 and 7 percent) of the UK population. Genomics has great potential for both because both rare disease and cancer are strongly linked to changes in the genome. Cancer begins because of changes in genes within what was a normal cell. Although a cancer starts with the same DNA as the patient, it develops mutations or changes which enable the tumour to grow and spread. By taking DNA from the tumour and DNA from the patient’s normal cells and comparing them, the precise changes are detected. Knowing and understanding them strongly indicates which treatments will be the most effective. Genomics has already started to guide and inform doctors about the best treatment for individual patients. We’ve already mentioned Herceptin for HER2 positive breast cancer but we are only at the beginning. Many more cancer types, including those for whom there is hardly any successful current treatments such as lung cancer could be helped if only we knew which gene changes were important.
At least 80 percent of rare diseases are genomic with half of new cases found in children. Knowledge of the whole genome sequence may identify the cause of some rare diseases and help point the way to new treatments for these devastating conditions – vital progress given that some rare diseases take two or more years just to identify. As most rare diseases are inherited, the genomes of the affected individual (usually a child) plus two of their closest blood relatives were included to pinpoint the cause of the condition.
In all, it was anticipated that about 75,000 people would be involved. The numbers added up like this: 50,000 genomes from cancer – two per patient, therefore 25,000 patients. 50,000 from rare disease – three per patient (affected person plus two blood relatives) – therefore roughly 17,00 rare disease patients. There was an extraordinary response by patients and their families wanting to take part in this ground-breaking project.
Today, we have sequenced over 100,000 genomes from over 97,000 patients and their family members, totalling over 21 petabytes of data – 1 petabyte of music would take 2,000 years to play on an MP3 player.
Some patients involved in the 100,000 Genomes Project have already benefitted (see First patients diagnosed through the 100,000 Genomes Project), because a better treatment is identified for them or their condition is diagnosed for the first time. However, for most, the benefit will be in knowing that they will be helping people like them in the future through research on the genome data they generously allow to be studied but all will know that because of their involvement, an infrastructure will be developed which, in the future will enable the NHS to offer genomic services much more widely, to any patient who might benefit.
To make genomics a reality for the NHS it has to be of high quality, fast and affordable with results that are readily understood. How was this achieved?
The sequencing challenge
Genomics England invested in the latest, state of the art sequencing machines to sequence the 100,000 genomes in the project. Because it was the first time sequencing had been attempted at such a scale in the UK, it was assumed that sequencing would be the most difficult part of the project. Whilst it wasn’t without its challenges, thanks to the support of our partner Illumina it proved to be less difficult than we had anticipated.
The data challenge
Data was a major challenge on two fronts. The first step after sequencing is to compare the possibly millions of differences between the patient’s genome and a reference genome, a process called variant calling. The next hurdle – annotation – is to interpret the meaning and importance of those differences which are important. Some of the differences will just be natural harmless variations between individuals, but some will be damaging and almost certainly involved in the development of disease. Automating this process – creating the Genomics England pipeline – so that it took weeks rather than years, was very difficult.
The second big data problem was that information about a person and details of their illness are needed for interpretation. It’s a bit like being able to measure the amount of haemoglobin in a sample of blood but not being able to say whether it is normal without knowing more about the person who donated it – were they a child or an adult for instance? Getting data from the NHS in such a way that it all followed the same ‘rules’ (so you knew you were comparing apples with apples) was very challenging, but the NHS staff involved worked incredibly hard to make it happen.
Another data issue is its size. The raw data from one genome is about 200GB which would occupy most of the average laptop’s hard drive. Just the annotations would easily fill a DVD by themselves. This mountain of data needs to be sifted, analysed and presented in a way that is helpful to doctors, most of whom will not have specialist knowledge of gene changes.
The cancer challenge
At one point, the cancer programme had to be halted because it became clear that the usual methods used in the collection and analysis of cancer tissues, such as preserving them in formalin and then fixing it in parafin (FFPE) damaged DNA. We had to find other ways to preserve samples, and this is where we decided to use fresh frozen tissue. Again, the NHS was magnificent in responding to this challenge which required completely reconfiguring how samples were collected.
The security challenge
The genome data is large in size and also precious and is stored securely and respectfully with rigorous conditions for access which the public can have confidence in. Access is contrlled by a wide range of security measures as well as detailed governance. Participants are involved in deciding which researchers are allowed to access the data.
Each one of these challenges involved science at the cutting edge in a field that continues to move very rapidly, and doing it all at a scale never seen before. Genomics England had to be very flexible, changing its plans frequently to reflect new advances but also being humble enough to learn from things that didn’t go right, especially in the pilot stage. We learned a huge amount from patients and clinicians.
Delivering benefit to patients
The 100,000 Genomes Project delivered clinical benefits to patients, but an additional and critically important spin off is the importance of this huge amount of data to researchers. This includes those wanting to understand more about the genome itself but also to those wanting to develop new treatments, diagnostics, devices and medicines. Researchers can be academics as well as those from life science industries. These are not just well-known, big pharmaceutical and biotechnology companies but also a great number of innovative small and medium enterprises (SMEs) working in machine learning, data management and software.
Some people feel companies should not benefit commercially from patients who have donated their genome data without receiving any payment. Or that participant’s data might not be secure and that they could be identified if they take part, or their data used by researchers in a way that is not fair.
Commercialisation and who benefits?
Patients donate their samples and information using models of informed consent which have been approved by an independent NHS ethics committee. Download the approved protocol for more details. Patients have explicitly been asked if they are willing for commercial companies to be able to conduct approved research on their data. Those people that have already generously consented to take part understand the challenges about sharing data in their own case but they are keen to see their data used to help progress research into the condition that affects them. If innovative treatments are to be found to extend or save lives then commercial companies will need to invest in the research, development and manufacture of new drugs and diagnostic tests. It has always been the case that this work is carried out in the commercial sector and not by government or within the NHS itself.
Genomics England is developing ways of charging for its data services to ensure that the costs of maintaining the data are shared with companies and that the UK tax payer will benefit should companies successfully develop drugs, devices, treatments, diagnostic tests or other services through its use. If successful products are developed, it means that patients are benefiting. Bespoke arrangements will be made with each company that uses the data if they are able to develop commercial products because of it.
The 100,000 Genomes Project put ethics at its heart from the outset. Without doing this, it would not have been possible to develop a service that the NHS could use. High standards of ethical practice continue to underpin the NHS Genomic Medicine Service. Genomics England has its own independent Ethics Advisory Committee which advises the Genomics England board on the ethical aspect of everything Genomics England does. Issues already scrutinised include what information patients should receive about their results as well as policies on consent. A series of engagement and involvement activities with patients, clinicians and other groups about these issues has been undertaken. The outputs of these discussions is available here.
Privacy and confidentiality issues
Any relevant information about a patient will be returned to their doctor. For other medical researchers and companies to access Genomics England’s data services is conditional on first passing a rigorous ethical review before having their research proposal approved by Genomics England’s Access Review Committee using policies developed by our Ethics Advisory Committee. Insurers and marketing companies are not allowed access to the data.
Oversight by the Genomics England Data Advisory Committee will ensure that any researchers wanting access to data will go through rigorous identity checks and their use of the data will be closely supervised. No raw genome data can be taken away. The data will be kept within Genomics England’s data structures and will be constantly under its control. Genomics England commits itself to constant testing and re-testing of its security systems to ensure data safety.
While Genomics England has the data, patient identifiers (such as NHS number or postcode are removed) to reduce the risk of re-identification of clinical and genomic information with a particular individual. Only when data is used for a patient’s own care will identifiable data be made available to the patient’s doctor and medical team. Patients are told that participant anonymity cannot be absolutely guaranteed as in theory, any non-trivial piece of health records data can be re-identified by someone who already has access to sufficiently detailed information about an individual, for instance, social media posts. In practice, this is still very hard to do and harder still to achieve undetected. Genomics England can’t promise that no researcher would be able to do this but what it can promise is that it will be made so difficult that there would be far easier ways to achieve the same goal. Re-identifying patients is also illegal.
Genomics England is talking constantly to patients about their concerns to make sure that any issues they may have are addressed. Patients have been involved from the outset and are at the very heart of this project. In particular, the commitment to consent is of paramount importance.
It is not just patients and the NHS that stand to benefit from the 100,000 Genomes Project. There will be numerous knock-on advantages for the country. An example from the past of how a major infrastructure project produced widespread benefit beyond that intended might be the introduction of the railways in the Victorian era. Individuals and families benefited from cheap travel but the infrastructure created by the new railways also triggered an economic boom. Whilst the growth of some companies say, those making railway tracks, was predicted other economic benefits were not. For instance, there was a boom in holidays travel, resulting in the development of seaside towns, of hotels and even a boom in travel guides.
The 100,000 Genomes Project has some parallels. Whilst primarily for the benefit of people who are sick, there are potentially many economic benefits for the nation. We can be certain of benefits such as new medicines and diagnostic tests but just as with railways, some of the companies that may develop will be unexpected, built on new, as yet undiscovered technologies that will emerge over the next five years.
The 100,000 Genomes Project was not guaranteed to succeed, in the same way that there was no guarantee for the railways. So only the government has been willing to take the risk and make the necessary investment in it. And just as Victorian England with its great engineers was the perfect place for the birth of the railways, the UK, which not only leads the world in life sciences but has the unique benefit of the NHS, is the best place in the world to initiate the practical use of genome sequencing and interpretation for patient benefit. Our vision was one where the UK is the leader in a new industry where genomics is used to help patients get better, more personalised care and treatment. This has happened, with Genomics England having a global reputation.
The NHS has been preparing to use genomics as part of its routine care. It needed more scientists, geneticists and doctors, and these have been trained to interpret the data and understand what it means for a patient’s medical condition. In parallel with Genomics England’s work, a skills and training programme for workers in the NHS was set up by Health Education England.
The 100,000 Genomes Project has used the generosity of patients and the outstanding skills and talent found in the medical and the life sciences’ sectors in the UK to help deliver this project. Genomic England’s legacy is a genomics service that has been adopted by the NHS, high ethical standards and public support for genomics, new medicines, treatments and diagnostics and a country which hosts the world’s leading genomic companies.
The 100,000 Genomes Project is mainly funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also generously funded research and infrastructure in the programme.
Introduction to cancer in the 100,000 Genomes Project.
Information about taking part in the Project
Find out how taking part in the Project may affect insurance.
Infographics and short films introducing genomics and genome sequencing.
Watch our short films where participants talk about why they are taking part.
A great debate pre-dated the start of the HGP: was it worth mapping the vast non-coding regions of genome that were called junk DNA, or the dark matter of the genome? Thanks in large part to the HGP, it is now appreciated that the majority of functional sequences in the human genome do not encode proteins. Rather, elements such as long non-coding RNAs, promoters, enhancers and countless gene-regulatory motifs work together to bring the genome to life. Variation in these regions does not alter proteins, but it can perturb the networks governing protein expression.
Sequence three million genomes across Africa
With the HGP draft in hand, the discovery of non-protein-coding elements exploded. So far, that growth has outstripped the discovery of protein-coding genes by a factor of five, and shows no signs of slowing. Likewise, the number of publications about such elements also grew in the period covered by our data set (1900 to 2017 see SI, Fig. S3a). For example, there are thousands of papers on non-coding RNAs, which regulate gene expression.
The HGP also offered a way to catalogue human genetic variation, including that of SNPs. Other big efforts slashed the cost of profiling common differences across thousands of individuals these included the International HapMap Project 8 (the third and final phase of which was completed in 2010) and the 1000 Genomes Project 9 (completed in 2015). These data sets, combined with advances in statistical analysis, ushered in genome-wide association studies (GWAS) of countless traits, including height 10 , obesity 11 and susceptibility to complex diseases such as schizophrenia 12 .
There are now more than 30,000 papers per year linking SNPs and traits. A large fraction of these associations are in the once-dismissed non-coding regions (see SI, Table S3).
Cellular function relies on weak and strong links between genetic material and proteins. Mapping out this network now complements the Mendelian perspective. Today, more than 300,000 regulatory network interactions have been charted — proteins binding with non-coding regions or with other proteins.
Genetics & Genomics
Whitehead Institute has been a trailblazer in the fields of genomics and genetics since it became the single largest public contributor to the Human Genome Project —and the Institute continues to be a world leader in genetics and genomics research.
Mouse embryos showing markers of methylation
Yuelin Song/Whitehead Institute
Genes, the segments of DNA that code for proteins, are the instruction manuals for biological organisms. Fundamental aspects of biomedical research include decoding those instructions and discovering how and when they are carried out, which allows researchers to understand the programming behind our biology, from how genes encode basic cellular functions to how they contribute to disease when the programming, or its execution, goes awry. Whitehead Institute researchers are using the latest tools, screens, and evolutionary comparisons, among other means, to match genes to their functions and shed light on the essential programming behind our biology.
Steven Lee/Whitehead Institute
All cells in an organism have the same DNA, organized into chromosomes and then further into genes, and it is how the genes are regulated differently that defines specific cell types. Whitehead Institute researchers are investigating how gene expression is regulated by studying molecules and processes including transcription factors--proteins that “read” DNA epigenetic marks--heritable molecules that regulate genes small RNAs and DNA sequences like enhancers, which regulate the expression of genes, to both identify broad patterns in gene regulation and understand the myriad ways in which gene regulation influences biological processes and diseases of interest.
Our researchers have made important contributions to the understanding of how different RNAs play a variety of roles in gene regulation. They continue to identify regulatory RNAs and their functions in different biological processes, and to advance the field’s knowledge of microRNAs, very tiny RNAs that can regulate gene expression. Whitehead Institute researchers are also investigating how RNA can form aggregates in the cell when RNAs with too many repeats of certain amino acids clump together to create gels. These gels have been observed in and may contribute to neurological diseases. Other researchers are investigating RNA structure and how RNAs can change shape, leaving different sequences exposed for translation, which can lead to the production of different proteins.
3.1 Study populations
The characteristics of the study populations consisting of 1,425 children treated with at least LABA and ICS are shown in Table 1. Analyses were performed in a subset of 175 patients with LABA use from PACMAN, 306 from BREATHE and PAGES, 149 from SAGE II, 463 from SCSGES and 332 from GALA II. The proportion of exacerbations defined as oral corticosteroids (OCS) use was lower in PACMAN and SCSGES (6.3% and 16.2%, respectively) compared to other studies. GALA II had the highest numbers of OCS courses of the meta-GWAS (49.7%). The number of OCS courses was even higher in PASS (53.5%).
3.1.1 Genome-wide association meta-analysis
The Q-Q plots did not provide evidence for genomic inflation due to population stratification in each study (Figure S1A-S1E). In the meta-analysis, no associations with asthma exacerbations were genome-wide significant (P-value ≤5 × 10 -8 ). However, 22 variants were suggestively associated with exacerbations (P-value ≤ 5 × 10 -6 ) in our meta-analysis of children and young adults with asthma (Table 2, Figure 1). The SNP rs7958534, located near TBX3, had the strongest signal. The G allele of this SNP was associated with increased risk of exacerbations (odds ratio (OR) 1.86 (95% confidence interval (CI) 1.47-2.35 P = 1.15x10 -7 ). Among the 22 identified SNPs, eight independent signals were identified. The forest plots of these SNPs are represented in Supplementary Figure S4. Results of the sub-analysis of the independent SNPs in 359 children from PASS are represented in Table S1. None of the SNPs were associated with increased risk of exacerbations.
|Nearest gene(s) or locations||SNP||Chr. a a Chromosome ||Position b b positions based on GRCh37/hg 19 build ||E/R c c effect allele / reference allele ||MAF d d minor allele frequency ||OR (95% CI)||P-value||Cochran's Q statistic||Cochran's Q P-value||I 2 (95% CI)|
|RMDN2||rs163085||2||38292519||A/T||0.346||0.59 (0.47-0.74)||4.22 × 10 −6||1.54||6.73 × 10 −1||0.0 (0.0-70.2)|
|KLF7||rs9288377||2||207856365||G/C||0.366||0.59 (0.47-0.74)||4.98 × 10 −6||1.52||4.67 × 10 −1||0.0 (0.0-86.3)|
|CLRN1||rs358959||3||150776600||G/A||0.257||0.63 (0.52-0.77)||4.52 × 10 −6||3.80||4.34 × 10 −1||0.0 (0.0-78.1)|
|LOC10537-7766||rs4700987||5||180251561||A/T||0.262||2.80 (1.81-4.33)||3.77 × 10 −6||0.65||4.19 × 10 −1||0.0 e e confidence intervals cannot be computed due to the limited amount of studies. |
|LINC00847||rs4700988||5||180255963||C/A||0.262||2.83 (1.84-4.36)||2.42 × 10 −6||0.15||6.99 × 10 −1||0.0 e e confidence intervals cannot be computed due to the limited amount of studies. |
|EPHA7||rs1947048||6||93012151||G/A||0.166||2.50 (1.69-3.69)||4.36 × 10 −6||0.33||8.48 × 10 −1||0.0 (0.0-37.0)|
|rs12197506||6||93014723||T/G||0.166||2.50 (1.69-3.69)||4.36 × 10 −6||0.33||8.48 × 10 −1||0.0 (0.0-37.0)|
|rs1596491||6||93015896||T/A||0.166||2.50 (1.69-3.69)||4.36 × 10 −6||0.33||8.48 × 10 −1||0.0 (0.0-37.0)|
|rs1899806||6||93017419||C/T||0.166||2.50 (1.69-3.69)||4.36 × 10 −6||0.33||8.48 × 10 −1||0.0 (0.0-37.0)|
|rs1899807||6||93017512||T/C||0.166||2.50 (1.69-3.69)||4.36 × 10 −6||0.33||8.48 × 10 −1||0.0 (0.0-37.0)|
|rs2588041||6||93026285||T/C||0.166||2.50 (1.69-3.69)||4.36 × 10 −6||0.33||8.48 × 10 −1||0.0 (0.0-37.0)|
|rs2588042||6||93027959||G/A||0.166||2.50 (1.69-3.69)||4.36 × 10 −6||0.33||8.48 × 10 −1||0.0 (0.0-37.0)|
|rs2818130||6||93034458||A/G||0.167||2.62 (1.75-3.91)||2.61 × 10 −6||0.53||7.67 × 10 −1||0.0 (0.0-60.7)|
|rs2818129||6||93035916||A/G||0.167||2.49 (1.69-3.66)||4.18 × 10 −6||0.60||7.41 × 10 −1||0.0 (0.0-65.3)|
|BUB3||rs7918913||10||124928952||C/T||0.374||0.59 (0.47-0.74)||4.96 × 10 −6||0.26||8.77 × 10 −1||0.0 (0.0-20.9)|
|TBX3||rs6489992||12||115352769||A/G||0.370||1.77 (1.40-2.23)||4.96 × 10 −6||1.64||4.40 × 10 −1||0.0 (0.0-87.3)|
|rs7972038||12||115352977||T/C||0.340||1.90 (1.50-2.40)||1.43 × 10 −6||0.83||6.60 × 10 −1||0.0 (0.0-75.0)|
|rs7958534||12||115353100||G/A||0.336||1.86 (1.47-2.35)||1.15 × 10 −7||1.20||5.48 × 10 −1||0.0 (0.0-82.7)|
|rs10850402||12||115354123||A/G||0.342||1.88 (1.48-2.38)||2.49 × 10 −7||0.69||7.10 × 10 −1||0.0 (0.0-69.7)|
|rs7961916||12||115355126||A/C||0.318||1.83 (1.44-2.33)||7.09 × 10 −7||0.38||8.27 × 10 −1||0.0 (0.0-45.3)|
|rs7970471||12||115365549||A/T||0.288||1.80 (1.41-2.30)||3.04 × 10 −6||1.36||5.06 × 10 −1||0.0 (0.0-84.7)|
|RAB22A||rs55950385||20||56559152||G/A||0.122||0.27 (0.16-0.45)||8.98 × 10 −7||0.66||4.16 × 10 −1||0.0 e e confidence intervals cannot be computed due to the limited amount of studies. |
- Independent SNPs of each gene are in boldface.
- Abbreviations: CI, confidence interval OR, odds ratio for effect alleles SNP, single nucleotide polymorphism.
- a Chromosome
- b positions based on GRCh37/hg 19 build
- c effect allele / reference allele
- d minor allele frequency
- e confidence intervals cannot be computed due to the limited amount of studies.
3.2 Functional evaluation of variants
Next, the eight independent SNPs resulting from the meta-GWAS were further investigated in GTEX. 52 Here, the independent SNP rs4700987 (nearest gene: LOC105377766) has been described as a lung eQTL for zinc finger protein 62 (ZFP62) 53 (Figure S2).
3.3 Validation of previous reported LABA associations from candidate gene studies
Of the three previously reported SNPs, two were available in all cohorts of the current meta-GWAS dataset. All three variants were not consistently associated with exacerbations despite LABA use (Figure S3). However, a sensitivity analysis in PACMAN in which we stratified for LABA users without leukotriene antagonist (LTRA) use shows a significant association for ADRB2 rs1042713, the A allele increased the risk of exacerbations: OR 7.39 (95% CI 1.95-28.01, Table S2). A trend towards a similar association for rs1042713 (OR 1.20 (95% CI 0.72-2.00)) can be observed in the sensitivity analysis of LABA users without LTRA use, albeit not statistically significant (Table S3).
GWAS in the present: where are we now?
Fast-forward sixteen years from the completion of the HGP, and genomics has moved at a speed no one could have predicted. In recent years, one of the most significant developments in human genetics has been a resource called the UK Biobank. This is a massive dataset consisting of genotype information (which can be used for GWAS) from about 500,000 human volunteers. Each participant also provides a veritable treasure trove of health data, ranging from basic information such as height and weight to dietary questionnaires and disease status (a total of over 2,400 traits!). This resource has revolutionized genomics, not only because of the huge sample size and detailed medical information, but also because the data is freely accessible to any scientist who applies to use it. As a result, the genetic analysis of the UK Biobank data has essentially been crowdsourced to scientists all over the world. The impact of this is clear from the numbers – since UK Biobank’s initial release in 2015, almost 600 papers have analyzed it, with countless new studies on the way.
Building Off Known Genomes to Advance Systems and Ecosystems Biology
Jesse Poland of Kansas State University proposed sequencing intermediate wheatgrass (Thinopyrum intermedium, alternately known as Agropyron intermedium), shown on the left. Intermiedate wheatgrass has a biomass yield equivalent to that of the candidate bioenergy feedstock switchgrass. The right-hand specimen is of Agropyron repens, which co-occurs with Agropyron intermedium. (Matt Lavin, CC BY-SA 2.0 Wikimedia Commons)
The U.S. Department of Energy Joint Genome Institute (DOE JGI), a DOE Office of Science User Facility, has announced that 27 new projects have been selected for the 2016 Community Science Program (CSP).
“These new CSP projects, selected through our external review process, exploit DOE JGI’s cutting-edge capabilities in nucleic acid sequencing and analysis and build our portfolio in key focus areas including sustainable bioenergy production, plant microbiomes and terrestrial biogeochemistry,” said Susannah Tringe, DOE JGI User Programs Deputy.
The CSP 2016 projects were selected from 74 full proposals received, resulting from 98 letters of intent submitted. The total allocation for the CSP 2016 portfolio is estimated to tap nearly 40 trillion bases (terabases or Tb) of the DOE JGI’s plant, fungal and microbial genome sequencing capacity. The full list of projects may be found at http://jgi.doe.gov/our-projects/csp-plans/fy-2016-csp-plans/.
One reference genome, many applications
Several projects highlight how a single reference genome can be applied to advance previously supported studies, while others focus on plant-microbial interactions. Two, in particular, leverage recent DOE Office of Biological and Environmental Research (BER) Sustainable Bioenergy awards.
Daniel Schachtman from the University of Nebraska, Lincoln proposed a project focusing on a systems analysis of Sorghum bicolor, a potential bioenergy feedstock sequenced by the DOE JGI and published in the journal Nature in 2009. The project seeks to understand how genotype—its underlying genetic makeup—microbiome composition, and the environment influence sorghum’s phenotype—the plant’s observable traits. This work is also supported by a Sustainable Bioenergy grant to Schachtman as well as colleagues at the Donald Danforth Plant Science Center and the University of North Carolina.
Another project aimed at improving bioenergy crop yields comes from Tom Juenger at University of Texas at Austin. By sequencing several hundred switchgrass genotypes, the team hopes to identify genetic variations that contribute to high yields and high quality plant biomass that can be used for biofuel production. Juenger’s project dovetails with his Sustainable Bioenergy Crop Development grant through BER. For this funding opportunity, BER solicited applications for systems-biology driven basic research focused on understanding the roles of microbes and microbial communities in contributing to the health of bioenergy crop feedstocks and their associated ecosystems.
There are four projects utilizing the Chlamydomonas reinhardtii genome resource generated by the DOE JGI in 2007, for example. One project from University of California, Berkeley’s Kris Niyogi involves resequencing algal mutants to identify genes related to photosynthesis. Another comes from Sabeeha Merchant at the University of California, Los Angeles investigating algae that colonize snow in the Arctic as potential feedstocks in algal farms for biofuel.
The CSP project led by Clark University’s David Hibbett focuses on an in-depth genomic survey of the Lentinula genus. Lentinula is a group of white-rot, wood-decaying fungi perhaps best known as the genus of shiitake mushrooms, Lentinula edodes. (Image by dominik18s via Flickr CC BY 2.0)
From Jesse Poland of Kansas State University is a proposal to sequence intermediate wheatgrass (Thinopyrum intermedium), a perennial distantly related to wheat and with a biomass yield equivalent to switchgrass. By producing a whole-genome assembly of intermediate wheatgrass, and then conducting comparative analyses with the DOE JGI Flagship and grass model species Brachypodium distachyon, and with wheat, the team hopes to develop genomic resources that can be applied toward methods for improving the productivity of candidate bioenergy feedstock grasses.
In addition to the Juenger project noted above, a project from J. Chris Pires at the University of Missouri focuses on the symbiotic relationship between orchids and fungi. Orchids are found around the world and their seeds rely on carbon solely provided by mycorrhizal fungi to germinate and develop into seedlings. Studying these relationships may provide researchers with insights into the evolution of plant-fungal interactions for DOE-relevant biomass feedstocks.
A proposal from Matteo Lorito at the University of Naples in Italy focuses on a similar symbiotic association between soil fungi and feedstock crops. His project specifically targets secondary metabolites, compounds that help the organism thrive and communicate, produced by Trichoderma fungal species interacting with the grass B. distachyon.
Other projects highlight the importance of microbial interactions within an ecosystem. One such project comes from Christopher Francis of Stanford University, who is studying the role of nitrogen-cycling microbial communities at uranium-contaminated groundwater sites within the upper Colorado River Basin. The goal is to determine the role that nitrification may play in the release of uranium into the aquifer.
Christopher Francis of Stanford University is interested in the floodplains in the upper Colorado River Basin, which are generally nutrient-poor but abundant in iron sulfide minerals, leading to the descriptor “naturally reduced zones” (NRZs). There are concerns that NRZs are slow-release sources of uranium to the aquifer that could persist for hundreds of years. (Photo by Roy Kaltschmidt, Berkeley Lab)
Two more plant microbiome projects focus on fungal interactions involving potential sustainable bioenergy feedstocks such as poplar and eucalyptus. One from Richard Hamelin at the University of British Columbia in Canada aims to develop a database of pathogens that could harm pine and poplar trees and thus prevent outbreaks through early detection, while the other from Ian Anderson at the University of Western Sydney in Australia looks at functional gene expression from the mutualistic Pisolithus genus, several species of which have symbiotic relationships with pine and eucalyptus.
Focusing on fungi
Several other projects have a fungal component, highlighting the breadth of this particular branch on the Tree of Life. Three of the selected projects extend the 1000 Fungal Genome Project, which aims to have at least two reference genomes from the more than 500 recognized families of fungi. Still other projects focus on harnessing fungal enzymes for bioenergy applications. One of the latter comes from Veronika Dollhofer at the Bavarian State Research Center for Agriculture in Germany. She proposed the study of anaerobic fungi from ruminant guts to better understand how they break down ingested plant matter. The enzymes in anaerobic fungi allow them to both degrade plant mass and convert it into sugars, a combination that could be useful in production-scale biogas plants.
Genome Project Focusing on gene ADRB2? - Biology
Reference genome comparison finds exome variant discrepancies in 206 gene
In a new study published in the American Journal of Human Genetics, BCM-HGSC researchers identify genetic variant discrepancies between the GRCh38 (hg38) human reference genome and the older GRCh37 (hg19).
Genomic assessment of cancer-predisposition landscape of pediatric rhabdomyosarcoma
Researchers at Baylor College of Medicine led the largest genomic assessment of children with RMS to determine the prevalence of genetic changes that result in cancer predisposition.
Genome sequencing identifies cancer’s Achilles heel in exceptional responders
Researchers at Baylor College of Medicine led a six-year study with the National Cancer Institute to analyze the tumor genome and microenvironment of exceptional responders to determine if survival could be explained by genetic mechanisms.
Sequencing African genomes illuminates health and migration history
The Human Genome Sequencing Center worked with the H3Africa consortium and local African governments to acquire consented samples from countries across the continent and generate high-coverage whole genome sequence data.
Baylor genomics teams partner to provide COVID-19 testing for Houston area
Baylor's Human Genome Sequencing Center and the Alkek Center for Metagenomics and Microbiome Research are partnering with local public health departments to provide PCR testing for tens of thousands of COVID-19 samples.
Human Genome Sequencing Center acquires new sequencer
The Baylor College of Medicine Human Genome Sequencing Center recently was awarded an NIH grant for acquisition of the Pacific Biosciences Sequel II DNA sequencing instrument.
Human Genome Sequencing Center Clinical Lab
The HGSC Clinical Laboratory (HGSC-CL) is the CAP/CLIA certified molecular diagnostic laboratory operating within the Human Genome Sequencing Center at Baylor College of Medicine.
With a commitment to improving health care through genomic testing, HGSC-CL offers clinical testing services in support of large-scale clinical sequencing efforts.
Texas Medical Center Genomic Center for Infectious Diseases
The Texas Medical Center Genomic Center for Infectious Diseases (TMC GCID) is the collaborative effort of a multidisciplinary, integrated team of basic and physician scientists at three institutions — Baylor College of Medicine, The University of Texas Health Science Center at Houston (UTHealth) School of Public Health, and The University of Texas MD Anderson Cancer Center.
The TMC GCID harnesses decades of experience in genomic sequencing, renowned clinical expertise and the use of novel ex vivo models of human intestinal and pulmonary function to create a platform for large scale genomics-based interrogation of host-mucosal pathogen interactions in the context of human tissues.
Like much of the research community at this urgent time, the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) has turned its focus primarily toward COVID-19 testing and research. With the safety and health of the community and our personnel in mind, the BCM-HGSC is operating with staff working remotely as much as possible while a rotating lab team continues critical research on site.
Towards population-scale long-read sequencing
Advances in long-read sequencing technologies and bioinformatics have enabled the first population-scale studies with long-read sequencing in recent years. In a new review published in Nature Review Genetics, Dr. Fritz Sedlazeck and colleagues discuss these recent developments and highlight project strategies for experimental design with Pacific Biosciences and Oxford Nanopore Technology systems.
Harmonizing Clinical Sequencing and Interpretation for the eMERGE III Network
A new paper available in preprint on bioRxiv describes the methods developed to harmonize genetic testing protocols linking multiple sites and investigators for the eMERGE III Network. The results mark a critical achievement toward global standardization of genetic testing by establishing protocols for different sites to harmonize the technical and interpretive aspects of sequencing tests. The integration of structured genomic results into multiple electronic health record systems by the eMERGE Network also sets the stage for clinical decision support to enable genomic medicine.
Uniquely comprehensive Pan-Cancer Atlas provides essential resource
A collection of 27 papers from The Cancer Genome Atlas (TCGA) consortium has been published reporting on the integrated project to analyze all 33 cancer types and to classify mutations and specific pathways. Many of the papers feature significant contributions from Baylor College of Medicine and its Human Genome Sequencing Center researchers. The findings from the 11,000 patient cohort data appear in Cell publications.
Hybrid computational strategy for scalable whole genome data analysis
In a study published in BMC Bioinformatics, researchers from Baylor College of Medicine’s Human Genome Sequencing Center, along with Oak Ridge National Laboratory, DNAnexus and the Human Genetics Center at the University of Texas Health Science Center, have developed a novel hybrid computational strategy to address the growing need for scalable, cost effective and real time variant calling of whole genome sequencing data.
This new strategy has proven successful in analyzing an unprecedented set of 5,000 samples, which constitute a critical part for the international consortia efforts known as The Cohorts for Heart and Aging Research in Genomic Epidemiology, or CHARGE.
Assessing structural variation in a personal genome—towards a human reference diploid genome
In a paper published in BMC Genomics, a team led by scientists from Baylor College of Medicine’s Human Genome Sequencing Center present Parliament, a structural variant (SV) calling pipeline that brings together multiple data types and SV detection methods to improve the characterization of these larger variants.
A region on human chromosome 5 (5q31.1-qter) contains several genes that encode important blood pressure regulators and thus is a good candidate for analysis of linkage and association with hypertension. We recruited 638 individuals from 212 Polish pedigrees with clustering of essential hypertension. These subjects were genotyped for 11 microsatellite markers that span this region to test for linkage to essential hypertension and systolic and diastolic blood pressures. The segment of this region of ≈7 cM delineated by D5S1480 and D5S500 markers was linked to blood pressures in multipoint analysis. In 2-point analysis, D5S1480—the marker in close proximity to β2-adrenergic receptor gene—reached the maximal linkage to essential hypertension and adjusted systolic and diastolic blood pressures, implicating this gene as a positional candidate for further association studies. Arg16Gly, Gln27Glu, and Thr164Ile—3 functional single nucleotide polymorphisms within the β2-adrenergic receptor gene—were tested for association with essential hypertension. None of these polymorphisms showed a significant association with essential hypertension, separately or in the haplotype analysis. This study provided evidence of linkage of 5q31.1-5qter region to essential hypertension in the European population. Moreover, it implicated the chromosomal segment in close proximity to D5S1480 and D5S500. The detailed analysis of 3 single nucleotide polymorphisms does not support the role of the β2-adrenergic receptor gene as a major causative gene for the detected linkage.
Essential hypertension is a multifactorial complex trait with a strong hereditary component. Apart from genome-wide scans and candidate gene approach (principal methods used in pursuit of genetic loci that may determine predisposition to essential hypertension), 1 a target chromosomal region approach combining the rationale of 2 major strategies has been postulated. 2 Selection of a small chromosomal region implicated by genome-wide searches and containing several candidate genes pathophysiologically related to the investigated phenotypes allows for denser saturation with microsatellite markers and may be followed by subsequent positional analysis. The distal segment of the long arm of chromosome 5 (5q31.1-qter) is an outstanding target chromosomal region for studies on essential hypertension, having been linked to both systolic 3 and postexercise diastolic blood pressure 4 in genome-wide scans performed in white populations. Furthermore, this region contains a cluster of genes coding for proteins known as important blood pressure regulators (β2-adrenergic receptor gene [ADRB2], α1B-adrenergic receptor, dopamine D1 receptor, annexin VI) and implicated as possible contributors to the pathogenesis of several cardiovascular disorders (platelet-derived growth factor receptor, glutathione peroxidase).
We performed a linkage analysis of this region using 3 related phenotypes: a diagnosis of essential hypertension (a qualitative trait) and 2 quantitative phenotypes—systolic and diastolic blood pressure. We searched for a linkage indicating positional locus for further association analyses. One of the loci in the implicated portion of the target region, ADRB2, was subsequently analyzed in association studies.
The participants in this project (Silesian Hypertension Study) were recruited between 1999 to 2000 in Silesia, a region in the south of Poland with a high prevalence of cardiovascular morbidity and mortality. The study was designed to investigate for genetic predisposition to several cardiovascular phenotypes and was based on collecting probands with diagnosed essential hypertension along with their available parents and/or siblings. The project was approved by the local bioethical committee, and informed consent was obtained from each participant. We recruited 638 white individuals from 212 families with clustering of essential hypertension. Complete phenotypic information was obtained from 635 subjects representing 210 families. Six other individuals from 3 families were excluded because of the inconsistencies in Mendelian segregation.
Phenotyping included clinical history obtained by standardized questionnaires, physical examination, and laboratory tests according to the recommendations of the World Health Organization. 5 Hypertension was defined as systolic and/or diastolic blood pressure >140/90 mm Hg on 3 separate occasions and/or remaining on antihypertensive treatment. 5 Subjects with secondary forms of hypertension were excluded from the study. Height and weight measurements were taken in standard conditions to calculate body mass index. Blood pressure was taken in a sitting position using mercury sphygmomanometer with a cuff size individually adjusted to the arm after 20 minutes of rest. Systolic blood pressure was taken at the return of arterial sounds (Korotkoff phase I), and disappearance of sounds (Korotkoff phase V) indicated diastolic blood pressure. The average of 3 consecutive recordings of both systolic and diastolic blood pressure was used to obtain the representative values. Blood pressure values from subjects remaining on antihypertensive therapy were nonparametrically adjusted for treatment effect according to the algorithm used previously in analyses of Framingham data. 6 In brief, the adjustments were based on a nonparametric method in which the blood pressures of an individual receiving antihypertensive treatment were shifted upward by adding to them the mean of the residues of the age-regressed blood pressures of those with higher blood pressures and the individual itself. Both genders were analyzed separately. Observations from individuals not taking antihypertensive therapy remained unchanged.
Identification and Localization of Genetic Loci Within the Candidate Chromosomal Region
Eight microsatellite markers (D5S1480, D5S636, D5s820, D5s2093, D5s1471, D5s1456, D5s462, D5s211) spanning the 35-cM region on the distal portion of long arm of chromosome 5 (5q31.1-qter) were initially selected for molecular analysis. A set of 3 additional markers (D5S500, D5S642, D5S494) located proximally from D5S1480 and covering the distance of ≈20 cM was chosen at a later stage to define the linkage region more accurately (Figure).
Multipoint linkage analysis of systolic blood pressure to 5q31.1-qter chromosomal region. D5s494 through D5s211 represent microsatellite markers within examined region. Distances are shown in centiMorgans.
Location and distances between the markers were obtained from public databases, including the Center for Medical Genetics at Marshfield Medical Research Foundation (http://research.marshfieldclinic.org/genetics) and the Genetic Location Database (http://cedar.genetics.soton.ac.uk/pub/chrom5/map.htm). Candidate genes were localized within the region by use of integrated information from the Unified Database for Human Genome Mapping at the Weizmann Institute of Science (http://bioinformatics.weizmann. ac.il) and the database at University of California, Santa Cruz, Human Genome Project Working Draft (http://genome.ucsc.edu). Single nucleotide polymorphisms within the ADRB2 gene were identified using the Human Genome Variation Database (http://hgvbase.cgb.ki.se) and the Single Nucleotide Polymorphism (SNP) Database of the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/SNP/index.html) and were then prioritized based on their ability to affect the function of the receptor.
Genomic DNA was extracted from whole blood samples by use of the MasterPureTM DNA purification kit (Epicentre Technologies). Each microsatellite marker was amplified by polymerase chain reaction (PCR) using Tetrad DNA Engine (MJ Research). The sequences of the primers were obtained from Genome Database Bank (http://www.gdb.org) and sequenced (Applied Biosystems). The total volume of PCR master mixture was 20 μL and included 25 ng of genomic DNA, 200 μmol/L of each dNTP (Promega), 10 pmol of forward and reverse primer, 0.2 U of Taq DNA polymerase (HotStarTaq, Qiagen), 10× PCR buffer (with 1.5 mmol/L of MgCl2, Qiagen), and polyoxyethylene ether (W-1 solution, Life Technologies). The 5′-end of the forward primers was labeled with 6-carboxyfluorescein (FAM) or its fluorescent analogs (HEX, NED). DNA polymerase activation in 95°C for 1 minute was followed by 34 cycles of 94°C (1 minute), annealing (1 minute), 72°C (1 minute), and the final extension in 72°C for 10 minutes. PCR products were pooled and resolved on 5% polyacrylamide gel by use of an ABI 377 Sequencer (Applied Biosystems). Genotyping was performed by use of Genescan and Genotyper software (Applied Biosystems), independently by 2 individuals, who were unaware of phenotypic data.
PCR amplification of the DNA fragment containing Arg16Gly polymorphism of the ADRB2 gene was performed in a volume of 20 μL, including 25 ng of genomic DNA, 200 μmol/L of each dNTP (Promega), 10 pmol of forward and reverse primer, 0.2 U of Taq DNA polymerase (HotStarTaq, Qiagen), and 10× PCR buffer (with 1.5 mmol/L of MgCl2, Qiagen). DNA polymerase activation in 95°C for 15 minutes was followed by 34 cycles of denaturation (95°C, 1 minute) annealing (52°C, 1 minute), and extension (72°C, 1 minute), with a final extension (72°C, 10 minutes). The PCR product was digested with 2 U of BsrDI restriction enzyme (New England Biolabs) in 60°C for 6 hours and resolved on 3% 1000-agarose (Life Technologies) gel containing ethidium bromide, with subsequent visualization with a Fluoro-S Multi-imager (Biorad). The primers and the sizes of digestion products were the same as described previously. 7
The amplification conditions for DNA fragment containing Gln27Glu polymorphism of the ADRB2 gene were similar to the Arg16Gly polymorphism, except for the primers sequence and annealing temperature (63°C). The PCR product was digested with 1.5 U of ItaI restriction enzyme (Roche) in 37°C for 20 hours, resolved on 2.5% 1000-agarose (Life Technologies) gel containing ethidium bromide, and visualized by means of a Fluoro-S Multi-imager. The sequence of the primers and the sizes of digestion products were the same as described previously. 7
The PCR conditions for the segment containing the Thr164IIe SNP of the ADRB2 gene were similar to that of the Arg16Gly polymorphism except for the sequence of primers and annealing temperature (55°C). The PCR product was digested with 2 U of MnlI restriction enzyme (New England Biolabs) in 37°C for 5 hours, resolved on 2% ultra pure-agarose (Life Technologies) gel containing ethidium bromide, and visualized by means of a Fluoro-S Multi-imager. The sequence of primers and the sizes of digestion products were the same as described previously. 7 Direct sequencing of the DNA fragment, including 3 functional SNPs within the ADRB2 gene, was performed in 15 randomly selected unrelated individuals to confirm the results of restriction fragment-length polymorphisms.
Verification of genotypes for inconsistencies in Mendelian segregation was performed by means of PEDCHECK program. 8 Multiple methods were used for linkage and association analysis of both qualitative and quantitative traits because this provides greater reliability of final result. Haseman-Elston regression analysis, based on regressing the siblings’ squared phenotype difference on their genetic similarity (defined as alleles shared identical by descent [IBD]) was used to test for 2-point linkage in case of microsatellite markers and the investigated phenotypes. 9 Another IBD sib-pair test, SPLINK (Unix version 1.08), was applied to investigate for linkage of microsatellite markers to essential hypertension. Confirmatory 2-point linkage analysis based on estimation of alleles identical by state (IBS) at a microsatellite marker compared with the random distribution of alleles was performed for hypertension as a binary trait, by means of IBS χ 2 test. 10 The Haseman-Elston and IBS χ 2 2-point linkage tests were completed with SIB-PAIR program. 11 For further confirmation of the results obtained in 2-point linkage analysis, multipoint nonparametric Z-score rank test was performed with quantitative phenotypes using MAPMAKER/SIBS.
The subsequent strategy, testing for association of essential hypertension with the ADRB2 gene as a positional candidate, was performed using family-based association tests.
The transmission disequilibrium test (TDT) assessing the number of transmitted versus non-transmitted alleles from heterozygous parents to affected (hypertensive) probands (compared with expected 50%/50% transmission/nontransmission ratio) was used to test for association of essential hypertension with the SNPs of the ADRB2. 12 The results of the TDTs were then verified by means of the empirical variance-family based association test (EV-FBAT) method, determining the value of an association test under the null hypothesis of linkage but no association by use of an empirical variance-covariance estimator. 13 Unlike other family-based tests, this method is not affected by pedigree configurations and can be used in case of binary, quantitative, or time-to-onset traits, as well as multi- and biallelic markers. 13
Clayton’s modified TDT was performed in case of haplotype combinations, using the program TRANSMIT. 14 This test calculates a score vector that is equalized over all possible combinations of parental haplotypes and transmissions, in concordance with the observed data, and deals with the problem of partially unknown parental genotypes and haplotype phase uncertainty. 14
Binary logistic regression analysis was performed in the parental generation to test for association between essential hypertension and each genetic variant of the ADRB2 in the presence of other covariates, including age, gender, and body mass index.
There were 629 individuals (age, 45.8±15.7 years) from 207 families, with 313 (49%) men and 316 (51%) women included in the final analysis. Of these, 401 (63.7%) subjects were hypertensive, and 270 (67.3%) of the hypertensive subjects remained on treatment. The demographic and clinical data of all individuals divided into probands, parents, and siblings are shown in Table 1.
Table 1. Demographic and Clinical Characteristics of the Individuals in Silesian Hypertension Study
Linkage Studies on 5q31.1-qter
All the microsatellite markers were highly informative, with the number of alleles from 8 (D5S820, D5S462, D5S211) to 15 (D5S494), and heterozygosity from 60% (D5S462) to 82% (D5S500).
In the first set of 8 markers, there was a significant linkage of the D5S1480 microsatellite marker to essential hypertension in 2-point linkage analysis (Table 2). The other markers did not show statistically significant linkage to this phenotype. Two-point linkage analysis of systolic and diastolic blood pressures (adjusted nonparametrically for treatment effect) showed statistical significance of the same marker (Table 2).
Table 2. Results of 2-Point Linkage of Microsatellite Markers Spanning 5q31.1-5qter Region to Essential Hypertension and Systolic and Diastolic Blood Pressures
In view of the marginal location of the significant marker assigning the linkage within the candidate region, we also performed similar 2-point linkage studies for a set of 3 markers located proximally from D5S1480. Two of these markers (D5S500 and D5S642) were significantly linked to essential hypertension, reaching linkage values of t=2.45 (P=0.008) and t=1.96 (P=0.03), respectively. Consistently, Haseman-Elston regression analysis of adjusted systolic and diastolic blood pressure revealed that the marker most proximal to D5S1480–D5S500 reached a borderline significance level in 2-point linkage analysis with diastolic blood pressure (P=0.06) and systolic blood pressure (P=0.09).
A joint multipoint analysis of all 11 microsatellite markers indicated that a region of ≈7 cM in close proximity to D5S1480 and D5S500 is linked to systolic blood pressure (Figure). The same tendency was evident in multipoint linkage analysis of adjusted diastolic blood pressure, with the maximal Z-score of 1.8 at the position of the D5S1480 marker.
Association Studies of the Positional Candidate, ADRB2 Gene
Arg16Gly, Gln27Glu, and Thr164Ile polymorphisms were not associated with essential hypertension in the TDT (Table 3). This lack of association was confirmed by EV-FBAT test (Arg16Gly, P=0.67 Gln27Glu, P=0.55). Testing for association of essential hypertension with Thr164Ile polymorphism using EV-FBAT could not be performed because of a rarity of Ile allele (only 17 individuals in the study were carriers of this allele).
Table 3. Arg16Gly, Gln27Glu, and Thr164Ile Polymorphisms in TDT Test-Transmissions of Alleles From Heterozygous Parents to Offspring With Essential Hypertension
Among 7 observed haplotypes (denoted A through G), B, D, and F represented the most common variants, comprising 97.4% of the total haplotypes (Table 4). The number of transmitted haplotypes from parents to hypertensive offspring was not significantly different from the expected number of transmissions (Table 4).
Table 4. Haplotype TDT Test for 3 Functional SNPs of the ADRB2
In the binary regression model, including age, gender, and body mass index as potential cofounders, none of the ADRB2 polymorphisms were associated with hypertension. The odds ratio for hypertension in subjects homozygous for a wild variant compared with heterozygous individuals and homozygous for a mutant allele was 0.85 (95% CI, 0.3 to 2.2 P=0.74) and 1.46 (95% CI, 0.5 to 4.2 P=0.49) for Arg16Gly, 1.33 (95% CI, 0.6 to 3.1 P=0.5) and 1.54 (95% CI, 0.5 to 4.4 P=0.43) for Gln27Glu, and 0.7 (95% CI, 0.1 to 4.9 P=0.72) for Thr164Ile, respectively.
In the present study, the maximal linkage was detected both in 2-point and multipoint analysis at the same chromosomal position corresponding to the D5S1480 microsatellite marker. In contrast, the linkage analysis of systolic blood pressure performed by Krushkal et al 2 on the same chromosomal region implicated different markers located proximally to the telomere. This discrepancy is not surprising and may reflect several differences in ethnic (European versus American origin), demographic (age), and clinical (normotension versus hypertension) profile of the subjects between these studies.
To avoid a potential bias that may arise from a linkage analysis of a dichotomous trait based on arbitrary categorization (hypertension), we performed additional studies of systolic and diastolic blood pressures, detecting consistent linkage in the proximal segment of 5q31.1-qter chromosomal region for both qualitative and quantitative traits.
A qualitative-quantitative joint analysis has been postulated to increase the evidence for linkage, especially in case of multifactorial diseases, 15 and it has been widely implemented in studies aiming to dissect genetic predisposition to atopic complex disorders. 16,17
Consistent linkage signal obtained in 2-point and multipoint analysis narrowed down searches for candidate genes to the chromosomal segment assigned by D5S1480 and D5S500. Among several candidates for further positional analyses, we selected the ADRB2, the gene located in close proximity to the marker of the highest linkage. The priority was given to this candidate also in light of the well-documented role of the ADRB2 in blood pressure regulation and its essential contribution to the development of several cardiovascular and metabolic phenotypes related to hypertension. 18
To test for the relationships between essential hypertension and the ADRB2, we performed association studies of 3 functional SNPs within the coding region of this gene. Arg16Gly, Gln27Glu, and Thr164Ile were selected for further analysis in light of the data regarding their influence on agonist-mediated receptor downregulation and affinity in vitro, 18 as well as vascular desensitization in vivo. 19 The lack of association of the ADRB2 polymorphisms with essential hypertension was evident in the TDT and verified by less conservative EV-FBAT test. Furthermore, these results were confirmed by the haplotype analysis. For the 2 common SNPs, our study had 86% power to detect association (P<0.05), with an odds ratio of 1.6.
The relationships between the polymorphisms of the ADRB2 gene and cardiovascular phenotypes have been assessed in European, American, Japanese, and African Caribbean populations, 20–27 and the apparent lack of consistency in the results among these studies may be, at least partially, attributable to ethnic differences. Our results, analyzed in context of other European studies, are in agreement with the data obtained from English, 20 French, 21 and nondiabetic Swedish 22 subjects. In contrast, association of the polymorphisms within the ADRB2 gene with blood pressure has been reported in Finnish, 22 German, 24 and Austrian 26 normotensive populations. It should be noted that all investigations on European normotensive subjects suggest the association of the ADRB2 with blood pressure, whereas the European studies involving hypertensive individuals are negative. One of the possible explanations for this discrepancy could be a pleiotropic effect exerted by the ADRB2. Its contribution to several cardiovascular and metabolic phenotypes (insulin resistance, obesity, heart failure) 18 of significantly different prevalence among hypertensive and normotensive individuals is well documented and cannot be excluded as a factor confounding the relationship of the ADRB2 and blood pressure. Whether the suggested different role of the ADRB2 in normotensive and hypertensive populations may be caused by its synergistic influence on multiple cardiovascular/metabolic phenotypes (acting as potential cofounders) or represents a distinct genetic background of high and low blood pressure remains to be elucidated.
The question that remains to be answered is which locus within our candidate region may be responsible for the observed linkage. Several genes coding for proteins involved in blood pressure regulation and hypertensive complications located in close proximity to D5S1480 and D5S500—such as annexin VI, glutathione peroxidase 3 gene, platelet-derived growth factor receptor-β gene, fibroblast growth factor-1, and glucocorticoid receptor gene—seem the most obvious positional candidates for further studies.
Our study implicates a short 7-cM segment on the long arm of the chromosome 5 as harboring a gene or genes for human essential hypertension. Furthermore, detailed haplotype analysis of three functional SNPs excluded ADRB2 as a causative gene. Further studies will focus on the remaining positional candidate genes, thus bringing closer the dissection of complex cardiovascular traits.