Why doesn't Sanger (fluorescent) DNA sequencing double count nucleotides?

Why doesn't Sanger (fluorescent) DNA sequencing double count nucleotides?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

My understanding is that PCR is carried out until a fluorescent nucleotide halts replication. The segments of DNA are fed through the capillary tube based on size, sifting through the segments from smallest to largest; subsequently, the sequence is read from one end to the other.

I'm confused as to how we know that each nucleotide in the sequence has been accounted for, or if a nucleotide has been counted twice. If two fragments of the same size, ending with the same ddNTP, pass through the capillary tube at slightly different times, how do we know if that specific nucleotide hasn't been double-counted?

In chain-termination sequencing, a population of molecules is detected as opposed to a single one. The readout looks like this:

[ ]

The peaks represent intensity of the four different fluorophores at the detector. Every fragment that is terminated at the same spot will have the same fluorophore and the same length. Because capillary gel electrophoresis separates molecules by size, these fragments will pass through at approximately the same time. "Approximately the same time" is why you see broad-ish and overlapping peaks on the readout as opposed to sharp and distinct lines.

Ideally, the readout will appear like the left half of the image, where there's minimal overlap between peaks. However, often you will see overlapping peaks, as in the right half of the image. The link above explains many reasons for why this occurs. Whether or not a peak is correctly assigned is determined by Phred scores, which is a statistical analysis of the shape and resolution of each peak.

Sequencing Genome

Download the video from iTunes U or the Internet Archive.

It can't go without at least some acknowledgement dimension.

If you should ever find yourself in life in a situation where you have or are about to give up all hope, you think things are utterly impossible and there's no way, you will remember this week that nothing is impossible.

It is possible to come back three games down in the bottom of the ninth inning, you've got to believe you can do it, and remember to have Dave Roberts pinch run.

Just a general bit of good advice. What an amazing week, just absolutely amazing week. Wow. There are lessons in life to be taken from it. Please do take them.

You know, there really are. I mean I'd given up hope by that point, I confess. I wish I could say oh, I knew they were going to pull it out, but I didn't.

And, boy, they pulled it out one game at a time.

So, all of you think good thoughts this week. This could be a historic week, you know, you were here. Anyway, onward.

We were talking last time about how to analyze your clone. The notion of cloning random pieces of DNA, identifying your clone within a library, purifying the DNA from a clone, doing some preliminary analysis by maybe cutting with a restriction enzyme, then sequencing it using these techniques that I'd described would allow you to take the clone that was, say, able to rescue the yeast that couldn't grow without arginine and figure out what its DNA sequence was. You could take the clone that you had obtained by hybridizing with the DNA sequence corresponding to the protein sequence for beta-globin and sequence it and see the beta-globin gene sequence perhaps. This is very powerful.

I want to take a brief moment, we'll come back to it in more detail in a subsequent lecture, but I really described how you would sequence one clone. I just want to make a note, because someone asked about it last time, about how you would sequence an entire genome. Someone asked about this.

Remember before we pulled out our clone, we sequenced it, we got its DNA sequence. What if I wanted to sequence the entirety of a genome? Yeah. Do a lot of this, right, basically if I got a whole genome. Well, somebody asked could I put a primer here and just sequence?

It would take a very long time. And it turns out that it wouldn't work because the separation that you can achieve through gels is a function, the separation between N and N plus 1 in length goes like the logarithm of the ratio. So, it turns out that when N and N plus 1 get to like about a thousand, you can achieve very little physical separation between them. And so, DNA sequencing runs cannot go much past the thousand bases. So, the problem with sequencing a genome by putting down a primer on an extraordinarily long piece of DNA, a hundred million bases, is you cannot separate the little fragments like that. So, what you do is you break up your genome into lots of pieces. One strategy, break it up into a library of some very big pieces. It turns out you can make pieces at random of a hundred thousand base pairs.

Cloning these in bacterial artificial chromosomes, as we talked about before. Take a library of bacterial artificial chromosomes and then begin sequencing them.

And take any given bacterial artificial chromosome and break it up into a whole lot of pieces that are maybe a thousand bases long, and you could sequence all of those. How do you arrange to get just a perfect overlapping set of thousand based pair clones that perfectly tile across the sequence with no redundancy?

You don't. That's the correct answer. That's how you do it, you don't. Instead you just randomly take a bunch of things.

And, in fact, typically you might take clones that give you six or eight-fold redundancy. You just sequence a lot of clones and then you ask the computer to reassemble it.

And, in fact, all that overlap is very good for being able to stick these pieces together. Sometimes people do such things as take pieces that might be four thousand bases long and sequence a thousand bases here and a thousand bases here by using a primer that starts there and a primer that starts there. And then you can get DNA sequences from two ends of a clone. And if you had that for zillions of clones your computer program might do an even better job of linking things up. It's one very big crossword puzzle of putting together all of these pieces, a jigsaw puzzle of putting together all these pieces. But, in effect, this is how you sequence a big piece of DNA. You chop it up into medium-sized pieces of DNA and then tinny pieces of DNA, you sequence them, and you use computational science to reassemble it.

Some people, for some genomes, take the whole big genome and immediately go to lots of little pieces. That can work, too. I depends on exactly how complicated your genome is.

In the human genome, there are some parts of a human genome that are almost identical that might be like 99.91% identical in two different parts of the genome. And so, if you do that, you may have trouble telling those pieces apart. So, for really complicated genomes people like sometimes breaking it up into intermediate-sized pieces. But basically the idea of sequencing a big piece of DNA by this process is referred to as shotgun sequencing. Shotgun sequencing, in fact, was developed in about 1980 by Fred Sanger, the same guy who developed the DNA sequencing technique that I told you about using polymerase and dideoxynucleotides.

Sanger very quickly wanted to go from sequencing a single piece to sequencing pieces, and so he developed the shotgun technique there. And it's now been applied in many different forms of intermediate shotguns, whole genome shotguns, et cetera. So, that's in reply to the question someone asked last time about, well, how would you do a whole genome? And, as a matter of fact, this is not theoretical because, in fact, people do hold genomes this way. And we do this at MIT.

Lots of genomes get done here in this fashion. Someone else asked how would you analyze your clone. And, again, I'll just make a brief remark on that in response to the question. So, analyzing some DNA sequence. So, suppose we got some DNA sequence, A-A-T-A, don't bother writing this down.

I'm just making up letters here. How would we make any sense of it?

Suppose I give you the ones and zeros from your hard-drive, how would you make any sense out of them? This is about as interesting as the ones as yours from your hard-drive, right?

It's got four letters, not two, but this is actually what you get out of any project. You want to sequence beta-globin?

You'll get something like this. You want to sequence the arginine gene? You want to sequence the human genome? You get a very long string of four letters. What do you do with it?

Oh, well, you could compare it to a normal copy of the gene.

And if I did that I might find a bunch of differences.

But how would I even know where the beta-globin gene was within this sequence? This clone contains beta-globin. How would I even find the exons? Yes? Or whatever? Look at codons.

So, let's start looking at the codons. This codon here?

Well, or this, maybe it's this codon here. Sorry?

Find it. Do you see any start codons here? Oh, there's an ATG there. So, maybe that's the start codon or maybe not. How often do we expect to find an ATG in some reading frame? You know, it could happen fairly easily. Also, how do we know it's going this way?

Maybe we should look for an ATG, we'll put it there, going this way. Sorry? I drew the arrow there?

Well, that's because it's where the sequence started out on my page.

It doesn't tell me my gene runs that way. Yes? From five prime to three prime. Ah, but it's a double-stranded piece of DNA. You see, if it's five prime to three prime on this strand, the genome has a, another strand that reads the other way. What did I get? C-A-T-A, right, C-C-T, et cetera. And the gene could be encoded on this strand.

This could be the coding strand, that could be the coding strand, and looking for a mere ATG in one of three possible reading frames on one of two possible strands I'll find all sorts of stuff.

So, sorry? Guess. Guess. Guess is good.

They don't, don't, won't you, you remember we talked about getting papers accepted. If you were to write up the paper that way the reviewing would probably ding it and say, you know, the guess isn't, isn't good enough. So, that's actually very interesting. How do you actually find the gene sequence? Well, it turns out to be a non-trivial problem which often gets glossed over in the textbooks.

What you might do is if something really were exonic, if this were any exon, does it have any properties that you can think of? It shouldn't have a stop codon. No stop codon.

How often does a stop codon occur at random in a given reading frame?

How many stop codons are there? Three out of 64 possible codons.

There's about one in 20 codons in any given reading frame is a stop codon. So, that means if I read for about 20 codons, and I don't encounter a stop, it's beginning to get more likely that that's not random. If I read for, say, 60 codons, 180 bases and I've encountered no stop codon in that reading frame, that chances of that occurring is about either the minus three or so, right? Because if I went through three characteristic lengths, either the minus three, you know, and I don't know, about 5% or something like that. If I went for thousands of bases without any stop codon, would you be impressed? That's pretty impressive. So, all I have to do is find the few thousands of basis with no stop codon. The problem with that is that in bacteria there are some genes that are a thousand bases long and you, there, you can read them and they have no stop codon. What's the problem with the human genome? Introns. It turns out that because the coding sequences are broken up into small exons, if I found a thousand bases with no stop codons then it's very likely coding sequence.

But a typical human exon is on the order of a 150 to 200 bases.

Very inconvenient because, you know, it's a typical exon encodes 50, 60, 70 codons. So, it turns out that even that is not so easy to do. Well, the answer is it's not a trivial problem. People do all sorts of things to figure out how to decode sequences of genomes. You do run computer filters across there that say, look, there are a bunch of consecutive codons without stop codons.

There tend to be little preferences, like amongst the synonymous choices of stop codons, humans tend to prefer one stop codon, one codon for a specific amino acid over others. So, there are some biases as to which codons get used.

And the computer can kind of take a little bit of account of that.

Then you can also have made a library of seed DNAs and sequence seed DNA, the mRNA which will help you a lot and look for where they match up. Then you can take sequences from the human and the mouse. And it turns out that the sequences in the mouse and the sequences in the human, if you line them up, the exons tend to match up better than the introns because evolution cares a lot about the exons. But it turns out this is not a trivial problem. And even today, if I give you a random stretch of human DNA, it's not, there is no simple computer program that it's on, that on its own, not even a complicated computer program, but on its own would be able, without axillary data, to accurately pick out all the genes.

Even for simple bacteria, we cannot nail perfectly all the genes. Although, the lack of these introns means that the exons tend to be pretty big, it means the coding are pretty big and we can kind of do it. So, I just wanted to point out that, that there's a lot still to be done there. The cell manages, thank you, to read this just fine, but we're not as smart yet as the cell, and so we're not totally able to read out all this stuff.

We'll come back to genomics in a, in a further lecture. Yes? Yeah, wouldn't -- What a cool idea.

Yes. There are actually some experiments, which maybe if you remind me we can, I can work it into a subsequent lecture, but people have some experiments where they can randomly mutagenize zillions of bacteria and determine which ones will grow and which ones won't. And they can do it all in parallel in a single test-tube. And thereby you can tell which, which nucleotides in the genome matter and which don't.

It's a kind of cool procedure. OK. Anyway, I just wanted to sort of tie up that bit here. Now, let's move on to re-sequencing a gene. So, let's suppose we've managed to sequence, I don't know, the human genome, the entirety of the human genome we have before us, OK? Actually, next week in the journal Nature will appear a paper reporting, in fact, today, yester, no, yesterday, in fact, it was yesterday, yesterday appeared in the journal Nature a paper reporting the finished sequence of the human genome.

http://www.nature. om/nature/journal/v431/n7011/pdf/nature03001.pdf And so, anybody who wants to go online, in fact, we can get, we'll get copies for the class. Why don't we get you copies of the paper? It's not as long as the last one. But, in fact, I didn't realize it was yesterday. I thought it was next week. Yesterday came out the final report on the finished sequence of the human genome, which a number of us have been laboring on for quite a long time.

And it just appeared. So, we actually have that now.

It actually, it's been on the Web for a while, but the paper describing it took a while to write up and it came out yesterday.

So, we'll get you a copy of that. But now you've got that whole sequence of the human genome here. I've been, you know, I've been working on this paper with people for so long that, you know, I hadn't actually paid attention to the fact that it just came out. You don't want to know how long it took to write this.

The paper, actually, is unusual. It's the only paper that Nature has ever published where the author list is sufficiently long that we don't have it in the Journal. There's a website that contains the author list. There are, I believe, I don't have the final count, but something in the neighborhood of about 200 authors to the paper. We decided that everybody who'd worked on it should be a co-author of the paper, and we just put it all on a website.

So, anyway, I digress. So, suppose we have the beta-globin gene here.

So, I've got that in -- I've got the normal form of the beta-globin gene, or I've got one person's form, in any case, in the human genome sequence. Now I want to take a patient with sickle cell anemia and I want to re-sequence their gene.

Now, remember what we said, we would, we would make a library from that person, right? So, we'd get that person's blood, we'd purify DNA, we'd cut it, we'd clone it, we'd probe the library with a radioactive probe for the beta-globin gene, we'd pull out the gene and we'd re-sequence it.

Suppose we wanted to do that to a hundred patients.

For every patient we'd get blood, we'd make DNA, we'd clone in a plasma, we'd made a whole plasma library, plate it out on filter, probe it with a radioactive probe, pull out the clone and sequence it.

Now, for any such library you probably need to look through a couple hundred thousand clones to find beta-globin.

So, for your DNA and your DNA and your DNA and your DNA, we're going to make libraries of a hundred thousand clones, that's a couple, that's a lot of plates, right?

We're going to put them all on, on nylon filters in these Seal-a-Meal bags with these radioactive probes, and we're going to look for your beta-globin clone, your beta-globin clone, your beta-globin clone, your beta-globin clone, et cetera, et cetera, et cetera. This is really boring. Do you realize how off putting it would be to study sickle cell anemia if we had to do that for each successive patient, make a whole library?

But that was what you had to do in molecular biology because that was how you got the gene. You build a whole library, you withdraw it from the library. However, if you wanted to do this, could you manage to get the beta-globin sequence from your genome without having to make the whole library?

It turns out, and I know it's been covered at least in some of the sections, there's a cool technique to do that. And what is that technique? PCR. So, it turns out that the next really great advance in molecular biology was the technique of PCR.

And what PCR was a way, is, is a way to obtain a piece of DNA corresponding to an already known gene, you have to already know the gene, and what it allows you to do is then obtain that piece of DNA based on knowing at least some of its sequence.

It allows you to amplify just that DNA from a, from any individual.

So, as compared to the experiment where I make a library for you and a library for you and a library from you and a library from you, each of which could take a month, PCR would allow us to do it in principle in five minutes. And, actually, there are machines that would let you do it in five minutes.

So, let's discuss how this PCR works. Nobody uses the five minute machines because you usually will then wait an hour or so, but anyway. Suppose I take my DNA sequence here from the human genome. Five prime to three prime.

Five prime to three prime. This sequence here beta-globin.

I want to obtain that sequence. The first thing I do is I'm going to heat my DNA sample to maybe 97 degrees Celsius to denature.

Denaturing means, of course, breaking the hydrogen bonds that separate the two strands so that the strands come apart, five prime to three prime, five prime to three prime.

Now, what I then do is I take a specific DNA primer matching this stretch just before the beta-globin gene starts.

Or just before where I'm interested in. How do I make a primer that matches just that sequence? I order, well, how do I know what to order? I know the sequence, right?

I've got the sequence already. I just look at it and I say I want that sequence. And then how do I get it?

I order it. I type it into the Web and the machine will synthesize me this, this primer. Typically a 20-base stretch will suffice. So, I'll get me a twentymer, a 20 base oligonucleotide complimentary to the sequence on this side of the gene.

What I'm also going to do is the same thing over here.

I'm going to get a second primer. This is primer number one. This is primer number two. OK? Now, let's see.

Five prime. This is five prime, five prime. Now what I'd like to do is add polymerase, I'd like to add dNTPs.

So, plus DNA polymerase plus dNTPs.

And what will happen? Polymerase will come along and start copying my DNA, but it will only copy it starting from the primers. Now, this will keep going, of course, but DNA polymerase doesn't go forever, you know, the reactions sort of stops at some point.

And so you'll get a strand going off here and a strand going off there. Now, notice what I've done.

I started with an entire human genome, and the number of copies of beta-globin was one per genome. When I'm done with this process, how many copies, how many double-stranded copies of beta-globin do I have? Two. That's still very little, but it's more than I had before. So, what do I do next?

Repeat. So, let's heat up that sample again. We'll denature at 97 degrees, and now we have our initial strand here, we have our strand that came off this primer that runs to here and maybe goes forward, we have this strand here. We have this strand here.

And this was five prime, five prime, five prime, five prime.

Now what do we do? We repeat. We'll take our primer, this is primer number one, let's see. It matches over there.

Primer number two over here. Number one over here. Number two.

Have I got this right? Yes. Good. Then where does this guy stop?

Right at the end where my other primer was. This guy runs along here. That guy stops right at the end.

That guy might go a little further. How many copies of the beta-globin gene do I have now? Four. Two of which, by the way, perfectly sit between my pink primers. What's going to happen if I do this again? How many copies will I get?

Eight, six of which will sit perfectly and two might be a little ragged as to where they go. So, initially, after cycle number zero, that is initial conditions, the number of copies relative to the genome was one.

After one cycle it's two. After two cycles it's four.

After N cycles it's two to the N copies. Is that clear how the PCR works? And that on every round you're doubling. And, with the exception of those two white things that go off to the side, they're going back and forth and back and forth between the two primers you chose to put in.

What is when N equals ten, what do you got? A thousand copies.

What happens when N equals 20? A million copies.

What is the copy number of beta-globin? Beta, let's suppose beta-globin, for the sake of the argument, sake of argument is about one thousand bases.

What fraction of the human genome does beta-globin represent?

Yeah, about a millionth for the genome. No, actually, one three millionth, but we'll call it a millionth.

So, after I've made a million-fold amplification of beta-globin, beta-globin now represents half of the stuff that's in the tube.

What would happen if I go another ten rounds? How many copies do I have? A billion copies. So, in other words, I started with something that was only present at about one one-millionth of what was in my test tube. If I could make a billion copies of that specific molecule, now it so dominates the mixture that it is a thousand times more abundant than the rest of the genome.

It works. That's the remarkable thing, this works.

Any questions about the technique? Now, yes? Well, I need two primers in their sequence. How many copies do I need of each of those primers? Well, I, I obviously need a lot of copies of those primers. So, primer number one, it's a single sequence, but when I order it from the company, I'm going to order me a boat load, a lot of that primer. So, I'm adding, I better add a billion molecules of that primer because I'm going to make a billion copies starting from such primers.

But if I have a billion copies of, of number one and a billion copies of number two and, you know, these days, billions aren't such big, you know, molecules are Avogadro's number and all that. It's not hard to get things.

So, you throw in huge excess, a massive excess of primer number one, a massive excess of primer number two, and you just do this.

Now, I mean, what does it cost to make such a massive excess of a primer? It's about ten cents a base, so it's two bucks, two bucks per primer give or take. You know, so I can get you a better price if you want, but, you know, anyway.

It's not a bad price to, to buy primer. So, you can just go out and order a pair of primers. You can have them tomorrow.

And then all you have to do is add the primers to the, so I take DNA. Do I, I, I need DNA from you.

It turns out I could draw your blood and purify DNA and all that.

But it turns out that if all I wanted to do was amplify one locus, I could actually take a Popsicle stick and ask you to scrape the inside of your cheek. That'll get enough cells off from the inside of your cheek, stick it in a test-tube, and it'll actually have enough DNA there. It turns out this is a very sensitive and powerful technique, so, but before we get to that notice what we had to do. We had to heat our DNA to 97 degrees and add polymerase. Then we heat again to 97, add polymerase, heat again to 97, add polymerase. Why do I have to keep adding polymerase? Because polymerase gets roomed at 97 degrees so it's denatured. So, the nuisance about PCR is I have to go to my Eppendorf plastic tube, pop open the lid, stick in some DNA polymerase, close it up, stick it back in a heating bath, let it go for a while, take it out, pop it open, add some more polymerase, put it back in the heating bath, pop it out. And this is actually the way primitive scientists did PCR not so long ago, OK? Wouldn't it be cool if we could engineer a DNA polymerase that didn't denature at 97 degrees? Because then what we could do is just add the polymerase, close up the tube, put it in a machine that goes heat, cool, heat, cool, heat, cool, heat, cool, but you would have, so how do we, what kind of cleaver biological engineering do we use to modify polymerase so it won't denature at 97 degrees?

Yes? Get it from a bacteria. What kind of a bacteria would you ask for an enzyme that could work in, in basically boiling water?

Bacteria that basically live in boiling water. Where would you look for such? Thermal vents.

You'll, geysers, things like that. Life lives everywhere. What you go is you find yourself a bacterium, so you find bacteria that lived in geysers or in thermal vents and you purify their DNA polymerase. The most famous one comes from the organism, the bacterium called thermos aquaticus, aquaticus.

Which of course means hot water, right? That's what the bacteria is called, thermos aquaticus. And, or, and its enzyme is called tack, tack. So, we'll refer to it often, Taq polymerase, meaning from this bacteria thermos aquaticus, OK? So, that's Taq. So, it turns out that you can do this now without having to open and close the test-tubes.

Oops, I meant to put that here. How sensitive is PCR? It's very sensitive. You could do, so applications of PCR. Very versatile. First let's just re-sequence a gene.

Gene from yeast or from human. You just need, you know, any DNA sample. Get my gene, get my primers, and as I was indicating with a Popsicle stick, I don't have to have it very pure, although in a laboratory you go to the trouble of making it pure because you want it to be pure and all that. Yes?

Correct. Yeah. So, remember I was making a fuss over the accuracy of replication, right? And I said that on its own a polymerase might have an accuracy to only about ten to the minus five.

So, now, what were the two mechanisms for, for repairing DNA, for proofreading DNA?

One was a built-in proofreading activity that the enzyme had.

The enzyme would have put in a base, would check the base, and that actually helped by an order of magnitude or two.

And some of these polymerases have a proofreading activity.

But then we also discussed the mismatch repair activity that would later come along and detect mismatches. You're absolutely right, PCR is not as accurate as cells because it doesn't have that mismatch repair activity. So, when you take a PCR product, if I were to clone, so if I were to take all the PCR product, say, from my beta-globin gene, so I'm going to take my test-tube, I'm going to add my primers and everything, I'm going to PCR, I'm going to PCR, and then I'm going to get a lot of copies of beta-globin. If I were to take that beta-globin and just directly sequence the DNA in the test-tube. Here's my pieces of beta-globin.

I can now sequence it by adding a primer and doing my fluorescent sequencing and running it on a sequencer and all that.

Sorry, going the other way. I'll run a sequencing reaction.

I could actually do it, and what I do it on is the whole population of a million or a billion molecules. If any one of them is wrong it's going to be swamped out by others, OK? Because I could do my sequencing reaction on the whole PCR product.

And random mistakes in one molecule or the other will still be a tiny minority of the votes at any given base, right? But suppose I were to take my PCR product, all these amplified molecules here, and suppose I were to clone them individually and I were to sequence each of those individual clones instead of sequencing a, a mixture of all the products. I would, in fact, see a higher mutation rate. And you're absolutely right.

When people clone PCR products they have to check them afterwards and throw out the ones that are wrong, OK? Absolutely right. Good, good, good. So, you guys are, you know, right on top of the important issues about, about DNA. So, so I can, I can take a gene and I can re-sequence it. I can also do things like take blood and look for the presence of a virus.

So, I could re-sequence beta-globin and study people and see who's got sickle cell anemia and all that. I could take blood and I might want to say do I see the HIV virus present in someone's blood?

For example, HIV testing can be done by making PCR primers for the sequence of the HIV virus. It has a genome.

Taking a human's blood sample and PCR-ing it. If you get a positive PCR product, a PCR product that is made by these two primers and if, for example, you checked that it, that it gives you the HIV sequence then you know that that blood sample has, that person has the HIV virus.

This is a way to do this. The PCR reaction itself is fast.

Typically takes hours. In fact, can be forced to go much more quickly by machines that rapidly thermocycle.

And you can actually PCR in five minutes, although people don't do it very often, but if you put a thin glass capillary and go heat, cold, heat, cold very, very quickly, there's a machine from Idaho Technologies that can do it in five minutes, but it's usually not the trouble. And you just put it in and, you know, in a couple hours you'll get an answer there as to whether or not somebody has HIV, for example. So, you can do that to detect relatively low quantities of virus. How low can you go?

Well, it turns out, what's the limit? What's the smallest number of molecules you might be able to detect in a sample?

Theoretically. One. You can't fewer than one molecule, right? So, one might be the limit. So, how could I arrange to have a single molecule in a test-tube? I would like to have a test-tube that has exactly one copy of the beta-globin gene.

What, how's the best, what's the best way to get exactly one copy of beta-globin and put it in the test-tube?

Sorry? You can't. Why? Just one molecule.

I want to get exactly one copy of beta-globin. I could, I could just take total DNA and dilute it so, on average, there's only one copy. Or, actually, is there any way to, I mean can I, I'd just like to buy a package that contains exactly one beta-globin. Sorry? Bind it to something big.

Let's think biologically. Does biology package up a single copy of beta-globin? Sorry? Gametes. How about a sperm?

Let's grab a sperm by its tail here, put it in the test-tube.

It's one copy of beta-globin. So, you can actually take cell sorters and have it cell sort sperm into individual test-tubes.

You now know there's one copy of beta-globin.

Heat it up, it will crack open the sperm, add your primers, you can amplify beta-globin, it's a single copy. That proves its extraordinary sensitivity. You can do it with a single sperm.

You can do it with a single egg also, but harder to come by.

So, with that level of sensitivity, you could do the following. So, single sperm typing. Now, single sperm typing is cool but sort of useless. What are you going to do with it, right? But here's another thing you could do. Embryo typing.

Suppose someone has a genetic disease in their family, maybe it's Huntington's disease. And suppose that the individual with Huntington's disease wants to have kids. Or the individual, sorry, the individual who is at risk for Huntington's disease or breast cancer or whatever wants to have kids.

What you can do is with an in vitro fertilization clinic you're able to obtain eggs, fertilize eggs in vitro, and grow them up in a Petri plate to 8 or 16 cell stage before re-implanting embryos back in the mother. Wouldn't it be cool if we could choose to only re-implant an embryo that did not have the genetic disease? How are we going to do that? PCR. How are we, so what do we do? We take the embryo.

We make DNA from the embryo. We do PCR and we say, ah-ha, this embryo did not have the genetic disease. Problem is it has killed the, the cells there, right, it killed the embryo.

Any ideas? Pull off one cell. Remove a single cell. It turns out that at stage the cells are not differentiated.

If I remove one cell from an embryo at that very early stage, the other cells with make a perfectly happy, healthy baby.

That cell is not necessary. This single cell sensitivity is very valuable because I can actually do single cell genotyping on in vitro fertilized embryos and be able offer parents a chance, the opportunity to re-implant only those embryos that do not have the genetic defect. That's cool. That's really cool.

There are other things you might be able to do. If you're treating a patient with cancer, a patient, a cancer patient and you've given chemotherapy you want to know have I managed to eradicate the cancer cells? And six months later have any of the cancer cells come back? I could look for very low quantities of cancer cells. I can, I can do surveillance for low quantities of cancer cells following chemotherapy.

And, of course, I can also do forensics.

I could take a small sample of blood from the scene of a crime or saliva from the back of an envelope that someone has licked, and I could do PCR and look for genetic variations that distinguish people. And, presumably, you see all that stuff on television all the time. So, that's what PCR is good for.

It's good. All right. Last topic, very brief topic, but I do want to mention. This was being able to analyze a gene directed mutagenesis.

And I won't go through the details of all this, but I just want to at least basically describe the concept.

I could take any piece of DNA, say from a drosophila, and I can mutate the DNA in vitro. I can change this base from a G to a C. There's a right, there's a proper protocol and cooking trick for doing that. It involves putting a certain oligo over it and extending, and it doesn't matter exactly how.

I could insert an extra gene into that. I could use a little restriction enzyme to open it up and stuff something in.

I could delete something from this. Maybe I'll use a restriction enzyme to cut it open, et cetera. Basically I could, I can fuse genes together. I can do whatever kind of construction of pieces of DNA and modifications of pieces of DNA that I would like to do in vitro. I can then take that mutated gene, let's say the gene is an enzyme, encodes an enzyme, and the enzyme has an active site. I could change the code for the amino acid right at the active site to see if that amino acid really matters or not. I can do any of those things.

And I can put this back in an organism. Remember that you, I said you could transform DNA back into bacteria?

Well, you can also do such things as simply inject DNA into a fertilized egg. In fact, at the stage where there's a male and a female pronucleus that haven't fused yet right after fertilization. You can take your little pipette and a needle and you can inject some of the DNA you want into the male pronucleus, and then when the male pronucleus and the female pronucleus fuse and the embryo grows it will have your DNA.

You can make mice that carry whatever gene you've modified like this. You can also not, you, you can also not just modify a piece of DNA and add, this is gene addition, you can also do gene subtraction. You can do gene subtraction and, again, I won't worry about the details here, by taking embryonic stem cells. Much in the news these days, and we may come back to them. And in vitro, working with embryonic stem cells, to transform a piece of DNA that has been arranged to recombine into the gene of interest and know it out.

So, if you build, if you build a piece of DNA in vitro and you put it into a whole bunch of embryonic stem cells you can select, by various cleaver techniques, for those embryonic stem cells that have taken up your gene. And not just taken it up but slammed it into the normal locus in place of the normal locus.

And that way you can knock out a gene. You can do gene knockout.

So, the basic point of this now, to summarize these many lectures is we're now at the point where this picture that we saw at the, at the beginning, function, gene, protein, that we understood now first as a methodology, genetics, biochemistry.

And then we understood how genes encode proteins through molecular biology. These tools of recombinant DNA allow us to move in any direction. You want to find the gene underlying a function, find the gene for Huntington's disease?

We could do it. Clone it based solely on its linkage.

You want to find the gene encoding a protein? If I know its amino acid sequence, I can find the DNA sequence that corresponds it.

If I want to find what a certain protein does, its function, I could get the gene for that protein. I could knock out the gene for that protein and see what its function is.

Suddenly, for the mathematicians amongst the group, this becomes a commutative diagram, which you can chase around in any direction. That is, in a sense, what the 20th century was about, was intellectually these two disciplines merging through molecular biology and then recombinant DNA giving you all the tools that if you're sitting at any place in this triangle you can move this way and that way, from a gene to a protein, from a protein to a gene, from a function to a gene, from a function to a protein. Much of the rest of the course we'll talk about how you use these tools, but this brings to a close this first chunk of the course about the concepts and the methodologies of molecular biology. Now, if you hang on one more minute, this is my last lecture for a while. I won't be, we're having an exam on, we have a quiz on Monday, and then Bob's taking over again.

So, I won't see you for the next week or so. So, two things.

One, I won't see you before the World Series is over so everyone please think good thoughts about the Red Sox. Number two, I will not see you before the election. Vote.

Key Concepts and Summary

  • Finding a gene of interest within a sample requires the use of a single-stranded DNA probe labeled with a molecular beacon (typically radioactivity or fluorescence) that can hybridize with a complementary single-stranded nucleic acid in the sample.
  • Agarose gel electrophoresis allows for the separation of DNA molecules based on size.
  • Restriction fragment length polymorphism(RFLP) analysis allows for the visualization by agarose gel electrophoresis of distinct variants of a DNA sequence caused by differences in restriction sites.
  • Southern blot analysis allows researchers to find a particular DNA sequence within a sample whereas northern blot analysis allows researchers to detect a particular mRNA sequence expressed in a sample.
  • Microarray technology is a nucleic acid hybridization technique that allows for the examination of many thousands of genes at once to find differences in genes or gene expression patterns between two samples of genomic DNA or cDNA,
  • Polyacrylamide gel electrophoresis (PAGE) allows for the separation of proteins by size, especially if native protein charges are masked through pretreatment with SDS.
  • Polymerase chain reaction allows for the rapid amplification of a specific DNA sequence. Variations of PCR can be used to detect mRNA expression (reverse transcriptase PCR) or to quantify a particular sequence in the original sample (real-time PCR).
  • Although the development of Sanger DNA sequencing was revolutionary, advances in next generation sequencing allow for the rapid and inexpensive sequencing of the genomes of many organisms, accelerating the volume of new sequence data.


Sanger, F. Sequences, sequences, and sequences. Annu. Rev. Biochem. 57, 1–28 (1988)

Edman, P. Method for determination of the amino acid sequence in peptides. Acta Chem. Scand. 4, 283–293 (1950)

Holley, R. W. et al. Structure of a ribonucleic acid. Science 147, 1462–1465 (1965)

Sanger, F., Brownlee, G. G. & Barrell, B. G. A two-dimensional fractionation procedure for radioactive nucleotides. J. Mol. Biol. 13, 373–398 (1965)

Wu, R. & Kaiser, A. D. Structure and base sequence in the cohesive ends of bacteriophage lambda DNA. J. Mol. Biol. 35, 523–537 (1968)

Gilbert, W. & Maxam, A. The nucleotide sequence of the lac operator. Proc. Natl Acad. Sci. USA 70, 3581–3584 (1973)

Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977). Refs 8, 9 : The seminal papers by Sanger, Nicklen & Coulson and Maxam & Gilbert describing the first widely adopted methods for DNA sequencing.

Maxam, A. M. & Gilbert, W. A new method for sequencing DNA. Proc. Natl Acad. Sci. USA 74, 560–564 (1977)

Maniatis, T., Jeffrey, A. & van deSande, H. Chain length determination of small double- and single-stranded DNA molecules by polyacrylamide gel electrophoresis. Biochemistry 14, 3787–3794 (1975)

Staden, R. A strategy of DNA sequencing employing computer programs. Nucleic Acids Res. 6, 2601–2610 (1979)

Messing, J., Crea, R. & Seeburg, P. H. A system for shotgun DNA sequencing. Nucleic Acids Res. 9, 309–321 (1981)

Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F. & Petersen, G. B. Nucleotide sequence of bacteriophage lambda DNA. J. Mol. Biol. 162, 729–773 (1982)

Smith, L. M. et al. Fluorescence detection in automated DNA sequence analysis. Nature 321, 674–679 (1986)

Connell, C. et al. Automated DNA sequence analysis. Biotechniques 5, 342–348 (1987)

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

Prober, J. M. et al. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238, 336–341 (1987)

Tabor, S. & Richardson, C. C. DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Natl Acad. Sci. USA 84, 4767–4771 (1987)

Craxton, M. Linear amplification sequencing, a powerful method for sequencing DNA. Methods 3, 20–26 (1991)

DeAngelis, M. M., Wang, D. G. & Hawkins, T. L. Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Res. 23, 4742–4743 (1995)

Zhang, J. et al. Use of non-cross-linked polyacrylamide for four-color DNA sequencing by capillary electrophoresis separation of fragments up to 640 bases in length in two hours. Anal. Chem. 67, 4589–4593 (1995)

Green, P. phred, phrap, consed. (2017). phred introduced quantitative, reliable metrics for base quality, substituting human judgement with computers, a process that occurred repeatedly over the course of the HGP.

Edwards, A. et al. Automated DNA sequencing of the human HPRT locus. Genomics 6, 593–608 (1990)

Sutton, G. G., White, O., Adams, M. D. & Kerlavage, A. R. TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci. Technol. 1, 9–19 (1995)

Myers, E. W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000). The Celera assembler introduced an overlap–layout–consensus approach to deal with the problems posed by repeats and the millions of reads needed to produce a reliable assembly.

Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995)

Goffeau, A. et al. Life with 6000 genes. Science 274, 546–567 (1996)

The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998)

International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). Refs 29 : The HGP and Celera produced draft sequences of the human genome with the HGP later publishing a more complete, relatively error-free reference.

International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)

Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001)

Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000)

Balasubramanian, S., Klenerman, D. & Barnes, C. Arrayed polynucleotides and their use in genome analysis. Patent US20030022207 (2003)

Braslavsky, I., Hebert, B., Kartalov, E. & Quake, S. R. Sequence information can be obtained from single DNA molecules. Proc. Natl Acad. Sci. USA 100, 3960–3964 (2003)

Harris, T. D. et al. Single-molecule DNA sequencing of a viral genome. Science 320, 106–109 (2008)

Adams, C. P. & Kron, S. J. Method for performing amplification of nucleic acid with two primers bound to a single solid support. Patent US5641658 (1997)

Chetverina, H. V. & Chetverin, A. B. Cloning of RNA molecules in vitro. Nucleic Acids Res. 21, 2349–2353 (1993)

Mitra, R. D. & Church, G. M. In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res. 27, e34–e39 (1999)

Adessi, C. et al. Solid phase DNA amplification: characterisation of primer attachment and amplification mechanisms. Nucleic Acids Res. 28, e87 (2000)

Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005). Refs 40, 41 : These papers described the first integrated systems for next-generation DNA sequencing.

Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 1728–1732 (2005)

Dressman, D., Yan, H., Traverso, G., Kinzler, K. W. & Vogelstein, B. Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. Proc. Natl Acad. Sci. USA 100, 8817–8822 (2003)

Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010)

Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlén, M. & Nyrén, P. Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem. 242, 84–89 (1996)

Toumazou, C. & Purushothaman, S. Sensing apparatus and method. Patent US7686929 (2004)

Rothberg, J. M., Hinz, W., Johnson, K. L. & Bustillo, J. Apparatus for measuring analytes using large scale FET arrays. Patent EP2639579 (2016)

Brenner, S. et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18, 630–634 (2000)

McKernan, K. J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009)

Mitra, R. D., Shendure, J., Olejnik, J., Edyta-Krzymanska-Olejnik, & Church, G. M. Fluorescent in situ sequencing on polymerase colonies. Anal. Biochem. 320, 55–65 (2003)

Ost, T. B. et al. Improved polymerases. Patent WO2006120433 (2006)

Ruparel, H. et al. Design and synthesis of a 3′-O-allyl photocleavable fluorescent nucleotide as a reversible terminator for DNA sequencing by synthesis. Proc. Natl Acad. Sci. USA 102, 5932–5937 (2005)

Seo, T. S. et al. Four-color DNA sequencing by synthesis on a chip using photocleavable fluorescent nucleotides. Proc. Natl Acad. Sci. USA 102, 5926–5931 (2005)

Barnes, C., Balasubramanian, S., Liu, X., Swerdlow, H. & Milton, J. Labelled nucleotides. Patent US7057026 (2006)

Smith, T. J. Cloned single molecule sequencing with reversible terminator chemistry.Genome Sequencing and Analysis Conference (2015).

Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). Advances in sequencing-by-synthesis culminated in the Solexa, later Illumina, platform, which quickly became, and remains today, the most widely used sequencing instrument.

Wetterstrand, K. DNA sequencing costs: data from the NHGRI Genome Sequencing Program (GSP). (2017)

Levene, M. J . et al. Zero-mode waveguides for single-molecule analysis at high concentrations. Science 299, 682–686 (2003). One of earliest real time observations of DNA synthesis in single molecules, using fluorescently labelled nucleotides and a DNA polymerase anchored in zero-mode waveguides, which with further development led to the PacBio platform.

Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009)

Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016)

Bayley, H. Nanopore sequencing: from imagination to reality. Clin. Chem. 61, 25–31 (2015)

Church, G., Deamer, D. W., Branton, D., Baldarelli, R. & Kasianowicz, J. Characterization of individual polymer molecules based on monomer-interface interactions. Patent US5795782 (1998). The concept of ssDNA modulating an electronic signal while moving through a membrane pore led eventually to practical nanopore sequencing.

Branton, D. et al. The potential and challenges of nanopore sequencing. Nat. Biotechnol. 26, 1146–1153 (2008)

Laszlo, A. H. et al. Decoding long nanopore sequencing reads of natural DNA. Nat. Biotechnol. 32, 829–833 (2014)

Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Preprint at (2017)

Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010)

Smith, A. M., Jain, M., Mulroney, L., Garalde, D. R. & Akeson, M. Reading canonical and modified nucleotides in 16S ribosomal RNA using nanopore direct RNA sequencing. Preprint at (2017)

Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Preprint at (2016)

Nivala, J., Marks, D. B. & Akeson, M. Unfoldase-mediated protein translocation through an α-hemolysin nanopore. Nat. Biotechnol. 31, 247–250 (2013)

Zhao, Y. et al. Single-molecule spectroscopy of amino acids and peptides by recognition tunnelling. Nat. Nanotechnol. 9, 466–473 (2014)

Wilson, J., Sloman, L., He, Z. & Aksimentiev, A. Graphene nanopores for protein sequencing. Adv. Funct. Mater. 26, 4830–4838 (2016)

Di Ventra, M. & Taniguchi, M. Decoding DNA, RNA and peptides with quantum tunnelling. Nat. Nanotechnol. 11, 117–126 (2016)

Sanger, F. et al. Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265, 687–695 (1977)

Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017)

Pevzner, P. A., Tang, H. & Waterman, M. S. An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA 98, 9748–9753 (2001)

Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)

Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014)

Mostovoy, Y. et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods 13, 587–590 (2016)

Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013)

Kaplan, N. & Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat. Biotechnol. 31, 1143–1147 (2013)

Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017)

Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017)

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)

McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)

Chaisson, M. J. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015)

Snyder, M. W., Adey, A., Kitzman, J. O. & Shendure, J. Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet. 16, 344–358 (2015)

Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010)

Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)

Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)

Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008)

Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008)

Albert, T. J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007)

Okou, D. T. et al. Microarray-based genomic selection for high-throughput resequencing. Nat. Methods 4, 907–909 (2007)

Porreca, G. J. et al. Multiplex amplification of large sets of human exons. Nat. Methods 4, 931–936 (2007)

Hodges, E. et al. Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 39, 1522–1527 (2007)

Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009). Refs 97, 103, 106 : Targeting all coding sequences or the exome, by PCR and later by exome capture, facilitated the direct discovery of cancer driver genes and Mendelian disease genes.

The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)

The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015)

Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013)

Chiu, R. W. et al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc. Natl Acad. Sci. USA 105, 20458–20463 (2008)

Fan, H. C., Blumenfeld, Y. J., Chitkara, U., Hudgins, L. & Quake, S. R. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc. Natl Acad. Sci. USA 105, 16266–16271 (2008)

Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA 106, 19096–19101 (2009)

Vissers, L. E. L. M., Gilissen, C. & Veltman, J. A. Genetic studies in intellectual disability and related disorders. Nat. Rev. Genet. 17, 9–18 (2016)

Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. J. Am. Med. Assoc. 312, 1870–1879 (2014)

Wood, L. D. et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–1113 (2007)

Adams, M. D. et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991)

Putney, S. D., Herlihy, W. C. & Schimmel, P. A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing. Nature 302, 718–721 (1983)

Velculescu, V. E., Zhang, L., Vogelstein, B. & Kinzler, K. W. Serial analysis of gene expression. Science 270, 484–487 (1995). The SAGE method captures 3′ tags from mRNAs, therefore introducing the idea of using a DNA sequencer to count molecules, an idea that has exploded with the later introduction of RNA-seq, chromatin immunoprecipitation followed by sequencing (ChIP–seq) and so on.

Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008)

Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008)

Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008)

Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008)

Wilhelm, B. T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008)

Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009)

Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010)

Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007)

Boyle, A. P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008)

Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009)

Shendure, J. & Lieberman Aiden, E. The expanding scope of DNA sequencing. Nat. Biotechnol. 30, 1084–1094 (2012)

Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012)

Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004)

Venter, J. C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004)

Blaser, M., Bork, P., Fraser, C., Knight, R. & Wang, J. The microbiome explored: recent insights and future challenges. Nat. Rev. Microbiol. 11, 213–217 (2013)

Shokralla, S., Spall, J. L., Gibson, J. F. & Hajibabaei, M. Next-generation sequencing technologies for environmental DNA research. Mol. Ecol. 21, 1794–1805 (2012)

Nozaki, H. et al. A 100%-complete sequence reveals unusually simple genomic features in the hot-spring red alga Cyanidioschyzon merolae. BMC Biol. 5, 28 (2007)

Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017)

Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324 (2013)

Larsson, C. et al. In situ genotyping individual DNA molecules by target-primed rolling-circle amplification of padlock probes. Nat. Methods 1, 227–232 (2004)

Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363 (2014)

McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016)

Peikon, I. D. et al. Using high-throughput barcode sequencing to efficiently map connectomes. Nucleic Acids Res. 45, e115 (2017)

Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016)

Zamft, B. M. et al. Measuring cation dependent DNA polymerase fidelity landscapes by deep sequencing. PLoS ONE 7, e43876 (2012)

Organick, L. et al. Scaling up DNA data storage and random access retrieval. Preprint at (2017)

Harrington, L., Alexander, L. T., Knapp, S. & Bayley, H. Pim kinase inhibitors evaluated with a single-molecule engineered nanopore sensor. Angew. Chem. Int. Edn Engl. 54, 8154–8159 (2015)

Pulcu, G. S., Mikhailova, E., Choi, L. S. & Bayley, H. Continuous observation of the stochastic motion of an individual small-molecule walker. Nat. Nanotechnol. 10, 76–83 (2015)

Rodriguez-Larrea, D. & Bayley, H. Protein co-translocational unfolding depends on the direction of pulling. Nat. Commun. 5, 4841 (2014)

Hyman, E. D. A new method of sequencing DNA. Anal. Biochem. 174, 423–436 (1988)

Lee, L. G. et al. DNA sequencing with dye-labeled terminators and T7 DNA polymerase: effect of dyes and dNTPs on incorporation of dye-terminators and probability analysis of termination fragments. Nucleic Acids Res. 20, 2471–2483 (1992)

Huang, S. et al. Identifying single bases in a DNA oligomer with electron tunnelling. Nat. Nanotechnol. 5, 868–873 (2010)

Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011)

Manrao, E. A. et al. Reading DNA at single-nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase. Nat. Biotechnol. 30, 349–353 (2012)

Cherf, G. M. et al. Automated forward and reverse ratcheting of DNA in a nanopore at 5-Å precision. Nat. Biotechnol. 30, 344–348 (2012)

Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012)

Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000)

Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)

Gibbs, R. A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004)

The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005)

International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005)

Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009)

Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013)

Landry, J. J. et al. The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213–1224 (2013)

Howe, K. et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498–503 (2013)

Session, A. M. et al. Genome evolution in the allotetraploid frog Xenopus laevis. Nature 538, 336–343 (2016)

Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)

Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002)

Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002)

Giardine, B. et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005)

Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008)

Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009)

Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009)

Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010)

Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011)

Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013)

Wang, D. G. et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998)

Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010)

Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011)

Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011)

Seo, J. S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016). BLAST and GenBank (GenBank and WGS Statistics were essential tools for sharing and searching sequencing data, vastly amplifying the value of each sequence to the field.

Part 4: Checking Nucleic Acids with an Agilent BioAnalyzer

00:00:07.05 Hello.
00:00:07.23 My name is Eric Chow,
00:00:09.14 and I direct the UCSF Center for Advanced Technology.
00:00:11.26 Today, I'm gonna give you a training video
00:00:13.24 on how to operate the Agilent Bioanalyzer.
00:00:16.22 This is the system used to QC nucleic acids
00:00:19.11 to go into next-generation sequencing library preps,
00:00:22.05 as well as check the quality of your final libraries before sequencing.
00:00:26.25 This is the Bioanalyzer.
00:00:28.26 And essentially, the system works by capillary electrophoresis.
00:00:32.10 But instead of running samples through a long, thin capillary,
00:00:35.03 all of the separation occurs on these microfluidic chips, over here.
00:00:41.12 So, I'm going to open up this chip
00:00:43.25 and show you some more details of this chip.
00:00:46.03 The chip is actually made out of glass,
00:00:47.27 and there are several wells on the chip.
00:00:50.26 And this glass chip is embedded into a plastic frame.
00:00:53.26 And on the back of this chip,
00:00:55.19 you'll be able to see a whole bunch of different channels,
00:00:58.00 and this is where all the different samples will run
00:01:00.10 and get separated and then get detected.
00:01:03.11 This chip, once it's prepared,
00:01:05.29 will go into this Bioanalyzer instrument.
00:01:10.08 And this instrument has a location to hold the chip,
00:01:13.02 as well as an optical detector at the bottom to detect the passing nucleic acids,
00:01:18.11 and an electrode array on top.
00:01:20.09 And this electrode array on top
00:01:22.29 is what causes the different samples to pass through the chip
00:01:25.11 and migrate towards that de. detector.
00:01:29.02 So, we haven't done anything prepare the chip yet,
00:01:32.12 and we'll go through that next.
00:01:34.04 So, with all the different kits for the Bioanalyzer,
00:01:36.20 you can analyze both RNA and DNA.
00:01:38.24 They come with a handy user guide
00:01:41.12 that we'll essentially be following today.
00:01:43.19 And in this user guide,
00:01:45.15 there are sections where it tells you to prepare a gel-dye mixture.
00:01:49.22 This is something you'll have to do ahead of time,
00:01:52.06 and it takes about 30 minutes,
00:01:54.01 so this is something we've done before.
00:01:56.14 before we've shot this video.
00:01:58.05 But all the reagents for the kits come in a small box like this
00:02:01.18 that's kept in the cold room,
00:02:03.12 and they all come with a pack of chips
00:02:05.24 that are at room temperature.
00:02:07.11 And so, once you open up this box
00:02:10.01 -- and today we're gonna be going over the DNA high sensitivity kits --
00:02:13.13 you'll see a set of reagents in here.
00:02:16.12 And the reagents that you'll need to prepare the gel-dye mixture ahead of time
00:02:20.17 are a dye, the gel, and a spin filter.
00:02:28.07 And what you'll do ahead of time is take a certain amount of the dye,
00:02:31.21 add it to the gel and vortex it,
00:02:34.01 and then you'll place this gel-dye mixture
00:02:37.19 into this Eppendorf tube that has a spin filter in there,
00:02:39.16 and you'll centrifuge this for about 30 minutes.
00:02:41.22 This will cause the gel-dye to form a nice even suspension.
00:02:45.16 And when you're done, you'll throw away the filter,
00:02:48.07 and keep the gel-dye in this Eppendorf tube at 4 degrees.
00:02:52.06 And you'll be able to use this gel-dye mixture several times,
00:02:55.07 for several runs across many chips.
00:03:03.19 So, what I have here is a tube of gel-dye that we've already prepared.
00:03:07.09 And this is what we'll be using today.
00:03:09.06 But before we start preparing the chip,
00:03:11.13 I'm gonna demonstrate to you
00:03:13.16 how to run the software.
00:03:15.09 The software that runs the Bioanalyzer
00:03:18.00 is the 2100 Expert software.
00:03:19.24 When you open up the software,
00:03:21.25 it will automatically connect to the Bioanalyzer instrument.
00:03:24.12 And on the left are a set of tabs.
00:03:27.27 icons that you can use to navigate to different areas of the software.
00:03:30.14 To set up the run,
00:03:32.10 you want to make sure you click on the instrument icon.
00:03:34.26 And on the instrument icon,
00:03:37.13 again you'll see a picture of the Bioanalyzer.
00:03:39.21 And the first thing you want to do is select the assay that you're running.
00:03:42.06 Again, the Bioanalyzer can analyze both DNA and RNA.
00:03:45.18 And if you're selecting DNA, you want to select the proper chip.
00:03:48.25 And today we're running the high sensitivity chip.
00:03:51.26 For RNA, there are two different types of kits.
00:03:54.24 There are these Nano and Pico kits,
00:03:58.01 as well as a Small RNA kit if you want to run small RNA assays.
00:04:02.21 But what's important about the Nano and Pico kits
00:04:04.29 if you're analyzing RNA and want to get a RIN number
00:04:07.24 -- this is an RNA integrity number --
00:04:10.00 you have to make sure that you select the total RNA options.
00:04:12.25 And this will cause the software to calculate a RIN number.
00:04:16.01 If you select the mRNA options,
00:04:18.13 you won't get a RIN number for your RNA.
00:04:20.27 So, again, since we're running DNA high sensitivity,
00:04:23.02 we'll go to double-stranded DNA and select high sensitivity.
00:04:27.08 Next, you have an option to select where your data gets saved to.
00:04:30.26 It can save into a default folder or a custom folder that you select.
00:04:35.25 After that, let the system know how many samples you'll be running.
00:04:38.06 On the high sensitivity DNA chip,
00:04:40.12 you can run up to 11 samples.
00:04:42.03 Today, we'll only be running 2 samples.
00:04:44.07 And at the bottom, you can give names to your different samples,
00:04:47.25 and you can label them down here.
00:04:50.25 And so, we'll just use sample A and sample B.
00:04:56.13 So, now we're gonna set up the instrument and the chips to run your samples.
00:05:00.13 Again, this is the Bioanalyzer,
00:05:02.23 and this head opens up and down to get the electrodes into a chip.
00:05:06.24 If you need to remove your electrode, simply flip down a tab
00:05:10.26 -- it's over here all the way to the bottom.
00:05:13.03 This will push out the electrode array by a little bit,
00:05:15.15 and once it's out you can slide it back.
00:05:17.16 And if you need to store this electrode array,
00:05:19.11 store it upside down so that these electrodes
00:05:21.26 don't get damaged by bumping up against surfaces.
00:05:25.27 To put the electrode back into the system,
00:05:28.00 simply slide it back down
00:05:30.15 and allow it to go down on its own.
00:05:32.10 Don't force it in.
00:05:33.15 Once it has dropped to the lowest position,
00:05:35.19 lift the lever up, and this will pull the electrode array in.
00:05:38.14 The next thing we're gonna do is wash the system.
00:05:40.26 To wash the system,
00:05:42.21 you want to use 350 microliters of nuclease-free water
00:05:45.18 and place it into a wash chip.
00:05:54.05 You can use any well of the wash chip,
00:05:56.06 because all the wells are connected by a reservoir.
00:06:03.11 To wash, simply place the wash chip on the chip holder
00:06:06.14 and close the system,
00:06:08.28 and let it sit for about 30 seconds to 1 minute.
00:06:11.16 All this does is wash off any salts and debris
00:06:14.11 that might be on the surface of those electrodes.
00:06:16.12 Once the wash is done,
00:06:18.09 go ahead and remove the chip,
00:06:20.03 and you can just dump out the water into the trash,
00:06:22.07 and save the chip for future runs.
00:06:29.09 Now, we'll prepare our chip.
00:06:31.03 I've already opened this up from the plastic packaging,
00:06:34.05 and we'll place this into the priming station.
00:06:38.17 The chip sits down in here and doesn't really move.
00:06:41.24 And on the priming station,
00:06:43.15 it's important to note that this is made up.
00:06:45.24 a plunger, and there is a holder to hold the plunger at different positions.
00:06:48.28 And depending on which chip and assay you use,
00:06:51.22 you'll need to make sure that you set the plunger to the proper setting.
00:06:55.12 So, there's a top, middle, and bottom setting.
00:06:58.14 For the high sensitivity DNA chips,
00:07:00.02 we'll have to use this lower setting.
00:07:02.00 And so, you just move the silver handle down
00:07:05.02 until you get it to this lower location.
00:07:07.18 And make sure it catches into the hold.
00:07:11.12 So, now this is set at the low position,
00:07:13.19 and we're ready to start adding our reagents.
00:07:15.24 The first reagent we'll be using is the gel-dye mix
00:07:18.24 that we prepared ahead of time.
00:07:20.10 This has had the dye added to the gel
00:07:23.00 and spun through a filter already.
00:07:25.07 What we'll do first is add 9 microliters of the gel-dye mixture
00:07:30.18 to the gel position that's marked with a dark circle on your chip.
00:07:34.15 And this is gonna be the second from the bottom.
00:07:39.05 When you add reagents to these chips,
00:07:40.23 you want to make sure that you touch the bottom of the chip.
00:07:43.12 You don't have to worry about marring them,
00:07:45.08 because they're made out of glass,
00:07:46.26 so they're much harder than your pipette tips.
00:07:53.11 The other thing you want to make sure that you don't do
00:07:56.01 is expel a bubble at the bottom of the chip,
00:07:58.08 because that's where those capillaries and channels are.
00:08:01.01 If you have an air bubble down there,
00:08:03.13 it's gonna disrupt the flow of current.
00:08:07.13 The next step is to bring the syringe up to the 1 mil mark
00:08:13.03 and then snap the lid shut until you hear a click.
00:08:17.29 Next, gently and steadily push the plunger all the way down
00:08:21.27 until it catches the hold.
00:08:25.28 And you want to set a timer for one minute.
00:08:28.21 What's happening right now is the pressure in the syringe
00:08:31.24 is forcing that gel-dye mixture through a capillary inside of the chip,
00:08:35.16 and this is forming that separation channel
00:08:38.23 that all the samples will go through.
00:08:40.13 So, both the pressure and timing are important for the different chips,
00:08:43.11 so make sure you consult your user guide
00:08:45.15 if you're running something else besides the high sensitivity DNA chip.
00:08:51.25 So, while this is running
00:08:53.20 -- we have about another 30 seconds --
00:08:55.13 one thing you can use to check your chips afterwards is
00:08:59.12 you want to make sure that you don't see any of the channels after pressurization,
00:09:02.12 because now they've all been filled with liquids.
00:09:04.23 Before we do this step, if you take a look at your chip,
00:09:07.18 on the back,
00:09:09.14 you can actually see all the channels if you hold it up to the light
00:09:11.14 because of the glass-air interfaces that are present.
00:09:14.03 But after priming, you shouldn't see those anymore,
00:09:17.15 because they should all be fluid-filled.
00:09:22.01 A minute is up, and we'll just release the plunger.
00:09:26.04 What you want to see is the plunger
00:09:29.07 going up past the half mil mark within a couple of seconds,
00:09:32.16 and this indicates that no pressure has leaked out of the system.
00:09:35.21 And so, now you just slowly bring the plunger back up to the 1 ml mark,
00:09:44.07 and then release the plunger from the chip.
00:09:48.21 Next, we'll add another 9 microliters of the gel-dye mixture
00:09:51.05 to the other three gel positions on the chip.
00:09:54.03 These are the other three wells on the right side of the chip.
00:09:59.22 You can use the same pipette tip for all of these,
00:10:01.20 and again, just remember not to expel any air bubbles
00:10:04.26 towards the bottom of the. the bottom of the wells.
00:10:21.22 Next, we're gonna add 5 microliters of a marker solution
00:10:25.25 to each of the sample wells and the ladder wells.
00:10:30.07 These are the 12 wells to the left of the chip.
00:10:33.26 So, we don't add these to the gel wells that we've added material to already.
00:10:38.17 And you can use the same pipette for all of these as well.
00:10:42.26 And this just contains an upper and lower marker
00:10:45.09 that can be used as reference standards
00:10:47.10 to compare your sample to the ladder.
00:10:56.11 So, even if you're only running one or two samples,
00:10:59.15 or not a full 11 samples,
00:11:01.07 you still have to add marker to all of the lanes,
00:11:03.17 or to all of the wells.
00:11:18.29 In the next step, we'll add one microliter of the ladder
00:11:22.08 to the ladder well, and this is marked on the chip.
00:11:41.11 Now we can add your samples.
00:11:43.00 And again, there are references to the well number
00:11:46.18 that correspond to the sample numbers on the chip.
00:12:12.03 I only have two samples, so I'm now done,
00:12:14.28 but you can proceed with up to 11 samples on this chip.
00:12:18.14 The next step is to mix the contents of the.
00:12:21.10 of the marker and the samples with the ladder in the chip,
00:12:24.21 and to do this we use a special vortexer that comes with every Bioanalyzer.
00:12:29.01 So, just make sure it's set to the proper set point,
00:12:31.23 and then allow it to mix.
00:12:34.01 And make sure you set a timer so that you have.
00:12:38.16 so you know when to stop.
00:12:39.29 And again, to get the time for the different assays,
00:12:41.26 just consult your user guide.
00:12:46.01 Alright, times up.
00:12:48.26 Now the chip is prepared and ready to run.
00:12:50.29 What we want to do is go ahead and place this into the Bioanalyzer system,
00:12:54.28 close the lid,
00:12:57.28 and then once the system detects the chip
00:13:01.10 you'll be able to click on the start button to begin the run.
00:13:04.04 A run of a full chip will generally take about 45 minutes.
00:13:07.21 And it's a good idea to stick around for a few minutes,
00:13:10.06 'til you see the ladder run,
00:13:12.13 just to make sure the run has started okay before you take off.
00:13:16.16 Now, our second and last sample is running.
00:13:18.20 What you see here is a raw trace for the Bioanalyzer,
00:13:21.06 and this is analyzing fluorescence that's being detected
00:13:24.03 as the sample passes through the capillary.
00:13:26.29 On the low end we have the lower marker,
00:13:29.29 and over here we have the upper marker.
00:13:32.13 These are used as internal references to compare the sample to the ladder that was run first.
00:13:37.29 And in the middle here is our library.
00:13:39.22 This is a high concentration sample
00:13:41.26 that is roughly at around 25 nanomolar.
00:13:44.12 And so, this is why this smear is so large.
00:13:47.28 You'll notice that the marker peaks are pretty sharp,
00:13:50.14 because they're a defined size,
00:13:52.26 and libraries tend to have a spread to them,
00:13:54.16 because they come in a variety of sizes.
00:13:57.18 They're not one specific base pair length.
00:14:00.23 And so, now the run is finishing,
00:14:03.24 and I'll show you how to analyze and export the data.
00:14:06.29 So, now that that screen has disappeared,
00:14:09.04 the run is done.
00:14:10.22 To access our data, we come back over to the left-hand side
00:14:13.15 and click on the data icon.
00:14:16.26 If you have several runs from beforehand,
00:14:18.25 you'll see the other chips here.
00:14:20.24 The newest one will be at the bottom,
00:14:22.15 which is our sample.
00:14:24.07 And if we click on that chip,
00:14:26.19 we'll get some details about the chip,
00:14:29.28 the assay properties, and then,
00:14:31.22 if you want to take a look at the raw traces,
00:14:33.09 you click on the electropherogram.
00:14:35.19 And this will show you the trace for. either at one sample at a time
00:14:39.16 -- so, this is our first sample, and this was our second sample --
00:14:44.26 or you can look at all samples at once.
00:14:47.13 And here's sample A and sample B, and down here is the ladder.
00:14:51.22 And so, if we click on the ladder, it has a series of peaks
00:14:57.12 that are present in that ladder sample,
00:14:59.09 and the ladder also contained the marker that we added,
00:15:01.27 the 5 microliters of reagents,
00:15:04.02 and those are this 35 base pair and roughly 10 kb base pair fragments
00:15:08.07 that are present in all samples.
00:15:10.28 And so, the ladder is used as a reference, because we know the sizes,
00:15:15.11 and we correlate it with the time that it takes for those samples
00:15:17.18 to come through the chip,
00:15:19.17 and we have these internal references
00:15:21.19 that we'll use to compare each of the samples.
00:15:24.06 So, now if you click on sample A, it'll also have those 35 and 10 kb markers
00:15:27.22 that are used to generate the sizing of these samples.
00:15:33.25 So, down here on the electropherogram
00:15:36.19 there are a couple of things you can do.
00:15:38.00 You can set something called a region table,
00:15:39.21 and for next-gen sequencing libraries
00:15:41.27 you typically set these around 200 base pairs
00:15:44.00 up to 1 kb.
00:15:45.21 And that's displayed here.
00:15:47.10 And what this will do is it'll analyze only the material that shows up within this range.
00:15:50.27 And this will give you a rough concentration, in picomoles.
00:15:53.24 So, this is a little over 4000 picomolar concentration,
00:15:57.22 or roughly 4 nanomolar.
00:16:01.02 It'll also give you an average size,
00:16:03.06 which is useful for some downstream QC'ing steps.
00:16:05.06 And so, this library has an average size of 320 base pairs.
00:16:08.29 Our other high concentration sample, again,
00:16:13.02 is also being analyzed in this smear region.
00:16:15.22 And what you can see is that this is a much higher concentration library.
00:16:19.17 This is measuring it at roughly 17 nanomolar,
00:16:23.17 and the average size is 491 base pairs.
00:16:29.05 Sometimes the Bioanalyzer won't detect your marker peaks correctly,
00:16:32.18 and you'll see a red flag or a red mark down here
00:16:35.18 in the simulated gel image.
00:16:37.10 This is something that's easy to correct
00:16:39.19 by going to the peak table tab, over here.
00:16:42.26 So, for instance, on this sample B,
00:16:45.24 we have this minor peak that showed up.
00:16:48.03 Sometimes this minor peak might get misidentified as a marker.
00:16:52.00 And on this peak table,
00:16:54.10 what you can do is you can highlight over these peaks,
00:16:57.11 and once you see a bullseye appear,
00:16:59.12 you can right click and manually set them as the lower or upper marker.
00:17:03.20 And so, for instance,
00:17:05.28 if I wanted to call this 29 base pair peak
00:17:08.08 my 35 base pair standard, or internal reference,
00:17:12.11 I can right click on it and manually set the lower marker.
00:17:15.02 And this will recalculate the size of everything.
00:17:17.27 So, now you can see that the software has now called this my 35 base pair peak,
00:17:22.03 and it's calling this marker a 46 base pair peak.
00:17:25.13 So, this isn't what I actually want, so I'll go ahead and change it back.
00:17:29.25 And hover over my actual lower marker,
00:17:32.02 see the bullseye, right click, and set lower marker.
00:17:38.05 So, once your data looks like it's all been set and is.
00:17:42.20 all the peaks have been properly called,
00:17:44.22 you can export this as a PDF
00:17:47.13 by clicking the printer button on the toolbar.
00:17:52.04 And so, sometimes it'll ask you if you want to save any changes.
00:17:54.14 And so, I said, yes, I want to save those peak changes that I made.
00:17:58.19 A window will show up,
00:18:01.09 and it'll give you options of what type of items to select.
00:18:03.15 We just leave this as the default.
00:18:05.20 You can tell it to print all wells
00:18:07.28 or just the wells that you have samples.
00:18:09.20 And for us, you only had two samples.
00:18:12.06 And then you have options to export this as a PDF or an HTML. HTML file,
00:18:17.10 and to select the directory that you want to export this to.
00:18:19.23 So, I'm just gonna save it to the desktop
00:18:21.29 to make it easier to find.
00:18:23.13 And you have options to put four sheets per page or one sheet per page.
00:18:27.05 So, I'll go ahead and save this.
00:18:34.00 And if we go to the desktop,
00:18:36.09 we should be able to see our PDF file,
00:18:40.27 and that should be here.
00:18:43.27 So, if you double-click on this,
00:18:45.19 you'll see a summary report on the first page.
00:18:47.18 And as you scroll down,
00:18:49.17 you'll see the ladder, over here,
00:18:53.05 and you want to see nice, tight, sharp bands, or peaks, on the ladder.
00:18:57.01 And then you'll see the samples.
00:19:00.04 And these will include some extra information down at the bottom.
00:19:05.05 And so, most people will want to save the PDF,
00:19:07.10 because this is the file they can open on their own computers.
00:19:10.00 You can't open the actual Bioanalyzer raw files,
00:19:12.19 these XAD files, because it requires the proprietary software,
00:19:15.27 so you definitely want to export and save that PDF.
00:19:18.17 And generally this PDF is what you would send off to your sequencing facility,
00:19:22.11 if you're getting a sample sequenced.
00:19:26.19 So, now that the run is completed,
00:19:28.27 the last thing we have to do is wash the system.
00:19:31.10 So, the first thing you want to do is pull out the old chip,
00:19:33.11 set it aside -- you can throw it away, it's unusable right now --
00:19:40.06 and then take another wash chip and fill it again with 350 microliters of nuclease-free water.
00:19:48.28 And then place a chip in the system
00:19:50.21 and close the lid, and let it sit for about 30 seconds to one minute.
00:19:53.06 And it's really important that you do this afterwards,
00:19:55.11 because we want to clean the electrodes of any gel and salts
00:19:58.06 that are gonna be on the electrodes from running the chip.
00:20:01.03 It's not good to leave a chip in there long-term,
00:20:03.05 because those things can dry on those electrodes
00:20:05.06 and cause problems for the next user.
00:20:08.11 The other point is that it's really important that you remove the wash chip.
00:20:11.08 If you leave it in there,
00:20:13.04 the water will start to evaporate
00:20:16.03 and cause condensation inside of the electrode array,
00:20:17.24 so there are a lot of electronics up above
00:20:20.15 that we don't want to get moisture on.
00:20:23.12 So, now that 30 seconds is up,
00:20:25.11 we remove the wash chip,
00:20:26.25 we can get rid of the water,
00:20:29.14 set it aside, and then just close the system, and you're done.
00:20:32.14 Thanks for watching.

  • Part 1: Next Generation Sequencing

Frederick Sanger, OM

Frederick Sanger, OM, the biochemist, who has died aged 95, was the only Briton — and one of only four people in history — to win the Nobel Prize twice.

His work unlocked the chemical secrets that underlie genes — the basic building blocks of life — and laid the foundation for genetic engineering and the Human Genome Project, a unique effort to spell out the chemical structure of every gene in the human body.

Sanger was awarded his first Nobel Prize in 1958 for work carried out with colleagues in the early 1950s. Toiling away in a small hutlike laboratory buried in Cambridge University’s department of biotechnology , Sanger deduced the sequence of amino acids (chemical building blocks) in the hormone insulin, the first complete protein sequence ever to be determined.

An introduction to protein sequencing

His second Nobel, in 1980, was awarded for related work carried out at the Laboratory of Molecular Biology in Cambridge, where he developed an ingenious method of working out the basic chemical “grammar” of DNA that has enabled scientists to “read” the chemical sequencing — the long chains of DNA molecules — that form our genes. The technique he developed, known as “Sanger” sequencing, was still used decades later.

The DNA sequencing method Sanger pioneered, with Alan Coulson, involves manufacturing a replica of the gene under study. The next step is to add a specially coloured “killer chemical” that terminates the replication once it hits a particular chemical link, or nucleotide, in the gene (DNA chains have four types of chemical links, and the order of these determines what the genes do).

The process is repeated with different killer chemicals which stop the replication at different sets of links. This gives a mixture of DNA fragments of varying lengths, each finishing with one of four different fluorescent dye molecules corresponding to the four nucleotides of DNA. The fragments are then driven by an electric field through a slab of gel or hair-thin capillary tubes filled with polymer. This sorts them by length. The order of colours that emerges — corresponding to the sequence of nucleotides in the original piece of DNA — is scanned by laser and displayed on a computer screen.

An excellent and simple video explaining DNA mapping by "Sanger sequencing"

This method made it possible to sequence several hundred DNA bases in one day, a process that previously took many years. It enabled Sanger and his colleagues to map the sequence of links of simple structures such as proteins and viruses, leading to far greater scientific understanding of the chemical basis of genetic defects and the processes that lead to disease — work that has led to an explosion in drug and vaccine development.

Frederick Sanger was born on August 13 1918 into a Quaker family at the village of Rendcomb in Gloucestershire, where his father was the local doctor. Under his father’s influence and that of his elder brother Theodore, Fred became interested in biology and set his heart on following his father into medicine.

From Bryanston he won a place at St John’s College, Cambridge, but even before going to university he decided he would be best suited to a scientific career — albeit one which he hoped would have clinical applications. At Cambridge he became interested in the emerging field of biochemistry, convinced that it offered a way to develop a more scientific basis to understand many medical problems. But he did not appear to be a particularly promising student, and took three years to complete the first part of his degree when normally it took only two.

Sanger was a conscientious objector, and after taking his degree in 1939 remained at the university for a further year after the outbreak of war to take an advanced course in Biochemistry, surprising everyone by obtaining a First. From 1940 to 1943 he worked with Albert Neuberger on the metabolism of the amino acid lysine, and at the same time became involved in a government-sponsored research project looking at the protein content of the potato.

When AC Chibnall was appointed Professor of Biochemistry in 1943, Sanger joined his research group working on proteins. This was an especially exciting time in protein chemistry: new chromatography techniques had been developed by Archer Martin and Richard Synge and Chibnall and Sanger believed that there might be a real possibility of determining the exact chemical structure of proteins.

This idea was controversial at the time as, although the 20 or so amino acids that can go to make up proteins were known, most scientists believed the arrangement of different amino acids in a protein to be random. One professor had even produced a complex mathematical formula that would express this random function. Thus, when Chibnall tried to get Sanger a grant from the Medical Research Council to work on protein structure, the grant was refused because “everyone knew” that the pattern of amino acids in a protein was random.

Nevertheless, Sanger scraped together enough money from various sources to start work. From 1944 to 1951 he held a Beit Memorial Fellowship for Medical Research and in 1951, by which time the Medical Research Council had come to recognise the importance of his work, he became a member of the MRC’s external staff.

The protein which Sanger chose for his research was insulin which, as well as being relatively small in size and available in large quantities, had strong clinical implications in the understanding of diseases such as diabetes. He developed a method of marking the end amino acid and splitting it off from the insulin. The end amino acid was then identified and the process repeated. By this painstaking method, Sanger showed that a molecule of insulin contains two peptide chains made of two or more amino acids that are linked together by two disulphide bonds. It took eight more years finally to identify the 51 amino acids that make up insulin.

The award of the Nobel Prize in 1958 had an important and stimulating effect on Sanger’s subsequent career, enabling him to obtain better research facilities and to attract the brightest young scientists to work alongside him. In 1962 — with Max Perutz’s unit from the Cavendish Laboratory, which included Francis Crick, John Kendrew and Aaron Klug — Sanger moved to the MRC’s newly-built Laboratory of Molecular Biology in Cambridge.

Surrounded by researchers interested in DNA and genes, Sanger was struck by the challenge of determining the order of bases in DNA — known as DNA sequencing. It was by this time clear that DNA was a linear code, and although the code was being unravelled, no methods existed to read the code in even the simplest genome. To Sanger, though, the problem was simply a natural extension of his work on protein sequencing.

Over the next 15 years he and his team developed several methods to sequence nucleic acids (DNA and RNA), eventually developing the method for which he won his second Nobel. The Sanger method is capable of “reading” genomes as much as 3,000,000,000 base-pairs long — 500 bases at a time.

Sanger shared his second Nobel Prize with Walter Gilbert, who had carried out independent research into the determination of base sequences in nucleic acids, and Paul Berg, for his work on recombinant DNA.

A courteous, serious-minded man of strong socialist opinions, Sanger’s thin, bespectacled figure, habitually dressed in academic-casual v-necked sweater, open-necked shirt and rubber-soled shoes, was a familiar sight in Cambridge for many years.

Though he was one of only four people ever to have won two Nobel Prizes (the others being Marie Curie, John Bardeen and Linus Pauling), he remained modest about his achievements, putting them down to hard work and team spirit rather than genius.

The walls of his simply-furnished house at Swaffham Bulbeck, a fen-side village outside Cambridge, were bare of plaques, certificates or citations: “You get a nice gold medal, which is in the bank,” he explained. “And you get a certificate, which is in the loft. I could put it on the wall, I suppose. I was lucky and happy to get it, but I’m more proud of the research I did. There are some people, you know, who are in science just to get prizes. But that’s not what motivates me.”

After retiring in 1985 Sanger devoted most of his time to working in his garden. In 1992 the Wellcome Trust and the Medical Research Council established the Sanger Centre, for furthering the knowledge of genomes. Located 10 miles outside Cambridge, it became one of the main sequencing centres of the Human Genome Sequencing Project.

A guide the Human Genome Project

Among many honours and awards, Sanger received the Corday-Morgan Medal and Prize of the Chemical Society in 1951 the Royal Medal of the Royal Society in 1969 the Royal Society’s Copley Medal in 1977 and the Albert Lasker Basic Medical Research Award in 1979. He won the Gold Medal of the Royal Society of Medicine in 1983.

In 1954 he became a Fellow of the Royal Society and a Fellow of King’s College, Cambridge. He was an honorary member of many foreign scientific academies, including the American Academy of Arts and Sciences.

Sanger was appointed CBE in 1963 and made a Companion of Honour in 1981 — but he turned down a knighthood, not wanting to be called “Sir”: “A knighthood makes you different, doesn’t it, and I don’t want to be different.” He did, however, accept the considerably more distinguished Order of Merit in 1986.

Frederick Sanger married, in 1940, Margaret Joan Howe. She was not a scientist, but he described her as having contributed more to his work than anyone else by providing a peaceful and happy home. They had two sons and a daughter.


Nanopore sequencing took 25 years to fully materialise. It involved close collaboration between academia and industry. One of the first people to put forward the idea for nanopore sequencing was Professor David Deamer. In 1989 he sketched out a plan to drive a single-strand of DNA through a protein nanopore embedded into a thin membrane as part his work to synthesise RNA from scratch. Realising that the same approach might hold potential to improve DNA sequencing, Deamer and his team spent the next decade testing it out. In 1999 Deamer and his colleagues published the first paper using the term 'nanopore sequencing' and two years later produced an image capturing a hairpin of DNA passing through a nanopore in real time. Another foundation for nanopore sequencing was laid by the work of a team led by Professor Hagan Bayley who from the 1990s began to independently develop stochastic sensing, a technique that measures the change in an ionic current passing through a nanopore to determine the concentration and identity of a substance. By 2005 Bayley had made substantial progress with the method to sequence DNA and co-founded Oxford Nanopore to help push the technology further. From the start the company believed that nanopore sequencing provided a means to make DNA sequencing much cheaper and faster and no longer reliant on expensive reagents and equipment which made the process the preserve of highly centralised laboratories based in high income countries. In 2014 the company released its first portable nanopore sequencing device. This marked a major breakthrough as it made it possible for DNA sequencing to be carried out almost anywhere, even in remote areas with limited resources. This has opened up a new chapter for detecting, tracking and mapping out the evolution of newly emerging pathogens behind the outbreak of epidemics in real time on the ground. It has played a pivotal role for example in the COVID-19 pandemic. A quarter of all the worlds SARS-Cov2 virus genomes have now been sequenced with nanopore devices. The technology also offers an important tool for combating antimicrobial resistance, an increasingly public health threat. This is only just the start of the many applications nanopore sequencing now offers. [9]

The biological or solid-state membrane, where the nanopore is found, is surrounded by electrolyte solution. [10] The membrane splits the solution into two chambers. [11] A bias voltage is applied across the membrane inducing an electric field that drives charged particles, in this case the ions, into motion. This effect is known as electrophoresis. For high enough concentrations, the electrolyte solution is well distributed and all the voltage drop concentrates near and inside the nanopore. This means charged particles in the solution only feel a force from the electric field when they are near the pore region. [12] This region is often referred as the capture region. Inside the capture region, ions have a directed motion that can be recorded as a steady ionic current by placing electrodes near the membrane. Imagine now a nano-sized polymer such as DNA or protein placed in one of the chambers. This molecule also has a net charge that feels a force from the electric field when it is found in the capture region. [12] The molecule approaches this capture region aided by brownian motion and any attraction it might have to the surface of the membrane. [12] Once inside the nanopore, the molecule translocates through via a combination of electro-phoretic, electro-osmotic and sometimes thermo-phoretic forces. [10] Inside the pore the molecule occupies a volume that partially restricts the flow of ions, observed as an ionic current drop. Based on various factors such as geometry, size and chemical composition, the change in magnitude of the ionic current and the duration of the translocation will vary. Different molecules can then be sensed and potentially identified based on this modulation in ionic current. [13]

Base identification Edit

The magnitude of the electric current density across a nanopore surface depends on the nanopore's dimensions and the composition of DNA or RNA that is occupying the nanopore. Sequencing was made possible because, passing through the channel of the nanopore, the samples cause characteristic changes in the density of the electric current flowing through the nanopore. The total charge flowing through a nanopore channel is equal to the surface integral of electric current density flux across the nanopore unit normal surfaces between times t1 and t2.

Biological Edit

Biological nanopore sequencing relies on the use of transmembrane proteins, called porins, that are embedded in lipid membranes so as to create size dependent porous surfaces- with nanometer scale "holes" distributed across the membranes. Sufficiently low translocation velocity can be attained through the incorporation of various proteins that facilitate the movement of DNA or RNA through the pores of the lipid membranes. [14]

Alpha hemolysin Edit

Alpha hemolysin (αHL), a nanopore from bacteria that causes lysis of red blood cells, has been studied for over 15 years. [15] To this point, studies have shown that all four bases can be identified using ionic current measured across the αHL pore. [16] [17] The structure of αHL is advantageous to identify specific bases moving through the pore. The αHL pore is

10 nm long, with two distinct 5 nm sections. The upper section consists of a larger, vestibule-like structure and the lower section consists of three possible recognition sites (R1, R2, R3), and is able to discriminate between each base. [16] [17]

Sequencing using αHL has been developed through basic study and structural mutations, moving towards the sequencing of very long reads. Protein mutation of αHL has improved the detection abilities of the pore. [18] The next proposed step is to bind an exonuclease onto the αHL pore. The enzyme would periodically cleave single bases, enabling the pore to identify successive bases. Coupling an exonuclease to the biological pore would slow the translocation of the DNA through the pore, and increase the accuracy of data acquisition.

Notably, theorists have shown that sequencing via exonuclease enzymes as described here is not feasible. [19] This is mainly due to diffusion related effects imposing a limit on the capture probability of each nucleotide as it is cleaved. This results in a significant probability that a nucleotide is either not captured before it diffuses into the bulk or captured out of order, and therefore is not properly sequenced by the nanopore, leading to insertion and deletion errors. Therefore, major changes are needed to this method before it can be considered a viable strategy.

A recent study has pointed to the ability of αHL to detect nucleotides at two separate sites in the lower half of the pore. [20] The R1 and R2 sites enable each base to be monitored twice as it moves through the pore, creating 16 different measurable ionic current values instead of 4. This method improves upon the single read through the nanopore by doubling the sites that the sequence is read per nanopore.

MspA Edit

Mycobacterium smegmatis porin A (MspA) is the second biological nanopore currently being investigated for DNA sequencing. The MspA pore has been identified as a potential improvement over αHL due to a more favorable structure. [21] The pore is described as a goblet with a thick rim and a diameter of 1.2 nm at the bottom of the pore. [22] A natural MspA, while favorable for DNA sequencing because of shape and diameter, has a negative core that prohibited single stranded DNA(ssDNA) translocation. The natural nanopore was modified to improve translocation by replacing three negatively charged aspartic acids with neutral asparagines. [23]

The electric current detection of nucleotides across the membrane has been shown to be tenfold more specific than αHL for identifying bases. [21] Utilizing this improved specificity, a group at the University of Washington has proposed using double stranded DNA (dsDNA) between each single stranded molecule to hold the base in the reading section of the pore. [21] [23] The dsDNA would halt the base in the correct section of the pore and enable identification of the nucleotide. A recent grant has been awarded to a collaboration from UC Santa Cruz, the University of Washington, and Northeastern University to improve the base recognition of MspA using phi29 polymerase in conjunction with the pore. [24]

Solid state Edit

Solid state nanopore sequencing approaches, unlike biological nanopore sequencing, do not incorporate proteins into their systems. Instead, solid state nanopore technology uses various metal or metal alloy substrates with nanometer sized pores that allow DNA or RNA to pass through. These substrates most often serve integral roles in the sequence recognition of nucleic acids as they translocate through the channels along the substrates. [25]

Tunneling current Edit

Measurement of electron tunneling through bases as ssDNA translocates through the nanopore is an improved solid state nanopore sequencing method. Most research has focused on proving bases could be determined using electron tunneling. These studies were conducted using a scanning probe microscope as the sensing electrode, and have proved that bases can be identified by specific tunneling currents. [26] After the proof of principle research, a functional system must be created to couple the solid state pore and sensing devices.

Researchers at the Harvard Nanopore group have engineered solid state pores with single walled carbon nanotubes across the diameter of the pore. [27] Arrays of pores are created and chemical vapor deposition is used to create nanotubes that grow across the array. Once a nanotube has grown across a pore, the diameter of the pore is adjusted to the desired size. Successful creation of a nanotube coupled with a pore is an important step towards identifying bases as the ssDNA translocates through the solid state pore.

Another method is the use of nanoelectrodes on either side of a pore. [28] [29] The electrodes are specifically created to enable a solid state nanopore's formation between the two electrodes. This technology could be used to not only sense the bases but to help control base translocation speed and orientation.

Fluorescence Edit

An effective technique to determine a DNA sequence has been developed using solid state nanopores and fluorescence. [30] This fluorescence sequencing method converts each base into a characteristic representation of multiple nucleotides which bind to a fluorescent probe strand-forming dsDNA. With the two color system proposed, each base is identified by two separate fluorescences, and will therefore be converted into two specific sequences. Probes consist of a fluorophore and quencher at the start and end of each sequence, respectively. Each fluorophore will be extinguished by the quencher at the end of the preceding sequence. When the dsDNA is translocating through a solid state nanopore, the probe strand will be stripped off, and the upstream fluorophore will fluoresce. [30] [31]

This sequencing method has a capacity of 50-250 bases per second per pore, and a four color fluorophore system (each base could be converted to one sequence instead of two), will sequence over 500 bases per second. [30] Advantages of this method are based on the clear sequencing readout—using a camera instead of noisy current methods. However, the method does require sample preparation to convert each base into an expanded binary code before sequencing. Instead of one base being identified as it translocates through the pore,

12 bases are required to find the sequence of one base. [30]

Comparison of Biological and Solid State Nanopore Sequencing Systems Based on Major Constraints
Biological Solid State
Low Translocation Velocity
Dimensional Reproducibility
Stress Tolerance
Ease of Fabrication

Major constraints Edit

  1. Low Translocation Velocity: The speed at which a sample passes through a unit's pore slow enough to be measured
  2. Dimensional Reproducibility: The likelihood of a unit's pore to be made the proper size
  3. Stress Tolerance: The sensitivity of a unit to internal environmental conditions
  4. Longevity: The length of time that a unit is expected to remain functioning
  5. Ease of Fabrication: The ability to produce a unit- usually in regards to mass-production

Biological: advantages and disadvantages Edit

Biological nanopore sequencing systems have several fundamental characteristics that make them advantageous as compared with solid state systems- with each advantageous characteristic of this design approach stemming from the incorporation of proteins into their technology. Uniform pore structure, the precise control of sample translocation through pore channels, and even the detection of individual nucleotides in samples can be facilitated by unique proteins from a variety of organism types.

The use of proteins in biological nanopore sequencing systems, despite the various benefits, also brings with it some negative characteristics. The sensitivity of the proteins in these systems to local environmental stress has a large impact on the longevity of the units, overall. One example is that a motor protein may only unzip samples with sufficient speed at a certain pH range while not operating fast enough outside of the range- this constraint impacts the functionality of the whole sequencing unit. Another example is that a transmembrane porin may only operate reliably for a certain number of runs before it breaks down. Both of these examples would have to be controlled for in the design of any viable biological nanopore system- something that may be difficult to achieve while keeping the costs of such a technology as low and as competitive, to other systems, as possible. [14]

One challenge for the 'strand sequencing' method was in refining the method to improve its resolution to be able to detect single bases. In the early papers methods, a nucleotide needed to be repeated in a sequence about 100 times successively in order to produce a measurable characteristic change. This low resolution is because the DNA strand moves rapidly at the rate of 1 to 5μs per base through the nanopore. This makes recording difficult and prone to background noise, failing in obtaining single-nucleotide resolution. The problem is being tackled by either improving the recording technology or by controlling the speed of DNA strand by various protein engineering strategies and Oxford Nanopore employs a 'kmer approach', analyzing more than one base at any one time so that stretches of DNA are subject to repeat interrogation as the strand moves through the nanopore one base at a time. [32] Various techniques including algorithmic have been used to improve the performance of the MinION technology since it was first made available to users. [33] More recently effects of single bases due to secondary structure or released mononucleotides have been shown. [34] [35]

Professor Hagan Bayley proposed in 2010 that creating two recognition sites within an alpha-hemolysin pore may confer advantages in base recognition. [36]

One challenge for the 'exonuclease approach', [37] where a processive enzyme feeds individual bases, in the correct order, into the nanopore, is to integrate the exonuclease and the nanopore detection systems. In particular, [38] the problem is that when an exonuclease hydrolyzes the phosphodiester bonds between nucleotides in DNA, the subsequently released nucleotide is not necessarily guaranteed to directly move into, say, a nearby alpha-hemolysin nanopore. One idea is to attach the exonuclease to the nanopore, perhaps through biotinylation to the beta barrel hemolysin. [38] The central pore of the protein may be lined with charged residues arranged so that the positive and negative charges appear on opposite sides of the pore. However, this mechanism is primarily discriminatory and does not constitute a mechanism to guide nucleotides down some particular path.

The challenges of sequencing by synthesis

DNA sequencing-by-synthesis (SBS) technology, using a polymerase or ligase enzyme as its core biochemistry, has already been incorporated in several second-generation DNA sequencing systems with significant performance. Notwithstanding the substantial success of these SBS platforms, challenges continue to limit the ability to reduce the cost of sequencing a human genome to $100,000 or less. Achieving dramatically reduced cost with enhanced throughput and quality will require the seamless integration of scientific and technological effort across disciplines within biochemistry, chemistry, physics and engineering. The challenges include sample preparation, surface chemistry, fluorescent labels, optimizing the enzyme-substrate system, optics, instrumentation, understanding tradeoffs of throughput versus accuracy, and read-length/phasing limitations. By framing these challenges in a manner accessible to a broad community of scientists and engineers, we hope to solicit input from the broader research community on means of accelerating the advancement of genome sequencing technology.

ELI5 how do genetics, mapping genomes, sequencing DNA and all that work?

How do they do it and what do we learn from it? Just curious.

DNA is a long polymer of repeating nucleotides, which can be one of four types - A, C, G, T. For all intents and purposes you can treat nucleotides as letters of the language of DNA. Three of these nucleotides code for a particular amino acids - words, if you may - and there are about 20 "words" in the vocabulary of human DNA. Now, if we string together a couple hundred amino acids, we get proteins - or sentences. These proteins have a function in your body - either in providing structural support, as a signalling molecule, help you digest food, produce another molecule, etc. Ultimately, DNA is what determines what proteins are expressed - and that's why many traits are hereditary.

DNA is usually double-stranded, and each letter has their corresponding partner between the strands. If A is on one strand, directly opposed to it must be T. The same goes with C and G. During DNA replication, DNA unwinds and become single-stranded. Special proteins - DNA polymerase - adds nucleotides one at a time onto a single-stranded DNA, according to the rules of the letters mentioned above.

DNA sequencing is what one does to decipher the "letters" of DNA. Most modern techniques are based on the works of Frederick Sanger - with the sequencing technique named after him (Sanger sequencing). What he did was ingenious - he let DNA replication occur normally, but instead of normal nucleotides, he threw in special ones that stop replication. So what ends up happening is that we get strands of different lengths, stopped at different letters in the sentence. By studying the lengths of these DNA strands, one can decipher the language. For example, if Sanger threw in special "C" nucleotides, and found DNA fragments of length 4, and 8, then he knew that the fourth and eighth letter must be "C". You repeat the same process for other letters.

Nowadays we mainly use fluorescence to detect it. We attach a fluorescent molecule to the nucleotides. These molecules are designed to glow when it is cut off, and we have our DNA polymerase cut off the fluorescent probe when adding on the next nucleotide. Further more, we can add different coloured probes, so one can correspond to each letter. Now we let DNA polymerase do its job, and we simply "read" the colours to get the sequence.

6.1.3 Manipulating Genomes Flashcards Preview

What is genetic engineering?

- changing the genetic makeup of an organism

- genes are taken from one organism and placed into another

- to form recombinant DNA

- the protein coded by the gene is then produced by the transgenic organism

- DNA from different organisms/sources joined together by complementary binding

Why do we genetically engineer organisms?

- to improve or introduce a feature in the recipient:

- herbicide resistant gene: can kill weeds, not harm crop

- growth controlling gene promotes muscle growth in cattle

- engineer organisms to produce useful products:

Give an overview of gene transfer processes

1. Gene is identified and cut out or made

2. Multiple copies of the gene are produced

3. Gene is inserted into vector for delivery into required cell

4. Gene inserted into recipient cells by vector and cells express the gene

5. Genetically transformed cells identified and cloned

Describe how to obtain the required gene in genetic engineering

- mRNA can be obtained from cells where the gene is expressed

- reverse transcriptase is used to form a single strand of complementary DNA (cDNA) using the mRNA as a template

- addition of primers and DNA polymerase can make this cDNA into a double-stranded length of DNA with a base sequence coding for the protein

If scientists know the nucleotide sequence of the gene:

- the gene can be synthesised using automated polynucleotide synthesis

- or the gene can be amplified (copied) from the genomic DNA from polymerase chain reaction (PCR)

- or a DNA probe can be used to locate a gene within the genome and then gene can be cut out using restriction enzymes

What are restriction endonuclease enzymes?

- made by bacteria and archaea to protect from phage virus attack

- they cut up foreign viral DNA, preventing the viruses from making copies

- the prokaryotic DNA is protected from the action of these endonucleases by being methylated at the recognition sites

- named according to the bacterium they have been obtained from e.g. EcoR1

- used in molecular biology and biotechnology as they recognise specific sequences within a length of DNA and cleave the molecule there

How do restriction endonuclease enzymes work?

- cut DNA at specific recognition sites which are 4-6 base pairs long

- enzymes recognise a palindromic sequence (e.g. from 5' to 3')

- some make a staggered cut leaving sticky ends, other make a cut that produces blunt ends

- some need Mg 2+ as cofactors

How are multiple copies of the gene made in genetic engineering?

- by using polymerase chain reaction (PCR)

How can you place the gene into a vector for delivery into recipient cell?

- plasmids from bacteria can be mixed with restriction enzymes that cut the plasmid DNA at specific recognition sites

- the cut plasmid has exposed unpaired nucleotide bases called sticky ends

- if free nucleotide bases, which are complementary to the sticky ends of the plasmid, are added to the ends of the gene to be inserted, then the gene and plasmid should anneal (bind)

- DNA ligase catalyses this annealing

- seal the gene into an attenuated (weakened) virus that could carry it to a host cell

- place the gene inside liposomes

- ligase links nucleotides together

- catalyses the formation of covalent (phosphodiester bonds)

- sealing sugar-phosphate backbone

Why do you need various methods to aid vector getting into recipient cell?

- DNA does not easily cross the recipient's plasma membrane

What methods help the vector to get into the recipient cell?

- heat shock treatment: subject bacteria to alternating period of cold (0ºC) and heat (42ºC) in the presence of calcium chloride

- walls and membrane will become more porous and allow in the recombinant vector

- this is because calcium ions surround the negative part of both DNA molecules and phospholipids in the cell membrane. this reducing repulsion between foreign DNA and host cell membranes

- electroporation: a high voltage pulse is applied to disrupt cell membranes

- electrofusion: electrical fields can help to introduce DNA into cells

- transfection: DNA can be packaged into a bacteriophage which can the infect the host cell

- Ti (recombinant) plasmids are inserted into the bacterium Agrobacterium tumefaciens which infects some plants and naturally inserts its genome into host cell genomes

- Gene gun: small pieces of gold/tungsten are coated with DNA and shot into plant cells

Why do you have to identify cells that have been successfully taken up the gene?

- not all bacteria contain the gene because

- some bacteria may not have taken up a plasmid

- some plasmids might have sealed back up after being cut with the restriction enzyme, so they didn't take in the gene

- the only bacteria we want are the ones that have the plasmid that contains the gene of interest

How do you identify the cells that have been successfully taken up by the gene and clone them in genetic engineering?

- replica plating and antibiotic resistance genes

- fluorescent marker gene from jelly fish

What is reverse transcriptase?

- retroviruses, such as HIV, which contain RNA that inject into the host genome, have reverse transcriptase enzymes that catalyses the production of cDNA

- they use their RNA as a template

- this is the reverse of transcription

- these enzymes are useful for genetic engineering

How is insulin made from GM bacteria?

- scientists can obtain mRNA from beta cells of islet of Langerhans in the human pancreas

1. adding reverse transcriptase enzymes makes a single strand of cDNA and treatment with DNA polymerase makes a double strand: the gene

2. addition of free unpaired nucleotides at the ends of the DNA produces sticky ends

3. with ligase enzyme, the insulin gene can be inserted into plasmids extracted from E.coli bacteria

- these are now called recombinant plasmids, as they contain inserted DNA

4. E.coli bacteria are mixed with recombinant plasmids and subjected to heat shock in the presence of calcium chloride ions, so they will take up the plasmids

Why does bacteria have to be safely contained when making insulin?

- transgenic bacteria have resistance to some antibiotics, we do not want them to escape from labs into the wild

- therefore, they have a gene knocked out, which means they cannot synthesise a particular nutrient

How did Fred Sanger sequence DNA in 1975?

- he used a single strand of DNA as a template for four experiments

- each dish contained a solution with the four bases plus DNA polymerase

- a modified version of one of the DNA bases was added to each dish

- it was modified in the way that once incorporated into the synthesised complementary strand of DNA, no more bases could be added

- each modified base was also labelled with a radioactive isotope

- as the reaction progressed, thousands of DNA fragments of varying lengths were generated

- DNA fragments were passed through a gel by electrophoresis

- smaller fragments travelled further, so they were sorted by length

- the nucleotide base at the end of each fragment was read according to its radioactive label

- although this was efficient and safe, he had to count each base one by one, so it was time-consuming

How was DNA cloned to allow them to be sequenced in early DNA sequencing?

- gene was isolated using restriction enzymes from a bacterium

- DNA is then inserted into a bacterial plasmid and then into an E.coli host, which divided many times, enabling the plasmid with the DNA to be copied

- these lengths of DNA were then isolated using plasmid preparation techniques and then sequenced

When was the first DNA sequencing machine developed?

- used fluorescent dyes instead of radioactive

- technicians had to read autoradiograms

What is high throughput sequencing?

- a variety of approaches was used to develop fast, cheap methods to sequence genomes

Describe pyrosequencing incomplete.

- it involves synthesising a single strand of DNA, complementary to the strand to be sequenced, one base at a time, while detecting, by light emission, which base was added at each step

1. a long length of DNA to be sequenced is mechanically cut into fragments of 300-800 base pairs, using a nebuliser

2. these lengths are then degraded into a single-stranded DNA (ssDNA)

- these are the template DNAs and are immobilised

3. a sequencing primer is added and the DNA is then incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase and apyrase and the substrates adenosine 5' phosphosulfate (APS) and luciferin

- only one of the four possible activated nucleotides, ATP, TTP, CTP, GTP is added at any one time and any light generated is detected

- one activated nucleotide, such as TTP is incorporated into the complementary strand of DNA

- as this happens, two extra phosphoryls are released as pyrophosphate (PPi)


  1. Paden

    Straight into the goal

  2. Isiah

    I am final, I am sorry, but it does not approach me. There are other variants?

Write a message