# 2.5: DNA Replication - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

### The scope of the problem

In this module we discuss the replication of DNA - one of the key requirements for a living system to regenerate and create the next generation. Let us first briefly consider the scope of the problem by way of a literary analogy.

The human genome consists of roughly 6.5 billion base pairs of DNA if one considers the full diploid genome (i.e. if you count the DNA inherited from both parents). Six point five billion looks like this: 6,500,000,000. That's a large number. To get a better idea of what that number means, imagine that our DNA is a set of written instructions for constructing one of us. By analogy we can then compare it to another written document. For this example we begin by considering Tolstoy's "War and Peace", a novel many people are familiar with for its voluminous nature. Data from Wikipedia estimates that "War and Peace" contains about 560,000 words. A second written work many are familiar with are the seven volumes of J.K. Rowling's "Harry Potter". This work checks in at ~1,080,000 words (Referenced Statistics on Wikipedia). If we assume that the length of the average English word is 5 letters, the two literary works are 2.8 million and 5.4 million letters in length, respectively. Therefore, even all seven volumes of "Harry Potter" have over 1000x fewer letters than our own genomes. The number of letters in these 7 novels are, however, much closer to the number of nucleotides in a typical bacterial genome.

Now imagine yourself copying these texts. How fast could you do it? How many mistakes are you likely to make? Do you expect there to be a trade-off between the speed at which you can copy and the accuracy? What type of resources does this process need? How much energy is required? Now imagine copying something 1000x larger!

With that in mind, it is worth noting that a human cell can take about 24 hours to divide (DNA replication must therefore be a little faster). A healthy E. coli cell may take only 20 minutes to divide (including replicating its ~4.5 million base pair genome). Both the human and bacterium do this while typically making few enough mistakes that the subsequent generation remains viable and recognizable. That pretty amazing!

Design challenge

If the cell is to replicate - its ultimate goal - a copy of the DNA must be created. So one clear problem/statement/question is "how can the cell effectively copy its DNA?" Given the analogy above, some relevant sub-questions of relevance might be: What are the chemical and physical properties that enable DNA to be copied (we're not just building more DNA- we're building an exact copy of its sequence)? With what fidelity must the DNA be copied? What speed must it be copied at? Where does the energy come from for this task and how much is necessary? Where do the "raw materials" come from? How do the molecular machines involved in this process couple the assembly of raw materials and the energy required to build a new DNA molecule together? The list could, of course, go on.

In the following and in lecture we will discuss how the DNA replication is accomplished while keeping in mind some of these driving questions. As you go through the reading and lecture materials try to be constantly aware of these and other questions associated with this process. Use these questions as guideposts for organizing your thoughts and try to find matches between the "facts" that you think you might be expected to know and the driving questions.

## The DNA Double Helix

To build some extra context we also need a little bit of empirically determined knowledge. Perhaps one of the best known and popular features of the hereditary form of the DNA molecule is that it has a double helical tertiary structure. The appreciation of this dates to the the 1950s. The story of this discovery has been widely recounted - and the details are beyond the scope of this text. Briefly, Francis Crick and James Watson are credited with determining the structure of DNA. Rosalind Franklin is now also widely credited with generating critical X-ray diffraction data that enabled Watson and Crick to piece together the puzzle of the DNA molecule.

Models of the structure of DNA revealed the molecule is made up of two strands of covalently linked nucleotides that are twisted around each other to form a right-handed helix. In each strand, nucleotides are covalently joined to two other nucleotides (except at the very ends of a linear strand) via phosphodiester bonds that link the sugars via the 5' and 3' hydroxyl groups (see panel b in the figure below) - recall that the labels 5' and 3' refer to the carbons on the sugar molecule. These sugar and phosphate chains form a contiguous set of covalent links that are often referred to as the "backbone" of the structure. In a linear molecule, each strand has two free ends. One is termed the 5'-end because the unlinked functional group that is typically involved in joining nucleotides is the phosphate linked to the 5' carbon. The other end of the strand is called the 3'-end because the unlinked functional group that is involved in joining nucleotides is the hydroxyl group linked to the 3' carbon of the sugar. Since the two ends of the strand are not symmetric this makes it easy to designate or describe a direction on the strand - one can, for instance, say that they are reading from the 5'-end to 3'-end to indicate that they are "walking" along the strand starting at the 5'-end and and moving towards the 3'-end. This direction (5' to 3') is, by the way, the convention used by biologists for writing sequences. The two strands of covalently linked nucleotides are found to be anti-parallel to one another in the double-helix; that is, the orientation/direction of one strand is opposite to that of the other strand (see panel b in the figure below). The sugar-phosphate backbone is on the outside of the double helix, creating a band of negative charges on the surface. By contrast, the nitrogenous bases of each of the antiparallel strands stack on the inside of the structure and oppose one another in a way that allows hydrogen bonds between unique purine/pyrimidine pairs (A pairing with T and G pairing with C) to form. "Stacking" refers to the fact that the flat planes of the bases on the same strand of DNA stack- like a stack of pancakes. These specific pairings (A with T, C with G) are said to be complementary to on another and thus the opposite strands of a double helix are often referred to one another as complementary strands.

Complementary strands carry redundant information. Because of the strict chemical pairing, if you know the sequence of one strand you also know the strand of its complement. Take for example the sequence 5′- C A T A T G G G A T G - 3′. Note how the sequence is annotated with the orientation (indicated by 5' and 3' labels). The complement of this sequence - written according to the 5' to 3' convention is: 5′- C A T C C C A T A T G - 3′. If you aren't convinced, write these two sequences out across from one another, making sure to write them as antiparallel strands. Note that the twisting of the two complementary strands around each other results in the formation of structural features called the major and minor grooves will become more important when we consider the binding of proteins to DNA (see panel c in the figure below).

Most of the Bis2a instructors will expect you to recognize key structural features depicted in the figure below and that you will be able to create a basic figure of the structure of DNA yourself.

DNA has (a) a double helix structure and (b) phosphodiester bonds. The (c) major and minor grooves are binding sites for sequence-specific DNA binding proteins during processes such as transcription (the creation of RNA from a DNA template), the regulation of transcription, and the identification of origins of replication by specific proteins (note that DNA polymerase is not among these!). The winding of DNA around nucleosomes, in contrast, is not sequence-specific and simply involves electrostatic interactions between the negatively charged backbone and the positively charged outer surface of the nucleosome. Similarly, DNA polymerase will copy any sequence, and will bind to the 3' end of any legitimately base-paired primer.

The structure of DNA immediately suggested how DNA might replicate. The two strands could, in theory, separate, new nucleotides could be aligned in a specific order that is complementary to the template strand, and these new nucleotides could be joined together. Three competing models for replication were suggested: the conservative model, the semi-conservative model and the dispersive model.

1. Conservative: The conservative model of replication postulated that each whole double-stranded molecule could act as a template for the synthesis of a completely new double-stranded molecule. That is, if one were to put a chemical tag on the template DNA molecule after replication none of that tag would be found on the new copy. This model seems a little silly, as the strands would have to separate, duplicate, and then separate again to find and re-bond to their original partner. Or a new strand would have to be formed side by side with an intact double helix, following some sort of templating that does not involve Watson-Crick base pairing.
2. Semi-conservative: This hypothesis stipulated that each individual strand of a DNA molecule could serve as a template for a new strand, to which it would now be hydrogen bonded. In this case, if a chemical label were placed on the double stranded template DNA molecule, one strand on each of the copies would retain the label.
3. Dispersive: This model proposed that a copied double-helix would be a piecewise combination of continuous segments of "old" and "new" strands. If a chemical label were placed on a DNA molecule that were copied using a dispersive mechanism, one would find discrete segments of the resulting copy that were labeled on both strands separated by completely unlabeled parts. This model, again, seems gratuitously complicated, requiring that the template strand be repeatedly broken and rejoined to newer molecules.

Meselson and Stahl resolved the issue in 1958 when they reported results of a now famous experiment which showed that DNA replication is semi-conservative (see figure below), and that each strand is used as a template for the creation of the new strand- as they no doubt expected. To learn more about this experiment watch The Meselson-Stahl Experiment.

DNA has an anti-parallel double helix structure, the nucleotide bases are hydrogen bonded together and each strand complements the other. DNA is replicated in a semi-conservative manner, each strand is used as the template for the newly made strand.

## DNA Replication

Having established some basic structural features and the need for a semi-conservative mechanism it is important to understand what is known about the process and to think about what questions one might want to answer.

Since DNA replication is a process we can invoke the "energy story" to think about it. Recall that an energy story is there to help us think systematically about processes (how things go from A to B). In this case the process is the act of starting with one double-stranded DNA molecule and ending up with two double-stranded DNA molecules. So, we will ask things like: What does the system look like at the beginning (matter and energy) of replication? How are matter and energy transferred in the system and what catalyzes the transfers? What does the system look like at the end of the process? We can also ask questions regarding specific events that MUST happen during the process. For instance, since DNA is a long molecule and it is sometimes circular, we can ask basic questions like, where does the process of replication start? Where does it end? We can also ask practical questions about the process like, what happens when a double-stranded structure is unwound?

We consider some of these key questions in this text and in class, and encourage you to do the same.

### Requirements for DNA replication

Let's start by listing some basic functional requirements for DNA replication that we can infer just by thinking about the process that must happen and/or be required for the replication to happen. So, what do we need?

• We know that DNA is composed of nucleotides. If we are going to create a new strand we will need a source of nucleotides.
• We can infer that building a new strand of DNA will require an energy source - we should try to find this.
• We can infer that that there must be a process for finding a place to start replication.
• We can infer that there will be one or more enzymes that help catalyze the process of replication.
• We can also infer that, given the enormous number of bases to be copied, some mistakes will be made.

We will find that once we understand the basics of how replication occurs, additional questions/issues will be raised!

### Nucleotide Structure

The building blocks of DNA are the nucleotides. Nucleotides are composed of a nitrogenous base, deoxyribose (a 5-carbon sugar), and a phosphate group. The nucleotide is named according to its nitrogenous base, purines such as adenine (A) and guanine (G), or pyrimidines such as cytosine (C) and thymine (T). Recall the structures below. Note that the nucleotide Adenosine triphosphate (ATP) is a precursor of the deoxyribonucleotide (dATP) which is incorporated into DNA. The other nucleotides to be employed during DNA synthesis are also nucleoside triphosphates. Hopefully this fact will provide some clues as to where the energy for DNA synthesis might come from.

Each nucleotide is made up of a sugar (ribose or deoxyribose depending on whether it builds RNA or DNA, respectively), a phosphate group, and a nitrogenous base. The purines have a double ring structure with a six-membered ring fused to a five-membered ring. Pyrimidines are smaller in size; they have a single six-membered ring structure. The carbon atoms of the five-carbon sugar are numbered 1', 2', 3', 4', and 5' (1' is read as “one prime”). The phosphate residue is attached to the hydroxyl group of the 5' carbon of one sugar of one nucleotide and the hydroxyl group of the 3' carbon of the sugar of the next nucleotide, thereby forming a 5'-3' phosphodiester bond.

## Initiation of Replication

With millions, if not billions, of nucleotides to copy, how does the DNA polymerase know where to start? Perhaps it might start anywhere, or, more reasonably, begin at one end of a chromosome and proceed to the opposite end? Neither of these hypotheses is correct. Evidence indicates that there are specific nucleotide sequences called origins of replication along the length of the DNA at which replication begins. The circular E. coli chromosome has just one of these sites; the linear eukaryotic chromosomes, in contrast, have multiple sites on every chromosome. Once this site is identified, however, there is a problem. The DNA double helix is held together by hydrogen bonds. If each strand is to be read and copied individually, there must be some mechanism responsible for dissociating the two strands. Breaking these hydrogen bonds is an endergonic process. Where does the energy come from and how is this reaction catalyzed? Basic reasoning should, at this point, lead to the hypothesis that a protein catalyst is involved and that this enzyme couples the endergonic separation of strands to some exergonic process.

It turns out that the details of this process and the proteins involved differ depending on the specific organism in question and many of the molecular level details are yet to be completely understood. There are, however, some common features in the establishment of replication origins in eukaryotes, bacteria and archaea. First, proteins generally called "initiators" have the capacity to bind DNA at or very near the DNA sequences that mark the origins of replication. The interaction of the initiator proteins with the DNA helps to destabilize the double helix and also to recruit other proteins, including an enzyme called ahelicase. In this case the energy required to destabilize the DNA double helix seems to come from the formation of new associations between DNA and the initiator proteins. The DNA helicase, in contrast, once loaded onto the origin, couples the exergonic hydrolysis of ATP to the unwinding of the DNA double helix. Additional proteins must be recruited to the partially unwound initiation complex. These include, but are not limited to, enzymes called primaseand DNA polymerase. While the initiators are lost soon after the initiation of replication, the rest of the proteins work in concert to execute the process of DNA replication. This complex of enzymes function at Y-shaped structures in the DNA called replication forks (see figure below). For any replication event two replication forks may be formed at each origin of replication, extending in both directions.

Suggested discussion

Why would different organisms have different numbers of replication origins? What could the benefit be to having more than one? Is there a drawback to having more than one?

Suggested discussion

Given what needs to happen at origins of replication, can you use logic to infer and propose for discussion some potential features that distinguish replication origins from other segments of DNA?

At the origin of replication, a replication bubble forms. The replication bubble is composed of two replication forks, each traveling in opposite directions along the DNA. The replication forks include all of the enzymes required for replication to occur - they are just not drawn explicitly in the figure so as to provide room to illustrate the relationships between the template and new DNA strands. What enzymes are needed, and where would they be located in this drawing?

### Elongation of Replication

The melting open of the DNA double helix and the assembly of the DNA replication complex is just the first step in the process of replication. Now the process of creating a new strand actually needs to get started. Here additional challenges are encountered. The first obvious issue is that of determining which of the two strands should be copied at any replication fork (i.e. which strand will serve as a template for semi-conservative synthesis)? Are both strands equally viable alternatives? There is also the problem of actually getting the process of new strand synthesis started. Can the DNA polymerase initiate a new strand on its own? It has been experimentally determined that DNA polymerase can NOT initiate strand synthesis. Rather, DNA polymerase requires a short stretch of double-stranded nucleic acids followed by single stranded template. DNA polymerase can only add a base to properly base-paired preexisting 3' end. This raises a bit of a problem for the initiation of DNA synthesis, doesn't it?

Fortunately, DNA polymerase can add a dNTP to an RNA molecule hybridized to a DNA template, and RNA polymerases

do not

require a preexisting base-paired 3' end tp initiate synthesis. Thus every molecule of DNA synthesis is actually initiated from a short (in E. coli, less than a dozen bases) RNA primer (these are depicted as short green lines in the figures above and below). The creation of a short primer is carried out by the enzyme primase. Primase is a very slow (well, relatively slow- requiring one second to make a primer) and error-prone polymerase. The error-prone nature of its activity is not a problem, because the cell will remove all the primers later in the process of DNA synthesis- and we'll revisit this later in this reading.

During the process of strand elongation, DNA polymerase polymerizes a new covalently-linked strand of DNA nucleotides (in E. coli this replicative polymerase is called DNA polymerase III; in eukaryotes polymerase nomenclature is more complex and the roles of the several replicative polymerases are not completely understood). DNA polymerase will ride along the

template

strand in that strand's 3' to 5' direction, synthesizing a new strand by adding bases to the nascent (new-born) strand's 3' end. (I suggest you make your own diagram to clarify this directionality- remember the two strands are antiparallel- the 3' to 5' direction on the template is the 5' to 3' direction on the new strand! If this sounds confusing, then get to work on that diagram asap).

Let us briefly consider the reaction involving the addition of a single nucleotide. The primer provides an important 3' hydroxyl on which to begin synthesis. The next deoxyribonucleotide triphosphate enters the binding site of the DNA polymerase and is oriented by the polymerase such that a hydrolysis of the incoming 5' triphosphate can occur, releasing pyrophosphate and coupling this exergonic reaction to the endergonic synthesis of a phosphodiester bond between the 5' phosphate of the incoming nucleotide and the 3' hydroxyl group of the primer. The degradation of pyrophosphate will add an extra energetic kick to this reaction. This process will then be repeated. There is no real "termination site" for DNA synthesis, synthesis by any individual polymerase will continue until the replication complex dissociates from the DNA; the complex might "fall off" the end of a broken template strand, it might run off the end of a linear chromosome, or it might run up against the 5' end of a previously synthesized primer (more on this later). This sounds much more complicated in text than it really is: see the "Leading and Lagging strand synthesis" diagram below (and definitely draw your own version).

Correct base pairing, or selection of correct nucleotide to add at each step, is accomplished by structural constraints felt by the DNA polymerase and the energetically favorable hydrogen bonds formed between complementary nucleotides. The process is energetically driven by the hydrolysis of the incoming 5' triphosphate and the energetically favorable interactions formed by the inter-nucleotide interactions in the growing double helix (base stacking and complementary base pairing hydrogen bonds).

After elongation of any particular new molecule is complete, a different DNA polymerase (in bacteria this is usually called DNA Polymerase I) comes in to remove the RNA primer and to synthesize the remaining bit of missing DNA. It will use the 3' end recently of the new strand recently created by DNA polymerase III as its primer.

The movement of the replication fork, and the separation of strands by DNA helicase, induces over-winding of the DNA in both ahead of the fork (imagine taking two strings twisted around each other, and trying to peel them apart- you'd end up with a tangle in the unseparated portion). Another ATP consuming enzyme called, in E. coli, gyrase, helps to relieve this stress. This enzyme is located just ahead of the replication fork, and repeatedly nicks and rejoins the DNA, allowing the supercoils to relax.

DNA polymerase catalyzes the addition of the 5' phosphate group from an incoming nucleotide to the 3' hydroxyl group of the previous nucleotide. This process creates a phosphodiester bond between the nucleotides while hydrolyzing the phosphoanhydride bond in the nucleotide.
Source: http://bio1151.nicerweb.com/Locked/m...h16/elong.html

Suggested discussion

Create an energy story for the addition of a nucleotide onto a polymer as shown in the figure above. This will be an explicit learning goal from some of your Bis2A instructors.

The discussion above about strand elongation describes the process of new strand synthesis if that strand happens to be synthesized in the same direction as the replication fork is or appears to be moving along the DNA. This strand can be synthesized continuously and is called the leading strand (and it moves along the "leading strand template"). However, both strands of the original DNA double helix must be copied, and to minimize the accumulation of single-stranded DNA (which is both susceptible to damage and difficult to repair), both strands are replicated simultaneously. Since the DNA polymerase can only synthesize DNA in a 5' to 3' direction, the polymerization of the strand opposite of the leading strand must occur in the opposite direction that the replication fork is traveling (this would be a good time to try to draw all of this, to orient yourself). This strand is called the lagging strand and due to geometric constraints must be synthesized through a repeated series of RNA priming and DNA synthesis events, creating short segments of new DNA called Okazaki fragments. As noted, the initiation of synthesis of each Okazaki fragment requires primase to synthesize an RNA primer and each of these RNA primers must be ultimately removed and replaced with DNA nucleotides by a different DNA polymerase. DNA polymerase I performs this job; it binds to the 3' end of an existing Okazaki fragement, using this (DNA) 3' to initiate replication. DNA polymerase possesses a 5' to 3' exonuclease activity and destroys the RNA nucleotides (former primers) in front of it, replacing them with dNTPs. This both eliminates RNA from the new strands. This is good for many reaons: as RNA is relatively unstable, primase is very error prone, and DNA polymerase would not be able to use RNA as a template for the next round of replication). The covalent bonds between each of the Okazaki fragments must therefore be formed by yet another enzyme called DNAligase, which uses up an ATP to ligate the 3' end of the fragment to the 5' end of the (now RNA-free) Okazaki fragment in front of it.

Quick question- why is an ATP needed for DNA ligase to join two Okazaki fragments together? ATP's aren't needed for DNA polymerase to add a nucleotide...

The geometry of lagging strand synthesis is difficult to visualize and will be covered in class. However, this might be best represented by a video (one that moves slowly!). A realistic representation of simultaneous leading and lagging strand synthesis can be found among Drew Barry's videos, here. However, this animation moves quickly (at actual speed). A really fascinating aspect of replication fork progression is the fact that the two polymerases, on the leading and lagging strand, are tethered together. They are both moving in the same direction as the replication fork, and the lagging strand template has to repeatedly twist and coil to accommodate this directionality. You will also see in this video how DNA polymerase has to be repeatedly reloaded and then dissociated from the lagging strand template. Both polymerases are stabilized on their template strands by a clamp that encircles that strand. In the video, you will see the "clamp loader" as a large complex that repeatedly grabs and reloads green clamps from the nucleoplasm.

Leading and lagging strand synthesis. The lagging strand is created in multiple segments. A replication fork showing the leading and lagging strand. A replication bubble showing the leading and lagging strands.
Bis2A Team original image

A growing replication fork. All the enzymes required for DNA replication (in E. coli)- with the exception of initiation- are illustrated here, although in a very stylized (but very clear) fashion. Which components are involved in leading and lagging strand priming? Strand extension? Removal of RNA nucleotides (former primers). Joining of separate Okazaki fragments? Coping with accumulation of supercoils? Note that the leading and lagging strand polymerases are both tethered to the clamp loader. How can DNA be simultaneous synthesized on both the leading and lagging stands if the polymerases are bound to each other?

## Termination of Replication

#### An issue specific to linear chromosomes: Telomeres and Telomerase

The termination of replication in circular bacterial chromosomes poses few practical problems. However, the ends of linear eukaryotic chromosomes pose a specific problem for DNA replication. Because DNA polymerase can add nucleotides in only one direction (5' to 3'), the leading strand allows for continuous synthesis until the end of the chromosome is reached; however, as the replication complex arrives at the end of the lagging strand there is no place for the primase to "land" and synthesize an RNA primer so that the synthesis of the missing lagging strand DNA fragment at the end of the chromosome can be initiated by the DNA polymerase. Without some mechanism to help fill this gap, this chromosomal end will remain unpaired and the the chromosome will become progressively shorter with each round of replication, ultimately compromising the ability of the organism to survive. These ends of the linear chromosomes are known as telomeres and contain highly repetitive, very short sequences that do not code for proteins. As a consequence, these "non-coding" telomeres act as replication buffers and, in somatic cells, are indeed shortened with each round of DNA replication. For example, in humans, a six base-pair sequence, TTAGGG, is repeated 100 to 1000 times at the end of most chromosomes. Clearly this is a stop-gap solution! The discovery of the enzyme telomerase helped in the understanding of how chromosome ends are maintained. Telomerase is an enzyme composed of protein ( a reverse transcriptase, meaning, an enzyme that copies an RNA template to make DNA) and a short RNA template. Telomerase attaches to the end of the chromosome by complementary base pairing between the RNA component of telomerase and the telomeric sequence of the DNA. The RNA is then used as template for the elongation of the chromosome end. This process can be repeated numerous times. Once the lagging strand template is sufficiently elongated by telomerase, primase will create a primer followed by DNA polymerase which can now add nucleotides that are complementary to the ends of the chromosomes. Thus, the ends of the chromosomes are replicated.

The ends of linear chromosomes are maintained by the action of the telomerase enzyme.

Telomerase is not active in somatic cells. Adult somatic cells that undergo cell division continue to have their telomeres shortened. Telomerase is, however, expressed in germline cells- cells that will ultimately produce gametes. Thus each zygote inherits chromosomes with long telomeres. Mutants completely defective in telomerase function often have no phenotype, but as the generations progress, their offspring become increasingly defective. This is due to the gradual loss of telomeric sequences with each cell division; once these sequences are lost the chromosomes become extremely unstable and exhibit frequent breakage. Thus telomeric sequences not only keep chromosomes long, they also prevent chromosome ends from being mistakenly recognized as "chromosome breaks". The instability of telomere-less chromosomes is due to the cell's misguided attempts to "repair" these breaks.

## Differences in DNA Replication Rates Between Bacteria and Eukaryotes

DNA replication has been extremely well-studied in bacteria, primarily because of the small size of the genome and large number of variants available. Escherichia coli has 4.6 million base pairs in a single circular chromosome, and all of it gets replicated in approximately 42 minutes, starting from a single origin of replication and proceeding around the chromosome in both directions. This means that approximately 1000 nucleotides are added per second. The process is much more rapid than in eukaryotes.

Table 1: Summary of the differences between bacterial and eukaryotic replications.
PropertyProkaryotesEukaryotes
Origin of replicationSingleMultiple
Rate of polymerization per polymerase500 nucleotides/s50 to 100 nucleotides/s
Chromosome structurecircularlinear
TelomeraseNot presentPresent

When the cell begins the task of replicating the DNA, it does so in response to environmental signals that tell the cell it is time to divide. The goal of DNA replication is to produce two identical copies of the double-stranded DNA template and to do it in an amount of time that does not pose an unduly high evolutionarily selective cost. This is a daunting task when you consider that there are ~6,500,000,000 base pairs in the human genome and ~4,500,000 base pairs in the genome of a typical E. coli strain and that Nature has determined that the cells must make copies of themselves within 24 hours and 20 minutes, respectively. In either case many individual biochemical reactions need to take place.

While ideally replication would happen with perfect fidelity, DNA replication, like all other biochemical processes, is imperfect - bases may be left out, extra bases added, or bases may be added that do not properly base-pair. In fact, the difference in potential energy between some correctly base paired and incorrectly base paired nucleotides is simply insufficient to "power" the remarkable fidelity of DNA polymerase (which erroneously inserts nucleotides at a rate of less than 1/106). Many of the mistakes that occur during DNA replication are promptly corrected by DNA polymerase itself via a mechanism known as proofreading. In proofreading, the DNA polymerase "reads" each newly added base via sensing the presence or absence of small structural anomalies before adding the next base to the growing strand. In so doing, a correction can be made.

If the polymerase detects that a newly added base has paired correctly with the base in the template strand, the next nucleotide is added. If, however, an incorrect nucleotide is added to the growing polymer, the mis-shaped double helix will cause DNA polymerase to stall, and the newly made strand will be ejected from the polymerizing site on the polymerase and move into an exonuclease site. In this site, DNA polymerase is able to cleave off the last several nucleotides that were added to the polymer. Once a few nucleotides have been removed, new ones will be added again. This proofreading capability comes with some trade-offs: Using an error correcting/more accurate polymerase requires time. The slower you go the more accurate you can be. Going too slow, however, may keep you from replicating as fast as your competition, so figuring out the balance is key.

Errors that are not corrected by proofreading or mismatch repair (see below) may become mutations- but only if their presence is not detected by yet another quality-control process: mismatch repair (discussed below).

Proofreading by DNA polymerase corrects errors during replication.

Suggested discussion

What are the pros and cons for DNA polymerases' proofreading capabilities?

Leading and lagging strand synthesis is complicated! Why not have a second DNA polymerase that extends the strand in the 3' to 5' direction instead? Having two polymerases, yoked together and proceeding on both strands simultaneously would seem to be a simpler solution. The energy requirement would still be fulfilled: to run polymerization in the opposite direction we'd simply have a growing strand that ends with a 5' triphosphate, which would be attacked by the 3' OH of the incoming dNTP. However, the imaginary polymerase that adds to the 5' end, rather than 3' end of the chain would have issues with proofreading. Their most recently added nucleotide carries the triphosphate. If this base is erroneous, and excised by the proofreading exonuclease, there will no longer be a triphosphate at the 5' end of the chain. Therefore, the chain could not be elongated. Try drawing this situation, for a real polymerase vs. this imaginary polymerase that elongates the 5' end of the growing chain.

Replication Mistakes and DNA Repair

Although DNA replication is typically a highly accurate process and proofreading DNA polymerases help to keep the error rate low (down to about 1/106 bases), mistakes still occur. In addition to errors of replication, environmental damage may also occur to the DNA. Such uncorrected errors of replication or environmental DNA damage may lead to serious consequences. Therefore, Nature has evolved several mechanisms for detecting and repairing damaged or incorrectly synthesized DNA.

Mismatch Repair

Some errors are not corrected during replication, but are instead corrected after replication is completed; this type of repair is known as mismatch repair. Specific enzymes recognize the incorrectly added nucleotide and excise it; replacing it with the correct base, using the sister strand as a template. Simple enough! The issue is: how do mismatch repair enzymes recognize

which

of the two improperly paired bases is the

incorrect

one?

In E. coli, at some point after replication, a subset of adenines (at a specific 4 base sequence) acquires a methyl group. Immediately after replication the parental (old) DNA strand will have methyl groups on these A's, whereas the newly synthesized strand lacks them (the enzyme that recognizes these sites hasn't found the new strand yet). Thus, this is a window of opportunity for mismatch repair enzymes are able to scan the DNA remove the wrongly incorporated bases from the newly synthesized, the non-methylated strand, using the methylated strand as the "correct" template from which to incorporate a new nucleotide. In eukaryotes, the mechanism for distinguishing old vs. new strands is not as well understood, but it is believed to involve recognition of unsealed nicks in the new strand, as well as a short-term continuing association of some of the replication proteins with the new daughter strand after replication has completed.

### Nucleotide Excision Repair

Nucleotide excision repair enzymes replace damaged bases by making a cut on both the 3' and 5' ends of the damaged site. The entire segment of DNA is removed and replaced with correctly paired nucleotides by the action of a DNA polymerase. Once the bases are filled in, the remaining gap is sealed with a phosphodiester linkage catalyzed by the enzyme DNA ligase. This repair mechanism is often employed when UV exposure causes the formation of pyrimidine dimers.

Nucleotide excision repairs thymine dimers. When exposed to UV, pyrimidines lying adjacent to each other can form dimers- this distorts the helix and is not an acceptable template for DNA synthesis by the replicative polymerase. In normal cells, they are excised and replaced.

## Reversal of damage

The mechanisms described above reflect a "remove and replace" strategy for repair of DNA, and take advantage of the redundancy in information of the double helix. Note that excision repair can't be employed in a single-stranded DNA or RNA genomes, which are frequently found in viruses. There are, however, a couple of types of damage that are so common that a specific repair pathway has evolved to recognize and simply reverse this specific type of damage to bases, rather than excising and replacing a damaged oligonucleotide. One example is photolyase- an enzyme that recognizes and reverses pyrimidine dimers, the most common form of UV induced damage. This enzyme is actually "powered" by blue light (which should be present in any environment where you would find UV light). The protein recognizes pyrimidine dimers and, when struck by light, donates (and then retrieves) an electron, breaking the inappropriate bonds between the two adjacent bases. Almost all living things express this enzyme- except, unfortunately, placental mammals. We're "stuck" with excision repair as our only mode of repair for UV-induced damage.

Consequences of errors in replication, transcription and translation

Cells have evolved a variety of ways to make sure DNA errors are both detected and corrected. We have already discussed several of them. Why did these evolve? Such mechanisms did not evolve to repair RNA or proteins. What the consequences would be of an error in transcription? Would such an error effect the offspring? Would it be lethal to the cell? What about errors in translation? How do these contrast with errors in DNA replication? If you are not familiar with transcription or translation, don't fret. We'll learn those soon and return to this question again.

## Impaired DNA replication derepresses chromatin and generates a transgenerationally inherited epigenetic memory

Impaired DNA replication is a hallmark of cancer and a cause of genomic instability. We report that, in addition to causing genetic change, impaired DNA replication during embryonic development can have major epigenetic consequences for a genome. In a genome-wide screen, we identified impaired DNA replication as a cause of increased expression from a repressed transgene in Caenorhabditis elegans. The acquired expression state behaved as an "epiallele," being inherited for multiple generations before fully resetting. Derepression was not restricted to the transgene but was caused by a global reduction in heterochromatin-associated histone modifications due to the impaired retention of modified histones on DNA during replication in the early embryo. Impaired DNA replication during development can therefore globally derepress chromatin, creating new intergenerationally inherited epigenetic expression states.

### Figures

Fig. 1. Impaired DNA replication during embryonic…

Fig. 1. Impaired DNA replication during embryonic development derepresses a transgene array.

Fig. 2. The absence of repressive histone…

Fig. 2. The absence of repressive histone modifications suppresses the effect of impaired replication on…

Fig. 3. Impaired DNA replication globally alters…

Fig. 3. Impaired DNA replication globally alters histone modifications.

( A to C ) Representative…

Fig. 4. Impaired DNA replication globally derepresses…

Fig. 4. Impaired DNA replication globally derepresses chromatin.

Fold change in expression of genes mapping…

Fig. 5. Impaired replication interferes with the…

Fig. 5. Impaired replication interferes with the inheritance of H3K27me3-modified paternal histones.

Fig. 6. Transgenerational epigenetic inheritance of acquired…

Fig. 6. Transgenerational epigenetic inheritance of acquired expression following DNA replication impairment.

Impaired DNA replication during early embryonic divisions leads to inefficient retention…

## Regulation of the start of DNA replication in Schizosaccharomyces pombe

Cells of Schizosaccharomyces pombe were grown in minimal medium with different nitrogen sources under steady-state conditions, with doubling times ranging from 2.5 to 14 hours. Flow cytometry and fluorescence microscopy confirmed earlier findings that at rapid growth rates, the G1 phase was short and cell separation occurred at the end of S phase. For some nitrogen sources, the growth rate was greatly decreased, the G1 phase occupied 30-50% of the cell cycle, and cell separation occurred in early G1. In contrast, other nitrogen sources supported low growth rates without any significant increase in G1 duration. The method described allows manipulation of the length of G1 and the relative cell cycle position of S phase in wild-type cells. Cell mass was measured by flow cytometry as scattered light and as protein-associated fluorescence. The extensions of G1 were not related to cell mass at entry into S phase. Our data do not support the hypothesis that the cells must reach a certain fixed, critical mass before entry into S. We suggest that cell mass at the G1/S transition point is variable and determined by a set of molecular parameters. In the present experiments, these parameters were influenced by the different nitrogen sources in a way that was independent of the actual growth rate.

## COMMENTARY

### Background Information

Human PSC hold great promise in the field of regenerative medicine. Yet, in order to reach the full potential of these cells, we must first capitalize on their ability to rapidly and endlessly renew to generate large numbers of genetically stable and undifferentiated cells. However, it has become apparent that human PSC acquire genetic changes during long-term culture, which raise concerns over the safety of stem cell−derived products that are destined for the clinic (Draper et al., 2004 , Olariu et al., 2010 ). The recurrent nature of certain karyotypic changes, such as amplifications to chromosomes 1q, 12p, 17q, and 20q, have highlighted that certain mutations provide a growth advantage to the variant cell, which becomes selected for in a culture over time (Amps et al., 2011 , Baker et al., 2016 , Olariu et al., 2010 ). Despite the mechanism of selection now being well defined, relatively little is known about the underlying mutational mechanisms, although the observations of replication stress and genomic damage in human PSC are similar to the oncogene-induced model of genetic instability in cancer development and progression (Halazonetis, Gorgoulis, & Bartek, 2008 ). The self-renewal of human PSC is characterized by an abbreviated G1 phase that bypasses the Rb/E2F checkpoint and is driven by high expression of cyclin D2 and constitutive expression of cyclin E (Becker et al., 2006 , Filipczyk, Laslett, Mummery, & Pera, 2007 ). Maintaining this rapid proliferation over extensive culture periods may expose these cells to replication stress, characteristics of which, such as reduced replication rates, have been defined in human PSC using the DNA fiber assay (Halliwell et al., 2020 ). Furthermore, we have recently identified regions of microhomology at the breakpoint of chromosome 20 tandem amplification, which implicates that template-switching mechanisms at stalled or collapsed forks are responsible for these mutations (J.A. Halliwell, D. Baker, P.W. Andrews, I. Barbaric, unpub. observ.). Collectively, these studies have determined that genetic stability in human PSC is overtly linked to DNA replication. This article provides experimental details and protocols required to perform the DNA fiber assay, which have been optimized for studying replication stress in human PSC.

The procedure described in this article has been optimized for use with human PSC, and has been successfully applied to several human iPSC and human ESC cell lines (Halliwell et al., 2020 ). We have utilized this assay to improve culture conditions: supplementing cultures with nucleosides alleviates replication stress and decreases the frequency of mitotic errors, highlighting that these events are linked in human PSC (Halliwell et al., 2020 ). The DNA fiber assay has revealed approaches that can be used to reduce the appearance of genetic instability in human PSC, which is necessary for the safe application of human PSC in regenerative medicine.

### Critical Parameters

We have observed little difference in fiber assay results whether human PSC are cultured in mTeSR, E8, or Nutristem cell culture medium. Currently, feeder layer−dependent cell culture practices have not been tested. The whole assay can be done in a single day, although we have included a stop point following the fixation step in Basic Protocol 2, which allows the protocol to be run over a period of 2 days.

We advise allowing the human PSC to recover for at least 48 hr following plating. This will permit 24 hr in culture without Rho-associated, coiled-coil containing protein kinase 1 inhibitor (Y-27632). It will also allow recovery from stressful re-plating, and for the cells to re-enter logarithmic growth phase. It is critical, where the results of experiments are to be compared, that the density of cells initially seeded and the confluency at the point of starting the experiment be consistent. Higher confluency may cause cell proliferation and DNA replication to slow.

With regard to labeling of DNA, it is possible to increase or decrease the pulse labeling times, but this will result in longer and shorter DNA fibers, respectively. Again, the labeling time must be constant across comparable experiments. It is also important that the labeling time be accurately measured. Deviations one of 1 min will increase labeling by 5%, and will confound the results of the experiment. To ensure timely labeling, it is advised that the reagents be prepared well ahead of starting the experiment. Also, when adding the second nucleotide analog (IdU), the plate should be removed from the incubator 1 min before the end of the first incubation period, to give a time buffer when preparing the second label.

Once harvested, the cells should be suspended in ice-cold PBS and kept on ice. DNA spreading should then be performed within 30 min, to minimize cell death.

When spreading DNA fibers, the user should select slides where the droplet spreads down the length of the slide in 3 to 5 min at a constant rate. In our experience, batches of slides can differ from one another, and each batch should be optimized.

### Troubleshooting

Table 2 describes problems that can arise with various steps in the assay, along with their possible causes and Solutions.

Step Problem Possible reason Solution
Basic Protocol 1 (step 18) Too many overlapping DNA fibers Density of labeled cells for spreading is too high Dilute the harvested labeled cell solution
Basic Protocol 2 (step 6) Fibers spreading occur too slow or fast Room temperature Increase or decrease the volume of spreading buffer
Humidity Increase or decrease the slide tilt angle
Microscope slide Optimize spreading conditions for each brand and batch of microscope slides prior to use
Support Protocol 2 Inconsistent results between biological and technical replicates Length of pulse labeling Ensure CldU or IdU is added for 20 min exactly
Time between cell harvesting and spreading Perform spreading sooner after cell harvesting. Reduce the number of samples to become more manageable.

### Statistical Analysis

It is important to take images of fields with minimum fiber cross overs across the whole slide to ensure a robust measurement of the progressing replication forks. A minimum of 150-200 individual fibers must be measured to account for the heterogeneity between progressing forks. The conversion factor for stretched DNA fibers is 2.59 kb/μm (Jackson & Pombo, 1998 ).

Consideration should be given to the presentation of data. Histograms and scatter plots are appropriate for capturing normal differences in replication fork progression. Statistical significance can be calculated using an unpaired Mann-Whitney U-test.

### Understanding Results

The DNA fibers produced can be variable but, in our experience, under normal growth conditions, they should contain equal lengths of CldU and IdU staining. When measured, we find the replication fork speed to vary between 0.5 and 0.75 kb/min. A reduced fiber length or fork speed can indicate replication stress.

Monitoring the frequency of replication events can provide valuable information regarding the replication processes going on during the culture of human PSC (Fig. 3). A strict control of replication origin density is required to maintain chromosomal stability (Prioleau & MacAlpine, 2016 ). Excessive replication-origin firing can deplete necessary protein and metabolites required for efficient DNA replication (Sørensen & Syljuåsen, 2012 ). A decrease in inter-origin distance or a higher density of IdU-only labeled fibers is a measure of increased origin density, suggesting greater numbers of simultaneously firing origins of replication. Fork stalling is a prerequisite for DNA breakage, which can become a substrate for genetic instability (Toledo, Neelsen, & Lukas, 2017 ). Measuring the frequency of CldU-only fibers or the ratio of the IdU labels on a bi-directional fork can be used to measure fork stalling. A shorter IdU fiber on one side of a bi-directional fork implies a fork-stalling event, and will result in a greater ratio between the IdU tracts.

### Time Considerations

#### Basic Protocol 1

Cells should be grown for a minimum of 48 hr following seeding we have found 72 hr to be optimal. Steps 1 to 6: ∼40 min will be needed for cell seeding. ∼20 min per day will be needed to refresh the medium.

Steps 7 to 18: ∼1 hr will be needed to pulse label and harvest the cells.

#### Basic Protocol 2

Step 1 to 4: preparation of cell lysate for DNA spreading will require ∼15 min.

Step 5 to 7: DNA spreading will require ∼20 min for spreading one set of three slides simultaneously.

Step 8 to 10: fixation of DNA fibers will require ∼15 min.

#### Basic Protocol 3

Step 1 to 18: DNA immunolabeling steps will require ∼6.5 hr.

#### Support Protocol 1 and 2

Several hours will be required for image acquisition and data analysis, but this will be variable depending on the quality of the DNA fibers and the biological question being asked.

### Acknowledgments

This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 668724.

This work was partly funded by the European Union's Horizon 2020 research and innovation program under grant agreement No. 668724 and partly by the UK Regenerative Medicine Platform, MRC reference MR/R015724/1.

Prof. Allen Gathman has a great 10-minutes video on Youtube, explaining the reaction of adding nucleotide in the 5' to 3' direction, and why it doesn't work the other way.

Briefly, the energy for the formation of the phosphodiester bond comes from the dNTP, which has to be added. dNTP is a nucleotide which has two additional phosphates attached to its 5' end. In order to join the 3'OH group with the phosphate of the next nucleotide, one oxygen has to be removed from this phosphate group. This oxygen is also attached to two extra phosphates, which are also attached to a Mg++. Mg++ pulls up the electrons of the oxygen, which weakens this bond and the so called nucleophilic attack of the oxygen from the 3'OH succeeds, thus forming the phospodiester bond.

If you try to join the dNTP's 3'OH group to the 5' phosphate of the next nucleotide, there won't be enough energy to weaken the bond between the oxygen connected to the 5' phosphorous (the other two phosphates of the dNTP are on the 5' end, not on the 3' end), which makes the nucleophilic attack harder.

Watch the video, it is better explained there.

DNA replications needs a source of energy to proceed, this energy is gained by cleaving the 5'-triphosphate of the nucleotide that is added to the existing DNA chain. Any alternative polymerase mechanism needs to account for the source of the energy required for adding a nucleotide.

The simplest way one can imagine to perform reverse 3'-5' polymerization would be to use nucleotide-3'-triphosphate instead of the nucleotide-5'-triphosphate every existing polymerase uses. This would allow for a practically identical mechanism as existing polymerases, just with different nucleotides as substrates. The problem with this model is that ribonucleotide-3'-triphosphates are less stable under acidic conditions due to the neighbouring 2'-OH (though this obviously only applies for RNA, not for DNA).

So any 3'-5' polymerase would likely need to use the same nucleotide-5'-triphosphates as the 5'-3' polymerase. This would mean that the triphosphate providing the energy for addition of a new nucleotide would be on the DNA strand that is extended, and not on the newly added nucleotide.

One disadvantage of this approach is that nucleotide triphosphates spontaneously hydrolyze under aqeuous conditions. This is no significant problem for the 5'-3' polymerase, as the triphosphate is on the new nucleotide and the polymerase just has to find a new nucleotide. For the 3'-5' polymerase spontaneous hydrolysis is a problem because the triphosphate is on the growing chain. If that one gets hydrolyzed, the whole polymerization needs to be either aborted or the triphosphate need to be readded by some mechanism.

You can take a look at the article "A Model for the Evolution of Nucleotide Polymerase Directionality" by Joshua Ballanco and, Marc L. Mansfield for more information about this. They created a model on early polymerase evolution, though they don't reach any final conclusion.

In my opinion, Prof. Allen Gathman's "great 10-minutes video on Youtube" is a pretty waste of time if you already know how hydrolysis happens. In fact, he has not considered the 3'->5' route in an unbiased manner he doesn't seem to look at the possibility of a triphosphate appearing at the growing 5' tip of the strand in the 3'->5' case.

Actually, the only difference between the two routes (5'->3' and 3'->5') is that the reacting triphosphate appears in different places. In the usual case, the triphosphate which is hydrolysed belongs to the added nucleotide, while in the latter case, the triphosphate which is hydrolysed belongs to the nucleotide on the growing strand. Both are feasible.

In fact, it is known that RNA polymerase has dual activity, but you see, RNA polymerase doesn't have proofreading activity!. Proofreading requires removal of the mismatched base, but in the 3'->5 direction the base's attachment had consumed the triphosphate at the 5' tip of the strand, so it is no longer available to add the replacement base. 3'->5' activity readily destroys proofreading capability of a polymerase So, basically, it is the need for proofreading that restricts the synthesis of DNA strands to 5'->3'. Why it is so, would need a lot more explanation (if in words) but I think a picture has far better explanatory power than a thousand words. I've added a picture from Essential Cell Biology that shows the answer to the 'WHY' question:

The other important consideration is repair. If one or more nucleotide is missing in one strand, repair of the missing nucleotide would be impossible for 3' to 5' synthesis, because no 5'-triphosphate is present. On the other hand, 5' to 3' synthesis does not require a 3'-triphosphate present at the repair site. This is important. That is 3' to 5' synthesis does not allow nucleotide repair.

## Results

### Implementation of TrAEL-seq

Various ligases can attach single-stranded DNA linkers to the 3′ end of single-stranded DNA, but efficiency is generally poor. An alternative method described by Miura and colleagues utilises terminal deoxynucleotidyl transferase (TdT) to add 1 to 4 adenosine nucleotides onto single-stranded DNA 3′ ends, forming a substrate for DNA adaptor ligation by RNA ligases [41,42] (Fig 1A steps i and ii). On a test substrate in vitro, TdT added 1 to 3 nucleotide A tails to >95% of single-stranded DNA molecules, which was ligated with approximately 10% efficiency to TrAEL-seq adaptor 1 using truncated T4 RNA ligase 2 KQ (Fig 1B).

TrAEL-seq adaptor 1 is a hairpin that primes conversion of single-stranded ligation products to double-stranded DNA suitable for library construction, incorporates a biotin moiety flanked by deoxyuracil residues that allows selective purification and elution of ligation products, and includes an 8-nucleotide unique molecular identifier (UMI) for bioinformatic removal of PCR duplicates (Fig 1A). Once TrAEL-seq adaptor 1 is ligated, a thermophilic polymerase with strong strand displacement and reverse transcriptase activities extends the hairpin to form unnicked double-stranded DNA (Fig 1A, step iii), then the DNA is fragmented by sonication and adaptor-ligated material is purified on streptavidin magnetic beads (Fig 1A, steps iv and v). The DNA ends formed during fragmentation are polished and ligated to TrAEL adaptor 2 while still attached to the beads (Fig 1A, step vi), then the purified fragments flanked by TrAEL adaptors 1 and 2 are eluted by cleavage of the deoxyuracil residues prior to library amplification (Fig 1A, step vii). The resulting library is sequenced using a primer that anneals to TrAEL-seq adaptor 1, such that the TrAEL-seq read is the reverse complement of the original DNA 3′ end (Fig 1A, step viii).

### Detection of 3′ extended DNA ends by TrAEL-seq

We tested TrAEL-seq on agarose-embedded yeast genomic DNA digested with restriction enzymes NotI, PmeI, and SfiI that yield 5′ extended, blunt, and 3′ extended ends, respectively, and generated a BLESS-type END-seq library from the same digested material for comparison (Fig 1C). The resulting TrAEL-seq library contained fragments of 200 to 2,000 bp as expected (S1A Fig), and sequencing data was processed through a custom bioinformatic pipeline to remove the A-tail, map the reads, and deduplicate by UMI (illustrated in S1B Fig). Comparing TrAEL-seq and END-seq data shows that both methods detect restriction enzyme cleavage sites: Efficiency is approximately equal on 3′ extended ends, END-seq is more efficient on 5′ extended ends, while TrAEL-seq unexpectedly performed better on the blunt PmeI ends (Fig 1C). Therefore, both methods efficiently detect DSBs even though the labelling strategies are very different.

The restriction enzyme SfiI has a degenerate recognition sequence (GGCCNNNN|NGGCC) that allows assessment of TrAEL-seq ligation efficiency on different 3′ end sequences, allowing us to ensure that there is no bias for DNA ends based on the 3′ or adjacent nucleotides (S1C Fig). Fine mapping of cleavages at the SfiI recognition site GGCCNNNN|NGGCC reveals differences between END-seq and TrAEL-seq: END-seq, in common with other BLESS-type methods, degrades the 3′ overhang and returns a consensus cleavage location 3′ of nucleotides 4 to 5 of the recognition site (Fig 1D). In contrast, TrAEL-seq can map the real cleavage site (3′ of nucleotide 8) and does so for >98% events, but only for SfiI sites lacking A nucleotides adjacent to the cleavage site (i.e., GGCCNNNB|BGGCC) (Fig 1D, top). This problem stems from the A-tails added by TdT, which cannot be distinguished from genome-encoded A’s. To reconcile this issue, we used a trimming algorithm that removes up to a maximum of 3 T’s from the start of the read. Since the average tail length is 2 to 4 nucleotides, this correctly maps the SfiI cleavage site to nucleotides 7 to 9 in >98% of reads, even when only the most challenging sites for mapping are considered (those with the structure GGCCNNNA|AGGCC) (Fig 1D, bottom). Importantly, this algorithm does not overtrim ends within genome-encoded A tracts such that the 10 SfiI sites with 2 or more 3′ A’s (GGCCNNAA|NGGCC) are mapped with the same accuracy (S1D Fig). We suggest that this overall mapping accuracy of >98% within ±1 nucleotide would be sufficient for almost all applications.

A major strength of TrAEL-seq should be the ability to map original sites of DSBs even after resection, a point in the homologous recombination process that is particularly amenable to stabilisation using mutations that prevent strand invasion. We chose meiosis as an in vivo model system to validate this as meiotic DSB patterns have been extremely well characterised. Meiotic DSBs formed by Spo11 are processed by Sae2 among other factors prior to resection, after which strand invasion into a homologous chromosome is mediated by Dmc1 [43,44]. Loss of Sae2 therefore stabilises DSBs prior to resection, whereas loss of Dmc1 stabilises DSBs after resection and before strand invasion. TrAEL-seq for the 3′ ends of resected DSBs in dmc1Δ cells 7 h after induction of meiosis revealed a DSB pattern very similar to that observed for unresected DSBs in an sae2Δ mutant mapped by S1-seq (a BLESS variant specific for meiotic recombination) (Fig 1E) [45]. TrAEL-seq technical replicates are highly reproducible across known hotspots of Spo11 cleavage (R = 0.99) (S1E Fig), and quantitation of these hotspots by TrAEL-seq correlates well to S1-seq in sae2Δ cells (R = 0.87) (Fig 1F, left) and Spo11 oligonucleotide sequencing (R = 0.85) (S1F Fig) [46,47]. Of the 3,907 known hotspots, TrAEL-seq detects 3,542 based on a threshold of 2 SDs above background, which lies between S1-seq (2,556), and Spo11 oligonucleotide sequencing (a much more labour-intensive method that forms the gold standard for meiotic DSB mapping, 3,784). TrAEL-seq sensitivity is broadly similar to CC-seq (a method specialised for protein-associated DNA ends [19]), which detects 3,223 sites by the same criteria. This shows that TrAEL-seq accurately maps and quantifies endogenous DSB sites even after end resection. Importantly, meiotic recombination is unusual in that mutants are known which completely stabilise DSBs, whereas stabilising breaks postresection is often more practical in other systems.

Overall, TrAEL-seq provides an effective method for detecting and quantifying DSBs genome-wide even after end resection.

### High-resolution mapping of stalled replication forks by TrAEL-seq

Replication forks stall at various impediments during DNA replication and stalled forks may undergo reversal or cleavage as the cell attempts to restart replication (Fig 2A). The replication fork barrier (RFB) in the rDNA of budding yeast is a classic system for studies of replication fork stalling, and results from replication forks encountering the Fob1 protein bound to DNA [48]. Fob1 binds just downstream of the 35S ribosomal RNA (rRNA) gene and prevents the passage of replication forks moving against the direction of 35S transcription that would otherwise encounter the RNA polymerase I machinery head-on [49,50]. The RFB has been intensely studied as a model for stalled replication forks initiating recombination and genome rearrangement [51,52], and DSBs thought to stem from fork cleavage have been reported at the RFB based both on Southern blotting and qDSB-seq (a BLESS-type method for mapping double stranded DNA ends) [13,53,54].

(A) Potential processing pathways of a stalled replication fork. Lagging strand processing is likely to finish soon after stalling, and at least for the yeast RFB, it is known that the lagging strand RNA primer is removed [55]. The fork could then undergo fork reversal to yield a Holliday junction or be cleaved on the leading or lagging strand. Whereas cleavage is irreversible and requires a recombination event to restart the replication fork, reversed forks can revert to the normal replication fork structure by Holliday Junction migration (labelled HJ migration). The 3′ DNA ends predicted to be TrAEL-seq substrates are labelled with green dots. The RNA primer on the Okazaki fragment in the leftmost structure is shown in red. (B) Comparison of the yeast rDNA RFB signals in TrAEL-seq datasets compared to qDSB-seq (SRA accession: SRX5576747) [13] and GLOE-seq (SRA accessions: SRX6436839 and SRX6436840) [40]. Reads were quantified in 1 nucleotide steps and normalised to reads per million mapped. qDSB-seq data were obtained from S-phase synchronised cells, all other samples are from asynchronous log-phase cell populations growing in YPD media. Schematic diagram shows the positions of RFB elements previously mapped by 2D gel electrophoresis [49,50], and black triangles indicate previously mapped sites of DNA ends [53,55]. (C) rDNA TrAEL-seq reads in hESCs. Two biological replicates are shown, each an average of 2 technical replicates. Reads were summed in 100 bp sliding windows spaced every 10 bp. One rDNA repeat is shown, the RNA polymerase I-transcribed 45S RNA is shown as a grey line with mature rRNAs marked in green in the schematic diagram. Note that the 45S gene is shown as transcribed right to left to maintain consistency with the yeast data, such that the sequence is the reverse complement of the rDNA reference sequence U13369. The R repeats, which contain the RFBs, are marked in green, while the primary direction of replication is shown by a red arrow labelled as “Replication?” to take into account evidence that forks can move in both directions through the human rDNA. (D) Average TrAEL-seq profiles across centromeres +/− 1 kb for 3 biological replicates of wild-type cells (drawn in red, orange, and purple). Centromeres are categorised based on replication direction in the yeast genome assembly into those replicated forward (CEN3, CEN5, CEN13, CEN2), reverse (CEN11, CEN15, CEN10, CEN8, CEN12, CEN9), and those in termination zones that could be replicated in either direction (CEN14, CEN16, CEN1, CEN4, CEN7, CEN6), see S2C Fig for details. Read counts per million reads mapped were calculated in nonoverlapping 10 bp bins, vertical lines indicate annotated boundaries of centromeres. (E) Average TrAEL-seq profiles across tRNAs +/− 200 bp for 3 biological replicates of wild-type cells (drawn in red, orange, and purple). tRNAs are categorised into those for which transcription is codirectional with the replication fork and those for which transcription is head-on to the direction of the replication fork. tRNAs for which the replication direction is not well defined were excluded. Arrows indicate peaks that are dependent on replication direction. Read counts per million reads mapped were calculated in nonoverlapping 5 bp bins, vertical lines indicate annotated boundaries of tRNAs. Numerical data underlying this figure can be found in S2 Data. hESC, human embryonic stem cell RFB, replication fork barrier rRNA, ribosomal RNA TrAEL-seq, Transferase-Activated End Ligation sequencing.

To detect replication forks stalled at the RFB and test the requirement for homologous recombination in resolution of these species, we prepared TrAEL-seq libraries from unsynchronised wild-type, fob1Δ, and rad52Δ cells growing at mid-log phase: fob1Δ cells lack RFB activity, while rad52Δ mutants cannot initiate homologous recombination. RFB signals should therefore be absent from fob1Δ, while signals representing DSBs formed by fork cleavage should accumulate in rad52Δ as this mutant cannot repair such DNA breaks once formed.

Two RFB sites are clearly visible in wild-type TrAEL-seq data as peaks of reverse strand reads but are absent in the fob1Δ mutant (Fig 2B, wild type and fob1Δ panels). These peaks are exactly reproduced between 2 libraries prepared independently from the same fixed cells (by different investigators working 6 months apart, S2A Fig) and are detected with high signal-to-noise in 3 wild-type biological replicates (S2B Fig). These sites correspond well with the RFB sites mapped using high-resolution gels [53,55] and are also visible in published qDSB-seq and GLOE-seq datasets, although TrAEL-seq data contains fewer additional peaks in this region than GLOE-seq data and the RFB peaks correspond more closely to known sites than qDSB-seq peaks (Fig 2B) [13,40].

To determine the applicability of TrAEL-seq to mammalian cells, we generated 2 TrAEL-seq datasets each from 2 biological replicate libraries of 0.5 million human embryonic stem cells (hESCs). A major peak was observed in the rDNA downstream of the RNA polymerase I termination site in both hESC biological replicates, on the reverse strand located in the most distal of the known RFB sites (Fig 2C) [56]. This observation is consistent with an efficient polar RFB located just downstream of the RNA polymerase I transcription unit, as seen in diverse species from plants to yeast to mice [49,57–60]. Furthermore, we detect smaller but reproducible peaks on both strands in all 3 RFB sites, consistent with the low efficiency bidirectional RFB activity that has been reported in human cells based on 2D gels and DNA combing (Fig 2C) [56,61,62].

rDNA RFBs are not the only sites at which replication forks stall, for example, reported GLOE-seq peaks at yeast centromeres likely stem from replication forks stalling at centromeric chromatin [40,63]. To probe this relationship, we first stratified centromeres into those replicated only by reverse forks, those replicated only by forward forks, and those sited in termination zones where forks converge (S2C Fig). At centromeres replicated from one direction only, we observed an accumulation of reads on the opposite strand to the direction of replication located just before the centromere, while forks in termination zones that can be replicated in either direction displayed both peaks (Fig 2D and S1 File). A similar analysis of tRNA loci, which are also known to stall replication forks [64], yielded more complex patterns (Fig 2E). These regions displayed peaks upstream or downstream of the tRNA depending on the direction of replication (Fig 2E, arrows), consistent with previous studies that reported both codirectional and head-on tRNA transcription can stall replication forks, at least in the absence of replicative helicases [64–67]. However, we also observed a major peak covering the first approximately 15 bp of the tRNA gene, which was not affected by replication direction and appears to mark a transcription-associated break on the template strand that must be a conserved feature of tRNA transcription as it is also detected in the hESC samples (S2D Fig). This aside, we find that sites of replication fork stalling both at the RFB and other sites are revealed by an accumulation of TrAEL-seq reads on the opposite strand to the direction of replication.

The structures resulting from stalled fork processing have various double-stranded 3′ ends that should be substrates for TrAEL-seq based on our restriction enzyme analysis (Figs 1C and 2A, green dots). However, no difference in signal intensity was observed between rad52Δ and wild type at the rDNA, centromeres or tRNAs, showing that these double-stranded ends are not normally processed by the homologous recombination machinery (Fig 2B, S2E and S2F Fig). DSBs formed in the rDNA are known to be repaired by homologous recombination, and although we and others have reported Rad52-independent recombination at the rDNA, these are rare events unknown in wild-type cells [68–70]. If TrAEL-seq peaks represented fork cleavage events, we would expect a strong stabilisation in the rad52Δ mutant. So, based on the lack of stabilisation observed, we consider that the vast majority of DNA ends at sites of replication fork stalling represent reversed forks that can revert to normal replication fork structures by Holliday Junction migration without recombination (see Fig 2A and Discussion).

Taken together, these results show that TrAEL-seq allows sensitive and precise mapping of replication fork stalling, most likely through labelling of reversed replication forks.

### TrAEL-seq profiles describe replication fork directionality

A striking feature of yeast TrAEL-seq data is the massive variation in strand bias of reads at different sites in the genome: A violin plot of the fraction of reverse reads in 1 kb bins shows 2 distinct peaks at 15% to 30% and 70% to 85%, a behaviour much less obvious in comparable GLOE-seq data (Fig 3A) [40]. TrAEL-seq read polarity in asynchronous wild-type cells (calculated from the difference between reverse and forward read densities) forms clear domains when plotted over large genomic regions that almost perfectly match the GLOE-seq map of Okazaki fragment ends in a Cdc9 DNA ligase depletion experiment, although with the opposite polarity (Fig 3B and S3A Fig) [40]. Mapping of Okazaki fragment ends is a well-validated method for detecting replication forks [35,36], and the tight correlation of TrAEL-seq data to Okazaki fragment distribution strongly suggests that TrAEL-seq detects processive replication forks even in wild-type cells. Indeed, the locations at which TrAEL-seq polarity switches from negative to positive coincide precisely with replication origins (autonomously replicating sequence or ARS elements) (Fig 3B, dotted vertical lines), and alignment of TrAEL-seq reads across 30 kb either side of all ARS elements reveals a switch in polarity as would be expected for replication forks diverging from replication origins (Fig 3C). Furthermore, TrAEL-seq reads in the rDNA reflect the known role of Fob1 in enforcing unidirectional rDNA replication, as reads are highly polarised in wild-type cells but this polarisation is absent in fob1Δ (S3B Fig).

Absolute TrAEL-seq read density is largely uniform across the single-copy genome, except for pronounced dips at each ARS (Fig 3D), suggesting that TrAEL-seq signals are primarily derived from active replication forks with little underlying noise. If so, then TrAEL-seq signals should vary across the cell cycle. However, as with other sequencing methods, quantitative comparison of total TrAEL-seq signal between libraries is not straightforward, as there is no relationship between total read count in a library and amount of substrate in the original sample. To allow such comparisons, we modified the TrAEL-seq pipeline such that 2 samples are barcoded at an early stage and then pooled for processing, sequencing, and postprocessing as a single sample. This approach maintains the absolute ratio of substrate between the 2 samples, allowing quantitative comparison.

We applied this method to compare cells arrested in G1 using α-factor to cells from the same culture after release into S-phase. Two variants of TrAEL-seq adaptor 1 with unique barcodes were ligated to the G1 and G1->S samples which were then pooled, and in each experiment, we performed 2 technical replicates with the barcodes swapped to ensure that no quantitative differences emerged from the adaptors themselves. Two biological replicate experiments yielded essentially identical results, with the TrAEL-seq read count across single-copy regions being dramatically higher in the G1->S samples than in the G1-arrested samples. To illustrate both absolute read quantity and strand bias, we plotted the read counts on forward and reverse strands separately across chromosome V (Fig 3E) S-phase samples show strong signals that phase between forward and reverse reads across the chromosome, whereas signals from G1 cells are almost undetectable. Furthermore, the phasing between forward and reverse matches the read polarity variation of unsynchronised samples (compare Fig 3B and 3E). This experiment shows that TrAEL-seq signals primarily arise from active DNA replication forks and are very low in nonreplicating cells.

Phasing of read polarity was also noted in wild-type samples profiled by GLOE-seq but only weakly, whereas TrAEL-seq libraries display very strong read polarity differences that are highly reproducible and yield essentially identical replication profiles (Fig 3A and 3E, S3C Fig) [40]. As Sriramachandran and colleagues noted for GLOE-seq [40], the read polarity of this replication signal is opposite to what would be expected from labelling of 3′ ends in normal forks. There should never be fewer 3′ ends on the lagging strand than the leading strand, yet up to 90% of TrAEL-seq reads emanate from the leading strand. To explain the GLOE-seq signal, Sriramachandran and colleagues suggested that GLOE-seq labels sites at which DNA is nicked during removal of misincorporated ribonucleotides [40]. To test this idea, we generated TrAEL-seq libraries from rnh201Δ and rnh202Δ mutants that lack key components of RNase H2, the main enzyme that cleaves DNA at misincorporated ribonucleotides, along with a wild-type control [71,72]. Strikingly, read polarity in these mutants is equivalent to wild type, showing that the leading strand bias of TrAEL-seq reads is not caused by RNase H2 and therefore is unlikely to arise through excision of misincorporated ribonucleotides (Fig 3G and S3D Fig). It is also possible that TrAEL-seq (and indeed GLOE-seq) signals arise when the replication machinery encounters Top1 cleavage complexes [73], but we saw no reduction in TrAEL-seq polarity or signal in top1Δ cells (Fig 3G and S3D Fig). One further observation in this regard is that END-seq data show a polarity bias, albeit weak, that parallels the polarity bias in TrAEL-seq data generated from the same cells (S3E Fig). This suggests that double-stranded ends are also formed during normal replication, although these faint signals could also arise through cleavage of the delicate single-stranded regions of replication forks during processing.

We then asked if an equivalent strand bias is observed in the hESC libraries. The limited read coverage in these libraries only allowed read polarity to be determined in 250 kb windows, but nonetheless, a striking variation was observed across the genome (Fig 3H). Importantly, these profiles were very similar between technical and biological replicates and cannot therefore simply result from noise this can be observed across defined genomic regions but is also clear in a scatter plot which shows that the average read polarity within each window correlates between the datasets (R = 0.84, S3F and S3G Fig). Furthermore, comparison to GLOE-seq results from a LIG1-depleted human cell line that is defective in Okazaki fragment ligation again revealed a striking similarity to the hESC TrAEL-seq data, although with the opposite polarity (Fig 3H and S3H Fig) [40]. Interestingly, a subset of origins were reproducibly detected in hESC samples but absent in the HCT116 data, consistent with evidence that origin usage differs between these cell lines (Fig 3H, green arrows) [24].

We therefore conclude that TrAEL-seq primarily detects processive replication forks and does so with exceptionally high signal-to-noise. TrAEL-seq profiles are highly reproducible and can be obtained from wild-type cells without need for cell synchronisation, sorting, or labelling. The 3′ ends detected by TrAEL-seq correspond to the leading rather than the lagging strand, despite the fact that many more 3′ ends occur on the lagging strand, and we suggest that these 3′ ends are exposed by replication fork reversal occurring either in vivo or during sample processing (see Discussion).

### Environmental impacts on replication timing and fork progression

Finally, we asked whether TrAEL-seq can reveal replication changes or DNA damage, and in particular whether we can detect collisions between transcription and replication machineries.

Since all the yeast libraries generated up to this point had yielded essentially identical DNA replication profiles outside the rDNA, we were first keen to ensure that changes in replication profile are indeed detectable. We therefore examined cells lacking Clb5, a yeast cyclin B that plays a key role in the activation of late-firing replication forks [74]. The TrAEL-seq profile of clb5Δ was very similar to wild type across most of the genome, but certain origins were clearly absent or strongly repressed, resulting in extended tracts of DNA synthesis from adjacent origins visible as regions of very different polarity (Fig 4A, green arrows, S4A Fig). This is as predicted for clb5Δ mutants and confirms that TrAEL-seq is indeed sensitive to changes in replication profile.

## Watch the video: DNA Replication SL IB Biology (May 2022).

1. Gibbesone

the Relevant message :), it is worth knowing ...

2. JoJolabar

There is something in this. Thank you so much for your help in this matter, now I will not make such a mistake.

3. Welles

It is removed (has mixed section)

4. Akijin

I apologise, but, in my opinion, you commit an error. Write to me in PM, we will talk.

5. Kajigar

They are wrong.