Next Generation (Next-Gen) sequencing is a new form of technology capable of determining the sequence of DNA the size of a bacterial genome. Despite this capacity to sequence millions of bases, Next-Gen technology is not always efficient or cost effective. Sequencing smaller fragments of DNA is an example of this. Sometimes researchers prefer to sequence a single gene to determine function or fill in gaps left by the more advanced technologies. Primer walking is a common practice still employed for sequencing smaller regions of DNA. Capillary sequencers have improved the capability of sequencing DNA with faster times and longer base read lengths. However, the basic process of primer walking and gene assembly remains the same.
Isolating the DNA Fragment
The first step in sequencing a selected DNA fragment is to isolate the fragment from genomic DNA. Isolation is often performed by amplifying copies of the chosen region using PCR (Polymerase Chain Reaction). PCR produces copies of (amplifies) a region of DNA determined by smaller single stranded oligonucleotides called primers. The primers anneal to the start point on the forward strand and the start point on the reverse strand as amplification proceeds in both directions. Amplification by PCR produces millions of copies of a given region of DNA. However, it requires knowledge of the sequence before and after the region in order to select primers for amplification.
Researchers sometimes choose to sequence the DNA fragment (PCR product) directly. However, many researchers prefer cloning using a bacterial plasmid as an alternative.
Amplifying the DNA Fragment in a Bacterial Plasmid
Once the selected DNA has been amplified by PCR, it is inserted into a bacterial plasmid. The plasmid is a circular DNA molecule with known sequence. The circular DNA is broken with restriction enzymes allowing the unknown DNA fragment to be inserted. Most commercial plasmids contain universal sights from which researchers can select universal primers to sequence the inserted DNA in the forward and reverse directions. Capillary sequencers typically generate 800 to 900 bases for each primer for a single set of sequencing reactions yielding 1600 to 1800 total bases of data. This may cover the entire insert for smaller DNA fragments. However, genes are generally over 5,000 bases. Therefore, one set of sequence results would not complete the sequencing of the entire fragment of DNA.
Sequence Results Provide Templates for Primers
Gene assembly and primer walking involve using known sequence to select primers for additional sequence data. Using a result generated from a universal primer provides the sequence template for designing the next primer. The primer will likely be selected around the 700 base region for a result with 800 bases of quality sequence. The remaining 100 bases will allow overlap with the new generated sequence. Primer walking continues until the forward sequence results intersect with the reverse sequence results. A 5,000 base insert would likely need 5 to 8 results to achieve overlap between the forward and reverse directions. Researchers may sequence both the forward and reverse strands completely to provide the entire sequence of the double stranded molecule to confirm the accuracy of all the bases.
Software Assembles the Sequence Results
There are commercial software programs designed to assemble sequence results in the order of their overlapping regions resulting in a consensus sequence. Many of these programs use chromatogram data that allows base peak calls to be reviewed and corrected if necessary. The chromatogram is a view of the actual data showing the quality of peaks generated during electrophoresis. Some programs simply use the text of the base calls, but this does not allow as detailed a review of the results.