
33:55
Summer Research Internship in Advanced Bioinformatics: Principles & Workflow of NGS
Barcode Biotechnology
Overview
This video explains the principles and workflow of Next-Generation Sequencing (NGS), focusing on the Illumina platform. It details the entire process from sample preparation, including DNA fragmentation and adapter ligation, to cluster generation and the sequencing-by-synthesis method. The explanation covers how fluorescently labeled nucleotides are incorporated one by one, detected, and removed to determine the DNA sequence. Finally, it briefly touches upon setting up a Linux environment (Ubuntu) on Windows for data analysis.
How was this?
Save this permanently with flashcards, quizzes, and AI chat
Chapters
- Illumina sequencing is a high-throughput DNA sequencing technique that generates millions to billions of short DNA reads.
- It's also known as sequencing-by-synthesis because it reads DNA by adding one base pair at a time.
- The core principle involves incorporating a fluorescently labeled nucleotide, which emits a specific color detected by a sensor, allowing identification of the base.
Understanding the fundamental principle of sequencing-by-synthesis is crucial for grasping how vast amounts of DNA data are generated efficiently.
Different nucleotides (T, G, C, A) emit distinct colors (red, green, blue, pink) when incorporated, enabling their identification.
- The workflow begins with DNA/RNA extraction, followed by fragmentation into smaller pieces.
- Adapters are added to the fragmented DNA, and then cluster generation occurs.
- Bridge amplification increases the DNA fragments, followed by denaturation and the sequencing process itself.
- The final step is data interpretation, with optional paired-end sequencing for more comprehensive data.
This overview provides a roadmap of the entire NGS process, highlighting the sequential steps involved in preparing and sequencing DNA.
The complete workflow includes sample preparation, cluster generation, sequencing-by-synthesis, and data interpretation.
- Genomic DNA is very large, so it must be fragmented into manageable short reads (e.g., 200-500 base pairs) for Illumina sequencing.
- Fragmentation methods include mechanical (sonication, nebulization, hydrodynamic shearing) and enzymatic (using restriction enzymes).
- Mechanical methods use physical forces, while enzymatic methods use specific enzymes to cut DNA.
- Fragmentation ensures uniform coverage across the genome and compatibility with high-throughput sequencing.
Proper fragmentation is essential because Illumina platforms are designed for short reads; incorrect fragmentation can lead to poor data quality and coverage.
Sonication uses high-frequency sound waves to break DNA into smaller fragments.
- After fragmentation, DNA fragments are selected and their quality is assessed using tools like TapeStation or Bioanalyzer.
- Quality control checks parameters like average fragment size, peak size, and concentration.
- Adapters are ligated to both ends of the DNA fragments, which are crucial for binding to the flow cell and for PCR amplification.
- These adapters also contain barcodes for multiplexing, allowing multiple samples to be sequenced together.
Quality control ensures that the DNA fragments are of the correct size and concentration, preventing issues in downstream steps and ensuring reliable sequencing results.
TapeStation uses microfluidic technology to provide a digital, precise, and fast analysis of DNA fragment size and concentration, often visualized as a graph or gel image.
- Single-stranded DNA fragments with adapters bind to complementary oligonucleotide sequences immobilized on a flow cell surface.
- Bridge amplification is a process where DNA fragments are repeatedly amplified in situ on the flow cell, creating clusters of identical DNA molecules.
- This process involves denaturation and extension, forming bridge-like structures that eventually result in millions of identical DNA copies per cluster.
- Each cluster originates from a single DNA molecule, ensuring that all sequences within a cluster are identical.
Cluster generation is vital for amplifying the signal during sequencing; a sufficient number of identical DNA copies in each cluster is needed for reliable detection of fluorescent signals.
A single DNA fragment binds to the flow cell, then polymerase extends it, forming a bridge-like structure that is amplified to create a dense cluster of identical fragments.
- During sequencing, fluorescently labeled nucleotides (dNTPs) are added one at a time in cycles.
- Each nucleotide has a reversible terminator and a fluorescent dye that emits a specific color.
- After a nucleotide is incorporated, its fluorescent signal is detected by a camera, and the dye and terminator are chemically removed.
- This cycle repeats, allowing the DNA sequence to be read base by base, generating millions of reads simultaneously.
This is the core of the sequencing process, where the actual DNA sequence is determined by detecting the color emitted by each incorporated nucleotide.
In one cycle, a 'T' nucleotide with a red fluorescent label is incorporated, emits red light, and then its label and terminator are removed before the next cycle begins.
- After sequencing, the raw data is typically analyzed in a Linux environment.
- A Linux environment (like Ubuntu) can be installed on Windows using the Windows Subsystem for Linux (WSL).
- This involves enabling Windows features, downloading Ubuntu from the Microsoft Store, and setting up a username and password.
- Further steps include installing bioinformatics tools like Conda for package management.
Setting up a suitable computational environment is essential for processing and analyzing the large datasets generated by NGS, enabling downstream biological interpretation.
Downloading and installing Ubuntu from the Microsoft Store and then using commands within Ubuntu to install the Conda package manager.
Key takeaways
- Illumina sequencing relies on a 'sequencing-by-synthesis' approach where DNA bases are identified by fluorescent signals emitted during nucleotide incorporation.
- The entire NGS workflow involves meticulous sample preparation, including fragmentation and adapter ligation, to make DNA suitable for high-throughput sequencing.
- Quality control at various stages, especially after fragmentation, is critical for ensuring the accuracy and reliability of sequencing data.
- Cluster generation amplifies single DNA molecules into millions of identical copies on a flow cell, providing a strong enough signal for detection.
- Each sequencing cycle incorporates one nucleotide, emits a signal, and then terminates, with the fluorescent tag and terminator being removed before the next cycle.
- The process generates millions of short DNA reads that are then assembled and analyzed to understand the genome.
- Bioinformatics analysis of NGS data often requires a specialized computational environment, such as Linux, which can be set up on Windows.
Key terms
Next-Generation Sequencing (NGS)Illumina SequencingSequencing-by-SynthesisHigh-Throughput SequencingDNA FragmentationAdapter LigationCluster GenerationBridge AmplificationFluorescently Labeled NucleotidesReversible TerminatorFlow CellTapeStationWindows Subsystem for Linux (WSL)
Test your understanding
- What is the fundamental principle behind Illumina's 'sequencing-by-synthesis' method?
- Why is DNA fragmentation a necessary first step in preparing samples for Illumina sequencing?
- How does bridge amplification contribute to the success of the sequencing process?
- Describe the role of fluorescently labeled nucleotides and reversible terminators in determining the DNA sequence.
- What are the key components of setting up a Linux environment for NGS data analysis on a Windows system?