Office Address:
Plot No. 16, Shakti Khand 3, Indirapuram, Ghaziabad - 201014
Email: nextgenlearn@gmail.com
Phone: +91 9310710211
NGS data analysis 4 weeks Program
Week 1 — RNA-Seq Data Analysis
Class 1: Introduction to NGS & NGS Data Types
Objective: Understand NGS technologies, applications, and data formats
ü What is NGS? Sequencing platforms (Illumina, ONT, PacBio)
ü Applications: Genomics, Transcriptomics, Metagenomics, Epigenomics
ü NGS workflow overview
ü Infrastructure requirements for NGS data analysis
ü NCBI SRA/ENA databases, metadata
Hands-on:
ü Linux basics (navigation, permissions, pipes, grep, awk)
ü Navigate NCBI SRA, ENA
ü Using SRA Toolkit: prefetch, fasterq-dump
ü Download sample datasets manually
Class 2: Introduction to RNA-seq & Experimental Design
Objective: Understand RNA-seq fundamentals
ü Overview of RNA-seq: concepts, applications, and data generation
ü Bulk vs. single-cell, library types, replicates, platforms
Hands on: Download RNA-Seq publicly available dataset.
Class 3: RNA-seq QC & Preprocessing
Objective: Perform QC & trimming specifically for transcriptomics
ü Introduction to tools and pipelines.
ü FASTQ structure and quality metrics
ü Quality assessment using:
-FastQC, fastp & MultiQC
ü Important parameters to look over: adapters, overrepresented sequences, duplication rate, GC content, etc.
ü Adapter/quality trimming
ü Trimming tools: Trimmomatic, fastp, Trim Galore
Hands on:
ü FASTQ QC, Trim reads using Trimmomatic/fastp, PostQC evaluation
Class 4: Alignment & Quantification
Objective: Map reads & quantify gene expression
ü Brief about alignment algorithms: HISAT2, STAR, Salmon/Kallisto (pseudo-alignment)
Hands on:
ü Alignment & Quantification: featureCounts / HTSeq
Class 5: Differential Expression & Functional Analysis
Objective: Identify Differentially expressed genes (DEGs) & interpretation
ü Short summary of what differential expression is and why it matters in clinical diagnostics and disease interpretation
Hands on:
ü DESeq2, edgeR, volcano plots, GO/KEGG Pathway analysis
Assignment for week 1: Download a publicly available RNA-seq dataset, perform complete preprocessing (QC + trimming), align or quantify the reads (HISAT2/STAR or Salmon/Kallisto), and generate a final count matrix along with a brief summary of differential expression and pathway results.
Week 2 — Whole Exome Sequencing (WES) Analysis: Germline & Somatic Variants
Class 1: Introduction to WES, Target Capture & Applications
Objective: Understand WES workflows and applications
ü Overview of WES, target enrichment strategies, and its clinical and research applications.
ü Applications of WES in disease research, diagnostics, and personalized medicine.
ü Comparison of WES with whole-genome sequencing (WGS).
Class 2: QC, Trimming & Alignment (BWA-MEM)
Objective: Preprocess and align exome data
ü Overview of paired-end sequencing data and differences in germline vs. somatic workflows.
ü FastQC, Trimmomatic, BWA-MEM
ü Post-alignment processing: sorting, marking duplicates, and base recalibration (BQSR) using GATK.
ü Introduction to SAM/BAM file formats and their handling.
Class 3: Germline & Somatic Variant Calling (GATK)
Objective: Call both germline and somatic variants
ü Germline: HaplotypeCaller, joint genotyping
ü Somatic: Mutect2, Panel of Normals, Filtering and refining somatic variant calls.
Class 4: Variant Annotation, Prioritization & Reporting
Objective: Interpret variants and build reports
ü Annotate variants using ANNOVAR, VEP, and SnpEff to determine their functional impact.
ü Interpret pathogenicity scores such as SIFT, PolyPhen, CADD, and MutationTaster.
ü Use key databases (gnomAD, ClinVar, COSMIC) to understand population frequency and clinical relevance.
ü Prioritize important variants based on pathogenicity, frequency, and functional significance, and review them in IGV.
Assignment for week 2: Download a publicly available WES dataset and complete the full workflow: run QC + trimming, align reads with BWA-MEM, call either germline or somatic variants using GATK, and finally annotate and prioritize key variants using ANNOVAR/VEP/SnpEff, summarizing the biologically relevant ones in a short report.
Week 3 — Metagenomic Data Analysis
Class 1: Introduction to Metagenomics
Objective: Understand metagenomics concepts & experimental workflows
ü Shotgun vs. 16S/ITS metagenomics
ü Microbiome study design
ü Challenges: contamination, low biomass, host reads
Hands-on: Explore metagenomic datasets in SRA
Class 2: QC, Host Removal & Taxonomic Profiling
Objective: Clean metagenomic reads and classify organisms
ü QC using FastQC/Fastp
ü Host read removal using Bowtie2
ü Taxonomic classification:
-Kraken2, MetaPhlAn, Kaiju
Hands-on:
ü Perform host-filtering
ü Run Kraken2/MetaPhlAn classification
Class 3: Functional Profiling & Diversity Analysis
Objective: Explore functional capabilities and diversity metrics
ü HUMAnN3 for functional/metabolic profiling
ü Diversity metrics: alpha, beta
ü Visualization: Krona, Heatmaps, PCA
ü QIIME2 workflows
Hands-on:
Class 4: Metagenomic Assembly & Interpretation
Objective: Assemble metagenomes and interpret biological meaning
ü Metagenomic assembly using MEGAHIT / MetaSPAdes
ü Binning: MaxBin2, MetaBAT
ü MAG quality check: CheckM
ü Reporting results
Hands-on:
ü Assemble a sample using MEGAHIT
Assignment for week 3 : Download a real metagenomic dataset, perform QC and host-read removal, generate both taxonomic and functional profiles using tools like Kraken2/MetaPhlAn and HUMAnN3 or QIIME2, and submit a brief summary including diversity plots or key taxa identified.
Week 4 — Genome Assembly & Annotation
Class 1: Introduction to Genome Assembly
Objective: Understand assembly principles and strategies
ü De novo vs. reference-guided assembly
ü Illumina vs. long-read (ONT/PacBio) assemblies
ü Metrics: N50, L50, coverage, completeness
ü Popular assemblers (SPAdes, Unicycler, Flye)
Hands-on: Explore assembly tools & datasets
Class 2: De Novo Genome Assembly (Hands-On Focus)
Objective: Assemble a draft genome
ü SPAdes / MEGAHIT workflows
ü Contigs, scaffolds, k-mer strategies
ü QC of assembled genome using QUAST
Hands-on:
ü Assemble bacterial/viral genome
ü Assess assembly metrics
Class 3: Genome Annotation (Structural & Functional)
Objective: Annotate genomes using automated pipelines
ü Prokka (bacteria/virus annotation)
ü RAST, Bakta alternatives
ü Predicting CDS, tRNAs, rRNAs
ü Functional annotation: COG, KEGG, Pfam
Hands-on:
ü Annotate genome using Prokka
Class 4: Comparative Genomics & Downstream Applications
Objective: Analyze assembled genomes for biological insights
ü Multiple sequence alignment (MAFFT, Clustal Omega)
ü Phylogenetic analysis (IQ-TREE)
ü Identifying SNPs/indels across strains
ü Pan-genome analysis (Roary)
ü Applications:
o Vaccine design (epitope prediction)
o Drug target identification
Hands-on:
ü Run phylogenetic Analysis; identify conserved regions
Assignment for week 4: Select a real microbial genome dataset, perform a full de novo assembly using SPAdes/MEGAHIT, annotate the assembled genome with Prokka, and generate a brief comparative genomics summary including key metrics (N50, completeness), major genes, and a phylogenetic placement.