Files
rnaseq-pipeline/README.md
T

1.8 KiB
Raw Blame History

RNA-seq DESeq2 Pipeline

A sequential Kubernetes Job pipeline for differential expression analysis on yeast RNA-seq data. Each stage runs as a one-shot Job against a shared PVC, in order.

Dataset

  • Source: Gierliński et al., ENA accession PRJEB5348
  • Reads: 50bp single-end
  • Conditions: wild-type (WT: ERR458493495) vs. snf2 deletion mutant (snf2: ERR458500502)

Pipeline stages

Order File Stage
1 01-pvc.yaml Shared PersistentVolumeClaim for pipeline data and intermediate files
2 02-job-sra-download.yaml Downloads raw FASTQ reads from SRA/ENA
2b 02b-job-sra-download-extra.yaml Downloads the remaining replicate samples
3 03-job-fastqc.yaml FastQC read quality control
4 04-job-star.yaml STAR alignment to the reference genome
4b 04b-job-star-extra.yaml STAR alignment for the remaining replicate samples
5 05-job-featurecounts.yaml Gene-level count matrix from aligned reads
6 06-job-deseq2.yaml DESeq2 differential expression analysis (WT vs. snf2)

Results

  • STAR alignment: ~8590% mapping rate across samples
  • DESeq2 output visualized (volcano plot, etc.) in a Jupyter R notebook

Running

Namespace: rnaseq. Jobs are sequential — each depends on the previous stage's output landing on the shared PVC, so apply and wait for completion before moving to the next:

kubectl apply -f 01-pvc.yaml
kubectl apply -f 02-job-sra-download.yaml
kubectl get jobs -n rnaseq -w          # wait for Completed before continuing
kubectl apply -f 02b-job-sra-download-extra.yaml
kubectl apply -f 03-job-fastqc.yaml
kubectl apply -f 04-job-star.yaml
kubectl apply -f 04b-job-star-extra.yaml
kubectl apply -f 05-job-featurecounts.yaml
kubectl apply -f 06-job-deseq2.yaml