Initial RNA-seq DESeq2 pipeline manifests

2026-06-21 06:39:53 -04:00
commit 395c18a9e6
9 changed files with 431 additions and 0 deletions
@@ -0,0 +1,43 @@
+# RNA-seq DESeq2 Pipeline
+
+A sequential Kubernetes Job pipeline for differential expression analysis on yeast RNA-seq data. Each stage runs as a one-shot Job against a shared PVC, in order.
+
+## Dataset
+
+- Source: Gierliński et al., ENA accession PRJEB5348
+- Reads: 50bp single-end
+- Conditions: wild-type (WT: ERR458493–495) vs. snf2 deletion mutant (snf2: ERR458500–502)
+
+## Pipeline stages
+
+| Order | File | Stage |
+|---|---|---|
+| 1 | `01-pvc.yaml` | Shared PersistentVolumeClaim for pipeline data and intermediate files |
+| 2 | `02-job-sra-download.yaml` | Downloads raw FASTQ reads from SRA/ENA |
+| 2b | `02b-job-sra-download-extra.yaml` | Downloads the remaining replicate samples |
+| 3 | `03-job-fastqc.yaml` | FastQC read quality control |
+| 4 | `04-job-star.yaml` | STAR alignment to the reference genome |
+| 4b | `04b-job-star-extra.yaml` | STAR alignment for the remaining replicate samples |
+| 5 | `05-job-featurecounts.yaml` | Gene-level count matrix from aligned reads |
+| 6 | `06-job-deseq2.yaml` | DESeq2 differential expression analysis (WT vs. snf2) |
+
+## Results
+
+- STAR alignment: ~85–90% mapping rate across samples
+- DESeq2 output visualized (volcano plot, etc.) in a Jupyter R notebook
+
+## Running
+
+Namespace: `rnaseq`. Jobs are sequential — each depends on the previous stage's output landing on the shared PVC, so apply and wait for completion before moving to the next:
+
+```bash
+kubectl apply -f 01-pvc.yaml
+kubectl apply -f 02-job-sra-download.yaml
+kubectl get jobs -n rnaseq -w          # wait for Completed before continuing
+kubectl apply -f 02b-job-sra-download-extra.yaml
+kubectl apply -f 03-job-fastqc.yaml
+kubectl apply -f 04-job-star.yaml
+kubectl apply -f 04b-job-star-extra.yaml
+kubectl apply -f 05-job-featurecounts.yaml
+kubectl apply -f 06-job-deseq2.yaml
+```