Initial RNA-seq DESeq2 pipeline manifests

This commit is contained in:
2026-06-21 06:39:53 -04:00
commit 395c18a9e6
9 changed files with 431 additions and 0 deletions
+43
View File
@@ -0,0 +1,43 @@
# RNA-seq DESeq2 Pipeline
A sequential Kubernetes Job pipeline for differential expression analysis on yeast RNA-seq data. Each stage runs as a one-shot Job against a shared PVC, in order.
## Dataset
- Source: Gierliński et al., ENA accession PRJEB5348
- Reads: 50bp single-end
- Conditions: wild-type (WT: ERR458493495) vs. snf2 deletion mutant (snf2: ERR458500502)
## Pipeline stages
| Order | File | Stage |
|---|---|---|
| 1 | `01-pvc.yaml` | Shared PersistentVolumeClaim for pipeline data and intermediate files |
| 2 | `02-job-sra-download.yaml` | Downloads raw FASTQ reads from SRA/ENA |
| 2b | `02b-job-sra-download-extra.yaml` | Downloads the remaining replicate samples |
| 3 | `03-job-fastqc.yaml` | FastQC read quality control |
| 4 | `04-job-star.yaml` | STAR alignment to the reference genome |
| 4b | `04b-job-star-extra.yaml` | STAR alignment for the remaining replicate samples |
| 5 | `05-job-featurecounts.yaml` | Gene-level count matrix from aligned reads |
| 6 | `06-job-deseq2.yaml` | DESeq2 differential expression analysis (WT vs. snf2) |
## Results
- STAR alignment: ~8590% mapping rate across samples
- DESeq2 output visualized (volcano plot, etc.) in a Jupyter R notebook
## Running
Namespace: `rnaseq`. Jobs are sequential — each depends on the previous stage's output landing on the shared PVC, so apply and wait for completion before moving to the next:
```bash
kubectl apply -f 01-pvc.yaml
kubectl apply -f 02-job-sra-download.yaml
kubectl get jobs -n rnaseq -w # wait for Completed before continuing
kubectl apply -f 02b-job-sra-download-extra.yaml
kubectl apply -f 03-job-fastqc.yaml
kubectl apply -f 04-job-star.yaml
kubectl apply -f 04b-job-star-extra.yaml
kubectl apply -f 05-job-featurecounts.yaml
kubectl apply -f 06-job-deseq2.yaml
```