rnaseq-pipeline/README.md

# RNA-seq DESeq2 Pipeline

A sequential Kubernetes Job pipeline for differential expression analysis on yeast RNA-seq data. Each stage runs as a one-shot Job against a shared PVC, in order.

## Dataset

- Source: Gierliński et al., ENA accession PRJEB5348
- Reads: 50bp single-end
- Conditions: wild-type (WT: ERR458493–495) vs. snf2 deletion mutant (snf2: ERR458500–502)

## Pipeline stages

| Order | File | Stage |
|---|---|---|
| 1 | `01-pvc.yaml` | Shared PersistentVolumeClaim for pipeline data and intermediate files |
| 2 | `02-job-sra-download.yaml` | Downloads raw FASTQ reads from SRA/ENA |
| 2b | `02b-job-sra-download-extra.yaml` | Downloads the remaining replicate samples |
| 3 | `03-job-fastqc.yaml` | FastQC read quality control |
| 4 | `04-job-star.yaml` | STAR alignment to the reference genome |
| 4b | `04b-job-star-extra.yaml` | STAR alignment for the remaining replicate samples |
| 5 | `05-job-featurecounts.yaml` | Gene-level count matrix from aligned reads |
| 6 | `06-job-deseq2.yaml` | DESeq2 differential expression analysis (WT vs. snf2) |

## Results

- STAR alignment: ~85–90% mapping rate across samples
- DESeq2 output visualized (volcano plot, etc.) in a Jupyter R notebook

## Running

Namespace: `rnaseq`. Jobs are sequential — each depends on the previous stage's output landing on the shared PVC, so apply and wait for completion before moving to the next:

```bash
kubectl apply -f 01-pvc.yaml
kubectl apply -f 02-job-sra-download.yaml
kubectl get jobs -n rnaseq -w          # wait for Completed before continuing
kubectl apply -f 02b-job-sra-download-extra.yaml
kubectl apply -f 03-job-fastqc.yaml
kubectl apply -f 04-job-star.yaml
kubectl apply -f 04b-job-star-extra.yaml
kubectl apply -f 05-job-featurecounts.yaml
kubectl apply -f 06-job-deseq2.yaml
```