Bioinformatician’s main tool for discovery has often been differential expression analysis. But between Enrichr, Prerank, GSEA, and ssGSEA, which tool should you use? Here is the quick reminder X-plainer. 🧬
The key question is: what shape is your data?
Enrichr is when you just have a list of genes (can be small). No values, no conditions — just names. 📋
GENEA, GENEB, GENEC
Under the hood, Enrichr tests each gene set in its databases using a Fisher’s exact test (hypergeometric). It asks: is my list enriched for genes in pathway X more than expected by chance?
Best for: DE gene lists, hit lists from CRISPR screens, manually curated sets.
| 👉 Enrichr | Python: gseapy.enrichr |
Prerank is when you have a continuous value per gene that you can rank — a fold change, a correlation, a t-statistic, anything. 📊
| Gene | Value |
|---|---|
| GENEA | 12 |
| GENEB | 8 |
| GENEC | 4 |
Prerank runs GSEA logic (the enrichment score / walking statistic) on your pre-ranked list, without needing raw expression data or phenotype labels. Useful when you already have a score but not the underlying samples.
Best for: correlation with a phenotype, output from another model, single-sample pseudo-bulk scores.
👉 Python: gseapy.prerank
GSEA works best when you have a matrix of gene expression values across multiple samples with a clear phenotype label (treated vs control, disease vs healthy, etc.). It computes its own gene ranking internally. 🔬
| Gene | C1 | C2 | C3 | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| GENEA | 12 | 7 | 3 | 1 | 1 | 0 |
| GENEB | 8 | 0 | 6 | 8 | 1 | 1 |
| GENEC | 4 | 4 | 3 | 2 | 3 | 4 |
The key advantage over Enrichr: GSEA doesn’t require you to define a hard cutoff (“top 200 DE genes”). It uses the full ranked list and identifies pathways enriched at the top or bottom. This makes it more sensitive and less arbitrary. ✅
Best for: bulk RNA-seq, any two-condition comparison with replicates (n ≥ 3 per group recommended).
| 👉 GSEA software | Python: gseapy.gsea |
ssGSEA is for when you have many samples with no clear two-group contrast — or when you want a continuous enrichment score per sample rather than a comparison between groups. 🗂️
| Gene | A | B | C | D | E | F |
|---|---|---|---|---|---|---|
| GENEA | 12 | 7 | 3 | 1 | 1 | 0 |
| GENEB | 8 | 0 | 6 | 8 | 1 | 1 |
| GENEC | 4 | 4 | 3 | 2 | 3 | 4 |
Each sample gets its own enrichment score for each pathway, independently. The output is a sample × pathway matrix. Great for downstream analysis — clustering, survival analysis, correlating pathway activity with other variables.
Best for: large cohorts (TCGA, GTEx), single-cell pseudo-bulk, any analysis where you want pathway activity as a continuous feature.
👉 Python: gseapy.ssgsea
| Tool | Input | Statistics | Best for |
|---|---|---|---|
| Enrichr | Gene list only | Fisher / hypergeometric | Small lists, no values |
| Prerank | Genes + score | GSEA walking statistic | Pre-computed rankings |
| GSEA | Expression matrix + 2 conditions | GSEA walking statistic | Bulk RNA-seq DE |
| ssGSEA | Expression matrix, no labels | Per-sample enrichment | Large cohorts, per-sample scores |
For most single-cell work: compute pseudo-bulk, run GSEA or Prerank per cell type. For single-cell pathway scoring directly, decoupleR is worth a look. 🔍