Enrichr, Prerank, GSEA or ssGSEA?
Bioinformatician’s main tool for discovery has often been differential expression analysis. But between Enrichr, Prerank, GSEA, and ssGSEA, which tool should you use? Here is the quick reminder X-plainer. 🧬
The Decision Tree
The key question is: what shape is your data?
Enrichr — you have a gene list, nothing else
Enrichr is when you just have a list of genes (can be small). No values, no conditions — just names. 📋
GENEA, GENEB, GENEC
Under the hood, Enrichr tests each gene set in its databases using a Fisher’s exact test (hypergeometric). It asks: is my list enriched for genes in pathway X more than expected by chance?
Best for: DE gene lists, hit lists from CRISPR screens, manually curated sets.
| 👉 Enrichr | Python: gseapy.enrichr |
Prerank — you have a ranked gene list
Prerank is when you have a continuous value per gene that you can rank — a fold change, a correlation, a t-statistic, anything. 📊
| Gene | Value |
|---|---|
| GENEA | 12 |
| GENEB | 8 |
| GENEC | 4 |
Prerank runs GSEA logic (the enrichment score / walking statistic) on your pre-ranked list, without needing raw expression data or phenotype labels. Useful when you already have a score but not the underlying samples.
Best for: correlation with a phenotype, output from another model, single-sample pseudo-bulk scores.
👉 Python: gseapy.prerank
GSEA — you have expression data with two conditions
GSEA works best when you have a matrix of gene expression values across multiple samples with a clear phenotype label (treated vs control, disease vs healthy, etc.). It computes its own gene ranking internally. 🔬
| Gene | C1 | C2 | C3 | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| GENEA | 12 | 7 | 3 | 1 | 1 | 0 |
| GENEB | 8 | 0 | 6 | 8 | 1 | 1 |
| GENEC | 4 | 4 | 3 | 2 | 3 | 4 |
The key advantage over Enrichr: GSEA doesn’t require you to define a hard cutoff (“top 200 DE genes”). It uses the full ranked list and identifies pathways enriched at the top or bottom. This makes it more sensitive and less arbitrary. ✅
Best for: bulk RNA-seq, any two-condition comparison with replicates (n ≥ 3 per group recommended).
| 👉 GSEA software | Python: gseapy.gsea |
ssGSEA — you want a per-sample enrichment score
ssGSEA is for when you have many samples with no clear two-group contrast — or when you want a continuous enrichment score per sample rather than a comparison between groups. 🗂️
| Gene | A | B | C | D | E | F |
|---|---|---|---|---|---|---|
| GENEA | 12 | 7 | 3 | 1 | 1 | 0 |
| GENEB | 8 | 0 | 6 | 8 | 1 | 1 |
| GENEC | 4 | 4 | 3 | 2 | 3 | 4 |
Each sample gets its own enrichment score for each pathway, independently. The output is a sample × pathway matrix. Great for downstream analysis — clustering, survival analysis, correlating pathway activity with other variables.
Best for: large cohorts (TCGA, GTEx), single-cell pseudo-bulk, any analysis where you want pathway activity as a continuous feature.
👉 Python: gseapy.ssgsea
Quick summary table
| Tool | Input | Statistics | Best for |
|---|---|---|---|
| Enrichr | Gene list only | Fisher / hypergeometric | Small lists, no values |
| Prerank | Genes + score | GSEA walking statistic | Pre-computed rankings |
| GSEA | Expression matrix + 2 conditions | GSEA walking statistic | Bulk RNA-seq DE |
| ssGSEA | Expression matrix, no labels | Per-sample enrichment | Large cohorts, per-sample scores |
For most single-cell work: compute pseudo-bulk, run GSEA or Prerank per cell type. For single-cell pathway scoring directly, decoupleR is worth a look. 🔍
Leave a comment