2 minute read

Bioinformatician’s main tool for discovery has often been differential expression analysis. But between Enrichr, Prerank, GSEA, and ssGSEA, which tool should you use? Here is the quick reminder X-plainer. 🧬

The Decision Tree

The key question is: what shape is your data?


Enrichr — you have a gene list, nothing else

Enrichr is when you just have a list of genes (can be small). No values, no conditions — just names. 📋

GENEA, GENEB, GENEC

Under the hood, Enrichr tests each gene set in its databases using a Fisher’s exact test (hypergeometric). It asks: is my list enriched for genes in pathway X more than expected by chance?

Best for: DE gene lists, hit lists from CRISPR screens, manually curated sets.

👉 Enrichr Python: gseapy.enrichr

Prerank — you have a ranked gene list

Prerank is when you have a continuous value per gene that you can rank — a fold change, a correlation, a t-statistic, anything. 📊

Gene Value
GENEA 12
GENEB 8
GENEC 4

Prerank runs GSEA logic (the enrichment score / walking statistic) on your pre-ranked list, without needing raw expression data or phenotype labels. Useful when you already have a score but not the underlying samples.

Best for: correlation with a phenotype, output from another model, single-sample pseudo-bulk scores.

👉 Python: gseapy.prerank


GSEA — you have expression data with two conditions

GSEA works best when you have a matrix of gene expression values across multiple samples with a clear phenotype label (treated vs control, disease vs healthy, etc.). It computes its own gene ranking internally. 🔬

Gene C1 C2 C3 D1 D2 D3
GENEA 12 7 3 1 1 0
GENEB 8 0 6 8 1 1
GENEC 4 4 3 2 3 4

The key advantage over Enrichr: GSEA doesn’t require you to define a hard cutoff (“top 200 DE genes”). It uses the full ranked list and identifies pathways enriched at the top or bottom. This makes it more sensitive and less arbitrary. ✅

Best for: bulk RNA-seq, any two-condition comparison with replicates (n ≥ 3 per group recommended).

👉 GSEA software Python: gseapy.gsea

ssGSEA — you want a per-sample enrichment score

ssGSEA is for when you have many samples with no clear two-group contrast — or when you want a continuous enrichment score per sample rather than a comparison between groups. 🗂️

Gene A B C D E F
GENEA 12 7 3 1 1 0
GENEB 8 0 6 8 1 1
GENEC 4 4 3 2 3 4

Each sample gets its own enrichment score for each pathway, independently. The output is a sample × pathway matrix. Great for downstream analysis — clustering, survival analysis, correlating pathway activity with other variables.

Best for: large cohorts (TCGA, GTEx), single-cell pseudo-bulk, any analysis where you want pathway activity as a continuous feature.

👉 Python: gseapy.ssgsea


Quick summary table

Tool Input Statistics Best for
Enrichr Gene list only Fisher / hypergeometric Small lists, no values
Prerank Genes + score GSEA walking statistic Pre-computed rankings
GSEA Expression matrix + 2 conditions GSEA walking statistic Bulk RNA-seq DE
ssGSEA Expression matrix, no labels Per-sample enrichment Large cohorts, per-sample scores

For most single-cell work: compute pseudo-bulk, run GSEA or Prerank per cell type. For single-cell pathway scoring directly, decoupleR is worth a look. 🔍

Leave a comment