Abstract
Single-cell RNA sequencing has opened a window into the cellular diversity of living organisms at unprecedented resolution, but making sense of millions of cells across thousands of studies remains a formidable challenge. My PhD, completed in March 2026 at ENS and Institut Pasteur under the supervision of Laura Cantini and Gabriel Peyré, tackled this challenge by building foundation models for cell biology: large-scale, pre-trained transformer models capable of learning universal representations of transcriptomic data.
The central contribution is scPRINT, a foundation model trained on tens of millions of single cells that can denoise expression profiles, embed cells in a meaningful latent space, and, most distinctively, infer gene regulatory networks (GRNs) directly from single-cell data. Unlike prior methods, scPRINT leverages the full complexity of large-scale single-cell atlases to produce cell-type-specific regulatory landscapes, without requiring bulk ATAC-seq or ChIP-seq. scPRINT was published in Nature Communications (2025).
A parallel contribution, Xpressor, introduced a novel attention mechanism enabling cross-scale biological learning and parameter-efficient fine-tuning with applications from single-cell to protein embeddings.
Its successor, scPRINT-2, extends the framework with improved scalability and generalization, and is currently under revision at Nature Methods.
To make this work possible, I developed a full software ecosystem: scDataLoader for efficient multi-dataset training, GRnnData for storing and manipulating GRN data alongside single-cell objects, BenGRN for benchmarking GRN inference methods, Xpressor for expression prediction, and the scPRINT-2 model itself. Taken together, this thesis establishes a new paradigm for analyzing single-cell data through the lens of foundation models, with concrete biological insights into transcriptional regulation.
The Thesis Manuscript
Packages
Blog Posts
These posts trace the arc of the PhD — the decision to start, the work in progress, and the finish line.
- 🚀 The PhD Decision: GRNs & Foundation Models — Why I left industry to pursue this PhD, and what I hoped to build (Oct 2023)
- 🧬 Ancestry Bias in CRISPR Screens — A paper on population bias in CRISPR libraries, published in Nature Communications (Jun 2024)
- 📅 A Year in the PhD — Reflections after the first year: what I learned, what surprised me (Sep 2024)
- 🤖 About the AIVC Paper — Commentary on the AI Virtual Cell initiative and where it sits in the landscape (Dec 2024)
- 🧫 VCC Starter Pack — A practical guide to working with Virtual Cell Concepts (Oct 2025)
- 🏁 Finishing the PhD — The final stretch: writing up, defending, and what comes next (Mar 2026)
Technical Guides
During the PhD I wrote a few deep-dive guides on methods I used daily. These stand on their own as references.
- 📊 Enrichment Analysis: Enrichr, PreRank & GSEA — A practical walkthrough of gene set enrichment methods (Feb 2024)
- 📐 AUPRC vs Average Precision — Clarifying the difference between two commonly confused classification metrics (Jun 2024)
- 🕸️ Gene Regulatory Networks: what they are and how to use them — A conceptual and practical guide to GRNs in single-cell biology (Jun 2024)
- 🔬 What are Large Cell Models? — Defining the emerging class of foundation models for biology (Sep 2024)
Presentations
scPRINT @ISMB/ECCB
Outreach
First Year PhD Committee Presentation
scPRINT @GenBioAI
scPRINT @ValenceLabs
Hackathon using scPRINT @Owking @Servier
Outreach @ Pint of Science
I undertook many more presentations at a dozen conferences and companies, of not only scPRINT but also Xpressor (@IBM) and scPRINT-2 (@RAMH).





