Abstract

Single-cell RNA sequencing has opened a window into the cellular diversity of living organisms at unprecedented resolution, but making sense of millions of cells across thousands of studies remains a formidable challenge. My PhD, completed in March 2026 at ENS and Institut Pasteur under the supervision of Laura Cantini and Gabriel Peyré, tackled this challenge by building foundation models for cell biology: large-scale, pre-trained transformer models capable of learning universal representations of transcriptomic data.

The central contribution is scPRINT, a foundation model trained on tens of millions of single cells that can denoise expression profiles, embed cells in a meaningful latent space, and, most distinctively, infer gene regulatory networks (GRNs) directly from single-cell data. Unlike prior methods, scPRINT leverages the full complexity of large-scale single-cell atlases to produce cell-type-specific regulatory landscapes, without requiring bulk ATAC-seq or ChIP-seq. scPRINT was published in Nature Communications (2025).

A parallel contribution, Xpressor, introduced a novel attention mechanism enabling cross-scale biological learning and parameter-efficient fine-tuning with applications from single-cell to protein embeddings.

Its successor, scPRINT-2, extends the framework with improved scalability and generalization, and is currently under revision at Nature Methods.

To make this work possible, I developed a full software ecosystem: scDataLoader for efficient multi-dataset training, GRnnData for storing and manipulating GRN data alongside single-cell objects, BenGRN for benchmarking GRN inference methods, Xpressor for expression prediction, and the scPRINT-2 model itself. Taken together, this thesis establishes a new paradigm for analyzing single-cell data through the lens of foundation models, with concrete biological insights into transcriptional regulation.

Ph.D. Proposal


The Thesis Manuscript


Packages

scPRINT scDataLoader GRNData BenGRN Xpressor scPRINT2

Blog Posts

These posts trace the arc of the PhD — the decision to start, the work in progress, and the finish line.


Technical Guides

During the PhD I wrote a few deep-dive guides on methods I used daily. These stand on their own as references.


Presentations

scPRINT @ISMB/ECCB

Outreach

First Year PhD Committee Presentation

scPRINT @GenBioAI

scPRINT @ValenceLabs

Hackathon using scPRINT @Owking @Servier

Outreach @ Pint of Science

I undertook many more presentations at a dozen conferences and companies, of not only scPRINT but also Xpressor (@IBM) and scPRINT-2 (@RAMH).