Spatial Transcriptomics-Conditioned Latent Diffusion Models for Synthetic Histopathology Tissue Patch Image Generation
Arnav Chaphalkar
Lay Summary:
We developed an AI system that can create highly realistic images of cancer tissue by combining DNA-level information about which genes are active in a tumor with cutting-edge image generation technology. This breakthrough could help doctors and researchers study cancer in new ways, expand access to critical data, and ultimately improve diagnosis and treatment for patients.
Abstract:
Whole slide images (WSIs) provide rich morphological information for cancer diagnosis and prognosis, but their size and complexity make data curation costly and limit algorithmic performance in underrepresented populations. Synthetic histology offers a potential solution by augmenting datasets and simulating tissue heterogeneity, yet existing generative approaches struggle to reproduce biologically meaningful features of the tumor microenvironment (TME). Models conditioned on bulk or single-cell transcriptomics lack spatial resolution and fail to capture gradients of immune infiltration, stromal remodeling, or hypoxia. We present a spatial transcriptomics–conditioned latent diffusion framework for histology generation, focusing on colorectal cancer. Paired hematoxylin and eosin (H&E) WSIs and Visium transcriptomic profiles from 41 tumors yielded 255,744 spatially matched tissue patches (512 × 512 pixels) with corresponding gene expression profiles. To overcome the limitations of prior “top-gene” tokenization methods, we developed a direct expression-to-embedding encoder (GeneCondNet) that projects continuous expression vectors into the Contrastive Language-Image Pre-training (CLIP) embedding space of Stable Diffusion, enabling transcriptomic signals to condition the UNet cross-attention layers. The model was trained for 100,000 iterations with a batch size of 64 on an HPC cluster with NVIDIA A100 GPUs. Generated patches achieved a Frechet Inception Distance (FID) of ~55, consistent with histopathology benchmarks when computed against ImageNet-based features. Qualitative inspection confirmed that most images plausibly captured tumor, stromal, and immune compartments, though occasional imperfections remained. Compared to previous approaches, the new encoder improved conditioning fidelity and preserved biologically relevant variation across the tumor microenvironment (TME). This study represents the first demonstration of directly integrating spatial transcriptomics into diffusion-based histology generation. While further validation with blinded pathologist review and biologically grounded evaluation metrics will be required, expanding beyond patch-level synthesis to full WSIs is a key next step toward dataset augmentation, improved prognostic modeling, and precision oncology.
Q&A:
Bios: Arnav Chaphalkar
Program Track: Advanced Research
GitHub Username:
rt5332 -Arnav Chaphalkar
What was your favorite seminar? Why?
My favorite seminar this summer was Dr. Ken Lau’s. I found his work on precancer fascinating, especially because it introduced a perspective I hadn’t considered before. What stood out most to me was the real-world applicability of his research and how far he is into the process. I truly enjoyed learning from him! -Arnav Chaphalkar
If you were to summarize your summer internship experience in one sentence, what would it be?
An amazing learning experience that allowed me to explore new approaches, challenge myself, and collaborate with brilliant people on meaningful research. -Arnav Chaphalkar