Inferring Single-Cell Spatial Transcriptomics from H&E Histology via Deep Learning Approaches

Ashank Shah

Lay Summary:

This study developed machine-learning methods to predict gene activity from tissue images, as an alternative to costly and inaccessible alternatives of generating gene activity data. We found that machine learning methods, especially those that attempted to incorporate broader tissue information, showed promise in gene activity prediction despite working with a limited dataset.

Abstract:

Spatially localized gene expression, or spatial transcriptomics, at the single-cell granularity is a vastly informative data type that facilitates enhanced clinical and research outcomes, including in tasks of diagnoses, biomarker identification, and tumor microenvironment analyses. Technologies such as 10x Genomics Xenium Analyzer enable generation of single-cell spatial transcriptomics data. However, assays remain prohibitively expensive, largely inaccessible, and often require specialized expertise to perform. Conversely, H&E histology is a widely adopted practice that clearly delineates morphological features in tissue. With the emergence of numerous deep learning-based methods for data translation, this work aimed to evaluate the feasibility of single-cell gene expression inference from corresponding H&E histology, specifically for spatially variable genes (SVGs) using a limited single-patient open source dataset. We develop and evaluate the relative efficacy of six modeling approaches- a Convolutional Neural Network (CNN), a fine-tuned ResNet50 architecture, a resized and fine-tuned ResNet50 architecture, an Autoencoder-Embedded Graph Convolutional Network, a CNN-Embedded Graph Convolutional Network, and a ResNet50-Embedded Graph Convolutional Network. We find that the Autoencoder-Embedded Graph Convolutional Network is most effective at single-cell SVG inference with an average Spearman Correlation Coefficient (SCC) of 0.163 despite the small-scale dataset. Thus, we conclude that incorporating relative global information opposed to solely localized cell information enables more informed and robust gene expression predictions. We additionally find that specific genes are more strongly correlated with morphological features, and by extension, more predictable via deep learning translation approaches. Namely, our models exhibit relatively highly accurate gene expression predictions for gene SFRP4, with an SCC of 0.503 via an Autoencoder-Embedded Graph Convolutional Network approach. We conclude that deep learning methods do have potential in single-cell spatial transcriptomics prediction, and urge scaling the developed methods to larger and more diverse datasets.

Q&A:

Bios: Ashank Shah

Program Track: Mentor

GitHub Username:

ashankshah -Ashank Shah

What was your favorite seminar? Why?

My favorite talk of the summer was Ryan Urbanowicz’s on STREAMLINE. The talk was my first introduction to Auto Machine Learning pipelines, and the discussion of its applications in biomedical contexts were especially interesting. -Ashank Shah

If you were to summarize your summer internship experience in one sentence, what would it be?

A continued learning experience full of collaboration, discussion, and hands-on opportunities. -Ashank Shah