Tanay Janmanchi, Avilash Angirekula, Matthew Spektor & Dennis Li

Bios: Tanay Janmanchi,Avilash Angirekula,Matthew Spektor,Dennis Li

Program Track: Skills Development

GitHub Username:

tjanamnchi2 -Tanay Janmanchi

aviangirekula -Avilash Angirekula

MatthewSpektor -Matthew Spektor

denden511 -Dennis Li

What was your favorite seminar? Why?

I liked the seminar on how to write a manuscript. I thought it could be very useful to anyone in the field of research, not just AI or Pathology. -Tanay Janmanchi

My favorite was the one where we discussed Detection and Segmentation for Medical Images. Segmentation of medical images is also directly related to my project and I was interested in the segmentation of medical images and how it relates to detecting diseases. -Avilash Angirekula

My favorite seminar was Kevin Cornell’s project on Environmental Risk Factors of ALS. This was because I was interested in how a disease like ALS could be affecting the environment and what kind of solutions would be necessary to stop it. -Matthew Spektor

Polygenic risk models in cancer from Paul Pharoah. I found that his speaking was particularly clear for some reason and that the way he described certain things, such as how the National cancer registry data for England was split into 3 subsets, was engaging. I was also intrigued by the website he showed us near the end of the seminar that attempted to predict cancer. -Dennis Li

If you were to summarize your summer internship experience in one sentence, what would it be?

I learned a lot while also contributing meaningfully to healthcare issues. -Tanay Janmanchi

Research backed learning to help learn the fundamentals of machine learning. -Avilash Angirekula

It was an amazing experience that not only introduced me to the complex and fascinating world of machine learning in medicine but also helped me understand this world alongside amazing fellow students and mentors. -Matthew Spektor

A great time working with others and a beautiful learning experience with knowledgable mentors that I wouldn’t have a problem working with again. -Dennis Li

Blog Post

Abstract
Bladder cancer is common and highly recurrent, so non-invasive screening must be both sensitive and scalable. Urine cytology remains a mainstay, but manual reads vary across stain quality, preparation, and readers.

The nucleus-to-cytoplasm (N/C) area ratio correlates with malignancy and requires precise segmentation of nucleus and cytoplasm. This project builds an end-to-end pipeline that segments nucleus and cytoplasm in single-cell urothelial images using classical and UNet-based methods, computes per-cell N/C ratios, and combines the cell results for each sample to see how they relate to the diagnosis (negative, atypical, suspicious, positive). We emphasize reproducibility and evaluation that links pixel-level segmentation to a clinically meaningful biomarker, allowing for faster, more consistent bladder-cancer screening.

Introduction
Urine cytology lets clinicians look for signs of bladder cancer without surgery. One of the most reliable visual clues is whether a cell’s nucleus is large compared to its cytoplasm, the N/C ratio.

Currently, this judgment is made by experts looking down a microscope, which takes time and can vary from person to person. Our goal is to turn this visual clue into a consistent number for each cell, using image analysis and deep learning. We segment each cell into nucleus and cytoplasm, compute the N/C ratio automatically, and summarize these values across all cells in a patient’s specimen. The result is a quantitative readout that can help standardize screening, speed up reviews, and make results easier to compare across labs.

The Challenge
● Urine contains urothelial and inflammatory cells, contaminants, debris, and variable stain/focus; and boundaries are often unclear.

● Touching/overlapping cells and low contrast break simple thresholding rules
● Manual review is slow and inconsistent; small differences in making boundaries can change the N/C ratio
● Differences in preparation, scanning, and labs can degrade algorithm performance if not addressed

Methodology

Datasets
● Urothelial Cell Dataset (Model Training): ~300 single-cell images with expert nucleus and cytoplasm masks.

○ Used to train and benchmark segmentation approaches and to compute the per-cell nuclear-to-cytoplasmic (N/C) ratio.

● Specimen Cell Dataset (Clinical Validation): ~25 cells per patient, grouped by diagnostic category (negative, atypical, suspicious, positive).

○ Used to examine if N/C ratios reflect diagnostic categories.

Preprocessing & Feature Engineering
● Standardization: Convert to RGB (and grayscale when needed for intensity baselines), resize, normalize intensities.

● Feature Extraction:
○ Gabor filter bank: highlights local texture and frequency patterns across orientations/scales.

○ First/second-order statistics: local mean, variance, standard deviation, (optionally entropy) computed over patches to summarize the texture.

○ Morphology cues (post-mask): area, perimeter, and simple shape descriptors from masks to stabilize N/C estimation.

● Modeling
○ K-Means Clustering:
■ Unsupervised clustering on Gabor + color/texture features to propose nucleus vs. cytoplasm regions; refine with basic morphology
(opening/closing, small-object removal).

○ Random Forest (RF):
■ Trained on extracted features to predict N/C ratio or to classify pixels/regions; used as a transparent comparison to deep learning.

○ UNet (deep learning):
■ Encoder–decoder network) that outputs two channels (nucleus, cytoplasm).

■ Produces cleaner, more stable masks under stain/overlap variability than classical baselines.

● Evaluation
○ Segmentation quality: qualitative overlays
○ Biomarker fidelity: absolute error and correlation between predicted and mask-derived N/C at the cell level.

○ Clinical utility: analyze the distribution of aggregated N/C across diagnosis categories
Results
Segmentation and Feature Output
● Each implemented method, UNet, KMeans, and Random Forest produced pixel-level outputs interpretable as the likelihood that a pixel belongs to nucleus or cytoplasm ● Low values corresponded to background/empty regions; high values corresponded to dense cell structures.

● Qualitative overlays (nucleus = red, cytoplasm = green) allowed for quick visual assurance of boundary quality and failure modes (overlaps, debris, low contrast).

Comparative Performance & Model Insights
● The UNet model consistently returned cleaner, more contiguous masks than the classical baselines, especially around complex or overlapping boundaries and in variable staining conditions.

● KMeans + feature maps (Gabor + basic first/second-order statistics + color) provided transparent baselines and captured structure but had noisier masks and less stable cytoplasm boundaries, which impacted downstream N/C calculations.

● Where computed, Dice/IoU favored the UNet across both nucleus and cytoplasm channels. Classical baselines were competitive on high-contrast, well-separated cells but degraded in harder cases.

N/C Ratio Calculation & Diagnostic Trends
● Per-cell N/C values were aggregated by specimen (≈25 cells/patient) by mean/median; dispersion (IQR/SD) was tracked to reflect within-specimen heterogeneity.

● UNet-derived N/C ratios aligned most closely with mask-based ground truth, showing reduced variance and better separation between diagnostic groups compared with classical baselines.

Clinical Relevance and Interpretation
● Clear differences in aggregated N/C distributions emerged across diagnostic categories; suspicious/positive specimens trended toward higher N/C, while negative/atypical trended lower.

● The combination of quantitative N/C and overlays linked predictions to visible
morphology, improving reviewer confidence and supporting the use of triage/QA ● The results suggest that deep learning segmentation improves the reliability of the N/C biomarker over classical feature-based methods, strengthening its potential use in bladder-cancer screening workflows.

Conclusion
Potential for future impact
● Provides a quantitative, biomarker (N/C ratio) to support bladder-cancer screening, reducing subjectivity and variability.

○ Promotes more consistent diagnostics across readers, labs, and scanners.

● Functions as a decision-support and QA/triage aid with interpretable overlays showing nucleus vs. cytoplasm.

● Framework is extensible to other cytology workflows (e.g., cervical or lung cytology) beyond bladder screening.

Limitations
● Data scale: ~300 labeled cells for training and ~25 cells/patient for validation may limit generalizability and risk overfitting (even with UNet and KMeans/RF baselines).

● Specimen variability: Differences in preparation, staining, imaging, and focus can reduce reproducibility across sites.

○ Overlapping cells, debris, and low-contrast cytoplasm boundaries remain challenging.

● Single-biomarker focus: N/C alone may not capture all morphological cues (shape, chromatin texture); performance may plateau without richer features and external validation.

Future Directions
● Go beyond N/C with texture, chromatin density, and multi-feature risk scores. ● Integrate automatic cell detection and add stain-robust normalization and domain adaptation for multi-site generalization.

● Evaluate specimen-level models and compare against simple N/C aggregation baselines.

● Expand to multi-class cytology labels (negative, atypical, suspicious, positive)

Acknowledgements

EDIT AI Team, DPLM, Pathology Shared Resource, DCC @ DHMC, Dr. Levy, Dr. Pujara, Dr. Diallo.

Tanay Janmanchi, Avilash Angirekula, Matthew Spektor & Dennis Li

Automated Bladder Cancer Screening with Deep Learning Algorithms

Tanay Janmanchi, Avilash Angirekula, Matthew Spektor & Dennis Li

Lay Summary:

Abstract:

Q&A:

Blog Post