Gautham Agilan, Arjun Garg & Shaurya Bisht

A Deep Learning Framework for Automated Pap Smear Analysis in Cervical Cancer Screening

Gautham Agilan, Arjun Garg & Shaurya Bisht



Lay Summary:

This project built a deep learning framework to preprocess, classify, and cluster Pap smear whole-slide images for cervical cancer screening. The approach automates squamous cell detection using a ResNet-50 classifier and exploratory clustering, aiming to improve consistency and efficiency in screening workflows.

Abstract:

Cervical cancer remains the second most prevalent cancer among women worldwide, with over 600,000 new cases annually. Early detection through cytopathology is critical, yet manual screening is both labor-intensive and subject to variability in staining and interpretation. This project develops a deep learning framework designed to automate the classification and clustering of Pap smear whole-slide images (WSIs), with the goal of reducing pathologist workload while improving diagnostic consistency. The pipeline begins with preprocessing, where WSIs are tiled into 1024×1024 patches and standardized using Macenko’s method to minimize staining variability. From an annotated Dartmouth dataset of 38 WSIs, approximately 5,000 patches were extracted and labeled as squamous or non-squamous. A ResNet-50 model, pre-trained on ImageNet and fine-tuned for binary classification, achieved strong performance on the test set (accuracy- 0.87, precision- 0.86, recall- 0.88, AUC- 0.92). To further explore morphological diversity, ResNet-derived feature vectors were reduced with UMAP and clustered using K-means, enabling the identification of squamous subtypes and artifacts. Beyond classification and clustering, the framework lays the foundation for more comprehensive screening capabilities. Planned extensions include integrating YOLOv5-based detection for cell localization and U-Net segmentation for pixel-level delineation of squamous regions. These additions aim to provide a hybrid workflow, combining efficient patch-level classification with precise localization to support real-time, clinically viable screening. This work demonstrates the feasibility of applying deep learning to automate core tasks in cervical cancer screening, offering a scalable approach that can complement existing AI-based methods such as Smart-CCS and NCI’s dual-stain AI. Future efforts will focus on validating the pipeline with external datasets, refining model performance with advanced architectures such as Vision Transformers, and collaborating with clinicians to assess its practical integration into diagnostic workflows.



Q&A:


Bios: Gautham Agilan,Arjun Garg,Shaurya Bisht

Program Track: Advanced Research

GitHub Username:

Gautham-A10 -Gautham Agilan

arjungarg95 -Arjun Garg

https://github.com/bshaurya -Shaurya Bisht

What was your favorite seminar? Why?

My favorite seminar was “Investigating the Role of Neurotoxic Metals in ALS Pathogenesis” by Vismay and someone else. I found the methodology and implications to be really interesting. -Gautham Agilan

My favorite seminar was Zarif’s talk on multimodal modeling and research entrepreneurship. I really appreciated how he not only explained the technical aspects of multimodal AI, but also tied them to the process of building something new outside of academia. It gave me a fresh perspective on how research ideas can move beyond publications and actually translate into real-world impact, which was both inspiring and practical to hear as someone early in their career -Arjun Garg

My favorite seminar was the one about Polygenic risk models in cancer by Paul Pharoah because it connected statistical genetics with real-world applications in oncology, showing how data-driven approaches can help stratify cancer risk and guide early interventions which i found to be an exciting intersection of computation and medicine. -Shaurya Bisht

If you were to summarize your summer internship experience in one sentence, what would it be?

It was a little overwhelming at first, but I learned a lot about clustering/classification techniques (kmeans, use of resnet and yolov8, binary classifier implementation). -Gautham Agilan

My summer internship at EDIT AI was an incredible opportunity to apply deep learning to real-world medical imaging challenges, while learning from mentors and collaborating with peers to build impactful solutions. -Arjun Garg

My summer internship was an eye-opening experience where I developed research skills, engaged with cutting-edge seminars, met and learned from mentors and peers, and learned the importance of collaboration in advancing science. -Shaurya Bisht