Om Patel & Elijah Renner

Determining the Effects of Domain-Specific Pretraining of Long-Context Transformer Encoder Models for Automated CPT Code Assignment in Cedars-Sinai Pathology Reports

Om Patel & Elijah Renner



Lay Summary:

Hospitals use standardized CPT codes to bill for surgical and pathology services; coding is typically done manually and then submitted to the revenue cycle team, but we propose an automated NLP system to assign the primary CPT code directly from full pathology reports.

Abstract:

Accurate CPT coding is vital to the finances of both pathology departments and patients. Previous approaches to CPT coding have leveraged feature extraction techniques like the transformer encoder to create representations of reports that can be used to assign CPT codes. However, the use of long-context encoders, which allow pathology reports to be processed without truncation, remains underexplored. Moreover, pretraining on additional pathology reports has not been explored. To address these gaps, we developed an effective long-context transformer encoder trained on over 233,000 unique pathology reports capable of coding pathology reports at 95.75% accuracy with an F1 of 0.8912 and showed through empirical analysis that pretraining models on additional corpora with representation overlap improves coding performance. We also provide systematic comparisons against traditional baselines (Naive Bayes, random forests, XGBoost) to show why transformers are necessary. Finally, we show our models’ predictions are efficient, interpretable, and grounded in clinical knowledge, supporting their potential for real-world deployment. This work demonstrates solutions toward more accurate, generalizable, and trustworthy automation in medical coding with direct implications for efficiency and reimbursement accuracy.



Q&A:


Bios: Om Patel,Elijah Renner

Program Track: Advanced Research

GitHub Username:

OmmyPatalonian -Om Patel

elijahrenner -Elijah Renner

What was your favorite seminar? Why?

I loved Ken Lau’s hallmarks of precancer seminar presentation. He spoke on metabolomics and using it to look at ion images for tumorous tissue, which I have actually physically done in a lab here at the University of Pennsylvania. I asked him a question on using temporal analysis of tumor development and using the different ion image to see if you could make a predictive model with that. I forgot exactly what he said, but he did reinforce the fact that we need enough of that data, and slicing a tumor open without embolizing it is sort of difficult (obviously). He seemed very genuine and a nice person. Hopefully I might meet him if I get into Vanderbilt. -Om Patel

I enjoyed Lou’s introduction to pathology seminar. It’s clear he’s incredibly fascinated by his work -Elijah Renner

If you were to summarize your summer internship experience in one sentence, what would it be?

Extremely fast paced but exhilaratingly fun. -Om Patel

Learning to move fast and iterate quickly when experimenting. -Elijah Renner