Identifying Skin Cancer using Machine Learning
Ryan Guo
Lay Summary:
I would describe my project as an easier way to identify whether you have skin cancer or not. My work uses basic machine learning algorithms with high accuracy rates to determine whether a person has skin cancer or not based off of features of the person's skin.
Abstract:
Skin cancer is one of the most common cancers in the United States. Early detection of skin cancer can drastically improve mortality rates. However, traditional methods are time consuming and relatively painful. Artificial Intelligence and Machine learning play a big role in early skin cancer detection. I used data from the PH^2 Database, part of the Automatic Computer-Based Diagnosis System for Dermoscopy Images Project, which contained features and traits of 160 moles and 40 melanomas.I made multiple machine learning models to see which one would have the highest accuracy rate. The models I created were Decision Trees, Random Forest, Support Vector Machine, Naive Bayes, and KMeans. All the models performed well, with the decision trees, random forest and Kmeans having the highest accuracy rate of 92.5%, 92.5%, and 90% respectively. Further work includes increasing the accuracy rate by tweaking models and implementing hyperparameters.
Q&A:
Bios: Ryan Guo
Program Track: Skills Development
GitHub Username:
ryan-g27 -Ryan Guo
What was your favorite seminar? Why?
submitted in first submission -Ryan Guo
If you were to summarize your summer internship experience in one sentence, what would it be?
submitted in first submission -Ryan Guo
Blog Post
Identifying Skin Cancer using Machine Learning
Ryan Guo
Skills Development Track
Skin cancer is one of the most common cancers in the United States, affecting millions of people each year. Early detection is crucial for effective treatment and can significantly improve survival rates. Traditionally, diagnosing skin cancer relies on dermatologists who examine skin lesions, and require biopsies for 100% conformation. However, this process can be time-consuming, costly, and sometimes inaccessible to those in remote areas. This is where machine learning comes in.
The PH\^2 Dataset from the ADDI Project was used in this project. The dataset contained 200 dermoscopic images, as well as a dataset that had features of every single image. 80 of these images were common nevus, also known as moles. 80 of these were atypical nevus, is a type of mole that could sometimes get confused with melanoma, which is the skin cancer. There were 40 images of melanoma in this dataset.
In this project, multiple machine learning models were used in order to find which one was the most accurate. These models consisted of Decision Trees, Random Forest, Support Vector Machine, Naïve Bayes, and KMeans. The Decision Tree model and Random Forest model had the highest accuracy of 92.5%, followed by the KMeans model which had an accuracy of 90%.
{width=”5.22in” height=”3.925038276465442in”}
Figure 1. Accuracy rate of every model used
{width=”6.5in” height=”3.564583333333333in”}
Figure 2. Decision Tree model
In future studies, Hyperparameter tuning can be used to improve the accuracy of the models. Hyperparameter tuning can be used in all models to increase performance, reduce overfitting, and enhance generalizations. New approaches can also be used, such as more unsupervised models like neural networks. More complex algorithms can also be used, such as image segmentation to further improve the accuracy of models.
Throughout this project, I learned an introduction to machine learning algorithms and its applications in biomedical studies. Using various models like decision trees, support vector machines, and random forests helped solidify my understanding. I also became proficient in using popular machine learning libraries such as pandas and scikit-learn. Although I didn’t attend many, the seminars taught me different models and motivated me to keep on learning Machine Learning.
A huge thank you to Dr. Joshua Levy, Aruesha Srivastava, and Suchir Paruchuri for their assistance in completing this project.