Electronic Thesis/Dissertation


Application of Engineering Principles with a Comparison of Machine Learning Classification Methods to Predict Treatment Outcomes in Head and Neck Cancer Patients Open Access

Downloadable Content

Download PDF

Our research approach emphasized a comparison of various classification methods (“Decision Trees, Logistic Regression, Naïve Bayes, Linear Discriminant Analysis, Nearest Neighbors, Support Vector Machines”) and compared those with ensemble classifier models (“bagging” and “boosting”) to predict weight loss of five or more kilograms and toxicity of five or more grays above the actual radiation therapy dose received by patients, with data up to 90 days post-treatment. The data for this study was obtained from Johns Hopkins Hospital, Baltimore, MD, taking anonymous data sets from Oncospace® database, consisting of randomly selected records of 326 patient instances (rows) and 295 features or predictor variables (columns) out of 729 features available, to predict weight loss. Features included tumor factors, diagnosis, treatment, patients’ anonymous biographical data, cancer site, and quality of life surveys (Appendix A), among others. Toxicity data included 597 patient instances (rows) and 37 predictor variables (columns), including toxicity to various organs and tissue. Oncospace® data used was from previously treated patients collected from June 24, 2014, back to January 1, 2006 (sample data fields in Appendix B). Feature variables and models were validated, evaluating predictive performance accuracy with 10-fold cross-validation and expert feature selection (domain knowledge and tools). We built the models using a comprehensive training and testing process available with MathWorks®, Matlab®, Statistics and Machine Learning Toolbox™, Classification Learner Application. Ensemble bagged trees classifiers showed prediction accuracies of 86.1% (toxicity) and 96.3% (weight loss). Ensemble boosted trees showed 92.3% (toxicity) and 100.0% (weight loss). Ensemble methods showed consistently higher prediction accuracies than that of single classifiers.

Author Language Keyword Date created Type of Work Rights statement GW Unit Degree Advisor Committee Member(s) Persistent URL