dc.description.abstract |
Despite the rapid advancement in the development of hybrid ensemble Machine Learning
(ML) techniques in malignancy management, recurrence and mortality from Head and Neck
Squamous Cell Carcinoma (HNSCC) subtypes have not significantly improved in recent
decades due to poor prognosis. Moreso, the recurrent HNSCC prognoses increase in patients
with HNSCC due to the metastatic stage of the tumor at diagnosis, but studies providing
promising prognostic models as a supporting tool for recurrence classification and prediction
in HNSCC are lacking. As a supporting tool for identifying the most accurate prognosis and
a robust prognostic classification model for classifying HNSCC recurrence patterns, this
study presents a hybrid stacked ensemble classifier model when the same ML classifiers for;
feature selectors, base classifiers, and meta classifiers are used, that could accurately predict
recurrence outcomes and identify the most newly accurate prognostic features in HNSCC
recurrence. Retrospective data of 125 HNSCC patients treated with curative intent between
2016 and 2020 at KBTH and who had a follow-up within this calendar period are collected.
Data is pre-processed using mode imputation and one-hot encoding. The proposed Hybrid
Ensemble Super Classification Algorithm (HESCA) model uses the ML classifier models
including Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Deep
Neural Network (DNN), Generalised Linear Model (GLM), and Naïve Bayes (NB) for
stacked ensemble learning. These classifier models are employed in constructing feature
subsets, base classifiers, and with each as a meta-classifier in a stacking ensemble. The
performances of the HESCA model on various feature subsets are compared. Next, the
performance of the HESCA model on 8-input features is compared with the HESCA model
on full-input features. Then, based on gradient-boosted features, the performance of the
HESCA model is compared with the established stacked ensembles. Thus, the two baseline
stacked ensemble models, and one state-of-the-art stacked ensemble model. The results show
that when the GBM classifier is used as a meta-classifier in a stacking ensemble consisting
of five base classifiers on gradient-boosted features (GBM-input features) including
concurrent chemoradiotherapy treatment, age at diagnosis, p63, cervical lymph/neck nodes,
tumor size, smoking habit, pathological tumor staging at T4, and stage IV of tumor at
diagnosis achieves higher accuracy (90.63%) with the least log loss (0.2959) compared to
that achieved by base models and the established stacked ensemble models on the same
gradient boosted features of recurrent HNSCC prognostic data. This gives a hybrid stacked
ensemble model termed the HESCA model, which consists of five base models under study
and a GBM meta-model. It is also observed that this HESCA model on GBM-input features
achieves better classification evaluation measures than that achieved on any other input
feature subsets as well as the full-input feature subset considered in this study. The study
shows that using the GBM classifier as a meta-classifier model in a stacking ensemble having
five base classifiers with its gradient-boosted features results in better performance than base
models and any other established stacked ensemble model used in this study; and using the
HESCA model with gradient boosted features is clinically appropriate as a supporting tool
for identifying, classifying and predicting patients' recurrent HNSCC prognostic data. |
en_US |