Published Paper


Recent Advancements in Cancer Diagnosis Using Machine Learning Techniques: A Systematic Review of Decades of Research, Comparisons and Problems

Sulekha Das1, Avijit Kumar Chaudhuri2, Partha Ghosh3, Prithwish Raymahapatra4
West Bengal, India
Page: 1509-1534
Published on: 2024 March

Abstract

Abstract : Cancer is a non-communicable disease that spreads throughout the body through uncontrolled cell growth. The malignant cell grows into a tumor, which weakens the immune system and disrupts other biological processes. The most frequent types of cancer are breast, lung, and cervical cancer. Several screening methods are available to detect the presence of cancer at various stages. Misdiagnosis can occur in some circumstances owing to human mistakes or incorrect data interpretation, resulting in the loss of human lives. To address these issues, this research study proposes an effective machine learning-based review and diagnosis technique backed by intelligence learning models. Artificial intelligence-based feature selection and classification techniques are used to detect cancer at an earlier stage, improve prediction accuracy, and save lives. In this research study, breast, cervical, and lung cancer datasets from the University of California, Irvine repository was used in these experimental investigations. To train and validate the optimal features minimized by the proposed system, the authors used supervised machine learning approaches. There could be numerous features that may contribute to the occurrence of cancer, it is difficult to pinpoint the specific environmental and other diagnostic features that contribute to it, but it still plays a role in determining cancer occurrence. We can achieve our goal of estimating the probability of cancer occurrences by using machine learning algorithms and frequent diagnostic data. Cancer data sets contain a variety of patient information features, but not all of them are useful in cancer prognosis. In such cases, a feature selection approach plays a crucial role in identifying the relevant feature set. In this research, we compare the effects of feature selection approaches on the accuracy provided by existing machine learning algorithms. We investigated the following machine learning methods for this purpose: Logistic Regression(LR), Naive Bayes(NB), Random Forest(RF), Hoeffding Tree(HT), and Multi-Layer Perceptron(MLP). Information Gain(IF), Gain Ratio(GR), Relief-F(R-F), and One-R(OR) were all evaluated as feature selection strategies.The training and performance models are validated using various accuracy matrices such as accuracy, sensitivity, specificity, f-measure, kappa score, and area under the ROC curve(AUC) using the 10-fold cross-validation approach. The accuracy of the proposed framework was 100%, 100%, and 91.30% on breast, cervical, and lung cancer datasets, respectively. Furthermore, this approach may serve as a versatile tool for extracting patterns from several clinical trials for various forms of cancer conditions

PDF