Published Paper


Artificial Intelligence Driven Diabetes Risk Assessment with Orange: A No-Code Machine Learning Approach

1 Atul Tiwari; 2 Rameshwar Kumar; 3 Ajay SK; 4 Asitava Deb Roy
NA
Page: 1474-1488
Published on: 2025 December

Abstract

Diabetes mellitus represents a major and growing global public health challenge, with a substantial proportion of affected individuals remaining undiagnosed until complications arise. Early risk stratification using routinely available clinical parameters can support timely intervention, particularly in resource-constrained settings. Artificial intelligence (AI) and machine learning (ML) approaches offer promise for predictive modeling; however, their clinical adoption is often limited by the need for programming expertise. Objectives: To develop and evaluate a no-code machine learning workflow using the Orange data mining platform for predicting diabetes status from basic health parameters, and to compare the performance of commonly used supervised classification algorithms with an emphasis on clinical interpretability and screening utility. Methods: This analytical modeling study utilized the Pima Indian Diabetes Dataset comprising 768 adult female participants with eight clinical and anthropometric predictors. Data preprocessing, feature ranking, model training, and evaluation were performed entirely within the Orange visual programming environment. Six supervised classifiers—Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machine, k-Nearest Neighbors, and Decision Tree—were trained and validated using stratified 10-fold cross-validation. Model performance was assessed using accuracy, precision, recall, F1-score, area under the receiver operating characteristic curve (AUC), and Matthews correlation coefficient. Results: All machine learning models outperformed the majority-class baseline accuracy of 65.1%. Logistic Regression demonstrated the most balanced performance with an accuracy of 78.4%, AUC of 0.831, F1-score of 0.774, and MCC of 0.508. Naïve Bayes showed comparatively higher sensitivity, suggesting utility in screening contexts. Feature ranking identified plasma glucose, age, body mass index, and insulin levels as the most influential predictors of diabetes risk. Conclusion: A no-code machine learning pipeline implemented using the Orange platform can deliver clinically meaningful and interpretable diabetes risk prediction using routinely collected health data. Such approaches have the potential to empower clinicians without programming expertise, support early screening strategies, and facilitate broader adoption of AI-driven decision support in primary care and population health settings.

 

PDF