Enhancing Early Diagnosis of Heart Disease: A Comparative Study of K-NN and Naive Bayes Classifiers Using the UCI Heart Disease Dataset

Angga Aditya Permana(1*), Arsanah Arsanah(2)


(1) Department of Informatics, Universitas Multimedia Nusantara, Banten, Indonesia
(2) Department of Informatics Engineering, Universitas Muhammadiyah Tangerang, Banten, Indonesia
(*) Corresponding Author

Abstract


Heart disease remains a leading cause of mortality globally, necessitating accurate predictive models for early detection and intervention. This study conducted a detailed comparative analysis of the K-nearest neighbor (KNN) and naive bayes classifiers using the UCI Repository Heart Disease dataset to determine the most effective algorithm for heart disease prediction. Our results demonstrate that the proposed KNN outperforms naive bayes in terms of several key metrics: KNN achieved an accuracy of 91.25%, which surpasses naive bayes' accuracy of 88.75%. Additionally, KNN exhibited superior precision (92%), recall (90%), and an F1 score (91%) compared to naive bayes, which demonstrated precision of 89%, recall of 87%, and an F1 score of 88%. The findings of this study have substantial practical implications for medical data analysis. The high accuracy and reliability of the KNN algorithm make it a valuable tool for healthcare professionals in the early diagnosis of heart disease. Implementing KNN-based predictive models can enhance patient outcomes by timely and accurate detection of heart disease, facilitating early intervention, and reducing the risk of severe cardiac events. Moreover, the user-friendly interface of the proposed system streamlines the classification process, making it accessible for clinical use. Future research should explore the integration of additional machine learning algorithms and ensemble methods to further improve prediction accuracy. Developing real-time prediction systems integrated with electronic health records (EHR) could revolutionize patient monitoring and proactive healthcare management, ultimately contributing to better patient outcomes and more efficient healthcare delivery.

Keywords


Heart Disease Prediction; K-Nearest Neighbor; Naive Bayes; Machine Learning; Medical Data Analysis; Early Diagnosis; Clinical Decision-Making

Full Text:

PDF

References


Althnian, A., AlSaeed, D., Al-Baity, H., Samha, A., Dris, A. Bin, Alzakari, N., Abou Elwafa, A., & Kurdi, H. (2021). Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain. Applied Sciences, 11(2), 796. https://doi.org/10.3390/app11020796

Bhatt, C. M., Patel, P., Ghetia, T., & Mazzeo, P. L. (2023). Effective Heart Disease Prediction Using Machine Learning Techniques. Algorithms, 16(2), 88. https://doi.org/10.3390/a16020088

Gupta, A., Kumar, L., Jain, R., & Nagrath, P. (2020). Heart Disease Prediction Using Classification (Naive Bayes). In Lecture Notes in Networks and Systems (LNNS,volume 121) (pp. 561–573). https://doi.org/10.1007/978-981-15-3369-3_42

Kemenkes, R. I. (2014). Situasi kesehatan jantung. Pusat Data Dan Informasi Kementerian Kesehatan RI, 3.

Lewandowicz, B., & Kisiała, K. (2024). Comparison of Support Vector Machine, Naive Bayes, and K-Nearest Neighbors Algorithms for Classifying Heart Disease. In Communications in Computer and Information Science (pp. 274–285). https://doi.org/10.1007/978-3-031-48981-5_22

Maheswari, B. U., Guhan, T., Britto, C. F., Sheeba, A., Rajakumar, M. P., & Pratyush, K. (2023). Performance analysis of classifying the breast cancer images using KNN and naive bayes classifier. INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES AND APPLICATIONS (ICSTA 2022), 020012. https://doi.org/10.1063/5.0164139

Rajput, D., Wang, W.-J., & Chen, C.-C. (2023). Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics, 24(1), 48. https://doi.org/10.1186/s12859-023-05156-9

Schonlau, M. (2023). The Naive Bayes Classifier. In Applied Statistical Learning (pp. 143–160). https://doi.org/10.1007/978-3-031-33390-3_8

Shah, D., Patel, S., & Bharti, S. K. (2020). Heart Disease Prediction using Machine Learning Techniques. SN Computer Science, 1(6), 345. https://doi.org/10.1007/s42979-020-00365-y

Somani, S., van Buchem, M. M., Sarraju, A., Hernandez-Boussard, T., & Rodriguez, F. (2023). Artificial Intelligence–Enabled Analysis of Statin-Related Topics and Sentiments on Social Media. JAMA Network Open, 6(4), e239747. https://doi.org/10.1001/jamanetworkopen.2023.9747

Sravani, S., & Karthikeyan, P. R. (2023). Detection of cardiovascular disease using KNN in comparison with naive bayes to measure precision, recall and f-score. Contemporary Innovations in Engineering and Management, 030002. https://doi.org/10.1063/5.0177014

Ting, K. M. (2011). Confusion Matrix. In Encyclopedia of Machine Learning (pp. 209–209). Springer US. https://doi.org/10.1007/978-0-387-30164-8_157

Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. (2019). Machine learning algorithm validation with a limited sample size. PLOS ONE, 14(11), e0224365. https://doi.org/10.1371/journal.pone.0224365

Wickramasinghe, I., & Kalutarage, H. (2021). Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Computing, 25(3), 2277–2293. https://doi.org/10.1007/s00500-020-05297-6


Article Metrics

Abstract view : 197 times
PDF - 25 times

DOI: https://doi.org/10.26714/jichi.v5i1.11251

Refbacks

  • There are currently no refbacks.


____________________________________________________________________________
Journal of Intelligent Computing and Health Informatics (JICHI)
ISSN 2715-6923 (print) | 2721-9186 (online)
Organized by
Department of Informatics
Faculty of Engineering
Universitas Muhammadiyah Semarang

W : https://jurnal.unimus.ac.id/index.php/ICHI
E : jichi.informatika@unimus.ac.id, ahmadilham@unimus.ac.id

View My Stats