Evaluating PCA and LDA to Improve Machine Learning Classification of Consumer Behavior in Health Informatics
(1) Department of Informatics, University of Muhammadiyah Semarang, Semarang, Indonesia
(2) Department of Informatics, University of Muhammadiyah Semarang, Semarang, Indonesia
(3) Department of Data Analytics, The Ohio State University, Columbus, United States
(*) Corresponding Author
Abstract
Behavioral data derived from consumer activity offer significant potential in health informatics, particularly for developing predictive models related to lifestyle, adherence, and patient engagement. This study evaluates the effect of two widely used dimensionality reduction techniques, namely principal component analysis (PCA) and linear discriminant analysis (LDA) on the performance of five supervised machine learning classifiers: logistic regression (LR), support vector machine (SVM), naive bayes (NB), decision tree (DT), and random forest (RF). The experimental dataset, although sourced from a commercial context, contains demographic and economic attributes commonly found in health-related behavioral data, such as age and income. Results indicate that LDA significantly improves classification performance across all models, with Random Forest achieving the highest scores: accuracy = 0.91, precision = 0.88, recall = 0.85, F1-score = 0.86, and AUC = 0.95 when trained on LDA-transformed features. SVM also performed competitively under the same configuration (AUC = 0.94). Conversely, PCA provided moderate gains but underperformed in capturing class-discriminative information compared to LDA. These findings demonstrate that integrating LDA with robust classifiers such as RF and SVM enhances both predictive accuracy and model interpretability, offering practical benefits for behavior-informed health decision support systems. The study highlights the relevance of supervised feature transformation in optimizing data pipelines for personalized healthcare applications.
Keywords
Full Text:
PDFReferences
Azad, Md. S., Khan, S. S., Hossain, R., Rahman, R., & Momen, S. (2023). Predictive modeling of consumer purchase behavior on social media: Integrating theory of planned behavior and machine learning for actionable insights. PLOS ONE, 18(12), e0296336. https://doi.org/10.1371/journal.pone.0296336
Darmawahyuni, A., Nurmaini, S., Tutuko, B., Rachmatullah, M. N., Firdaus, F., Sapitri, A. I., Islami, A., Marcelino, J., Isdwanta, R., & Karim, M. I. (2024). Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning. IEEE Access, 12, 55342–55356. https://doi.org/10.1109/ACCESS.2024.3390008
García-Barrera, L. J., Meza-Zamora, S. A., Noa-Carrazana, J. C., & Delgado-Macuil, R. J. (2024). Chemometric analysis using infrared spectroscopy and PCA-LDA for early diagnosis of Fusarium oxysporum in tomato. Journal of Plant Diseases and Protection, 131(5), 1609–1626. https://doi.org/10.1007/s41348-024-00978-y
Hayati, R., Munawar, A. A., Lukitaningsih, E., Earlia, N., Karma, T., & Idroes, R. (2024). Combination of PCA with LDA and SVM classifiers: A model for determining the geographical origin of coconut in the coastal plantation, Aceh Province, Indonesia. Case Studies in Chemical and Environmental Engineering, 9, 100552. https://doi.org/10.1016/j.cscee.2023.100552
Jiménez-Narváez, A. D., Vaca, V. D. C., Loor-Duque, J. J., Martín, I. R. A., Reyes-Chacón, I. G., Vizcaíno, P., & Morocho-Cayamcela, M. E. (2025). Predictive Modeling for Fetal Health: A Comparative Study of PCA, LDA and KPCA for Dimensionality Reduction. IEEE Access, 1–1. https://doi.org/10.1109/ACCESS.2025.3553110
Kabir, M. R., Ashraf, F. Bin, & Ajwad, R. (2019). Analysis of Different Predicting Model for Online Shoppers’ Purchase Intention from Empirical Data. 2019 22nd International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT48885.2019.9038521
Nosakhare, E., & Picard, R. (2020). Toward Assessing and Recommending Combinations of Behaviors for Improving Health and Well-Being. ACM Transactions on Computing for Healthcare, 1(1), 1–29. https://doi.org/10.1145/3368958
Samuel Ajibola Dada, & Adeleke Damilola Adekola. (2024). Leveraging digital marketing for health behavior change: A model for engaging patients through pharmacies. International Journal of Science and Technology Research Archive, 7(2), 050–059. https://doi.org/10.53771/ijstra.2024.7.2.0063
Zhai, K., Yousef, M. S., Mohammed, S., Al-Dewik, N. I., & Qoronfleh, M. W. (2023). Optimizing Clinical Workflow Using Precision Medicine and Advanced Data Analytics. Processes, 11(3), 939. https://doi.org/10.3390/pr11030939
Article Metrics
Abstract view : 58 timesPDF - 7 times
DOI: https://doi.org/10.26714/jichi.v6i1.16176
Refbacks
- There are currently no refbacks.
____________________________________________________________________________
Journal of Intelligent Computing and Health Informatics (JICHI)
ISSN 2715-6923 (print) | 2721-9186 (online)
Organized by
Department of Informatics
Faculty of Engineering
Universitas Muhammadiyah Semarang
W : https://jurnal.unimus.ac.id/index.php/ICHI
E : jichi.informatika@unimus.ac.id, ahmadilham@unimus.ac.id
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.