Research Article | | Peer-Reviewed

Leveraging Ensemble Models for Optimizing Predictive Accuracy of Low Birthweight Risk in Kenya

Received: 14 September 2025     Accepted: 24 September 2025     Published: 27 October 2025
Views:       Downloads:
Abstract

Low birth weight (LBW) is a prevalent public health challenge in low- and middle-income countries, including Kenya, where approximately 11.5% of newborns are affected. LBW is linked to heightened infant mortality, infections, and long-term developmental issues. While machine learning (ML), particularly ensemble learning, has demonstrated potential in improving LBW risk prediction, its application in resource-limited settings like Kenya remains underexplored. Prior research has largely focused on developed countries with limited adoption in sub-Saharan Africa, highlighting a crucial gap this study aims to address. This research develops and evaluates ensemble machine learning models to predict LBW risk using nationally representative data from the 2022 Kenya Demographic and Health Survey. The study integrates traditional clinical indicators with advanced computational methods, employing base classifiers such as Support Vector Machines and Logistic Regression alongside ensemble methods including Random Forest, Gradient Boosting, and Extreme Gradient Boosting. Meta-ensemble approaches such as bagging, voting, and stacking were also assessed. Data preprocessing included treatment of missing values, encoding categorical variables, and addressing class imbalance through the Synthetic Minority Over-sampling Technique (SMOTE). Models were trained and validated using stratified cross-validation and independent testing, with evaluation metrics comprising ROC AUC, accuracy, F1-score, Matthews Correlation Coefficient, and Brier score, emphasizing both discrimination and calibration. Results indicate that Random Forest outperformed other models, achieving a high ROC AUC of 0.957 and PR AUC of 0.971, with excellent calibration (Brier score 0.089), evidencing its strong predictive capability for LBW risk in the Kenyan context. Important predictors identified were gestational age, maternal height and weight, antenatal care utilization, and socioeconomic factors, consistent with known biological and contextual determinants. Ethical considerations regarding patient privacy, algorithmic fairness, and transparency were incorporated to promote responsible AI use in healthcare. The findings demonstrate that tailored ensemble learning models provide robust, interpretable, and practical tools for LBW prediction in low-resource settings. This work fills a critical research gap by applying advanced ML methods to Kenyan maternal-child health data, offering potential to enhance clinical decision-making and improve maternal and neonatal outcomes. The study underscores the importance of contextualized AI solutions and ethical governance for sustainable healthcare innovation.

Published in American Journal of Artificial Intelligence (Volume 9, Issue 2)
DOI 10.11648/j.ajai.20250902.22
Page(s) 198-209
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Low Birth Weight, Ensemble Learning, Machine Learning, Predictive Modelling, Kenya

References
[1] WHO, “Born too soon: decade of action on preterm birth.” Accessed: Feb. 08, 2025. [Online]. Available:
[2] A. Ranjbar, F. Montazeri, M. V. Farashah, V. Mehrnoush, F. Darsareh, and N. Roozbeh, “Machine learning-based approach for predicting low birth weight,” BMC Pregnancy Childbirth, vol. 23, no. 1, p. 803, Nov. 2023,
[3] D. Unicef, “Low birthweight,” UNICEF DATA. Accessed: Feb. 08, 2025. [Online]. Available:
[4] A. K’Oloo et al., “Improving birth weight measurement and recording practices in Kenya and Tanzania: a prospective intervention study with historical controls,” Popul. Health Metr., vol. 21, no. 1, p. 6, May 2023,
[5] S. J. Sawe, “MACHINE LEARNING PREDICTION OF LOW BIRTH WEIGHT IN KENYA USING MATERNAL RISK FACTORS,” 2022.
[6] J. Lanowski, J. von Ehr, and M., “Impact of Ultrasound Training and Experience on Accuracy regarding Fetal Weight Estimation at Term Creative Education.” Accessed: Aug. 05, 2025. [Online]. Available:
[7] W. T. Bekele, “Machine learning algorithms for predicting low birth weight in Ethiopia,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, p. 232, Sept. 2022,
[8] S. Sanchez-Martinez et al., “Prediction of low birth weight from fetal ultrasound and clinical characteristics: a comparative study between a low- and middle-income and a high-income country,” BMJ Glob. Health, vol. 9, no. 12, p. e016088, Dec. 2024,
[9] Rubaiya, Mohaimen Mansur, and Md. I. Rayhan, “Unraveling birth weight determinants: Integrating machine learning, spatial analysis, and district-level mapping,” Heliyon, vol. 10, no. 5, p. e27341, Mar. 2024,
[10] M. M. Musau et al., “Spatial heterogeneity of low-birthweight deliveries on the Kenyan coast,” BMC Pregnancy Childbirth, vol. 23, no. 1, p. 270, Apr. 2023,
[11] Z. D. Bailey, J. M. Feldman, and M. T. Bassett, “How Structural Racism Works - Racist Policies as a Root Cause of U.S. Racial Health Inequities,” N. Engl. J. Med., vol. 384, no. 8, pp. 768–773, Feb. 2021,
[12] N. Kozuki et al., “The associations of parity and maternal age with small-for-gestational-age, preterm, and neonatal and infant mortality: a meta-analysis,” BMC Public Health, vol. 13 Suppl 3, no. Suppl 3, p. S2, 2013,
[13] W. H. Organization, “Global nutrition targets 2025: low birth weight policy brief,” Art. no. WHO/NMH/NHD/14.5, 2014, Accessed: Jan. 06, 2025. [Online]. Available:
[14] J. Molitoris, K. Barclay, and M. Kolk, “When and Where Birth Spacing Matters for Child Survival: An International Comparison Using the DHS,” Demography, vol. 56, no. 4, pp. 1349–1370, Aug. 2019,
[15] Y. I. Coulibaly et al., “The Impact of Six Annual Rounds of Mass Drug Administration on Wuchereria bancrofti Infections in Humans and in Mosquitoes in Mali,” Am. J. Trop. Med. Hyg., vol. 93, no. 2, pp. 356–360, Aug. 2015,
[16] G. Rezaeizadeh et al., “Maternal education and its influence on child growth and nutritional status during the first two years of life: a systematic review and meta-analysis,” eClinicalMedicine, vol. 71, p. 102574, Apr. 2024,
[17] Y. B. Okwaraji, S. Cousens, Y. Berhane, K. Mulholland, and K. Edmond, “Effect of Geographical Access to Health Facilities on Child Mortality in Rural Ethiopia: A Community Based Cross Sectional Study,” PLoS ONE, vol. 7, no. 3, p. e33564, Mar. 2012,
[18] T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PloS One, vol. 10, no. 3, p. e0118432, 2015,
[19] A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., vol. 91, pp. 216–231, July 2019,
[20] E. W. Steyerberg et al., “Assessing the performance of prediction models: a framework for traditional and novel measures,” Epidemiol. Camb. Mass, vol. 21, no. 1, pp. 128–138, Jan. 2010,
[21] K. Hajian-Tilaki, “Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation,” Casp. J. Intern. Med., vol. 4, no. 2, pp. 627–635, 2013.
[22] V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, “Deep Neural Networks and Tabular Data: A Survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 6, pp. 7499–7519, June 2024,
[23] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in KDD ’16. New York, NY, USA: Association for Computing Machinery, Aug. 2016, pp. 785–794.
Cite This Article
  • APA Style

    Opiyo, V., Anyika, E. (2025). Leveraging Ensemble Models for Optimizing Predictive Accuracy of Low Birthweight Risk in Kenya. American Journal of Artificial Intelligence, 9(2), 198-209. https://doi.org/10.11648/j.ajai.20250902.22

    Copy | Download

    ACS Style

    Opiyo, V.; Anyika, E. Leveraging Ensemble Models for Optimizing Predictive Accuracy of Low Birthweight Risk in Kenya. Am. J. Artif. Intell. 2025, 9(2), 198-209. doi: 10.11648/j.ajai.20250902.22

    Copy | Download

    AMA Style

    Opiyo V, Anyika E. Leveraging Ensemble Models for Optimizing Predictive Accuracy of Low Birthweight Risk in Kenya. Am J Artif Intell. 2025;9(2):198-209. doi: 10.11648/j.ajai.20250902.22

    Copy | Download

  • @article{10.11648/j.ajai.20250902.22,
      author = {Victor Opiyo and Emma Anyika},
      title = {Leveraging Ensemble Models for Optimizing Predictive Accuracy of Low Birthweight Risk in Kenya
    },
      journal = {American Journal of Artificial Intelligence},
      volume = {9},
      number = {2},
      pages = {198-209},
      doi = {10.11648/j.ajai.20250902.22},
      url = {https://doi.org/10.11648/j.ajai.20250902.22},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20250902.22},
      abstract = {Low birth weight (LBW) is a prevalent public health challenge in low- and middle-income countries, including Kenya, where approximately 11.5% of newborns are affected. LBW is linked to heightened infant mortality, infections, and long-term developmental issues. While machine learning (ML), particularly ensemble learning, has demonstrated potential in improving LBW risk prediction, its application in resource-limited settings like Kenya remains underexplored. Prior research has largely focused on developed countries with limited adoption in sub-Saharan Africa, highlighting a crucial gap this study aims to address. This research develops and evaluates ensemble machine learning models to predict LBW risk using nationally representative data from the 2022 Kenya Demographic and Health Survey. The study integrates traditional clinical indicators with advanced computational methods, employing base classifiers such as Support Vector Machines and Logistic Regression alongside ensemble methods including Random Forest, Gradient Boosting, and Extreme Gradient Boosting. Meta-ensemble approaches such as bagging, voting, and stacking were also assessed. Data preprocessing included treatment of missing values, encoding categorical variables, and addressing class imbalance through the Synthetic Minority Over-sampling Technique (SMOTE). Models were trained and validated using stratified cross-validation and independent testing, with evaluation metrics comprising ROC AUC, accuracy, F1-score, Matthews Correlation Coefficient, and Brier score, emphasizing both discrimination and calibration. Results indicate that Random Forest outperformed other models, achieving a high ROC AUC of 0.957 and PR AUC of 0.971, with excellent calibration (Brier score 0.089), evidencing its strong predictive capability for LBW risk in the Kenyan context. Important predictors identified were gestational age, maternal height and weight, antenatal care utilization, and socioeconomic factors, consistent with known biological and contextual determinants. Ethical considerations regarding patient privacy, algorithmic fairness, and transparency were incorporated to promote responsible AI use in healthcare. The findings demonstrate that tailored ensemble learning models provide robust, interpretable, and practical tools for LBW prediction in low-resource settings. This work fills a critical research gap by applying advanced ML methods to Kenyan maternal-child health data, offering potential to enhance clinical decision-making and improve maternal and neonatal outcomes. The study underscores the importance of contextualized AI solutions and ethical governance for sustainable healthcare innovation.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Leveraging Ensemble Models for Optimizing Predictive Accuracy of Low Birthweight Risk in Kenya
    
    AU  - Victor Opiyo
    AU  - Emma Anyika
    Y1  - 2025/10/27
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ajai.20250902.22
    DO  - 10.11648/j.ajai.20250902.22
    T2  - American Journal of Artificial Intelligence
    JF  - American Journal of Artificial Intelligence
    JO  - American Journal of Artificial Intelligence
    SP  - 198
    EP  - 209
    PB  - Science Publishing Group
    SN  - 2639-9733
    UR  - https://doi.org/10.11648/j.ajai.20250902.22
    AB  - Low birth weight (LBW) is a prevalent public health challenge in low- and middle-income countries, including Kenya, where approximately 11.5% of newborns are affected. LBW is linked to heightened infant mortality, infections, and long-term developmental issues. While machine learning (ML), particularly ensemble learning, has demonstrated potential in improving LBW risk prediction, its application in resource-limited settings like Kenya remains underexplored. Prior research has largely focused on developed countries with limited adoption in sub-Saharan Africa, highlighting a crucial gap this study aims to address. This research develops and evaluates ensemble machine learning models to predict LBW risk using nationally representative data from the 2022 Kenya Demographic and Health Survey. The study integrates traditional clinical indicators with advanced computational methods, employing base classifiers such as Support Vector Machines and Logistic Regression alongside ensemble methods including Random Forest, Gradient Boosting, and Extreme Gradient Boosting. Meta-ensemble approaches such as bagging, voting, and stacking were also assessed. Data preprocessing included treatment of missing values, encoding categorical variables, and addressing class imbalance through the Synthetic Minority Over-sampling Technique (SMOTE). Models were trained and validated using stratified cross-validation and independent testing, with evaluation metrics comprising ROC AUC, accuracy, F1-score, Matthews Correlation Coefficient, and Brier score, emphasizing both discrimination and calibration. Results indicate that Random Forest outperformed other models, achieving a high ROC AUC of 0.957 and PR AUC of 0.971, with excellent calibration (Brier score 0.089), evidencing its strong predictive capability for LBW risk in the Kenyan context. Important predictors identified were gestational age, maternal height and weight, antenatal care utilization, and socioeconomic factors, consistent with known biological and contextual determinants. Ethical considerations regarding patient privacy, algorithmic fairness, and transparency were incorporated to promote responsible AI use in healthcare. The findings demonstrate that tailored ensemble learning models provide robust, interpretable, and practical tools for LBW prediction in low-resource settings. This work fills a critical research gap by applying advanced ML methods to Kenyan maternal-child health data, offering potential to enhance clinical decision-making and improve maternal and neonatal outcomes. The study underscores the importance of contextualized AI solutions and ethical governance for sustainable healthcare innovation.
    
    VL  - 9
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Sections