Data-driven Glucose Prediction for Type 1 Diabetes: Modeling, Adaptation and Engineering

Cui, Ran

Data-driven Glucose Prediction for Type 1 Diabetes: Modeling, Adaptation and Engineering

Date

2025

Authors

Cui, Ran

Abstract

Type 1 Diabetes Mellitus (T1DM) is a chronic disease affecting over 9 million people worldwide, requiring continuous external insulin delivery to control blood glucose levels. Continuous Glucose Monitoring (CGM) systems have enabled individuals to track glucose trends, and the availability of CGM data has made data-driven glucose prediction using Machine Learning (ML) a central focus in biomedical research. This PhD thesis aims to advance data-driven glucose prediction through three key objectives: modeling, adaptation, and engineering. The modeling objective seeks to enhance existing techniques, such as Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). The adaptation objective focuses on tailoring generic glucose prediction models to individual patients, particularly when personalized data is limited. The engineering objective addresses practical implementation, optimizing computational efficiency, and improving data processing. Five studies underpin this research. The first study examines the correlation between diabetes and various blood biomarkers using ML classification. By analyzing 9,549 individuals from the China Health and Nutrition Survey dataset, an XGBoost algorithm achieved over 86% F1 score, with blood glucose shown to be the most significant factor in determining diabetes. The second study introduces the Recurrent Self-Attention Network (RSAN) for glucose prediction, using a self-attention mechanism to outperform traditional RNN and TCN models. RSAN achieved state-of-the-art results on the OhioT1DM dataset, with transfer learning improving predictions by approximately 1 mg/dL. The third study explores a novel time-index modeling approach, contrasting it with historical-value models. The Meta-Optimised Time-Index Model (MOTIM) reduced the problem to point-to-point mapping, significantly improving efficiency while maintaining predictive accuracy. MOTIM demonstrated comparable performance to historical-value models but with much lower computational costs. The fourth study redefines postprandial glucose prediction as the joint prediction of hyperglycemia and hypoglycemia. A unified Long Short-Term Memory model with two linear heads achieved superior performance on the OhioT1DM dataset, with Matthew's correlation coefficients of 0.61 for hyperglycemia and 0.48 for hypoglycemia. The final study compares rolling and direct prediction schemes, as well as single-point and excursion prediction targets. Experiments using the OhioT1DM dataset demonstrated that direct and excursion predictions offer better accuracy and efficiency. This thesis contributes to the fields of modeling, adaptation, and engineering for data-driven glucose prediction, advancing the potential for precise and reliable prediction technology for T1DM. The open-source code developed during this research is available to support further study.