Recent Developments in Applying Quantile Regression for Missing Data in Longitudinal Studies
Abstract
In this thesis, we propose new approaches based on a quantile regression model to handle missing data in longitudinal studies. In Chapters 2 and 3, we develop univariate quantile regression imputation models for longitudinal data that may exhibit population heterogeneity. In Chapter 4, we propose a multivariate quantile regression imputation model capable of handling missing values across multiple responses, as well as imputations for censored observations. Our proposed models are evaluated using three HIV longitudinal datasets collected from the Multicenter AIDS Cohort Study and the Women's Interagency HIV Study (also known as the MACS/WIHS Combined Cohort Study). We believe these models can also be applied to other longitudinal datasets. All our models are built within the Bayesian framework, with estimation and inference implemented using the Markov Chain Monte Carlo (MCMC) procedure.
In Chapter 2, considering the presence of population heterogeneity in the longitudinal data, we construct a latent class mixture quantile regression model for imputation. To address missing values under the Missing Not at Random (MNAR) assumption, we propose an additional model to capture the missingness indicator, thereby facilitating the imputation of missing values. We test the proposed approach under various simulation settings, and find the proposed model consistently yields accurate parameter estimations across the simulation scenarios. The model is also evaluated using real data from the MACS study. Based on the real-data analysis results, we conclude that the model fit can be improved with the addition of a latent class structure when a clustering effect exists in the data.
In Chapter 3, we extend the imputation model introduced in Chapter 2 by incorporating semiparametric random effects terms, assuming a Dirichlet process prior. This imputation model can be treated as a semiparametric latent class mixture quantile regression model. Additionally, we include the same random effects terms in the model for the missingness indicator, thus building a shared-parameter model structure for MNAR. Simulation results demonstrate that the proposed model can not only provide consistent and accurate results for the estimation of parameters of interest, but can also recover the shape of the random effects terms when the response variable contains missing observations. We also test the proposed model using longitudinal data from the MACS HIV study. Compared to the model presented in Chapter 2, the Chapter 3 model returns a better fit.
In Chapter 4, we propose a multivariate imputation approach within the quantile regression framework to address missing values in multiple responses within longitudinal studies. This model is motivated by real data from the WIHS HIV study. The response variable, HIV viral load, contains missing values and is also subject to upper detection limits, which can be treated as left-censored observations. Other variables of interest, such as CD4 and CD8 cell counts, also contain missing observations. Additionally, we incorporate an autocorrelated error structure in the imputation model, as current values can be influenced by previous observations. We test the model under various simulation settings and show that our model can provide accurate and consistent parameter estimation results by imputing the missing observations and recovering the censored observations. In the real data setting, we assess our model's performance compared to models without autocorrelated error structures and those using multivariate linear mixed effects models as imputation models, and we find that our model yields a better fit.
In Chapter 5, we summarize the thesis and outline directions for future research.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material