A Missing Data Method Based on a Product Partition Model of the Missing Patterns

Song Zhang, UTSW Medical Center

3:00 pm Seminar on February 29, 2008



Abstract

Missing data is a problem encountered by statisticians in many scientific fields. Discarding all the subjects or covariates with missing values leads to loss of information. More sophisticated methods assume a joint model of the observed and missing data, and the uncertainty induced by missingness is accounted for by integration with respect to the missing values. Unfortunately, such methods requires correct specification of the joint model, which is difficult in practice. We construct a prior probability model that describes prior uncertainty about which missing patterns are most likely to share similar parameter values for the regression coefficients.  Two patterns that differ by one covariate xi share the same regression coefficients of the remaining covariates if xi has a vanishing or small regression coefficient, or if the corresponding column of the design matrix is (close to) orthogonal to the other columns. A similar statement becomes increasingly more restrictive for patterns that differ by two or more covariates. This leads us to specifying a prior probability model on the partitions of missing patterns such that patterns different in fewer covariates should be more likely to co-cluster. The proposed method is an extension of the product partition model by introducing an additional factor to the cohesion function measuring the similarity between missing patterns. A simulation study and a real data analysis are presented.