Comparison of Tests and Confidence Intervals for Univariate Normal Mean Based on Multiply Imputed Synthetic Data Obtained by Posterior Predictive Sampling

September 05, 2024

Written by:

Biswajit Basak and Bimal Sinha

RRS2024-06

Abstract

There is a huge literature on data analysis under privacy or confidentiality protection. Among many inferential statistical methods based on parametric models, data analysis based on perturbation of original sensitive data using plug-in and posterior predictive sampling are quite common. In this paper we consider a very basic inferential problem of tests and confidence intervals for a normal mean with unknown variance based on synthetic data obtained from multiple imputations under posterior predictive sampling method. Several methods are suggested and compared. A general expression of the local power of a class of tests is also derived which can be used in a design context to determine a combination of sample size and number of imputations to guarantee a desired level of local power. A measure of privacy protection is derived to demonstrate that privacy would be compromised if too many imputations are released. An application to draw inference about the household earnings, corresponding to a US Census Bureau data, is illustrated.