Comparison of Tests and Confidence Intervals for Univariate Normal Mean Based on Multiply Imputed Synthetic Data Obtained by Posterior Predictive Sampling

Written by:
RRS2024-06

Abstract

There is a huge literature on data analysis under privacy or confidentiality protection.  Among many inferential statistical methods based on parametric models, data analysis based on perturbation of original sensitive data using plug-in and posterior predictive sampling are quite common. In this paper we consider a very basic inferential problem of tests and confidence intervals for a normal mean with unknown variance based on synthetic data obtained from multiple imputations under posterior predictive sampling method. Several methods are suggested and compared. A general expression of the local power of a class of tests is also derived which can be used in a design context to determine a combination of sample size and number of imputations to guarantee a desired level of local power. A measure of privacy protection is derived to demonstrate that privacy would be compromised if too many imputations are released. An application to draw inference about the household earnings, corresponding to a US Census Bureau data, is illustrated.

Page Last Revised - September 5, 2024