An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
When survey organizations and statistical agencies such as the U.S. Census Bureau release microdata to the public, a major concern is the control of disclosure risk, while simultaneously ensuring quality and utility of the released data. Very often some popular statistical disclosure control methods such as data swapping, multiple imputation (MI), top coding/bottom coding (especially for income data), and multiplication with random noise, are applied before releasing the data. Multiple imputation has been in existence for some time as a viable methodology to handle missing data (see Rubin, 1987); following the initial proposal by Rubin (1993), in a series of papers (e.g., Drechsler and Reiter, 2010; Raghunathan, Reiter, and Rubin, 2003; Reiter, 2003, 2004, 2005a, 2005b) Reiter and his colleagues expanded its scope and provided a solid and rigorous foundation for its use so much so that statistical agencies can now employ this method for sensitive data protection while data users can carry out the required inference in a valid way. When MI is applied for statistical disclosure control, the multiply imputed data that are ultimately released are usually referred to as synthetic data. More recently, multiple imputation has been cleverly used by An and Little (2007) as an alternative to top coding. Recall that top coding consists of censoring the top part of the data above a specified threshold, and is commonly used in the context of income data so that the identity of those in the top income bracket is protected. We refer to the recent monograph by Drechsler (2011) for a detailed discussion of multiple imputation as a tool for disclosure control. Noise perturbation by addition or multiplication has also been advocated by some statisticians as a possible data confidentiality protection mechanism (Hwang, 1986; Little, 1993; Kim and Winkler, 2003); recently there has a been a renewed interest on this topic (Nayak, Sinha and Zayatz, 2011; Sinha, Nayak and Zayatz, 2011).
Share
Related Information
WORKING PAPER
Statistical Research Reports and StudiesSome content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.
Top