Topics

Data & Maps

Surveys & Programs

Resource Library

Partners Researchers Educators Survey Respondents

News NAICS Codes Jobs About Us Contact Us Help

View All Topics and Subtopics Index A to Z

Age and Sex Business and Economy Education Emergency Management / Disasters Employment Families and Living Arrangements Geography Health Hispanic Origin Housing Income and Poverty International Trade Migration/Geographic Mobility Population Population Estimates Public Sector Race Redistricting Research Voting and Registration

Explore data on data.census.gov View all data resources

Census Academy Combining Data Data Equity Tools Data Tools and Apps Datasets Developers Experimental Data Products Interactive Maps Mapping Files Profiles Related Sites Software Tables Training and Workshops Visualizations

Survey Help View all Surveys & Programs

2020 Census 2030 Census American Community Survey (ACS) American Housing Survey (AHS) Annual Business Survey (ABS) Annual Integrated Economic Survey (AIES) Census of Governments County Business Patterns (CBP) Current Population Survey (CPS) Economic Census Household Pulse Survey International Programs Metro and Micro Areas Population Estimates Population Projections Small Area Income and Poverty Statistics of U.S. Businesses Survey of Income and Program Participation (SIPP)

View all library resources Glossary

America Counts: Stories Audio Blogs By the Numbers Facts for Features Fact Sheets Infographics and Visualizations Photos Publications Spotlights Stats for Stories Training (Census Academy) Videos Working Papers

Measuring Identification Risk in Microdata Release and Its Control by Post-randomization

Skip Navigation

Measuring Identification Risk in Microdata Release and Its Control by Post-randomization

May 02, 2016

Written by:

Tapan K. Nayak, Cheng Zhang, and Jiashen You

CDAR2016-02

Abstract

Download Measuring Identification Risk in Microdata Release and Its Control by Post-randomization [PDF - <1.0 MB]

Statistical agencies often release a masked or perturbed version of survey data to protect respondents' confidentiality. Ideally, a perturbation procedure should protect confidentiality without much loss of data quality, so that released data may practically be treated as original data for making inferences. One major objective is to control the risk of correctly identifying any respondent's records in released data, by matching the values of some identifying or key variables. For categorical key variables, we propose a novel approach to measuring identification risk and setting strict disclosure control goals. The general idea is to ensure that the probability of correctly identifying any respondent or surveyed unit is at most ξ , which is pre- specified. Then, we develop an unbiased post-randomization procedure that achieves this goal for ξ > 1 / 3. The procedure allows substantial control over possible changes to the original data and the variance it induces is of a lower order of magnitude than sampling variance. We apply the procedure to a real data set, where it performs consistently with the theoretical results and quite importantly, shows very little data quality loss.

Others in Series

Working Paper

Likelihood-Based Finite Sample Inference

July 2014

Likelihood-based finite sample inference based on synthetic data under the exponential model is developed in this paper.

Working Paper

Emerging Applications of Randomized Response Concepts

May 02, 2016

Randomized response (RR) was introduced as a technique for protecting respondents' privacy in survey interviews regarding sensitive characteristics.

Working Paper

A Concise Theory of Randomized Response Techniques for Privacy

July 28, 2016

A variety of randomized response (RR) procedures for privacy and confidentiality protection have been proposed, studied and compared in the literature.

View All

Related Information

Disclosure Avoidance

Page Last Revised - October 8, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top