census.gov Notification
Due to the lapse of federal funding, portions of this website are not being updated. Any inquiries submitted via www.census.gov will not be answered until appropriations are enacted.

A Post-randomization Method for Rigorous Identification Risk Control in Releasing Microdata

Written by:
RRS2020-01

Abstract

One significant concern in releasing survey microdata is the possibility of identifying the records of some survey units by matching the values of some of the variables, called key or pseudo-identifying variables, whose values can be obtained easily from other sources. For categorical key variables, Nayak, Zhang and You [Int. Stat. Rev, 86(2), 2018, 300-321] developed a novel approach for measuring and controlling identification risks. For any ξ > 1/3, it can guarantee that any unit’s probability of correct identification would not exceed ξ. We present another post-randomization method for giving that guarantee more stringently, even for ξ ≤ 1/3. We use data partitioning and unbiased post-randomization as two effective tools for preserving data utility. We illustrate and assess the procedure by applying it to a U.S. Census Bureau’s publicly released data set.

Page Last Revised - October 8, 2021