A Post-randomization Method for Rigorous Identification Risk Control in Releasing Microdata

Skip Navigation

A Post-randomization Method for Rigorous Identification Risk Control in Releasing Microdata

April 17, 2020

Written by:

Xiaoyu Zhai and Tapan K. Nayak

RRS2020-01

Abstract

Download RESEARCH REPORT SERIES OR STUDY SERIES [PDF - <1.0 MB]

One significant concern in releasing survey microdata is the possibility of identifying the records of some survey units by matching the values of some of the variables, called key or pseudo-identifying variables, whose values can be obtained easily from other sources. For categorical key variables, Nayak, Zhang and You [Int. Stat. Rev, 86(2), 2018, 300-321] developed a novel approach for measuring and controlling identification risks. For any ξ > 1/3, it can guarantee that any unit’s probability of correct identification would not exceed ξ. We present another post-randomization method for giving that guarantee more stringently, even for ξ ≤ 1/3. We use data partitioning and unbiased post-randomization as two effective tools for preserving data utility. We illustrate and assess the procedure by applying it to a U.S. Census Bureau’s publicly released data set.

You May Be Interested In

Page Last Revised - October 8, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top