An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Michael H. Freiman, Amy D. Lauger, and Jerome P. Reiter
This paper assesses an empirical measure of disclosure risk of synthetic demographic data generated using classification and regression trees. We synthesized a dataset with 50 implicates and tried to infer from the synthetic data the maximum income in the original dataset. If synthetic values were determined by drawing without noise from a leaf of the regression tree, then the maximum value across implicates was a very good estimate of the maximum value in the original dataset. If synthetic values were determined by drawing from the leaf with noise, then skewness in the incomes within the leaves led to substantial bias in the mean wage for the synthetic dataset. Furthermore, the maximum income could still be determined with unreasonable accuracy, estimable by the median of the maxima of the implicates, or in some cases by rescaling the maximum across all of the implicates. We conclude that this method of generating synthetic data does not adequately protect continuous variables such as income from reconstruction, at least not when many implicates are created.
Share
Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.
Top