U.S. flag

An official website of the United States government

Skip Header


A Penny Synthesized is a Penny Earned? An Exploratory Analysis of Accuracy in the SIPP Synthetic Beta

Written by:
Working Paper Number CED-WP-2021-006

Abstract

The Census Bureau has expressed interest in using modern synthetic data modeling techniques for privacy and confidentiality protection in future microdata releases. In order to aid understanding of how to evaluate accuracy and usability of synthetic microdata going forward, we perform an exploratory analysis on an early synthetic microdata release known as SIPP Synthetic Beta. The present research endeavor compares results generated using synthetic microdata to results from the same analyses using the corresponding confidential microdata. The confidential data, the SIPP Gold Standard File (GSF), links data from the Survey of Income & Program Participation to administrative data from the Internal Revenue Service and Social Security Administration. The GSF is used to model the SIPP Synthetic Beta (SSB) through sequential regression multivariate imputation. We find that the SSB replicates many results in the GSF, including descriptive statistics, time trends in national statistics, and coefficients from regression analyses. Relative to the GSF results, the SSB performs best when our analysis involves only variables modeled from the GSF and when using methods that are less sensitive to outliers. Similarity between the GSF and SSB noticeably declines when our analysis relies on merged external data or within-person variation in earnings. Finally, we believe our findings represent something of a lower bound for the accuracy of future synthetic microdata because of improvement in synthetic data modeling since the SSB was created and the fact that we do not account for other sources of survey error when comparing the confidential data to the synthetic data.

Page Last Revised - January 13, 2023
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header