Skip Header

We are hiring thousands of people for the 2020 Census. Click to learn more and apply.

Synthetic Data: Public-Use Micro Data for a Big Data World

Tue Oct 14 2014
Ron Jarmin
Component ID: #ti718634914


Thomas A. Louis, Associate Director, Research and Methodology Directorate
Javier Miranda, Principal Economist, Center for Economic Studies

Component ID: #ti1638598786

Businesses, households and policymakers need timely and accurate data to make informed decisions. National statistical offices around the world have a wealth of information from survey and administrative sources to meet these needs. However, they are constrained in their ability to release these data because of the confidentiality pledge to data respondents.

Component ID: #ti1559567279

Synthetic data offer a way to expand the amount of information that national statistical offices can publically release while maintaining respondent confidentiality. In synthetic datasets, some or all data values are simulated (synthesized) using statistical models designed to mimic the (joint) distributions of the underlying data.

Component ID: #ti1559567280

Researchers at the Census Bureau, in partnership with academic economists and statisticians through the Census Bureau’s secure research data centers, recently produced two synthetic public micro datasets. The SIPP-Synthetic Beta product combines survey data from the Survey of Income and Program Participation with administrative records from the Internal Revenue Service and the Social Security Administration (see Benedetto, Stinson and Abowd 2013). The Synthetic Longitudinal Business Database is the first business establishment-level public-use micro dataset made available by a U.S. statistical agency (see Kinney et. al. 2011).

Component ID: #ti1559567281

Research findings on the development and use of synthetic data and future usage of these data were presented in a session of the World Statistical Congress in August 2013 held in Hong Kong. These articles are accessible in the Statistical Journal of the International Society of Official Statistics.

Component ID: #ti1559567282

While synthetic data are exciting and hold great promise, there are challenges to expanding their development and use. Creating synthetic data requires significant technical expertise that is not widely available within many statistical agencies. Census Bureau progress on synthetic data has relied on robust collaboration with academic experts. Users also confront challenges. Synthetic microdata are still experimental and not as straightforward to use as conventional microdata. Because users may not understand what is involved in developing apps and online tools constructed using synthetic data, such as OnTheMap, they may understate the variance of estimates supplied by such tools.

Component ID: #ti1559567283

Synthetic data are one way for national statistical organizations to take the lead in making high quality and reliable official statistics more accessible and relevant. However, creating and supporting synthetic data requires staffing and resources beyond what are generally available to them. The Census Bureau’s “two-way-street” strategy of developing partnerships with academic and funding institutions offers a way to move forward.

Component ID: #ti1559567284

Ron S. Jarmin, Assistant Director, Research and Methodology Directorate

X
  Is this page helpful?
Thumbs Up Image Yes    Thumbs Down Image No
X
Comments or suggestions?
No, thanks
255 characters remaining
X
Thank you for your feedback.
Comments or suggestions?
Back to Header