Based on the results of over 600 experimental data runs to optimize and tune the parameters of the new 2020 Census Disclosure Avoidance System (DAS) algorithm, the Census Bureau’s Data Stewardship Executive Policy Committee (DSEP) has chosen the privacy-loss budget (PLB) for the forthcoming set of demonstration data.
The global privacy-loss budget (PLB) for the persons file in the next demonstration data set will be 10.3 and the PLB for the housing units data will be 1.9. As discussed previously, the four demonstration products released to date used a PLB of 4.0 for persons and 0.5 for housing units—significantly lower than we anticipate using for the final 2020 Census data. Those earlier demonstration data were purposefully “tuned” to privacy and not “tuned” for producing highly accurate redistricting data. We held the PLB roughly the same across those four releases to allow us to compare the effects of incremental algorithmic improvements in the system.
While significantly larger than the PLB used in the previous data, the 10.3 PLB is still allocated in a manner that provides a level of protection for every census record and every published characteristic. For those of you trying to understand the increase in accuracy attributable to a shift from a PLB of 4 to a PLB of 10.3, it’s important to understand that the PLB is logarithmic—meaning every additional number in the PLB scale represents an exponential increase in the PLB. The forthcoming demonstration data, released as Privacy-Protected Microdata Files (PPMFs) will help data users see that increase in PLB reflected in the accuracy of population counts and demographic characteristics at various levels of geography.
The Census Bureau announced the new PLB in a declaration submitted in response to litigation on April 13, 2021.
In the same declaration we previewed some of the high-level results from the upcoming demonstration data release regarding the accuracy criteria established for the P.L. 94-171 redistricting data (see our previous newsletter for criteria details).
We report that the new demonstration data will fully satisfy those specialized accuracy criteria. Specifically, populations, voting-age populations, and the proportion of the largest OMB-designated race and ethnicity groups are all reliable for redistricting and Voting Rights Act scrutiny. Because new districts cannot be drawn before the 2020 P.L. 94-171 Redistricting Data Summary File is released, counties, block groups, minor civil divisions, incorporated places, and census designated places were all used as on- and off-spine geographic entities for tuning purposes.
The declaration also revealed high-level results of an analysis comparing the error caused by the new differentially private methods to the other sources of error that are inherent in census data (coverage error, measurement error, etc.) based on post-census analyses.
Our internal analyses have shown that 2010 Census operations resulted in an average county-level estimation of uncertainty in terms of total population of +/- 960 people (averaging 1.6% of the county census counts). The new demonstration data, by comparison, has an average error of only +/- 5 people at the county level (reflecting a mean absolute percent error of 0.04% of the counties’ population) as noise from differential privacy.
At the block level the differentially private data have an average population error of +/- 3 people, which includes both housing unit and group quarters populations. Compare that with the simulated error inherent in the census that puts the average uncertainty of block population counts at +/- 6 people.
We’ll share more information about these results in our next newsletter.
Per the calendar below, we will release the new demonstration data by April 30. We’ll release two versions to aide in your analysis: a version using the new (10.3, 1.9) PLB, and one using the earlier, development-focused PLB (4.0, 0.5).
We’ll look forward to your feedback after that release.
By April 30:
By late May:
Early June:
Late June:
By August 16:
September:
By September 30:
* Released via Census Bureau FTP site.
** Released via data.census.gov.