U.S. flag

An official website of the United States government

Skip Header


Release of “2010 Demonstration Metrics 2;” First Set of Post-Baseline Quality Metrics Results

This is the first release of results that measure improvements in the iterative design of the 2020 Disclosure Avoidance System (DAS). As we develop the final DAS throughout 2020 we will continue to produce new results so you can compare the impact of this work against the baseline version of the DAS: the 2010 Demonstration Data Products released in October 2019. "2010 Demonstration Metrics 2” measure the accuracy of the DAS output using the draft measures released 3/27/20. We are still evaluating user feedback on these metrics and will issue updates as we move forward.

About the Latest DAS Development Work Reflected in These Results

This revised set of metrics was calculated on a national run of the 2020 Disclosure Avoidance System (DAS) on the 2010 Census data following the conclusion of DAS development Sprint II (March 2-March 31, 2020). The most notable change implemented during Sprint II affects how the DAS TopDown Algorithm (TDA) converts the noisy measurements taken from the confidential data into the counts that will be tabulated and published, an operation that we call “postprocessing.”

Previously, the TDA conducted the postprocessing of all of the statistics for a particular geographic level at the same time. Unfortunately, as we saw in the 2010 demonstration data, the TDA had difficulty accurately performing this optimization when there were large quantities of statistics with zeros or very small values processed at the same time. The result was distortions in the data that effectively moved individuals from high- to low-density populations (e.g., from cities to rural areas, or from larger race groups to smaller race groups).

With the changes implemented during Sprint II, the TDA now conducts the post-processing in a series of passes through all the geographic levels.

At the national level, then at the state level, then at each lower level of geography, the first pass of the algorithm solely determines the population counts for each unit within that geographic level (e.g., for all census tracts within a county).

Once those total population counts are determined, the second pass of the algorithm processes just the statistics necessary to produce the redistricting data (also known as the Public Law 94-171 data file), constraining those statistics to the population counts determined in the first pass.

The third pass through the algorithm then processes the core statistics necessary to support population by age, sex, and broad race/ethnicity categories for the demographic analyses that underlie the Population Estimates program. Third-pass statistics are constrained to the sum of the statistics produced for the redistricting data.

A final pass through TDA processes the remainder of the statistics necessary for the Demographic and Housing Characteristics files and the Demographic Profiles, constraining these values to the sum of the ones produced in the third pass.

To compare apples-to-apples and better isolate the impact of iterative DAS changes, this version of the DAS uses the same global privacy-loss budget (PLB) applied to the 2010 Demonstration Data Products, ε=6.0. Of this total budget, the person records use ε=4.0, and the housing records use ε=2.0.

While more work remains to be done to further improve and optimize the DAS algorithms, these new accuracy metrics are intended to keep our data users informed of our progress in addressing the limitations observed in the 2010 demonstration data.

Page Last Revised - October 8, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header