U.S. flag

An official website of the United States government

Skip Header


2020 Census Operational Statistics on the Planning Database and Related Disclosure Avoidance Strategies

Written by:

Estimated reading time: 7 minutes

The U.S. Census Bureau recently released the 2023 Planning Database (PDB), the first PDB to include operational statistics from the 2020 Census, including metrics on online self-response.

The PDB was originally developed in the 1990s as an aid for survey and census planning. It includes housing, demographic and socioeconomic statistics from the decennial census and the American Community Survey (ACS) at the census tract and block group levels.

Its compact design, fine geographic granularity and wide number of frequently used characteristics make the PDB an ideal data product for a broad range of data users, such as academics, policymakers and local community leaders.

Historically, the PDB has been a key source for operational statistics from the decennial census. This blog describes some of the operational statistics that are available and explains how we protect the confidentiality of the underlying response data while providing useful measures to data users.

2020 Census Operational Statistics Available

The PDB includes many operational statistics from the 2020 Census, such as:

  • Bilingual questionnaire housing unit count – the number of housing units in Self Response enumeration areas and Update Leave enumeration areas (where a census taker drops off the census invitation) that received bilingual English/Spanish materials.
  • Internet self-response rate – the percentage of housing units providing a sufficient internet self-response.
  • Return rate – the percentage of occupied housing units that responded to the census on their own (online, by phone or by mail).

A complete list of the operational statistics is available in the technical documentation for the 2023 PDB. These statistics are available at the census tract and block group levels for all 50 states, the District of Columbia and Puerto Rico.

Stakeholders can use this updated information from the 2020 Census to inform their own data collection operations and planning.

Note that the 2023 PDB does not contain a measure called the Low Response Score (LRS), which is derived in part from the operational statistics. The LRS is used to identify communities at risk of low self-response rates to a census or survey.

Census Bureau analysts are actively researching and updating the LRS methodology. Now that the 2020 Census operational statistics have been finalized and released to the public, we anticipate that an updated version of the LRS will be ready for publication on the 2024 PDB.

Protecting the Privacy and Confidentiality of Respondents

Before we could add the 2020 operational statistics to the PDB, we needed to ensure we protected the confidentiality of census respondents.

We first considered using the same disclosure avoidance system that was used for 2020 Census data products, which is based on the differential privacy framework. For those products, to maintain confidentiality, we added noise – small, random additions or subtractions – to every published statistic so no one can reidentify a specific person or household with any certainty using any combination of the published data.

However, we determined that the noise from differential privacy had too great an impact on some operational statistics for them to remain useful, so we took a different approach.

Instead of considering all operational items collectively, the team looked at each item separately – a realistic approach as many of the data collection mechanisms were inherently different than those used to collect the survey responses themselves. For each item, we considered the disclosure risk and how best to mitigate that risk.

We worked with disclosure avoidance experts at the Census Bureau to apply appropriate protections based on the following scenarios:

  • No disclosure avoidance would be necessary if the Census Bureau determined the operational statistic before data collection and releasing the information produces no known disclosure concern. A good example of this is the type of enumeration area (TEA) that simply indicates the approach we planned to use to enumerate housing units in a given geographic area, such as Self-Response or Update Leave. The final TEAs were determined in 2019 (before data collection began in 2020).
  • No disclosure avoidance would be necessary if the operational statistic indicates inconsistencies between the Master Address File and what was found during data collection, because this information does not pose a disclosure risk. For example, if a census taker went to an address but there was not a housing unit there, they would remove it from the list. These are referred to as “deletes.”
  • We would round the rate to the nearest decimal and only release the denominator for certain rates. An example is the return rate, which reflects the rate of responses to the census. The numerator – the count of sufficient responses with enough information to process – is not released because together with other publicly-available data, the information could be disclosive. (The PDB includes response and return rates by mode of response.)
  • No disclosure avoidance would be necessary if the operational statistic reflects a count at a specific stage in the data collection process and this count is not a subset of a final count that should be protected. This is the case with the pre-Nonresponse Followup (NRFU) vacant housing unit numbers. Most vacant housing units are identified during NRFU, but there are some situations where a housing unit can be classified as vacant before NRFU. For example, someone could have self-reported a seasonal home as vacant. The number of pre-NRFU vacants can be much lower or even higher than the number of total vacants and so knowing it does not reveal the number of total vacants. (Note that the total number of vacant housing units was already released with noise in the 2020 Census Public Law 94-171 Summary File.)
  • No additional disclosure avoidance above what might already have been applied to the inputs is required if the operational statistic is derived using other releasable operational statistics. For example, the valid housing unit count is calculated using the number of mailed addresses in Self-Response or Update Leave areas minus pre-NRFU vacants, deletes and undeliverable addresses. In this scenario, no additional disclosure avoidance is applied to valid housing unit count beyond what disclosure avoidance, if any, was applied to the inputs.

The above scenarios have been approved and applied to the new 2020 operational data added to the 2023 PDB. In addition to the data, this disclosure avoidance logic could also serve survey practitioners as we all think about ways to keep data confidential without loss of information.

Conclusion

This blog highlights both the release of 2020 Census operational statistics on the PDB and the Census Bureau’s commitment to make these data available to the public while protecting respondent confidentiality. The PDB itself is entering its third decade as a valuable resource for the Census Bureau and data users, such as scholars, researchers, partners and community groups. We anticipate that data users will find these data helpful for evaluating the quality of the 2020 Census and planning for future data collections, including the 2030 Census.

Page Last Revised - July 10, 2024
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header