Estimated reading time: 7 minutes
The U.S. Census Bureau recently released the 2023 Planning Database (PDB), the first PDB to include operational statistics from the 2020 Census, including metrics on online self-response.
The PDB was originally developed in the 1990s as an aid for survey and census planning. It includes housing, demographic and socioeconomic statistics from the decennial census and the American Community Survey (ACS) at the census tract and block group levels.
Its compact design, fine geographic granularity and wide number of frequently used characteristics make the PDB an ideal data product for a broad range of data users, such as academics, policymakers and local community leaders.
Historically, the PDB has been a key source for operational statistics from the decennial census. This blog describes some of the operational statistics that are available and explains how we protect the confidentiality of the underlying response data while providing useful measures to data users.
The PDB includes many operational statistics from the 2020 Census, such as:
A complete list of the operational statistics is available in the technical documentation for the 2023 PDB. These statistics are available at the census tract and block group levels for all 50 states, the District of Columbia and Puerto Rico.
Stakeholders can use this updated information from the 2020 Census to inform their own data collection operations and planning.
Note that the 2023 PDB does not contain a measure called the Low Response Score (LRS), which is derived in part from the operational statistics. The LRS is used to identify communities at risk of low self-response rates to a census or survey.
Census Bureau analysts are actively researching and updating the LRS methodology. Now that the 2020 Census operational statistics have been finalized and released to the public, we anticipate that an updated version of the LRS will be ready for publication on the 2024 PDB.
Before we could add the 2020 operational statistics to the PDB, we needed to ensure we protected the confidentiality of census respondents.
We first considered using the same disclosure avoidance system that was used for 2020 Census data products, which is based on the differential privacy framework. For those products, to maintain confidentiality, we added noise – small, random additions or subtractions – to every published statistic so no one can reidentify a specific person or household with any certainty using any combination of the published data.
However, we determined that the noise from differential privacy had too great an impact on some operational statistics for them to remain useful, so we took a different approach.
Instead of considering all operational items collectively, the team looked at each item separately – a realistic approach as many of the data collection mechanisms were inherently different than those used to collect the survey responses themselves. For each item, we considered the disclosure risk and how best to mitigate that risk.
We worked with disclosure avoidance experts at the Census Bureau to apply appropriate protections based on the following scenarios:
The above scenarios have been approved and applied to the new 2020 operational data added to the 2023 PDB. In addition to the data, this disclosure avoidance logic could also serve survey practitioners as we all think about ways to keep data confidential without loss of information.
This blog highlights both the release of 2020 Census operational statistics on the PDB and the Census Bureau’s commitment to make these data available to the public while protecting respondent confidentiality. The PDB itself is entering its third decade as a valuable resource for the Census Bureau and data users, such as scholars, researchers, partners and community groups. We anticipate that data users will find these data helpful for evaluating the quality of the 2020 Census and planning for future data collections, including the 2030 Census.