U.S. flag

An official website of the United States government

Skip Header


2010 Census Effectiveness of Unduplication Evaluation

Written by:
2010 Census Planning Memo No. 244

Executive Summary

Counting each person once, only once, and in the right place is the foundation of the decennial census. Oftentimes though, people have multiple places where they spend time and so could be enumerated at more than one place, creating duplication in the census.

The Census Bureau has developed computer - matching algorithms to match the census universe against itself and thus identify potentially duplicated persons. The algorithms use characteristics such as first name, last name, middle initial, age, date of birth, phone number, and geographic distance to match people. Each time a person record is matched to another person record, it is given a score that reflects the strength of the match. The scores are then ranked and the matches are reviewed to establish a cutoff point. All matches with scores above the cutoff are reliably identified as duplicate person records. Cutoffs are set very high during the review to minimize false matches being incorrectly classified as duplicates. Followup operations were expensive in the 2010 Census so resources could not be wasted on false matches. The computer-matching process only identifies potential duplicates; no individuals are removed from the census during this process. Although extensive research has been done to ensure that chance agreements of name and date of birth are not classified as matches, and while the cutoffs are high, there is still the possibility that persons matched as potential duplicates are not actual duplicates. On the other hand, computer matching will fail to identify some duplicates because of inaccurate or missing data.

The computer-matching algorithm identifies an association of one person to another, called a “link.” The Census Bureau is interested both in the individuals who are linked and in the housing units occupied by those individuals. Two linked people are considered to be a “person link.” The housing units involved in each person link are known as “housing unit links.” The census questionnaires that enumerate the linked people are known as “response links.”

The universe of all housing unit returns in the 2010 Census was matched against itself to identify people who may have been duplicated. Group Quarters returns were also included and compared to housing unit returns. For the scope of this research however, census returns were only included if they were data captured by the end of July 2010 and were in scope for the Coverage Followup operation.

The purpose of this evaluation is:

  • To document the universe of duplication cases identified in the 2010 Census,
  • To document the results of duplication cases sent to the Coverage Followup operation,
  • To document the results of the experimental questions asked to a subset of duplicated persons at the end of the Coverage Followup interview, and,
  • To convey the results of the cognitive and qualitative interviews conducted with duplication cases.

Page Last Revised - October 8, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header