Skip Header

We are hiring thousands of people for the 2020 Census. Click to learn more and apply.

Estimating the Size of a Small Population

Wed Sep 28 2011
Robert Groves
Component ID: #ti155975153

Let me tell you a wonderful story, a statistical detective story of sorts.

Component ID: #ti1169536589

During the summer, you may have seen statistics released from the 2010 Census Summary File 1 on same-sex couple unmarried partner households.

Component ID: #ti1356766530

We noticed that reported counts of same-sex couples from the 2010 census were much higher than similar estimates from American Community Survey at earlier years. Our demographic analysts had some immediate ideas, explained nicely in this video:

Component ID: #ti1356766529

So we suspected that the format of the nonresponse followup form was the culprit. If that were the case, one should see some obvious mismatches between the name of the person written on the form and the recorded sex of that person. Bingo! A qualitative inspection of some of the records showed suspicious combinations (e.g., “Harold” recorded as a “female”). Past research led us to believe that the name entered was likely to be more accurate than the recorded sex.

Component ID: #ti1356766528

How could the unintentional mistakes be fixed? We have an analysis of the full Census that lists the percentage male and female for all first names. Some names are common for both males and females (e.g., “Leslie,” “Dana,” “Alex”). Other names are very dominantly one sex or another (e.g., “Mary,” “Thomas,” “Alicia”). Our analysts identified the names that were 95% or higher male and those 95% or higher female. Then we completely reanalyzed the entire 2010 Census. When we discovered one of the names in the two lists that had a very unlikely sex reported to it, we noted that as a likely error.

Component ID: #ti1356766527

When we count those apparent mistakes and reclassify them as a consistent name-sex pair, we found that the same-sex couples counts from the Census agree with other estimates. The best comparison is to the sample-based estimates of the American Community Survey, which moved to the improved question format in 2008. The chart below shows why we are confident that the “preferred estimates” are likely much better than the original counts.

Component ID: #ti1356766525

The chart above shows a large decrease in the number of same-sex couples when we changed the format of the American Community Survey in the 2007-2008 time period. We have evidence that the lower estimates are more accurate.

Component ID: #ti1356766524

Similarly, we are confident that the “Preferred estimates” at the rightmost bar of the chart are more accurate than the “original counts” from the 2010 Census. The logic of our analysis and repair procedure on the 2010 coding is compelling, and the closer agreement with the just-released 2010 American Community Survey results strengthens our confidence.

Component ID: #ti1356766523

This is the technical expertise of the Census Bureau at its finest – examining statistics for anomalies, detecting the cause of a found anomaly, and fixing mistakes from data collection when possible to give the country the best statistics possible.

  Is this page helpful?
Thumbs Up Image Yes    Thumbs Down Image No
Comments or suggestions?
No, thanks
255 characters remaining
Thank you for your feedback.
Comments or suggestions?
Back to Header