Quality in a Census Part 5

Comparing the census counts to alternatives is one way to get a sense of how good the census is; if all alternative ways of estimating the size of the population yield about the same conclusions, we feel good about the results. The last post on quality described “demographic analysis” as an alternative way of measuring the population of the United States.

This post describes the post-enumeration sample survey approach to measuring the population. (This decade’s version of the post-enumeration survey is called “Census Coverage Measurement.”) Post-enumeration sample surveys have been part of US censuses for some time. For 2010 a survey will be used to estimate the number of persons missed by the census as well as those erroneously enumerated (e.g., duplicates and visitors from other countries).

A post-enumeration survey draws two samples – one from the full population in complete ignorance of whether sample cases were covered in the census; another from the census address universe. After a survey is done, each address and each person enumerated within the address are carefully matched to the census file, as a way of determining which cases were captured by both the census and the survey or by only one of the methods. From this matching operation, estimates of the misses and erroneous enumerations are made.

Just like demographic analysis, if the post-enumeration sample survey achieves its ideal form, it offers completely accurate estimates of differential undercount, the tendency for some populations to be covered by the census less well than others. However, the ideal post-enumeration survey is never achieved.

In an ideal post-enumeration sample survey,

a) The likelihood that a sample person is measured in the survey is completely independent of the likelihood that the person is measured in the census. (More loosely stated, those who were covered by the census and those not covered by the census have the same probability of being measured in the survey.)

Problem: This independence is not fully achievable.

Fix: The field staff working on the post-enumeration survey are different from those working on the census, and use different materials. Overlap between the operations is kept to a minimum. This means that the operations are kept independent, but if those who are reluctant to respond in a census are also reluctant to respond in the survey, the problem remains.

b) The probability of being captured in the census is the same for all persons and for those in the survey all persons have the same probability of being measured.

Problem: The assumption is violated in the census and the survey; the probabilities of capture range widely among people with different life styles (e.g., very mobile young singles who live by themselves vs. nuclear families).

Fix: Group people with similar characteristics who share similar probabilities of being captured in the census; use statistical models to reduce the effect of violating this assumption.

c) The respondent to the survey correctly reports his or her April 1, 2010, residence and household composition.

Problem: The survey interviewing begins in mid-August, 2010; some persons, especially people who have moved, may have difficulty recalling their April 1 residence status.

Fix: Additional fieldwork and statistical models can be used to mitigate the effect of incomplete information and reporting errors. Further, an auxiliary study has been mounted to estimate what difficulties persons have in accurately reporting their April 1 residence.

d) The survey operation collects the information requested completely.

Problem: Not all persons in the sample survey will be contacted or agree to participate; those who don’t may have distinctive characteristics on geography or person-level attributes.

Fix: We try to contact proxies, neighbors or others who know the nonrespondent cases. Statistical models will also be used in an attempt to remove the nonresponse errors from the sample survey.

Finally, an inherent weakness of a post-enumeration survey is that it is based on just a sample of the population, not the total population. This means that all estimates from it are subject to instability due to sampling variability. This, however, can be measured, just as we are accustomed to seeing in surveys as “margins of error” or the sampling error figures.

As this admittedly high-level description itself demonstrates, the statistical complexity of the post-enumeration survey is high; the technical expertise required to construct the estimates and evaluate them is considerable. Whenever possible, the violations of assumptions in the estimates of undercount will be themselves investigated, but no one involved believes that perfection will be attained. Instead, as with demographic analysis, we will openly document the strengths and weaknesses of the estimates from the post-enumeration survey, so that all can form their judgments about the utility of the estimates to evaluate the census.

We will not have the statistical results of the post-enumeration survey until 2012, so we have to wait awhile to compare this alternative way of measuring the population to the 2010 census. Stay tuned!