An official website of the United States government
Here’s how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
The U.S. Census Bureau depends on the Master Address File (MAF) to prepare address lists for the decennial census and household surveys. Accuracy of the MAF is critical to these operations. The Census Bureau has considered statistical models to help characterize and predict errors on the MAF. This work follows Young, Raim, & Johnson (Accepted, 2015) and further investigates zero-inflated negative binomial regression to model adds from the 2010 Address Canvassing operation. We consider several supplemental data sources including the Planning Database, the Longitudinal Employer-Household Dynamics data, and land use data, in addition to the database with outcomes from the operation. Collection of the 2010 Address Canvassing data was subject to a variety of influences not captured in the data. These influences include variations in field representative behavior, in-office post-processing of field data, and other operational details not available at the time of data analysis. Therefore, it is not obvious which predictors explain outcomes from the operation, and variable selection is especially critical for this analysis. We carry out an exhaustive variable selection, consisting of forward and backward selection steps, and compare candidate models by several likelihood and prediction-based criteria. This method allows us to consider two-way interactions and to rank predictors by their contribution to the model. Our initial results find that predictors based on missing delivery point type, historical coverage on the Delivery Sequence File, and IRS 1040 forms with no block ID or no MAFID to be among the most useful. The model obtained from the variable selection is shown to fit well to a majority of the blocks, but the relatively small proportion of blocks which do not fit well tend to be those with the most observed adds. Therefore, future research is needed to identify other useful predictors or to permit more heterogeneity within the model. We stress that we are not making recommendations for future Census Bureau operations; our purpose is to obtain a plausible statistical model for MAF coverage error based on the 2010 Address Canvassing outcomes.
Share
Related Information
WORKING PAPER
Statistical Research Reports and StudiesSome content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.
Top