U.S. flag

An official website of the United States government

Skip Header


Selection of Predictors to Model Coverage Errors in the Master Address File

Written by:
RRS2015-04

Abstract

The U.S. Census Bureau depends on the Master Address File (MAF) to prepare address lists for the decennial census and household surveys. Accuracy of the MAF is critical to these operations. The Census Bureau has considered statistical models to help characterize and predict errors on the MAF. This work follows Young, Raim, & Johnson (Accepted, 2015) and further investigates zero-inflated negative binomial regression to model adds from the 2010 Address Canvassing operation. We consider several supplemental data sources including the Planning Database, the Longitudinal Employer-Household Dynamics data, and land use data, in addition to the database with outcomes from the operation. Collection of the 2010 Address Canvassing data was subject to a variety of influences not captured in the data. These influences include variations in field representative behavior, in-office post-processing of field data, and other operational details not available at the time of data analysis. Therefore, it is not obvious which predictors explain outcomes from the operation, and variable selection is especially critical for this analysis. We carry out an exhaustive variable selection, consisting of forward and backward selection steps, and compare candidate models by several likelihood and prediction-based criteria. This method allows us to consider two-way interactions and to rank predictors by their contribution to the model. Our initial results find that predictors based on missing delivery point type, historical coverage on the Delivery Sequence File, and IRS 1040 forms with no block ID or no MAFID to be among the most useful. The model obtained from the variable selection is shown to fit well to a majority of the blocks, but the relatively small proportion of blocks which do not fit well tend to be those with the most observed adds. Therefore, future research is needed to identify other useful predictors or to permit more heterogeneity within the model. We stress that we are not making recommendations for future Census Bureau operations; our purpose is to obtain a plausible statistical model for MAF coverage error based on the 2010 Address Canvassing outcomes.

Related Information


Page Last Revised - October 28, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header