2006 - 2009 County-Level Estimation Details

For an overview of the changes in methodology for the 2006 - 2009 estimates relative to the methodology for the 2005 estimates, see Estimation Procedure Changes.

Methodology

2006 - 2009 Estimation Procedure Changes

View changes in the estimation procedures used for school district, county, and state poverty estimates for 2006-2009 relative to the procedures used for 2005.

We estimate a regression model that predicts the number of people in poverty using single-year county-level observations from the American Community Survey (ACS) as the dependent variable, and administrative records and census data as the predictors. A single year of ACS sample is used for every county, even those below 65,000 population for which the ACS data is unpublished. Although we use only the counties with nonzero reported poverty in the ACS to estimate the equation, we make regression "predictions" for all 3,142 county-level entities in the SAIPE universe (which excludes Kalawao, HI).
The official, published direct ACS county estimates are single-year estimates only for sufficiently large counties (greater than 65,000 people); three-year or five-year accumulations of ACS data will be used in constructing estimates for smaller counties. Since modeling produces estimates with reduced sampling error, we feel we can use single-year ACS estimates for all counties in our models. We also feel it is important to do since primary uses of the SAIPE estimates (e.g., their use in allocation of federal funds) effectively involve comparing poverty estimates across places. For such uses, having all the estimates on a common basis is important, so that if we wanted to use multi-year ACS estimates for small counties, we should probably also use them for the large counties.
The model is multiplicative; that is, we model the number of people in poverty as the product of a series of predictors which are numbers (not rates) and have unknown errors. When estimating the coefficients in the model, we take logarithms of the dependent and all independent variables. While we may omit reference to logs in the description, all variables in the county regression models for numbers of people in poverty are logarithmic.
The ACS estimates for different counties are of different reliability because of the size of the sample in the counties. Our estimates take this factor into account.
To use the information contained in the direct survey estimates for the counties in the ACS with nonzero reported poverty, we combine the regression predictions with these direct estimates using Empirical Bayes (or "shrinkage") techniques. The Empirical Bayes techniques weight the contribution of the two components (regression and direct estimates) based on their relative precision.
We control the estimates for the counties of a given state to sum to the independently derived state estimate (which in turn has been controlled to sum to the ACS national estimate).
We provide a confidence interval, which represents uncertainty from both sampling and from modeling, for each estimate.

Estimation of the Model Equation

ACS sampling variances are not constant over all counties. We avoid giving observations with larger variances (a great deal of uncertainty) the same influence on the regression as observations with smaller variances (less uncertainty) by, in effect, weighting each observation by the inverse of its uncertainty. Representing this uncertainty requires recognizing that it arises from two sources:

uncertainty about where the estimates lie relative to the true values for each county (sampling error), and
uncertainty about where the true county values lie with respect to the regression surface (lack of fit).

To estimate the lack-of-fit component, we estimate the residual variance by a maximum likelihood procedure. Next we estimate the ACS regression parameters using the variance components as observational weights with a maximum likelihood procedure.

Combining Model and Direct Survey Estimates

Final estimates are weighted averages of direct ACS estimates, where they exist, and the model predictions. The two weights for each county add to 1.0, and we compute the weight on the model prediction as the sampling variance divided by the total variance (sampling plus lack-of-fit) of the direct estimate. With this technique, the larger the sampling variance of the direct estimate, the smaller its contribution and the larger the contribution from the prediction model. These weights are commonly referred to as "shrinkage weights" and the final estimates as "shrinkage" or "Empirical Bayes" estimates. For counties that have zero poor children in sample, the weight on the model's predictions is 1.0 and the weight on the direct survey estimate is zero.

Controlling to State Estimates

The last steps in the production process are transforming the county estimates from the log scale to estimates of numbers and controlling them to the independently derived state estimates. We make a simple ratio adjustment to the county-level estimates to ensure that they sum to the state totals. We control model-based estimates at the state level to the national level direct estimates derived from the ACS. We adjust the estimated standard errors of the county estimates to reflect this additional level of control. We do not control estimates of county median household income to the state medians. This would require that the estimation model produce the entire household income distribution, rather than just the median as it does now.

The estimates for the number of school-aged children in poverty are handled slightly differently. The Department of Education, a major sponsor of the SAIPE program, requires that the estimated numbers of school-aged children in poverty be integers. We use an algorithm to round the counties' estimates in a way that forces the sum of the estimates of school-aged children in poverty for the counties to sum to the estimate for the state. Note that this algorithm is first applied to the states' estimates, so they are integers and add to the integer-valued national estimate.

We do not control estimates of county median household income to the state medians. This would require that the estimation model produce the entire household income distribution, rather than just the median as it does now.

Standard Errors and Confidence Intervals

One goal of our small area estimation work is to provide estimates of the uncertainty surrounding the estimates of the numbers of people in poverty. The model-based estimates shown in the tables are accompanied by their 90-percent confidence intervals. These intervals were constructed from estimated standard errors. For the model-based estimates, the standard error depends mainly on the uncertainty about the model and the ACS sampling variance. While the variance of the shrinkage weights could also be a significant component of uncertainty about our estimates (if sizeable and ignored, we would be underestimating the standard errors), our research indicates that its contribution is negligible.

Predictor and Dependent Variables

For 2009, the dependent variable is based on the 2009 ACS. The predictor variables described below use 2008 tax, SNAP benefits, and Bureau of Economic Analysis (BEA) data and 2009 population estimates. SNAP stands for Supplemental Nutrition Assistance Program, and is the new name for the federal Food Stamp Program, as of October 1, 2008. For 2008, the dependent variable is based on the 2008 ACS. The predictor variables described below use 2007 tax, SNAP benefits, and Bureau of Economic Analysis (BEA) data and 2008 population estimates. For 2007, the dependent variable is based on the 2007 ACS. The predictor variables described below use 2006 tax, SNAP benefits, and Bureau of Economic Analysis (BEA) data and 2007 population estimates. For further information on these variables see Information About Data Inputs.

About SAIPE Model Inputs

Learn more about the surveys, censuses, and programs that contribute data and other content to SAIPE.

The Model for Total Number of People in Poverty

The model is multiplicative; that is, we model the number of people in poverty as the product of a series of predictors that are numbers (not rates), and we model the unknown errors. To estimate the coefficients in the model, we take logarithms of the dependent and all independent variables. Our choice of a multiplicative model is motivated, in part, by the fact that the distribution of the number in poverty has a huge range -- from zero in some counties to more than a million in the largest county (with a mean of 10,000), based on Census 2000 -- and the distribution is highly skewed. Taking the logarithm of all variables makes their distributions more centered and symmetrical and has the effect of diminishing the otherwise inordinate influence of large counties on the coefficient estimates. Another advantage of a multiplicative model is that it makes it plausible to maintain that the (unobserved) errors for every county, no matter how large or small, are drawn from the same distribution.

The predictor variables in the regression model used to estimate the total number of people in poverty are:

the log of the number of tax return exemptions (all ages) on returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form;
the log of the number of SNAP benefits recipients in July of the previous year;
the log of the estimated total resident population as of July 1;
the log of the total number of tax return exemptions; and
the log of the Census 2000 estimate of the total number of people in poverty.

The dependent variable is the log of the total number of people in poverty in each county as measured by the ACS. We combine the regression predictions, in the log scale, with the logs of the direct ACS sample estimates, and then transform the results into estimates of the numbers of people in poverty. Finally, we control the estimates to the independent estimates of state totals.

The Model for the Number of Related Children Ages 5 to 17 in Families in Poverty

The estimation model for related children ages 5 to 17 in poverty parallels that for all people in poverty in structure. There are five predictor variables:

the log of the number of child exemptions claimed on tax returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form;
the log of the number of SNAP benefits recipients in July of the previous year;
the log of the estimated resident population under age 18 as of July 1;
the log of the total number of child exemptions indicated on tax returns; and
the log of the Census 2000 estimate of the number of related children in poverty ages 5 to 17.

The dependent variable is the log of the number of related children in poverty ages 5 to 17 in each county as measured by the ACS. We combine the regression predictions, in the log scale, with the logs of the direct ACS sample estimates, and then transform the results into estimates of the numbers in poverty. Finally, we control the estimates to the independent estimates of state totals.

The Model for the Number of People Under Age 18 in Poverty

The estimation model for people under age 18 in poverty is quite similar. There are five predictor variables:

the log of the number of child exemptions indicated on tax returns whose adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the form;
the log of the number of SNAP benefits recipients in July of the previous year;
the log of the estimated resident population under age 18 as of July 1;
the log of the total number of child exemptions indicated on tax returns; and
the log of the Census 2000 estimate of the number of people under age 18 in poverty.

The dependent variable is the log of the number of people in poverty under age 18 in each county as measured by ACS. We combine the regression predictions, in the log scale, with the logs of the direct ACS sample estimates, and then transform the results into estimates of the numbers in poverty. Finally, we control the estimates to the independent estimates of state totals.

The Model for Median Household Income

Like the models for the number of people in poverty, the model for median household income is multiplicative. A consequence of the multiplicative form and the model performing well relative to the direct ACS estimates of median household income is that the standard errors of the estimates are proportional to the point estimates. In other words, the unobserved errors associated with high-income counties are larger than the unobserved errors in counties with high proportions of people in poverty. To estimate the model, we take logarithms of the dependent and all independent variables; i.e., the model is linear in logarithms. However, we report median household income in the linear scale and, as a result, the confidence intervals are asymmetric. The predictor variables are:

the log of the Census 2000 estimate of county median household income;
the log of the median adjusted gross income from tax returns;
the log of the proportion of the Bureau of Economic Analysis (BEA) estimate of total personal income derived from government transfers;
the log of the growth of BEA total personal income from 1999 through the target year; and
the log of the "nonfiler" rate.

We define the nonfiler rate as the ratio of estimated total population minus total exemptions claimed on IRS tax returns to estimated total population.

The dependent variable is the log of county median household income interpolated with the ACS survey.