1999 State Estimation Details

The 1999 state and county estimates of poverty and income were released in October 2002. For an overview of the changes in methodology between this release and the previous release see Estimation Procedure Changes.

Methodology

1999 Estimation Procedure Changes

The following items represent changes in the estimation procedures of the county and state income and poverty estimates from the 1998 estimates to 1999.

Several features of the 1999 state estimates should be noted.

SAIPE models combine estimates from the regression with direct estimates from the Current Population Survey (CPS) in a way that varies the importance given to the direct CPS estimates from state to state depending upon their reliability.
SAIPE multiplies model-based estimates of poverty ratios by demographic estimates of the population to provide estimates of the numbers of poor people.
SAIPE controls the state estimates of the number of poor people so that the total agrees with the direct CPS national estimates.
The CPS estimates used in the SAIPE state models for 1999 are "2000-based." This means that the CPS sampling weights are controlled to reproduce, for certain groups, updated population estimates that use Census 2000 results as a base.
SAIPE state models for 1999 use Census 2000 data in defining regression variables in the models in place of the 1990 census data that were used previously.
SAIPE state models for 1999 used Census 2000 estimates as regression variables, rather than the "census residuals" (residuals from fitting the analogous model to the census data) that were used in most cases in past years. (This is a change only for 1999; we expect to return to using census residuals in future years. See Estimation Procedure Changes for the 1999 Estimates for more discussion.)
Because the Department of Education requires estimates of the number of "related children age 5 to 17 in families in poverty," and not all children 5 to 17 are "related children," there are two sets of equations for children ages 5 to 17.
SAIPE estimates the total number of poor people as the sum of estimates derived from a set of four age-specific equations.

A brief discussion of these features follows. The models are then presented.

Bayesian Estimation Techniques

The models SAIPE used to estimate 1999 income and poverty at the state level employ both direct survey-based estimates of 1999 income and poverty from the March 2000 CPS and regression predictions of income and poverty based on administrative records and Census 2000 data. We combine the regression predictions with the direct sample estimates using Bayesian techniques. The Bayesian techniques weight the contribution of the two components (regression predictions and direct estimates) on the basis of their relative precision.

The regression model used to develop the regression predictions is postulated for the true, unobserved poverty ratios or median income, but it is fitted to the CPS direct estimates allowing for the sampling error in the data. If the variance of the error term in this regression model (the model error variance) were known, then the Bayesian estimate for each state would be a weighted average (shrinkage estimate) of the state's regression prediction and direct CPS estimate. The two weights in this average add to 1.0, with the weight on the direct estimate computed as the model error variance divided by the total variance (model error variance plus sampling error variance). In this average, the larger the sampling variance of a direct sample estimate, the smaller its contribution to the shrinkage estimate, and the larger the contribution from the regression prediction. Since the model error variance is unknown, the Bayesian approach averages the shrinkage estimates computed over a plausible range of values of the model error variance, weighting the results for each of these values according to the posterior (conditional on the data) probability distribution of the model error variance developed from the Bayesian calculations. The result is generally very close to what one gets by estimating the model error variance by the mean of its posterior distribution and computing the corresponding shrinkage estimate. Technical details of the Bayesian approach are discussed in the paper, "Accounting for Uncertainty About Variances In Small Area Estimation," (Bell 1999) in the Working Papers section of this web site.

SAIPE Working Papers

Census Bureau research made available to others to encourage discussion on a range of topics. They have not undergone official review or the editorial process.

Prior Distribution for Regression Parameters

Bayesian estimation requires that a "prior probability distribution" be specified for the model parameters to reflect what we know about them prior to fitting the model to the data. In past years we used a noninformative prior that conveyed no useful prior information - the prior distribution was simply a constant over all parameter values. For 1999, however, we specified an informative prior distribution for the regression coefficients of the administrative records predictors in the poverty ratio models. (We still used a noninformative, constant prior for the other model parameters.) We made this change because of the unique status of the census poverty ratio estimates when they refer to the same year as the CPS estimates being modeled rather than to an earlier year. Since the 2000 census and 2000 CPS both estimated poverty for 1999, albeit with differences in regard to sampling and nonsampling errors, there is reason to expect the administrative records regression variables in the CPS models to be much less relevant in 1999 than in other, non-census years. Empirical verification of this came from model fits to 1990 CPS poverty ratio estimates for IY 1989 using 1990 census data in the model: the fits showed the coefficients on the administrative records variables to be statistically insignificant for all age groups. To reflect our prior knowledge that the administrative records variables should be less relevant for 1999 than for non-census years, and that in fact for 1999 they may not be useful at all, we used a prior distribution for their regression coefficients that had mean zero and had variance determined in a manner we thought to be "conservative" in the sense of avoiding placing too much confidence in the prior.

Results from the poverty ratio model fits for IY 1989 were used in developing the prior variances of the administrative records variables' regression coefficients. In fact, we simply took the variance matrix of these coefficients from the fitted model and multiplied it by 4 to reflect additional uncertainty about how these results from 1989 would translate to 1999. This doubled the standard deviations of these parameters from the 1989 model fits to reflect a relatively mild amount of prior information ("conservative assumptions.")

The use of the informative prior reduced the uncertainty about the corresponding regression coefficients in the 1999 poverty ratio models. It thus yielded small reductions in the state prediction error variances, and hence slightly narrower confidence intervals for the true poverty ratios. The prior had very minor effects on the point estimates.

We did not use an informative prior distribution for the coefficient of the administrative records variable (IRS median adjusted gross income) in the 1999 CPS median income model, because in the median income model we fit for IY 1989, we found the IRS variable to be statistically significant, even with the census data in the model. The same result also held in the 1999 median income model.

Poverty Ratios and Numbers of Poor People

Deriving state-level estimates of the numbers of poor people of various ages involves two steps. The first step is to apply the Bayesian estimation techniques applied to CPS direct state estimates of "poverty ratios." The second step is to multiply the resulting model-based poverty ratio estimates by corresponding demographic population estimates to convert the results to estimates of the numbers of poor people of various ages.

The poverty ratios used as the dependent variables in the regression models have the CPS direct-estimated number poor of the given age in the numerator and the CPS direct-estimated noninstitutional population of the given ages in the denominator. These "poverty ratios" differ from official poverty rates which would use the CPS estimated poverty universes of the given age as the denominators. (For a discussion of the differences between the noninstitutional population and the poverty universe see Denominators for Model-Based State and County Poverty Rates).

Denominators for Poverty Rates

Adjusted estimates of population used as denominators in the poverty rates.

We use CPS estimated numbers in both the numerator and denominator of the poverty ratios because positive correlation between the two estimates generally makes the resulting poverty ratio estimate more precise than one obtained with a CPS estimated numerator and a demographic population estimate in the denominator. We multiply the model-based poverty ratio estimates by demographic population estimates, however, because the demographic estimates are deemed more reliable than CPS direct population estimates, which contain substantial sampling error for most states. The CPS controls survey weights only to estimates of the population age 16 and over at the state level, and we are making estimates for more specific age groups.

While we have multiplied model-based poverty ratio estimates by population estimates at the state level, we have not addressed the county-level estimation in the same way, because the estimates of the populations of counties by age are likely to be much less stable than the state population estimates, and little is known about their error structure. Thus, for counties, we directly model (logarithms of) CPS estimates of the number of poor people.

Controlling to the National Estimates

After converting the Bayesian estimates of poverty ratios to state estimates of numbers of poor, we control these estimates to the direct national estimate of number poor based on the CPS. We do not control estimates of state median household income to the national median because the estimation model does not produce the entire household income distribution, which would be required to do so.

Using Estimates from Census 2000 in the Models

The Census 2000 estimates provide regression predictor variables for each of the age-specific poverty ratio models and the median income model. The specific variables are documented below. The models use the actual census estimates, rather than "census residuals" as was generally done in previous years. However, since 1999 is a census year, the models using census estimates as a regression variable are equivalent or nearly equivalent to models using census residuals for the purpose of developing model-based estimates of poverty ratios or median income. For further discussion of this point see Estimation Procedure Changes for the 1999 Estimates on this web site.

Methodology

1999 Estimation Procedure Changes

The following items represent changes in the estimation procedures of the county and state income and poverty estimates from the 1998 estimates to 1999.

The Poverty Ratio Models

The model of 1999 state poverty ratios employs the following predictors:

an intercept term.
the 1999 "tax return poverty rate" for the age group. The numerator of this rate is defined as the number of exemptions entered on returns for which the adjusted gross income falls below the official poverty threshold for a family of the size implied by the number of exemptions on the return. For the age 5-17 and 65 and over poverty models, we use poor child exemptions and poor age exemptions, respectively, in the numerator. For the other age groups, we use poor exemptions of all persons under age 65. The denominator of this rate is the state population estimate from Census 2000, for the age group corresponding to that used in the numerator, except for 5-17 for which the denominator is the total state child exemptions.
the 1999 "nonfiler rate". Defined as the difference between the estimated population under age 65 and the number of exemptions under age 65, expressed as a percentage of the population under age 65 for all except the 65 and over model. For the 65 and over model we use estimates of the population over age 65 and the number of age exemptions.
the 1999 Supplemental Security Income recipiency rate. Defined as the 12-month average number of state SSI recipients age 65 years and over for 1999 divided by the state population of that age estimated from Census 2000. This variable is used only for the 65 and over model.
the Census 2000 poverty ratios for 1999 for the relevant age group. Note that this is the only predictor variable that refers specifically to the age groups being modeled.

Note the population estimates for other years use July 1, demographic population estimates in the denominators. For 1999 estimates we use Census 2000, which represents April 1, 2000 population.

For further information on these variables, go to Information about Data Inputs.

The dependent variable is the 1999 state estimate of the ratio of the number poor for the relevant age group to the noninstitutional population of that age with both the numerator and denominator estimated from the March 2000 CPS.

About SAIPE Model Inputs

Learn more about the surveys, censuses, and programs that contribute data and other content to SAIPE.

Estimating the Total Number of Poor People

We derive the estimate of the total number of poor people in a state by summing the separate model-based estimates of the number of poor people by age (not limited to related children). The age groups with separate models were 1) people under 5 years of age, 2) people age 5 to 17 years, 3) people age 18 to 64 years, and 4) people age 65 years and over.