The following is a text-only version of the paper "Methods Used for Small Area Poverty and Income Estimation" by Robin Fisher. All mathematical equations are presented in two forms in the paper: a text-symbolic form and an english-only form. METHODS USED FOR SMALL AREA POVERTY AND INCOME ESTIMATION Robin Fisher, US Bureau of the Census US Bureau of the Census, Washington, DC 20233 Key Words: Poverty, Small Areas, CPS Abstract Existing postcensal estimates of poverty and income at the county level are considered inadequate for various reasons: The Census is rapidly dated and the March CPS is not sufficiently reliable, especially for those counties which are not sampled by CPS. The goal of The Small Area Income and Poverty Estimates (SAIPE) project is to form these estimates. We modeled the number of poor in various age categories and median household income as a function of various variables taken from administrative records. We recognize two sources of "error" -- sampling error and model error -- and apply a shrinkage estimator to obtain estimates of number of poor or median income by county. Finally, a ratio adjustment is used to make estimates consistent with the SAIPE state estimates. We describe the methods used to obtain these estimates and their standard errors and present some empirical evaluations of the models. 0. Introduction Existing postcensal estimates of poverty and income at the county level are considered inadequate for various reasons, including the fact that the Census is rapidly dated and the March CPS is not sufficiently reliable, especially for those counties which are not sampled by CPS. The goal of The Small Area Income and Poverty Estimates (SAIPE) project is to form these estimates. We modeled the number of poor in various age categories and median household income as a function of various variables taken from administrative records. Of particular interest to this paper are number of poor, especially those aged 5 to 17 years. We recognize two sources of "error", sampling error and model error, and apply a shrinkage estimator to obtain estimates of number of poor or income by county. To form the necessary estimates of the variance components, we use a Best Linear Unbiased predictor, estimated with a modification of the MINQUE(0) estimator. Finally, a ratio adjustment is used to make estimates consistent with the SAIPE state estimates, derived independently. We describe the methods used to obtain these estimates and their standard errors and present some empirical evaluations of the models. The model was used to "predict" numbers of poor in 1990. We then compared the results to those from the 1990 Census to evaluate our model. The results of the comparisons are presented. Section 1 describes some aspects of the data. Section 2 describes the small area estimation model we used. Section 3 describes the methods we used to form the county level estimates. Section 4 discusses the estimation of the variance components, section 5 describes the calculation of standard errors and confidence intervals, section 6 the raking and the necessary adjustment to the variance, and section 7 describes the model evaluations. We conclude in section 8. 1. Data We form estimates of poverty for every county in the US. The approach is to use CPS estimates of the number of poor at the county level as the response variable in a regression equation with administrative records data as predictors. The CPS has sample in only about a third of the counties in the US. We form parameter estimates on the basis of those counties, then apply the estimated model to the remaining counties. Once we have the regression predictions, we form the Empirical Bayes (EB) estimator by taking a weighted average of the CPS direct estimate and the regression prediction. This way we can make use of the information in the individual CPS county estimates. The data we get directly from the CPS need some modification. First, the primary sampling units (PSUs) in the CPS design include collections of counties or minor civil divisions; these PSUs are chosen from one or more in the strata. The weights for observations in CPS include a factor for the inverse of the probability of selection of the PSU. When we form county-level aggregates, then, we need to multiply by the probability of selection of the appropriate PSU so these aggregates are approximately unbiased at the county level. For details of the CPS design, see (Bureau of the Census, 1996). In an effort to reduce the variance of the response in our regression model, we took a three-year weighted average of observations in the county, weighted by the number of housing units with children 5 to 17 years old in the poverty universe. This also had the effect of increasing the number of counties with any sample cases at all. Specifically, if C sub i j is the count of interviewed housing units in which at least one person age 5-17 is found in county i in year j, U sub i j is the estimated number of related persons age 5-17 in the poverty universe in county i in year j, obtained using the CPS sampling weights adjusted to represent counties, and D sub i j is the similarly estimated number of related persons age 5-17 in families in poverty in county i in year j, then the value of the poverty rate with which county i is characterized in the regressions is Text-Symbolic: S_i = (sum_{j}C_{ij}*(D_{ij}/U_{ij}))/(sum_{j}C_{ij}) English-Only: s sub i equals the sum over j of c sub i j times d sub i j divided by u sub i j all divided by the sum over j of c sub i j. The number of persons in the poverty universe in county i is Text-Symbolic: T_{i} = (sum_{j}C_{ij}*U_{ij})/(sum_{j}C_{ij}) English-Only: t sub i equals the sum over j of c sub i j times u sub i j all divided by the sum over j of c sub i j. and the number of related persons age 5-17 in families in poverty with which county i is characterized is Text-Symbolic: P_{i} = S_{i}*T_{i}. English-Only: p sub i equals s sub i times t sub i. 2. The Model We modeled the log of the 3-year average of CPS number of poor as a linear function of the logs of variables derived from administrative records data: food stamps, number of poor from tax forms, number of exemptions, population, and the last census number of poor. Put another way, we assume the regression model with two sources of "error", one associated with the deviation of the "true" county log number of poor from the mean regression curve and one associated with sampling in CPS. We will refer to the former as the county random effect or model error and the latter as sampling error. We use a best linear unbiased predictor (BLUP) with a few modifications. In the usual BLUP case, we can estimate the variance of the error components provided some assumptions about the covariance structures are satisfied. These assumptions are satisfied here, but we thought the Census could provide some information about the fit of the model itself, that is, the magnitude of the random effects variance. We therefore modeled the Census using the same methods, assuming a common variance on the random effects. Then we estimated the random effects variance from the Census of the model. A. CPS Model We assume the vector of CPS estimates of log number of poor persons for the counties has sampling properties Text-Symbolic: Y_c | mu_c ~ N[mu_c, V_{c epsilon}] English-Only: y sub c given mu sub c follows a normal distribution with mean mu sub c and variance v sub c epsilon. which should be read as a normal distribution with mean vector mu sub c and covariance matrix V sub c epsilon. We assume V sub c epsilon is diagonal. The mean vector has the distribution Text-Symbolic: mu_c ~ N[ X_c*beta_c, V_u] English-Only: mu sub c follows a normal distribution with mean x sub c times beta sub c and variance v sub u. The covariance matrix V sub u has the form v sub u times I, for some scalar v sub u. Here we assume that the random effects variance is constant across counties. We can express this as a linear model: Text-Symbolic: Y_c = X_c*beta_c + u_c + epsilon_c English-Only: y sub c equals x sub c times beta sub c plus u sub c plus epsilon sub c. where u sub c follows a normal distribution with mean 0 and variance v sub u and epsilon sub c follows a normal distribution with mean 0 and variance v sub c spsilon. The x sub c times beta sub c term contains the explanatory variables including information from the administrative records. The second term is the random effect. The last term represents the sampling error. The assumption of normality of the log number of poor, equivalent to an assumption of lognormality of the number of poor at the county level, is necessary only insofar as it justifies our calculation of variances, described below. It is also helpful for tests of hypotheses, but those are not our primary concern. B. Census Model The terms in the CPS model above are identifiable when the covariance structures v sub u and v sub c epsilon are different, but we thought that the Census might have information about the random effects variance; indeed we thought the random effects variances might be the same, so we assume the same model holds for the Census and that the random effect variance v sub u is the same for Census and CPS. Then we estimated the random effects variance from the Census data. The hope was that the Census, with its much higher precision, would yield better estimates of v sub u. The hope seems to have been borne out; see the results section. We assume a Census model similar to the CPS model. We assume the vector of Census estimates of log number of poor persons for the counties has sampling properties Text-Symbolic: Y_d | mu_d ~ N[mu_d, V_{d epsilon}] English-Only: y sub d given mu sub d follows a normal distribution with mean mu sub d and variance v sub d epsilon. Again, v sub d epsilon is diagonal. The mean vector has the distribution Text-Symbolic: mu_d ~ N[X_d*beta_d, V_u]. English-Only: mu sub d follows a normal distribution with mean x sub d times beta sub d and variance v sub u. The covariance matrix v sub u is common with that in the CPS model. We can express this as a linear model just as we did in the CPS model: Text-Symbolic: Y_d = X_d*beta_d + u_d + epsilon_d, English-Only: Y sub d equals X sub d times beta sub d plus u sub d plus epsilon sub d, where u sub d follows a normal distribution with mean zero and variance- covariance matrix V sub u and epsilon sub d follows a normal distribution with mean zero and variance-covariance matrix V sub d epsilon. The X sub d times beta sub d term contains the explanatory variables to describe the Census responses. The second term is the random effects term with variance common to the one in the CPS model. The last term represents the sampling error and is estimated with Generalized Variance Functions (GVFs) in the Census model. See (Bureau of the Census, 1993). Note the census part of the model is similar to the Fay-Herriot(1979) model. 3. Estimation The shrinkage estimator has the form Text-Symbolic: tilde{Y}_i = {Y hat}_i + (1 - a_i) ( Y_i - {Y hat}_i ) English-Only: Y tilde sub i equals y hat sub i plus open parenthesis 1 minus a sub i close parenthesis times open parenthesis Y sub i minus Y hat sub i close parenthesis. where Y tilde sub i is the estimated mean, X beta, and Y sub i is the direct estimate, when it is available. The variable a sub i is a weight which depends on the relative sizes or the variances of u and epsilon. An expression for the weights is given below. The variable a sub i is chosen to minimize the expected squared difference between the estimator and the true county log number of poor mu sub i. The rule defined by a value a sub i that minimizes the expected difference between Y tilde sub i and mu sub i is the best linear unbiased predictor and the corresponding value for a sub i is Text-Symbolic: a_i = (V_{c epsilon i})/(V_{c epsilon i} + V_u), English-Only: a sub i equals open parenthesis V sub c epsilon i close parenthesis divided by open parenthesis V sub c epsilon i plus V sub u close parenthesis. Y tilde sub i is estimated by performing a weighted least squares procedure with weights equal to Text-Symbolic: weight for county i = 1/({V hat}_u + {V hat}_{epsilon i}). English-Only: weight for county i equals one over open parenthesis V hat sub u plus V hat sub epsilon i close parethesis. The estimation of V hat sub u and V hat sub epsilon i is discussed in the next section. 4. Variance Component Estimation It remains to estimate the variances V sub u and V sub c epsilon. First we estimate the random effects variance, V sub u. This information comes from the Census, which we assume has the same random effects variance as CPS. We apply the same basic form of this model to the Census as we apply to the CPS. The GVF variances we had for the Census were for the number of poor. We needed to transform these to variances for the log number poor. Recall we assume the number poor in county i is lognormal. That is, if W sub i is the number of poor in county i and Y sub i equals the natural log of W sub i, Text-Symbolic: Y_i | mu_i ~ N[mu_i, V_{epsilon_i}] English-Only: Y sub i given mu sub i follows a normal distribution with mean mu sub i and variance V sub epsilon i. and Text-Symbolic: W_i | mu_i ~ LN[xi_i,(eta_i)^2]. English_Only: W sub i given mu sub i follows a lognormal distribution with mean xi sub i and variance eta sub i squared. We estimate eta sub i squared with Census GVFs and solve for V sub epsilon i. If we fit either the CPS or the Census regression with ordinary least squares, we have Text-Symbolic: E(SSE) = E(Y'M'MY) = E(Y'MY) = sum V_{u i}*tr(M*E) + sum V_{epsilon i}*tr(M*E) English-Only: the expected value of the error sum of squares equals the expected value of Y transpose times M transpose times M times Y, which equals the expected value of Y transpose times M times Y, which equals the sum of V sub u i times the trace of open parenthesis M times E close parenthesis plus the sum of V sub epsilon i times the trace of open parenthesis M times E close parenthesis. where e sub i is the ith column of the identity matrix, E is a diagonal matrix with e sub i as the diagonal values, and M=I-X(X'X)^{-1}X' (M equals the indentity matrix minus X times open parenthesis X transpose X close parenthesis inverse times X transpose). Now let m sub i i be the ith diagonal element of the projection matrix I-X(X'X)^{-1}X' (the identity matrix minus X times open parenthesis X transpose X close parenthesis inverse times X transpose) or one minus the leverage of the ith observation. In our application, we have V sub u i = V sub u for all i, so Text-Symbolic: E(SSE) = V_u df_e + sum V_{epsilon i}*m_{ii}, English-Only: the expected value of the error sum of squares equals the expected value of V sub u times the error degrees of freedom plus the sum of V sub epsilon i times m sub i i, and Text-Symbolic: E(MSE) = V_u + 1/{df_e}*(sum V_{epsilon i}*m_{ii}). English-Only: the expected value of the mean square error equals V sub u plus the sum of V sub epsilon i times m sub i i over the error degrees of freedom. If the V sub epsilon i have unbiased estimators V hat epsilon i, an unbiased estimator for V sub u is Text-Symbolic: {V hat}_u = MSE - (1/{df_e})*(sum {V hat}_{epsilon i}*m_{ii}. English-Only: V hat sub u equals the mean square error minus the sum of V hat sub epsilon i times m sub i i over the error degrees of freedom. See (Fay and Herriot, 1979) or (Christiansen, 1987). This is just the MINQUE(0) (Minimum Norm Quadratic Unbiased (translation invariant) Estimation, Christiansen, 1987) estimator with V sub epsilon i given. If we fit OLS to the CPS model, Text-Symbolic: E(MSE) = V_u + (1/{df_e})*(sum V_{c epsilon i}*m_{ii}) English-Only: the expected value of the mean square error equals V sub u times the sum of V sub c epsilon i times m sub i i over the error degrees of freedom. We make the assumption that V sub c epsilon i equals sigma squared times d sub i, where d sub i is known. Here, d sub i equals 1 over n sub i, where n sub i is the CPS sample size in county i. Note this is approximately equivalent to the assumption that c v squared of W sub i equals sigma squared divided by n sub i. Some other models for d sub i are examined in section 7. The equation is Text-Symbolic: E(MSE) = V_u + ({sigma^2}/{df_e})*(sum_{i} m_{ii}/n_{i}) English-Only: the expected value of the mean square error is V sub u plus open parenthesis sigma squared over the error degrees of freedom close parenthesis the sum over i of m sub i i divided by n sub i. Solving for sigma squared yields Text-Symbolic: {sigma hat}^2 = (df_e*E(MSE) - df_e*V_u)/(sum_i m_{ii}/n_{i}). \end{eqnarray*} English-Only: sigma hat squared equals open parenthesis the error degrees of freedom times the expected value of the mean square error minus the error degrees of freedom times V sub u close parenthesis divided by the sum over i of m sub i i divided by n sub i. Now we can write our estimator for V sub epsilon i in CPS for log number of poor: Text-Symbolic: V_{c epsilon i} = {sigma hat}^2/n_i = (SSE - df_e*V_u)/(n_i sum_j (m_{jj}/n_{j})). English-Only: V sub c epsilon i equals sigma hat squared over n sub i, which equals open parenthesis the error sum of squares minus the error degrees of freedom times V sub u close parenthesis divided by open parenthesis n sub i times the sum over j of m sub j j divided by n sub j close parenthesis. This estimator is an unbiased quadratic estimator for V sub c epsilon i. (Reference?) 5. Standard Errors and Confidence Intervals The expected squared error for the EB estimator, if we ignore the variance of the (a sub i)'s, is Text-Symbolic: R_i = (a_i)^2*V({Y hat}_i) + a_i*V_u. English-Only: R sub i equals a sub i squared times the variance of Y hat sub i plus a sub i times V sub u. See Henderson(1975). In many situations, the variance of the (a sub i)'s can be pretty important in that the variance contributed by the estimation of the E B weights may not be negligible compared to the other variances. One example is the Fay-Herriot model, which is very similar to the one above, except the sampling variance is known and the random effects variance is estimated. (Fay and Herriot, 1979) In that case, the estimator above tends to nonnegligibly underestimate the total variance. In our case, the CPS sampling variance is estimated from the data and, at this writing, the census sampling variance is assumed known. In this case, the underestimation of variance, as measured with the method described by Prasad & Rao(1990), seems to be negligible. It's not obvious from the equation above, but when the (a_i)'s are close to 1, the variances of the final estimates are close to that of the (Y hat sub i)'s. In our application, then, the variances of the estimates in the log scale are somewhat uniform, so the CV's of the estimated number of poor are somewhat uniform. 5.1 Transformations of Estimates back to `Number of Poor'. It remains to convert our estimates of log number of poor to estimates for number of poor. To convert the estimates themselves, we note the mean of a lognormal distribution is exp(mu + sigma^2/2) (e raised to the power open parenthesis mu plus sigma squared over 2 close parenthesis), where mu and sigma squared are the mean and variance, respectively, of the corresponding normal distribution. We form the same transformation with sigma squared equals V hat sub u. The variance estimates also need to be formed for the number of poor. We do this simply by recognizing that the variance of the log of w is approximately c v squared of w, which is an approximation that works well when the right hand side of that equation is small. 5.2 Confidence Intervals We follow the common practice of forming our confidence intervals thus: Text-Symbolic: ({Y tilde}_i - z_{alpha/2}*(R_i)^{1/2} , {Y tilde}_i + z_{alpha/2}*(R_i)^{1/2}) English-Only: open parenthesis Y tilde sub i minus z sub alpha over two times the square root of R sub i comma Y tilde sub i plus z sub alpha over 2 times the square root of R sub i close parenthesis, where z sub alpha over two is the one minus alpha over two quantile of the standard normal distribution. See Morris(1983). This gives us symmetric confidence intervals, which are not completely appropriate for the lognormally distributed number of poor. 6. Raking One of the requirements of our estimators is that they sum to the State estimates (Fay 97). They are not constrained to this in the estimation, so ratio adjustment step was included to ensure that they do. County j in state i is multiplied by the ratio of the state estimate of poor to the sum of the county estimates of poor. That is, Text-Symbolic: {W hat}_{ij} = (Y_{Tj}*{Y tilde}_{ij})/(sum_i {Y tilde}_{ij}), English-Only: W hat sub i j equals open parenthesis Y sub T j times Y tilde sub i j close parenthesis divided by open parenthesis sum over i of Y tilde sub i j close parenthesis. where W hat sub i j is the so-called raked estimator of poor in state j, county i, T sub j is the estimate of poor in state j, and Y hat sub i j is the estimate for county i in state j. The variance on this estimator is different from the unraked estimator. The new variance is approximated with a Taylor expansion about the expectations of the three factors in the above equation. Unfortunately, this depends on the covariance of the county- and state-based estimates, which has not been estimated. 7. Some Evaluation Notes It would never do to consider a model with no alternatives and not evaluate any of the assumptions. It is also nice to consider some alternatives with respect to the estimation procedure itself. In this section we examine some of these topics. 7.1 Other Models We have a list of alternative models, some suggested by our colleagues at the National Academy of Science. In this paper we consider the estimates made for 1990, for the purpose of comparison to the Census, and for 1994, and make comparisons to two other models. The first is based on the assumption that the county shares within each state are the same as at the previous Census; the Census counts, then, are simply raked to the state-based estimates of Fay. We will refer to this estimate as U1. The second estimator is based on the assumption that the county ratios, poor/population, are the same as at the previous census; the ratios from the previous Census are multiplied by the current population estimate and raked the resulting numbers to the state estimate. We refer to this model as U2. Another model modification proposed by the panel (NRC, 1997) models the log of the rate rather than the log number poor but keeps the set of dependent variables described above. We present it here because it looks competitive with the SAIPE model where a number of other proposals have turned out not to be as interesting. We refer to this model as D. A minimum requirement for our estimates is that they perform better than the Census. Models U1 and U2 seem like very straightforward improvements over the Census, and we would like our model to do better than they do. 7.1.1. Numerical Results We have several criteria for the evaluation of the models and estimates, including the traditional regression diagnostics and comparisons to the Census in 1990. We content ourselves with Table 1, which shows the mean relative difference and mean absolute relative difference between the estimates and the Census number of poor for children 5 to 17 years old for each of the three models. The relative difference is English-Only: open parenthesis model estimate of number of poor divided by census estimate of number of poor close parenthesis minus 1. Note the numbers in Table 1 are reported for the raked estimates, since the models U1 and U2 are by definition raked. Table 1: Measure of Comparison to the Census for some Models. Model Mean Relative Difference Mean Absolute Difference ------------------------------------------------------------------- U1 17.5% 29.3% U2 15.65% 27.0% SAIPE Model 2.9% 15.7% D 1.7% 17.1% Clearly the SAIPE model and the D model are better than U1 or U2, at least as we measure it here. We chose the SAIPE model over the D model partly for its improved performance with the mean absolute relative difference. There was also the consideration that the estimated rates in the D model would need to be converted into a number of poor, that being our parameter of interest, and in so doing, we would need to multiply by an estimate of a population. We were not able to evaluate the quality of the population estimates, so we were not sure what contribution they would make to either the variance or bias of the final estimates. 8. Conclusion We formulated a model to estimate the number of poor at the county level by forming the regression of CPS direct estimates on administrative records data. We suggested some criteria by which we could judge the model and showed that the model performs better by these criteria than some of the more obvious alternatives. We also examined some variations one some of our methods and assumptions and we saw that we did not do too badly compared to them, although we saw how we might make some improvements. In particular, we may use a slightly different model for the variances and we may replace the constrained MINQUE(0) estimator with the MLE. We did not consider other models in this paper. Several have been suggested, particularly some which model rates as a function of administrative records data. That has been left for another paper. 9. References Bureau of the Census (1996), "CPS Annual Demographic Survey -- March Supplement" http://www.bls.census.gov/cps/ads/adsmain.htm. Bureau of the Census (1993), 1990 Census of Population, Social and Economic Characteristics for Various States (1990 CP-2-various), Appendix C, US Government Printing Office, Washington, DC. Christiansen, Ronald (1987), Plane Answers to Complex Questions, Springer-Verlag, New York. Fay, R. E., and Herriot, R. A. (1979), "Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data", Journal of the American Statistical Association, 74, 269-277. Henderson, C. R., (1975), "Best Linear Unbiased Estimation and Prediction Under A Selection Model," Biometrics, 31, 4423-447. Prasad, N. G., and Rao, J. N. K., (1990), "The Estimation of the Mean Squared Error of Small-Area Estimators", Journal of the American Statistical Association, 85, 163-171. Acknowledgments The author is grateful to Richard Griffiths and to Carol King for their valuable comments and to Paul Siegel for his valuable contibutions. This paper reports the general results of research undertaken by Census Bureau staff. The views expressed are attributable to the author(s) and do not necessarily reflect those of the Census Bureau.