The following is a text-only version of the paper "A Small-Area View of Poverty from the Panel" by Alan M. Zaslavsky. There are no mathematical equations in this paper. A SMALL-AREA VIEW OF POVERTY FROM THE PANEL Alan M. Zaslavsky, Harvard University Department of Health Care Policy, 180 Longwood Ave., Boston, MA 02115-5899 Keywords: small-area estimation, empirical Bayes, Current Population Survey The three papers presented today represent the current state of a vastly complex and challenging project. I address this project as a member of the Panel on Estimates of Poverty for Small Geographic Areas of the National Academy of Sciences (NAS). Although I will summarize the panel's preliminary conclusions, for an official statement please refer to the interim report (Citro, Cohen, Kalton and West 1997), available from National Academy Press and at the NAS Web site (http://www.nap.edu/readingroom/ - select Behavioral and Social Sciences and then Small area estimates of poverty). Also, as this paper appears our next report should also be out, with updated recommendations regarding estimates to be released by the Census Bureau in October, 1997. This paper represents my own views and not necessarily those of the panel or the NAS. 1 View from the panel History of the project The current small-area estimates program stems from 1994 amendments to the Elementary and Secondary Education Act. Until this year, ESEA Title I funds for school districts with poor children have been distributed to states based on poverty rates and numbers of children in poverty calculated from the most recent census. Consequently, at the end of the decennial cycle, allocations could be based on incomes from as much as twelve years before. The 1994 amendments called for development by the Census Bureau of updated estimates for states and counties which could be fed into the complex allocation formula to yield state and county funding levels, to be updated every other year throughout the intercensal period. The legislation also called for establishment of a NAS panel to evaluate the acceptability of the estimates as a basis for fund allocation. The panel would make a recommendation to the Secretaries of Commerce and Education, who would then decide whether to use the updated estimates or continue to allocate funds on the basis of 1990 census estimates. The Census Bureau's research program was well underway by 1995, and the panel was formed in June, 1996. Estimates were developed beginning in September, 1996 and first released in January, 1997, with advance release of the Panel's report in February. In March, 1997, the Secretaries adopted the panel's recommendation to base allocations on a compromise between the old (1990 census) and updated poverty estimates. The panel has had an unusual role in this process. Most NAS panels are charged to study a policy-relevant issue in general terms, but in this case the panel's recommendation was recognized by statute. In my opinion, the collaboration between the Census Bureau and the panel has been very successful, with intense interchange on technical and policy issues of the program. Furthermore, despite the implications of the work for funds distribution, there has not been political interference from either the legislative or executive branch, except for the sense of urgency expressed by all parties for moving forward to the new process. Interim findings: The panel's findings included a number of points in favor of the revised estimates. First, there is strong evidence that there have been substantial changes in the national distribution of poverty between 1989 (the reference year of income data from the 1990 census) and 1993 (the reference year of the 1994 Current Population Survey). Second, model-based procedures are essential to the production of estimates that are more accurate for 1993 than carrying forward the census-based estimates. None of the survey data sources available (in particular, the Current Population Survey) are capable of supporting direct estimates of poverty with adequate precision for any but the largest states, much less for counties. Third, the panel generally evaluated the methods used for modeling poverty as being in line with the current state of the art for small-area estimation, although some important issues about the methodology are still to be resolved, especially for the county-level model. Finally, the panel found that the Census Bureau's model-based estimates for 1993 probably improve on the 1989 estimates (based on the 1990 census) as estimates of poverty distribution in 1993. Neither of the two forms of evidence for this belief gives an entirely conclusive answer. One uses model-based estimates of error, although these are to some extent dependent on the accuracy of the model specification and indirect estimates of CPS sampling variance. The other evaluation compares the model and the carried-forward (1980) census, treating the census in a later year (1990) as the "gold standard." Aggregate measures of error were much smaller for the 1990 CPS-based estimates than for estimates based on various naive models, i.e. simple updates of the 1980 census. Both the state and county components of the model contributed to accuracy. Unfortunately, 1990 is the only year for which this comparison could be implemented, so it is difficult to generalize to other years for refined comparisons among closely competitive models. Despite these generally positive conclusions, the panel also had several concerns about moving to a set of fully model-based estimates. These included the following: 1. There appeared to be some differences between the census and CPS measures of poverty, although it was difficult to tell whether they were systematically related to features of the counties. This is unsurprising, because the CPS income questions are much more detailed than those on the census long form. Nonetheless, we were cautious about implicitly moving to a new poverty definition, even one that might prove to be superior. (Another NAS report (Citro and Michael 1995) deals at length with poverty measures.) 2. We were also concerned about possible biases due to the model specification, including biases related to size which might occur as artifacts of the nonlinearities of the county model specification. 3. We were dissatisfied with the exclusion of sampling zeros from the data set. 4. The discrepancies between predictions for states from the state model, and those obtained by aggregating predictions from the county model (before raking) up to states, were substantial. In other words, the raking factors for adjusting the county model to the state model estimates were quite variable. This does not necessarily indicate that the combined state-county procedure gives inaccurate results but suggests some lack of fit in the county model. 5. The lack of direct variance estimates for CPS estimates in small areas made it necessary to estimate CPS variances in an indirect and highly model-dependent manner. 6. The quality of the postcensal estimates, required for calculating the rates used in the allocation formulae, was unknown. 7. Because of unavoidable delays in the production of data required for the models, allocations for the school year beginning in 1997 would be based on estimates for the 1993 income year. Because of these uncertainties, the panel did not recommend unmodified use of the model-based estimates for 1997. Instead, it proposed averaging estimates of poverty rates from the 1990 census and the 1993 CPS-based model, and multiplying these by 1993 population estimates for counties to obtain estimated poverty counts by county. The panel did not claim to have devised a new and superior estimation method, but rather that uncertainties about the new methods were great enough that it was desirable to moderate the impact of the shift. This recommendation was accepted by the Secretaries of Commerce and Education, and became the basis for Title I funds allocations for the current school year (1997-1998). The panel also emphasized the need for continued research to validate and improve the models. Since the report was issued, the Census Bureau has prepared, and both Census Bureau staff and the panel have studied, diagnostics for model fit and comparisons among direct CPS estimates, model-based estimates, and census-based estimates. Research has continued on evaluation of postcensal population estimates and estimation of CPS sampling variances. Alternative model specifications have also been considered, including models for rates or log rates by county, a bivariate model for census- and CPS-year poverty rates, an integrated state-county model (county model with state effects), and a generalized linear model (capable of handling sampling zeros in the CPS). I would add that over the course of several years, enough data will accumulate to support more refined evaluations of possible systematic biases in the models and to distinguish these from accidental discrepancies that are consequences of patterns in poverty distribution that vary each year. School district estimates: the Final Frontier Finally, the Census Bureau and the panel have begun to confront the challenges involved in calculating estimates for school districts, which the legislation anticipates phasing in later in the decade. In all respects, this is far more challenging than the county-level estimation problem. There are many more school districts (> 16,000) than counties (about 3000), and many of them are extremely small. Even defining the districts is difficult, because they sometimes overlap, often change their boundaries, and in many cases have school-age populations that are quite different from their school enrollments. Therefore, geographic issues become crucial at this level. (The Census Bureau is conducting a major effort to update maps of school districts.) Because of the paucity of data sources for extremely small geographic units, the methodology for county-level estimates is unlikely to translate directly and uniformly to school districts. (The new American Community Survey may be help, however.) Creating school district estimates of poverty will require another unprecedented effort. 2 Discussion of the papers Fisher and Siegel: For county-level estimation, a normal homoscedastic linear model is assumed for underlying (logged) poverty counts, with predictors that are also (logged) counts. A simple relationship (variance inversely proportional to sample size, with a single proportionality constant) is assumed to describe CPS sampling variances. The constant in the sampling variance function is estimated indirectly, assuming constancy of model error variances across years. Counties with sampling zeros, which cannot be accommodated in the loglinear model, are omitted from the dataset. Many specification decisions are embedded in this approach, and alternatives can be explored for many of them, e.g. nonconstant model variances, modeling of rates rather than counts, and more complex variance functions. Perhaps the loglinear model is both too simple and too hard to understand: it does not use current technologies for nonlinear modeling with random effects, and it is hard to explain the aggregate behavior of the model when the model assumptions (homoscedasticity and linearity on the logged scale) are not satisfied. I would like to see this research move toward a generalized linear model framework, such as a quasilogistic model for poverty rates. Random effects logistic models are routinely fitted by educational and health services researchers using commercially available software such as MLn and HLM. The design of the CPS complicates the application of such models. Nonetheless, a more appropriate regression structure with an approximation to survey design effects might give more sensible results than a simplified regression structure with elaborate estimates of variances and covariances under the design (which have not been developed yet for this application, anyway). Fay and Train: This elegant, careful work was by and large accepted by the panel. The authors have also investigated a number of promising alternatives not represented in this paper, such as multivariate modeling of poverty in several age groups. Sampling variances were related to rates by a relationship proportional to the p(1-p) relationship for binomial data. This is roughly equivalent to fitting a generalized linear model with binomial likelihood and a design effect. This is a big step in the direction of the quasi-logistic model suggested above, and a creative method for bridging the gap between design- and model-based analyses. I believe that the full benefits of this method will be obtained when it is applied to modeling smaller domains. Bell and Otto: This technically ambitious venture focuses on one of several possible directions for model expansion, namely multivariate modeling of several years of CPS data. Other directions that can be approached with similar technologies include a measurement error model for census year data (now being researched by Bell), use of additional covariates measured with error, multivariate modeling of several "outcomes" (e.g. poverty in several age groups), and more complex relationships among counties (e.g. spatial modeling). To make these methods work well, more precise sampling variance-covariance estimates are needed. I hope that the authors will investigate estimation methods making more use of the full CPS design, including the structure of rotation groups and segments (Dempster and Hwang 1993). Rough bounds might be obtained for the amount of additional information contributed by each year of CPS data. It might be very difficult to choose conclusively among the various possible specifications of the autocorrelation model in the time series approach, using these short noisy series. Even the assumption of stationarity is suspect, as autocorrelations are partly driven by irregular short-term economic trends. However, it may be that the consequences of the various models for prediction are not very different. For example, the random walk model is conservative in its use of past data although it is not plausible as a model for long-term dynamics. Conclusion: The authors are again to be congratulated for their progress. We can expect this program to contribute both to statistical methodology in general and to the ability of government statistics to make use of sophisticated and dynamic methods. References Citro, C. F., Cohen, M. L., Kalton, G. and West, K. K., eds. (1997), Small-Area Estimates of SchoolAge Children in Poverty: Interim Report I: Evaluation of 1993 County Estimates for Title I Allocations, Washington: National Academy Press. Citro, C. F. and Michael, Robert T., eds. (1995), Measuring Poverty: A New Approach, Washington: National Academy Press. Dempster, A. P. and Hwang, J.-S. (1993), "Component models and Bayesian technology for estimation of state employment and unemployment rates," Proceedings of the Bureau of the Census Annual Research Conference, 571-581.