Sampling Estimation & Survey Inference

Motivation:

Survey sampling helps the Census Bureau provide timely and cost efficient estimates of population characteristics. Demographic sample surveys estimate characteristics of people or households such as employment, income, poverty, health, insurance coverage, educational attainment, or crime victimization. Economic sample surveys estimate characteristics of businesses such as payroll, number of employees, production, sales, revenue, or inventory. Survey sampling helps the Census Bureau assess the quality of each decennial census. Estimates are produced by use of design-based estimation techniques or model-based estimation techniques. Methods and topics across the three program areas (Demographic, Economic, and Decennial) include: sample design, estimation and use of auxiliary information (e.g., sampling frame and administrative records), weighting methodology, adjustments for non-response, proper use of population estimates as weighting controls, variance estimation, effects of imputation on variances, coverage measurement sampling and estimation, coverage measurement evaluation, evaluation of census operations, uses of administrative records in census operations, improvement in census processing, and analyses that aid in increasing census response.

Research Problems:

How to design and analyze sample surveys from "frames" determined by non-probabilistically sampled observational data to achieve representative population coverage. To make census data products based jointly on administrative and survey data fully representative of the general population, as our current surveys are, new sampling designs and analysis methods will have to be developed.
How can inclusion in observational or administrative lists be modeled jointly with indicator and mode of survey response, so that traditional survey methods can be extended to merged survey and non-survey data?
Can non-traditional design methods such as adaptive sampling be used to improve estimation for rare characteristics and populations?
How can time series and spatial methods be used to improve ACS estimates or explain patterns in the data?
Can generalized weighting methods be formulated and solved as optimization problems to avoid the ambiguities resulting from multiple weighting step and to explicitly allow inexact calibration?
What models can aid in assessing the combined effect of all the sources of sampling and nonsampling error, including frame coverage errors and measurement errors, on sample survey estimates?
What experiments and analyses can inform the development of outreach methods to enhance census response?
Can unduplication and matching errors be accounted for in modeling frame coverage in censuses and sample surveys?
How can small-area or other model-based methods be used to improve interval estimates in sample surveys, to design survey collection methods with lowered costs, or to improve Census Bureau imputation methods?
Can classical methods in nonparametrics (e.g., using ranks) improve estimates from sample surveys?
How can we measure and present uncertainty in rankings of units based on sample survey estimates?
Can Big Data improve results from censuses and sample surveys?
How to develop and use bootstrap methods for expressing uncertainty in estimates from probability sampling?

Current Subprojects:

Optimization-based (single-stage) approaches to Weight-adjustment for Probability and Nonprobability Samples (Slud, Morris)
The Ranking Project: Methodology Development and Evaluation (Wright,Klein/FDA, Wieczorek/Colby College, Yau)
Optimal Sample Allocation and Apportionment (Wright)
Optimal stratification in economic surveys, using multiple measures of size and multiple survey outcomes (Slud, Joyce)
Machine Learning projects related to non-response segmentation Mindsets for decennial outreach (Mulry, Morris, Scheid/DSSD), or to Frames (Weinberg, Slud)
Methods of estimating variances for survey estimates combining model- and design-based estimates, and simulation studies of bias when the design-based methods include Replication Methods in domains with small sample-size (Slud, Trudell)
Analyses supporting improvement of household rosters for census nonresponders that are projected to be occupied and to have high quality administrative records. (Mulry).

Potential Applications:

Improve estimates and reduce costs for household surveys by introducing new design and estimation methods.
Produce improved ACS small area estimates thorough the use of time series and spatial methods, where those methods improve upon small area methods using covariates recoded from temporal and spatial information.
Streamline documentation and make weighting methodology more transparent by applying the same nonresponse and calibration weighting adjustment software across different surveys.
New procedures for adjusting weights or reported values in the monthly trade surveys and surveys of government employment, based on statistical identification of outliers and influential values, to improve accuracy of estimation monthly level and of month-to-month change.
Provide a synthesis of the effect of nonsampling errors on estimates of net census coverage error, erroneous enumerations, and omissions and identify the types of nonsampling errors that have the greatest effects. Employ administrative records to improve the estimates of census coverage error.
Measure and report uncertainty in rankings in household and economic sample surveys.
Develop bootstrap methods for expressing uncertainty as an alternative source of published variance estimates and as a check on existing methods of producing variances in Census Bureau sample surveys.

Accomplishments (October 2018-September 2020):

Contributed to team development of methods for producing differentially private decennial census tabulations conforming to legally mandated error-free disclosure of block-level population totals under Public Law 94 as well as to Title 13 requirements for nondisclosure of individual-level data.
Developed novel optimization-based weighting adjustment methods based on partially missing data, along with diagnostics based on cross-classified post-stratification variables.
Demonstrated the potential for a market segmentation from an external source to improve self-response propensity models using data from the 2010 Census and the American Community Survey.
Demonstrated that market segmentation from an external source aid in providing useful information about problems in the Census enumeration of young children.
Established theoretical limitations on consistent estimation of variance component parameters from informatively sampled complex survey data based only on single-inclusion weights.
Developed a simple and novel measure of uncertainty for an estimated ranking with theory, using American Community Survey travel time to work data, and with a visualization.
Extended the current equal proportions methodology by appealing to probability sampling results.
Developed a general exact optimal sample allocation algorithm with bounded cost and bounded stratum sample sizes.

Short-Term Activities (FY 2021 – FY 2023):

Extend Machine Learning approaches to non-response segmentation and frame changes.
Develop optimal stratification in economic surveys, using multiple measures of size and multiple survey outcomes.
Document biases of SDR design-based variance estimates for survey-weighted totals in small domains, and what survey design and attribute features they depend on.
Continue research into post-stratified weight adjustment methodology and assessment of weights, with application to low-response probability surveys and non-probability data collection as in the Tracking Survey.
Extend research into stratification methodology for economic surveys based on multiple MOS variables and multiple outcomes.
Continue research into alternative techniques for statistical nondisclosure control motivated by randomize-response techniques
Improve methodology for measuring uncertainty in rankings.
Extend methodology for exact optimal sample allocation and apportionment.

Longer-Term Activities (beyond FY 2023):

Extension of Census Matching capability to non-PIK persons using Administrative Records, Duplicate Status and Post-Enumeration Survey data for evaluation of Matching quality.
Develop software that is re-usable and easily implementable for small area prediction within language minority groups in connection with the determinations of ballot language assistance by jurisdiction and American Indian Area under Section 203 of the Voting Rights Act.
Further investigate the statistical implications and assumptions of formal privacy (e.g., differential privacy) methods in order to understand how the methods may impact the use of data products and to develop estimates of variability of released data that has been privatized by noise infusion.
Develop statistical methods and theory related to the use of differential privacy to release data from unequal probability sampling surveys. A specific focus of this research would be on how to account for the sampling probabilities/weights in the planning of the privacy budget.
Develop probability sampling methods targeted to the complement of an administrative records database within a survey frame such as the MAF; this research will require combining statistical models for joint dependence of administrative records and survey or census response, to be incorporated into new response propensity models in terms of which the survey data can be analyzed.
Develop spatial models and associated small area estimation techniques in terms of Generalized Linear Mixed Models (GLMMs) with covariates recoded to incorporate local spatial geographic/demographic/economic effects, and compare the performance of these models with Bayes-hierarchical models currently being developed elsewhere at the Census Bureau using American Community Survey data. Such GLMM spatial models may also be applicable to the evaluation of canvassing and address status changes in the MAF.

Selected Publications:

Mulry, M.H. and Mule, V.T. (2022). “Advances in the Use of Capture-Recapture Methodology in the Estimation of U.S. Census Coverage Error,” In Recent Advances on Sampling Methods and Educational Statistics. In Honor of S. Lynne Stokes. Editors Hon Keung Tony Ng and Daniel F. Heitjan, 93–116, ISSN 2524-7735, https://doi.org/10.1007/978-3-031-14525-4

Slud, E., Hall, A., and Franco, C. (In Press). “Small Area Estimates for Voting Rights Act Section 203(b) Coverage Determinations,” Calcutta Statistical Association Bulletin.

Nayak, T.K. (2021). “A Review of Rigorous Randomized Response Methods for Protecting Respondent's Privacy and Data Confidentiality,” in Methodology and Applications of Statistics: A Volume in Honor of C.R. Rao on the Occasion of his 100th Birthday, ed. B.C. Arnold, N. Balakrishnan and C.A. Coelho, New York: Springer, pp. 319-341.

Wright, T. (2021). “From Cauchy-Schwartz to the House of Representatives: Application of Lagrange’s Identity,” Mathematics Magazine, Vol 94, 244-256.

Mulry, M., Bates, N., and Virgile, M. (2021). “Viewing Participation in Censuses and Surveys through the Lens of Lifestyle Segments” (print), Journal of Survey Statistics and Methodology, doi:1093/jssam/smaa006.

Zhai, X., and Nayak, T.K. (2021). “A Post-randomization Method for Rigorous Identification Risk Control in Releasing Microdata,” Journal of Statistical Theory and Practice, 15, Article 8, https://doi.org/10.1007/s42519-020-00143-2.

Trudell, T., Dong, K., Slud, E., and Cheng, Y. (In Press). “Computing Replicated Variance for Stratified Systematic Sampling,” Proceedings of the Survey Research Methods Section of the American Statistical Association.

Wright, T. (2020). “A General Exact Optimal Sample Allocation Algorithm: With Bounded Cost and Bounded Sample Sizes,” Statistics and Probability Letters, Vol 165, Article 108829.

Klein M., Wright, T., and Wieczorek, J. (2020). “A Joint Confidence Region for an Overall Ranking of Population,” Journal of the Royal Statistical Society, Series C, 69, Part 3, 589-606.

Franco, C., Little, R., Louis, T., and Slud, E. (2019). “Comparative Study of Confidence Intervals for Proportions in Complex Sample Surveys,” Journal of Survey Statistics and Methodology, 7, 334-364.

Slud, E. and Thibaudeau, Y. (2019). “Multi-Outcome Longitudinal Small Area Estimation, A Case Study,” Statistical Theory and Related Fields. Special Issue on Small Area Estimation, 3, 136-149.

Wright, T., Klein, M., and Wieczorek, J. (2019). “A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals,” The American Statistician, Vol 73, No 2, 165-178.

Chai, J. and Nayak, T. (2018). “A Criterion for Privacy Protection in Data Collection and its Attainment via Randomized Response Procedures,” Electronic Journal of Statistics 12 (2), 4264-4287.

de Oliveira, V., Wang, B., and Slud, E. (2018). “Spatial Modeling of Rainfall Accumulated over Short Periods of Time,” Journal of Multivariate Analysis, 166, 129-149.

Dong, K., Trudell, T., Slud, E., and Cheng, Y. (2018). “Understanding Variance Estimator Bias in Stratified Two-Stage Sampling,” Proceedings of the Survey Research Methods Section of the American Statistical Association.

Klein, M., Wright, T., and Wieczorek, J. (2018). “A Simple Joint Confidence Region for A Ranking of K Populations: Application to American Community Survey’s Travel Time to Work Data,” Research Report Series (Statistics #2018-04), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Lu, B. and Ashmead, R. (2018). “Propensity Score Matching Analysis for Causal Effects with MNAR Covariates,” Statistica Sinica, 28, 2005-2025.

Mulry, M.H, Kaputa, S., and Thompson, K. (2018). “Initial M-estimation Parameter Settings for Detection and Treatment of Influential Values,” Journal of Official Statistics, 34(2). 483–501. http://dx.doi.org/10.2478/JOS-2018-0022

Nayak, T., Zhang, C., and You, J. (2018). “Measuring Identification Risk in Microdata Release and Its Control by Post‐randomisation,” International Statistical Review, 86 (2), 300-321.

Slud, E., Vonta, I., and Kagan, A. (2018). “Combining Estimators of a Common Parameter across Samples,” Statistical Theory and Related Fields, 2, 158-171.

Wright, T. (2018). “No Calculation When Observation Can Be Made,” in A.K. Chattopadhyay and G. Chattopadhyay (Eds), Statistics and Its Applications, Springer Singapore, 139-154.

Ashmead, R., Slud, E., and Hughes, T. (2017). “Adaptive Intervention Methodology for Reduction of Respondent Contact Burden in the American Community Survey,” Journal of Official Statistics, 33(4), 901-919.

Ashmead, R. and Slud, E. (2017). “Small Area Model Diagnostics and Validation with Applications to the Voting Rights Act Section 203,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria, VA.

Mulry, M.H. and Keller, A. (2017). “Comparison of 2010 Census Nonresponse Follow-up Proxy Responses with Administrative Records Using Census Coverage Measurement Results,” Journal of Official Statistics, 33(2), 455–475. DOI: https://doi.org/10.1515/jos-2017-0022

Mulry, M.H., Nichols, E. M., and Hunter Childs, J. (2017). “Using Administrative Records Data at the U.S. Census Bureau: Lessons Learned from Two Research Projects Evaluating Survey Data.” In Biemer, P.P, Eckman, S., Edwards, B., Lyberg, L., Tucker, C., de Leeuw, E., Kreuter, F., and West, B.T. Total Survey Error in Practice. Wiley. New York. 467-473.

Slud, E. and Ashmead, R. (2017). “Hybrid BRR and Parametric-Bootstrap Variance Estimates for Small Domains in Large Surveys,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria, VA.

Thibaudeau, Y., Slud, E., and Gottschalck, A. (2017). “Modeling Log-linear Conditional Probabilities for Estimation in Surveys,” Annals of Applied Statistics, 11 (2), 680-697.

Wieczorek, J. (2017). “Ranking Project: The Ranking Project: Visualizations for Comparing Populations,” R package version 0.1.1. URL: https://cran.r-project.org/package=RankingProject.

Wright, T. (2017). “Exact Optimal Sample Allocation: More Efficient Than Neyman,” Statistics and Probability Letters, 129, 50-57.

Mulry, M. H., Nichols, E. M., and Childs, J. Hunter (2016). “A Case Study of Error in Survey Reports of Move Month Using the U.S. Postal Service Change of Address Records,” Survey Methods: Insights from the Field. Retrieved from http://surveyinsights.org/?p=7794

Mulry, M.H., Oliver, B., Kaputa, S., and Thompson, K. J. (2016). “Cautionary Note on Clark Winsorization.” Survey Methodology 42 (2), 297-305. http://www.statcan.gc.ca/pub/12-001-x/2016002/article/14676-eng.pdf

Nayak, T. and Adeshiyan, S. (2016). “On Invariant Post‐randomization for Statistical Disclosure Control,” International Statistical Review, 84 (1), 26-42.

Nayak, T., Adeshiyan, S. and Zhang, C. (2016). “A Concise Theory of Randomized Response Techniques for Privacy and Confidentiality Protection,” Handbook of Statistics, 34, 273-286.

Wright, T. (2016). “Two Optimal Exact Sample Allocation Algorithms: Sampling Variance Decomposition Is Key,” Research Report Series (Statistics #2016-03), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Nagaraja, C. and McElroy, T. (2015). “On the Interpretation of Multi-Year Estimates of the American Community Survey as Period Estimates.” Published online, Journal of the International Association of Official Statistics.

Slud, Eric. (2015). “Impact of Mode-based Imputation on ACS Estimates,” American Community Survey Research and Evaluation Memorandum, #ACS-RER-O7.

Franco, C., Little, R., Louis, T., and Slud, E. (2014). “Coverage Properties of Confidence Intervals for Proportions in Complex Sample Surveys,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria, VA.

Griffin, D., Slud, E., and Erdman, C. (2014). “Reducing Respondent Burden in the American Community Survey's Computer Assisted Personal Visit Interviewing Operation - Phase 3 Results,” ACS Research and Evaluation Memorandum #ACS 14- RER-28.

Hogan, H. and Mulry, M. H. (2014). “Assessing Accuracy of Postcensal Estimates: Statistical Properties of Different Measures,” in N. Hogue (Ed.), Emerging Techniques in Applied Demography. Springer. New York.

Hunley, Pat. (2014). “Proof of Equivalence of Webster’s Method and Willcox’s Method of Major Fractions,” Research Report Series (Statistics #2014-04), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Joyce, P., Malec, D., Little, R., Gilary, A., Navarro, A., and Asiala, M. (2014). “Statistical Modeling Methodology for the Voting Rights Act Section 203 Language Assistance Determinations,” Journal of American Statistical Association, 109 (505), 36-47.

Mulry, M. H. (2014). “Measuring Undercounts in Hard-to-Survey Groups,” in R. Tourangeau, N. Bates, B. Edwards, T. Johnson, and K. Wolter (Eds.), Hard-to-Survey Populations. Cambridge University Press, Cambridge, England.

Mulry, M. H., Oliver, B. E., and Kaputa, S. J. (2014) “Detecting and Treating Verified Influential Values in a Monthly Retail Trade Survey.” Journal of Official Statistics, 30(4), 1–28.

Shao, J., Slud, E., Cheng, Y., Wang, S., and Hogue, C. (2014). “Theoretical and Empirical Properties of Model Assisted Decision- Based Regression Estimators,” Survey Methodology 40(1), 81-104.

Tang, M., Slud, E., and Pfeiffer, R. (2014). “Goodness of Fit Tests for Linear Mixed Models,” Journal of Multivariate Analysis, 130, 176-193.

Wright, T. (2014). “A Simple Method of Exact Optimal Sample Allocation under Stratification with Any Mixed Constraint Patterns,” Research Report Series (Statistics #2014-07), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Wright, T. (2014). “Lagrange’s Identity and Congressional Apportionment,” The American Mathematical Monthly, 121, 523-528.

Slud, E., Grieves, C., and Rottach, R. (2013). “Single Stage Generalized Raking Weight Adjustment in the Current Population Survey,” Proceedings of Survey Research Methods Section, American Statistical Association, Alexandria, VA.

Wright, T. (2013). “A Visual Proof, a Test, and an Extension of a Simple Tool for Comparing Competing Estimates,” Research Report Series (Statistics #2013-05), Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, D.C.

Wright, T., Klein, M., and Wieczorek, J. (2013). “An Overview of Some Concepts for Potential Use in Ranking Populations Based on Sample Survey Data,” 2013 Proceedings of the World Congress of Statistics (Hong Kong), International Statistical Institute.

Ikeda, M., Tsay, J., and Weidman, L. (2012). “Exploratory Analysis of the Differences in American Community Survey Respondent Characteristics between the Mandatory and Voluntary Response Methods,” Research Report Series (Statistics #2012-01), Center for Statistical Research & Methodology, U.S. Census Bureau, Wash. D.C.

Wright, T. (2012). “The Equivalence of Neyman Optimum Allocation for Sampling and Equal Proportions for Apportioning the U.S. House of Representatives,” The American Statistician, 66 (4), 217-224.

Klein, M. and Wright, T. (2011). “Ranking Procedures for Several Normal Populations: An Empirical Investigation,” International Journal of Statistical Sciences, Volume 11 (P.C. Mahalanobis Memorial Special Issue), 37-58.

Slud, E. and Thibaudeau,Y. (2010). “Simultaneous Calibration and Nonresponse Adjustment,” Research Report Series (Statistics#2010-03), Statistical Research Division, U.S. Census Bureau, Washington, D.C.