2020 Federal CASIC Workshops

2021 FedCASIC Virtual Conference


Click + left of a session title to view individual presentation titles in the session


    Day 1: Tuesday, April 13


    9:00 am - 9:55 am
    Opening Session
    Welcoming Remarks
    Keynote Address:  How Dealing With 2020 Changed Us
    Michael Thieme, Assistant Director for Decennial Census Programs at the U.S. Census Bureau

    10:00 am - 11:25 am
    Concurrent Sessions
    Session 1A:  Alternative Data Sources
    Sylvie Bonhomme, Statistics Canada*

    Statistics Canada has recently increased its emphasis on researching and introducing such innovative collection methods for household surveys. As a result, response rates have stabilized and costs have been managed effectively over the last few years. The first part of the presentation will describe the initiatives that successfully contributed to alleviating the downward trend in response rates. However, continued research is required on new data collection methods and techniques, as the downward trend in response rates could return, along with resulting cost increases to limit it. As a result, Statistics Canada is researching more advanced approaches, which might change its primary data collection more dramatically by complementing or replacing traditional collection. The next steps are thought to lead towards completely new data collection techniques, such as sensor and scanner use, crowdsourcing, web scraping, automated voice interface use, and other innovative methods. The second part of the presentation will describe some of the experiments, risks and opportunities that are being considered at Statistics Canada.

    Michael Gerling, U.S. Department of Agriculture National Agricultural Statistics Service*
    Samuel Garber, U.S. Department of Agriculture National Agricultural Statistics Service
    Tyler Wilson, U.S. Department of Agriculture National Agricultural Statistics Service

    Since 2015, the National Agricultural Statistics Service (NASS) has explored the use of building a survey list frame of agricultural operations from open source information. To do this, one must learn to crawl before beginning to scrape. In this presentation, the methods of web crawling NASS has found to be efficient for identifying pertinent agriculture websites within the vast sea of internet information are presented. After web crawling, the scraping processes are described including the list frame information necessary for current and future needs. The layout and format resulting from these processes are discussed. Finally, the laborious processes used in data cleansing (complete a record's missing information, mark duplicate records, etc.) are reviewed with specific emphasis on pragmatism and working efficiently within the federal working environment. The reasons for choosing to automate some processes and to conduct others manually are discussed. Although the methods are discussed in the context of NASS's development of a survey list frame of hemp farms, the techniques and strategies highlighted are broadly applicable.

    Monica L. Wolford, AHRQ*
    Sandra Pope, SoftDev Inc.
    Patricia Keenan, AHRQ

    In 2020 the Agency for Healthcare Research and Quality collected health insurance booklets from individuals participating in the Medical Expenditure Panel Study for the purpose of abstracting data on health insurance cost sharing such as deductibles and copayments/coinsurance. Policyholders were asked to call or access their insurance company websites to request documentation and return documents either by mail or by uploading them to the MEPS website. Once received, the unstandardized files were converted to PDF files. Camelot, a Python library, was used to extract tables and Cloud DLP was used to redact sensitive text from images. The MEPS Abstraction Tool leveraged learning capabilities to enhance image recognition, computer vision and natural language processing of the booklets text. After ingestion, the MEPS Abstraction Tool was used by trained abstractors to confirm and supplement the ingestion results. The Tool provided a dashboard to monitor abstraction progress by both the abstractors and the Quality Control staff.

    Peter Baumgartner, RTI International*
    Murrey Olmsted, RTI International
    Amanda Smith, RTI International
    Dawn Ohse RTI International
    Bucky Fairfax, RTI International

    Coding responses from free-text, open-ended survey questions (i.e., qualitative analysis) can be a labor-intensive process. The resource requirements for qualitative coding can prevent researchers from extracting value from free-text responses and can influence decisions about the inclusion of open-ended questions on surveys. Machine learning (ML) has been proposed as a potential solution to alleviate coding burden, but traditional ML methods for text classification require large amounts of training data usually not available from surveys. With that problem in mind, we evaluated a ML approach that used responses from an open-ended question on a 2018 employee survey to train a model that predicted a set of codes applied to the same question on the 2019 survey. A coding team then adjudicated these predictions and provided coding corrections when applicable. We achieved promising performance despite an original training dataset of under 3,000 survey responses by using both data augmentation and recent advances in transfer learning models for natural language processing.

    Session 1B:  Online Diaries

    Testing a Device Optimized Online Diary for Expenditure Data Collection
    Parvati Krishnamurty, U.S. Bureau of Labor Statistics*

    As response rates decline and costs of fielding traditional in-person, phone, and mail surveys increase, many surveys are considering alternate methods of data collection including web surveys. The Consumer Expenditure Surveys (CE) recently completed a test of a browser-based device optimized online diary prior to its implementation into production. The online diary is designed to collect information on small, frequently purchased items over a two-week period that is currently collected in a paper diary. The test was fielded from October 2019 to April 2020 and the goal was to identify any methodological, operational, or technical issues with the use of online diaries in the CE Diary Survey. This presentation will discuss design challenges, usability, operational issues, and future plans for online diary data collection.

    Lin Wang, U.S. Census Bureau*
    Anthony Schulzetenberg, U.S. Census Bureau
    Alda Rivas, U.S. Census Bureau
    Heather Ridolfo, U.S. Department of Agriculture National Agricultural Statistics Service
    Shelley B. Feuer, U.S. Census Bureau

    In designing a mobile survey, data loss and data accuracy are two particular concerns to survey designers. Data entry is a crucial task in survey data collection because entering inaccurate data or failure to enter data increases survey measurement error and non-response error. In the present study, the authors implemented an experimental approach to developing an optimal data entry model based on empirical behavioral analysis, using a national sample survey on households' food acquisition as a case study. A paradigm of sequential card sorting was developed to simulate the process of respondent's entering food information. Based on the findings from the three card sorting studies, we came to the recommendation of the following data entry order: Food acquisition location, food items (food item name, quantity and unit, food item cost), payment method. This study demonstrates that card sorting techniques combined with rigorous experimental design can be an effective method for mobile survey design research.

    Adam Kaderabek, Institute for Social Research University of Michigan*
    Brady T. West, Institute for Social Research, University of Michigan
    John A. Kirlin, Kirlin Analytical Services
    Elina T. Page, U.S. Department of Agriculture Economic Research Service
    Jeffrey M. Gonzalez, U.S. Department of Agriculture Economic Research Service

    The USDA's first National Household Food Acquisition and Purchase Survey (FoodAPS-1) was a nationally representative survey that collected data about household food purchases and acquisitions. In advance of designing FoodAPS-2, a subsequent Alternative Data Collection Method (ADCM) study was conducted and requested that respondents use a web application to also scan and submit receipts for reported purchases. A validation of FoodAPS ADCM data was conducted using the submitted receipts. The objective of the receipt validation was to confirm the accuracy of respondent-reported expenditure data using the scanned receipts. The total cost, number of items, and item prices reported were key variables of interest. The validation effort also revealed that certain properties of the receipts were directly influencing their efficacy for data validation. This presentation will discuss the accuracy of FoodAPS events with a corresponding receipt and articulate the properties of receipts that were most influential during validation.

    Jeffrey M. Gonzalez, U.S. Department of Agriculture Economic Research Service*
    Mark Denbaly, U.S. Department of Agriculture Economic Research Service
    Linda Kantor, U.S. Department of Agriculture Economic Research Service
    Elina T. Page, U.S. Department of Agriculture Economic Research Service
    John A. Kirlin, Kirlin Analytic Services

    The USDA's National Household Food Acquisition and Purchase Survey (FoodAPS-1) was the first nationally representative survey of U.S. households to collect unique and comprehensive data about household food purchases and acquisitions. Development of survey's second round, FoodAPS-2, is underway and its design and data collection protocols draw on the lessons learned from FoodAPS-1. Additionally, changes in the surveying environment and in how people acquiring food precipitated a need to leverage advancements in web, mobile, and other digital technologies to combat concerns associated with data quality, including nonresponse and underreporting, respondent burden and fatigue, and significant backend data processing times. This session presents the current plans for FoodAPS-2 which will be evaluated in a forthcoming large-scale Field Test. We'll provide an overview of the key survey design features, present an in-depth look at a native smartphone application (the primary mode of data collection), highlight how the application uses the smartphone's built-in features, and discuss plans for leveraging extant databases to reduce burden and improve quality in real time.

    Session 1C:  2020 Decennial Census

    Elizabeth Nichols, U.S. Census Bureau*
    Shelley Feuer, U.S. Census Bureau
    Erica Olmsted-Hawala, U.S. Census Bureau
    Jasmine Luck, U.S. Census Bureau

    For the 2020 Census, over 13 million calls came into the Census Questionnaire Assistance (CQA) telephone help line. The telephone help line supported 14 languages with over 9,000 Customer Service Representatives (CSRs) hired across 10 U.S. call centers. To help evaluate the operation, focus groups with a sample of CSRs were planned to be in person at each call center, but due to COVID-19 travel restrictions, moderators from the Census Bureau conducted them remotely using Skype for Business. CSRs were at their call centers using social distancing requirements, and moderators and observers were at their respective homes. Census Bureau staff had conducted remote focus groups throughout the decade with call center agents during the census tests and were familiar with the protocol, but the addition of the social distancing requirements at the call centers led to some new challenges. In the talk, we will share our procedures for conducting remote focus groups, lessons learned, and suggestions for how to conduct remote focus groups when social distancing requirements are and are not necessary.

    Lydia Shia, U.S. Census Bureau*

    In 2020, the U.S. Census committed to using the internet as a primary response option for the first time. The increasing use of web and mixed-mode survey allowed improved Census awareness and promoting self-response in a cost-effective manner. Unlike traditional datasets, paradata contains information on direct interactions between the respondents and the web instrument. The dataset displays user behaviors and instrument performances rather than the outcome of responses. It is constructed by sessions with bundles of user activities to give great insight into the response process in the self-administered web instrument. Paradata collects a variety factors that reflects respondent attitudes towards Census instrument, such as navigation behaviors (number of logins, breakoffs and completion time), response sufficiency, user characteristics (device, ID and browser types) with different levels of details. This presentation provides an overview of the use and application of paradata to improve the quality of 2020 Census outcome. It introduces real-time quality measurements and resolutions conducted based on respondent behaviors towards 2020 Census internet instrument.

    Brett Moran, U.S. Census Bureau*

    The 2020 Census paradata contains special links called Source Tracking URLs. In this presentation, we will discuss how we used these URLs to monitor the Mobile Questionnaire Assistance (MQA) operation during the census, and how we are currently using the URLs to assess both the MQA operation and the 2020 Census Digital Advertising (Digital Ad) campaign. The MQA operation involved sending Census representatives to low self-responding areas to encourage response to the census either through respondents' own devices or through interviews with representatives. We will discuss how we used paradata URLs to monitor the MQA operation in real time, how that monitoring contributed to the operation's overall success, and how we are using the URLs along with response data to assess the operation. The Digital Ad campaign created and deployed numerous digital advertisements aimed at encouraging response to the 2020 Census. We will discuss how we are using the paradata URLs to assess the effects of the campaign on response rates for different demographic groups. Finally, we will discuss some of the limitations of the 2020 Census paradata, as well as recommendations for future research.

    Dave Hackbarth, U.S. Census Bureau
    Frank Fisiorek, U.S. Census Bureau
    Mark Markovic, U.S. Census Bureau*

    More than 731,000 mobile devices were acquired through a Decennial Device as a Service (dDaaS) contract in support of the 2020 Census. The program included key milestones and encountered numerous logistical challenges. Such as: -The vendor purchasing, provisioning, and shipping over 731,000 total devices for use in multiple field ops, with varying end-user requirements; -Distribution of devices to 255 offices through a partnership with UPS; -Asset Management -the Intelligent Tracking and Management System (ITMS) was developed to manage the order, shipment, delivery and custody transfers of devices; -Training- development of accountability procedures, and training/guiding staff through the rollout of the ITMS; - Break/Fix - replacing broken devices during ops; - Retrieving devices from the end-users and return to the vendor; and - Reconciling the status of all devices in the program with the vendor. The presentation will elaborate on the challenges, lessons learned, and process improvements that were implemented for key activities encountered throughout the program. We will conclude with a set of statistics highlighting the effectiveness of the overall program.


    11:30 am - 11:55 am
    Posters and Demonstrations
    P&D Session 1A:  Meet the presenters

    Martha (Virginia) Gwengi, U.S Census Bureau*

    The Census Bureau serves as the data collection agent for AHRQ for the Medical Expenditure Panel Survey-Insurance Component (MEPS-IC). The survey collects data on health insurance from private and public sector employers. In this work, we focus on the 1,000 government units that are sampled with certainty. Certainty government units are sampled every year and to reduce the burden on these respondents, they are only asked to respond to unit-level questions and some health insurance questions. The respondents are then asked to upload plan forms to the data collection instrument or to provide websites where Census analysts can search for the plan forms and manually extract the remaining health insurance information. This transfers burden of response to Census analysts. We seek to lessen this burden using data extraction and web scraping tools. We create a tool to extract Summary of Benefits and Coverage (SBC) forms which have a standardized format that was mandated by the Affordable Care Act. The extraction tool collects the information faster than the manual process while the web scraping tool allows us to crawl each webpage and search for the SBC forms more efficiently.

    Grace Jaroscak, NORC, University of Chicago
    Kali Defever, NORC, University of Chicago*

    The Medicare Current Beneficiary Survey (MCBS) is a continuous, multipurpose survey of a nationally representative sample of the Medicare population, conducted by the Centers for Medicare & Medicaid Services (CMS) through a contract with NORC. The survey collects information from respondents about prescription medicine use, including medicine name, strength, and form. These medicines are linked to CMS administrative prescription medicine claims data, creating a uniquely rich data source. In 2017, CMS and NORC revised a lookup tool built in 2015, which integrates a high-quality commercial medicine name database into the questionnaire. The revised tool allows interviewers to select medicine details directly from the database, minimizing manual entry of data. The impact on reported data quality will be examined by assessing the match rates for survey-reported medicines to claims data. This poster will present the results of this descriptive analysis. The analysis includes: (1) match rates of survey-reported medicines to claims data, (2) the impact of the commercial database on match rates, and (3) the impacts of respondent characteristics and medicine name length on match rates.

    Integrating Authoritative Sources to Enhance Survey Response Quality
    Juan Salazar*

    As we look into the future of surveys, while understanding the decreasing levels of responses, Federal Agencies like the Census Bureau has laid-out their vision for the the use of Authoritative (data) Sources, including Administrative Records, to further- supplement responses with data that has been confirmed to be dependable for its purpose. But Administrative Records and Third-party data will come in different formats, by definition, so in order to link new data to existing records, it's necessarily to create a flexible data model, while still supporting the transactional aspects of a traditional database. Our poster will show the workflow of arriving Authoritative Data, how to intake streams, stage the raw data, and iterate it towards a gold copy that can be used for data-linkage and Advanced Analytics. Note: This is a poster presented by the private sector, and the reference to the U.S. Census has no direct correlation to the Bureau.

    P&D Session 1B:  Meet the presenters

    Tyson Weister, U.S. Census Bureau*

    After a few years of ongoing development, testing, and user feedback, the Census Bureau been is now over a year into the launch of its new platform to access data -- data.census.gov. This site represents a new chapter in Census Bureau's data dissemination approach by centralizing access and allowing for a more rapid response to user feedback, replacing the previous site that was used for the last 20 years. In this session, we will demo the new interactive site on data.census.gov. Attendees will explore the platform's latest tables, maps, and data visualizations in an easily digestible format, and will have the opportunity to provide feedback to make Census data easier to access.

    Marilyn Seastrom, NCES*
    Jennifer Nielsen, NCES
    Zac Mangold, Sanametrix
    Melissa Roessler, Sanametrix

    IPAM is a web-based record management system for processing public trust security applications throughout the stages of agency level review and resubmissions. IPAM was designed to manage and track: contract and Contracting Officer Representative assignment, initiation of applicants, approval and/or rejection of the Public Trust application, fingerprints submission, release to Defense Counterintelligence and Security Agency (DCSA), DCSA approval or rejection, DCSA Schedule-Accepted status, and adjudication outcome. IPAM has significantly reduced the backlog of applications, increased processing time by automating data validations, streamlined the process for requesting an approval, and reduced the overall risk of sharing PII across organizations. In addition, the IPAM system includes data analysis reporting to identify most common rejections reasons, processing times, contract counts, status reports, and adjudication outcome reports. These reports contribute to the departments' continuous improvement process to ensure applications are processed in a timely manner with as few errors as possible.

    P&D Session 1C:  Meet the presenters

    Christopher N. Carrino, U.S. Census Bureau*

    With the exponential growth in recent years of interest in Data Science, Machine Learning, and Artificial Intelligence, the field of Data Management has become inundated with ambiguous terms and an undefined scope. In this presentation we define the often used and more often misused terms of Metadata, Paradata and Enterprise Data. These terms are often used in working conversations, blog articles and research papers even though the terms lack a formal definition and a shared understanding. We'll examine conflicting examples of their use and provide a framework to de-conflict these meanings.

    Todd Johnsson, ExactData*
    Beverly Harris, U.S. Census Bureau

    The industry term is Fully Synthetic Data. On the Decennial program, it was more accurately called Correlated Simulated Data. The key Privacy characteristic is that the data sets are NOT a derivation of production data. The link to production data is completely broken, intentionally, thus in the dev and test environments, Privacy by Design. The simulated data itself is correlated across inputs and systems. It is longitudinally consistent, has happy and non-happy path scenarios, has intended patterns, has event-level artifacts and intended aggregate-level statistics. It has the interrelated complexity and realism that sophisticated data processing application dev and test require. Imagine Simulated Data for Dev and Test of Survey Responses, Admin Records, and MAF/TIGER products (MAFX, GRFC, GRFN), all correlated, with longitudinal consistency, with ground truth, scaled to the entire US population. Each dataset contained 1000s of files in various formats, types, schemas, versions, all correlated. With current configurable, extendable, maintainable Census data models, most datasets can be generated within a day, with millions of correlated records, to any defined geography.


    1:00 pm - 2:25 pm
    Concurrent Sessions
    Session 2A:  Machine Learning

    Colleen Spagnardi, RTI International*
    Alex Waldrop, RTI International

    Transcripts hold a wealth of data, but to compare students across institutions, data such as course content must be standardized. Much of data categorization is done manually by training coders to match keywords to the best fitting code. Leveraging artificial intelligence (AI) enhances this process and reduces the labor to code data to existing taxonomies. In this talk, we describe how RTI piloted an AI-assisted recommendation engine to assign postsecondary fields of study and course codes to federal coding taxonomies for the Department of Education. Surveys could employ similar methods to ease the burden of coding responses both in real-time participation and post-production activities.

    Catherine Billington, Westat
    Jiating (Kristin) Chen, Westat*
    Gonzalo Rivero, Westat
    Andrew Jannett, Westat

    Field requests for data updates to CAPI interviews are important for data quality in panel studies. Processing these open text comments is time-consuming and costly. A pilot project used Natural Language Processing (NLP) to assign category variables to comments to improve processing efficiency without loss of data quality. We trained a lasso model to perform this classification. A machine-learning (ML) pipeline in Python extracted linguistic features from the text fields. The output met our quality requirements, and we integrated a production version with our existing web application on a panel study. Data technicians must assign a category to each comment. Our ML model presents the top-3 categories by probability. Technicians can select one, or enter any category. We discuss how technicians used this feature in terms of efficiency and data quality. Superfluous comments account for about 40% of entries. We look at how comments assigned the 'Other' category by the model were dispositioned, and evaluate a similar approach to identify non-actionable comments. We discuss the risks and benefits of this approach, which will vary by project based on priorities, cost, and acceptable risk.

    David H. Oh, U.S. Bureau of Labor Statistics*

    The Occupational Requirements Survey (ORS) contains text data on the critical tasks performed by selected jobs. With growing interests in the types of work being done in the U.S. economy, the ORS task data has the potential to provide invaluable insights for the public. However, originally designed to simply aid in review of the coded ORS elements, the tasks contained within ORS for each selected job are stored as free-form text with minimal structure, limiting its usability beyond manual review. To overcome this, the Office of Compensation and Working Conditions (OCWC) at the Bureau of Labor Statistics (BLS) has been exploring various ways to utilize the rich information contained within the ORS task data through the use of natural language processing (NLP) and machine learning techniques. This presentation provides a brief description of the task data contained in ORS, discuss the NLP methods used to process the text, and highlight some ongoing OCWC projects that leverage them.

    Sudip Bhattacharjee, University of Connecticut, U.S. Census Bureau*
    Nevada Basdeo, U.S. Census Bureau
    Ugochukwu Etudo, University of Connecticut
    Sara Alaoui, U.S. Census Bureau

    Response rates are dropping and data collection costs are rising in federal surveys. As a result, practitioners use response propensity models to mitigate cost while retaining response rates. We evaluate key determinants of survey completion for the American Community Survey (ACS) using paradata of Contact History Information (CHI). To our knowledge, unstructured field representatives' (FR) notes are omitted in paradata models within the Census Bureau. We believe that incorporating these notes would improve the performance of paradata models. From the notes, we identify themes and terms that are useful for estimating response propensity. Additionally, FRs cannot contact a respondent if the response burden exceeds a threshold. We present the first steps in solving this optimization problem. We show two findings: (1) combining CHI and FR's notes can improve response propensity estimates, and (2) FR's notes can be incorporated in calculating burden scores for respondents. Also our text mining of the FR's notes may be useful in training FRs. Results from our study can be generalized to other surveys that capture both numeric and textual paradata from survey operations.

    Session 2B:  Challenges in Technology and Survey Computing

    Moderator:
    Karen Davis, RTI International
    Panelists:
    Bryan Beverly, U.S. Bureau of Labor Statistics
    James Berry, U.S. Energy Information Administration
    Kyle Fennell, NORC, University of Chicago
    Gregg Peterson, University of Michigan

    Panelists will identify the top challenges facing their organizations today given the changing survey technology, data systems, and programming environments. Projects today often include innovative survey technologies, the use of specialized programming customization, incorporate administrative and extant data sources, and the integration of different devices and technologies to support data collection. The panelists will discuss the ways that their organizations are dealing with the environmental changes that they have identified, and offer examples and lessons learned in addressing these challenges.

    Session 2C:  Development and Pilot of the OMB ICR Application

    Moderator:
    Marilyn Seastrom, NCES
    Panelists:
    Jennifer Nielsen, NCES
    Bob Sivinski, OMB
    Carrie Clarady, NCES
    Zac Mangold, Sanametrix

    The National Center for Education Statistics has recently completed a pilot associated with the 2020 Federal Data Strategy, Action 17, to develop an automated tool for Office of Management and Budget Information Collection Review (ICR) documents that supports data inventory creation and updates. The pilot project built upon NCES' research and the Department of Education's Data Inventory to develop electronic templates within a capture/review system that generates Supporting Statements Parts A and B of the ICR while tagging the data elements needed to populate the ED Data Inventory. This proposed panel presents stakeholder perspectives across the project various will showcase the work conducted to date on this application and allow for discussion with participants about the project. Marilyn Seastrom, NCES, will discuss the background and context of the project. Jennifer Nielsen, NCES, Bob Sivinski, OMB, and Carrie Clarady, NCES, will discuss the OMB ICR from the perspective of the Government, while Zac Mangold will discuss the developer's perspective. Lastly, the panel will present any up to the minute efforts from the current work to enhance and improve the Application.


    2:30 pm - 2:55 pm
    Posters and Demonstrations II
    P&D Session 2A:  Meet the presenters

    Maurice Gonzenbach, Caplena*

    Open-ended feedback is becoming more and more abundant: Ad-hoc studies, trackers and transactional feedback often contain open questions, both reducing the bias and broadening the depth of opinions. However, many researchers struggle to evaluate verbatims in a scalable, actionable way and to find the right balance between automation and quality. While AI methods have been promising relief for a long time, "traditional" AI methods are usually based on manually defined rules, which neither scale well nor deliver the desired quality. Modern AI methods, such as machine learning, have set out to fix this. Their downside is the huge number of training-data they require as well as their "black-box" behavior. Recent developments in machine learning (specifically transfer learning), allow categorizing texts with only a few dozen training examples, while keeping humans in the loop. We present Caplena.com, an easy to use platform enabling market research agencies as well as corporates worldwide to efficiently categorize their open-ends on the one hand and then perform advanced analyses (like correlation or driver analysis) on the results, while keeping full control over the process.

    Putting Data Science Skills into Action
    Lisa Frid, U.S. Census Bureau*

    Census piloted a Data Science Training Program to grow the beginner and intermediate data science skills of employees. While the program included low-cost online learning components that can be readily translated to other agencies, the highest-impact portions of the program were those that allowed participants to practice their skills and get to know the Census-specific technical environment and datasets. Evaluations revealed the importance of selecting these hands-on opportunities based on their ability to translate to current Census data science needs. The success of the pilot, student feedback, and the evolving nature of data usage at Census indicate a need to (1) intentionally select participants based on their potential for using data science skills in their work, and (2) include content that specifically addresses the Census technical environment and opportunities for students to get hands-on experience. This presentation will share how we addressed these challenges in our pilot and plan to address them in our upcoming program, with the goal of educating other federal stakeholders on how to best grow their data science workforce through specialized and hands-on training.

    Benjamin Feder, Coleridge Initiative*
    Julia Lane, Coleridge Initiative, New York University
    Clayton Hunter, Coleridge Initiative
    Ekaterina Levitskaya, Coleridge Initiative

    Survey providers are being urged to combine new sources of data with their surveys. In this demonstration, we will show you how text analysis can be applied to public research information to augment survey data. We will provide an overview of the record linkage to administrative data. We will then walk through a sample Jupyter Notebook and show you how FedReporter data abstracts have been parsed into specific topics of research to better understand doctoral recipients' academic careers. You can then take the Jupyter Notebooks and apply the code to generate topics from other text sources.

    P&D Session 2B:  Meet the presenters

    Nola du Toit, NORC, University of Chicago*
    Jennifer Titus, NORC University of Chicago
    Michael Latterner, NORC, University of Chicago

    The Medicare Current Beneficiary Survey (MCBS) is an ongoing survey of a representative national sample of the Medicare population, including beneficiaries aged 65 and over and beneficiaries aged 64 and below with certain disabling conditions. With the emergence of the COVID-19 pandemic in the U.S., the Centers for Medicare & Medicaid Services quickly collected vital information on how the pandemic impacted the Medicare population. The MCBS COVID-19 Supplements are a series of nationally representative, cross-sectional telephone surveys of MCBS respondents conducted by NORC as a supplement to MCBS annual data collection. To make the survey's findings more accessible to the public, NORC constructed public use files (PUFs) and developed the MCBS COVID-19 Data Tool, an interactive website created using R Shiny and D3 to present PUF data. The tool aims to accelerate research with this data and ultimately help inform stakeholders' decisions about the pandemic. Our demonstration will address the process undertaken to develop the tool, including the technical and methodological challenges overcome.

    Jennifer Nielsen, NCES*
    Marilyn Seastrom, NCES
    Zac Mangold, Sanametrix
    Rickita Walley, Sanametrix

    The National Center for Education Statistics has recently completed a pilot project for the 2020 Federal Data Strategy, Action 17 to develop an automated tool for Office of Management and Budget Information Collection Reviews (ICRs) that supports data inventory creation and updates. The pilot project built upon NCES' research and the Department of Education's Data Inventory by developing electronic templates within a capture/review system that generates Supporting Statements Parts A and B of the ICR while tagging the data elements needed to populate the ED Data Inventory. To date, the pilot tool has been tested within two agencies (NCES and the National Science Foundation's National Center for Science and Engineering Statistics) and stakeholder input and feedback has been obtained from 18 stakeholder engagement activities. The OMB ICR Application has been designed in such a manner that, should additional funding be secured in later years, implementation of this automated process could be extended beyond the pilot and into other federal agencies. This proposed demonstration will showcase the work conducted to date on this application and allow for discussion.

    P&D Session 2C:  Meet the presenters

    Nestor Alexis Ramirez, RTI International*

    Graph databases are useful for storing and traversing relationships and connections between data and have several use cases through the survey lifecycle. In a graph database, data objects are represented by nodes, or vertices, and relationships between objects are represented by edges. In situations where modeling and understanding data relationships among large amounts of data are important - for example, the connection between sample member survey response and sample member contact methods in a longitudinal study - using graph databases can provide projects with several advantages compared to traditional relational databases. This demonstration will showcase a use case of graph databases as part of the derived variables task of the Baccalaurate and Beyond longitudinal study. JanusGraph and Neo4j, two graph database systems, will be shown as part of the demonstration, and potential use cases across the survey lifecycle will be discussed.

    Marcy Gialdo, Mathematica*
    Mark Lafferty, Mathematica
    Dan Glovic, Mathematica

    Collecting and monitoring data is a key component of programs providing community social services. Programs not only use data to evaluate, assess performance, and track quality improvement but also collect participant data as a condition of receiving federal grant funding. To address this growing need, Mathematica developed RAPTER, a scalable, secure cloud-based data system enabling federal agencies and grantees to consistently and effectively collect, track, and report participation data to inform program efficacy. To meet privacy, confidentiality, and data security standards, RAPTER is NIST compliant and FedRAMP ready. RAPTER has customizable modules for participant enrollment, random assignment, service tracking, survey administration, and reporting. Within RAPTER, studies can create and manage cohorts and monitor data with built-in reporting including a dashboard. This poster presentation explains how RAPTER supports consistent and effective data collection, analysis, and reporting across programs over time. It demonstrates how RAPTER helps solve challenges in managing data collection specifically highlighting use cases and system benefits during the COVID-19 pandemic.

    Mark M. Pierzchala, MMP Survey Services, LLC*

    The Blaise 5 Choréo MultiMode Management System will be described. The name Choréo is derived from the word choreography because like a choreographer, the system must coordinate many moving parts all at one time. Choréo works in concert with the Blaise CATI management system, the CAPI Case Management System (CMA) and other modes like web and paper. The system fits into an institute's existing infrastructure. For example, most institutes already have a Sample Management System (SMS) to send mail and email; these are not replicated. There is a Survey Handling Database. It keeps track of all survey management happenings, statuses, counts, and indicators. It does not contain any Personally Identifiable Information (PII). It issues instructions to Blaise 5 modules and to other institute systems. Choréo operates the way each institute wants. This is done through specification databases such as for (1) survey design parameters, (2) happenings, and (3) actions. There are hooks in the system to implement management responsive design, as well as to keep track of survey burden. An institute can continue to use its own coding scheme and naming conventions.


    3:00 pm - 4:25 pm
    Concurrent Sessions
    Session 3A:  Challenges and Achievements Using AI and Data Science

    Moderator:
    Jane Shepherd, Westat
    Panelists:
    Rebecca Hutchinson, U.S. Census Bureau
    Alex Measure, U.S. Bureau of Labor Statistics
    Jason Keller, NORC, University of Chicago
    Gayle Bieler, RTI International
    Marcelo Simas, Westat

    This panel will discuss the challenges and achievements that organizations have encountered is applying AI and data science approaches to their survey/data management projects. While there is significant publicity about the application of AI and data science techniques, the level of sophistication and experience varies. Panelists will explore strategic approaches, best practices, and examples of where they have employed these techniques, and cover lessons learned in doing so.

    Session 3B:  Workshop: Tools for Survey and Census Planning


    Kathleen M. Kephart, U.S. Census Bureau*
    Suzanne McArdle, U.S. Census Bureau
    Luke Larsen, U.S. Census Bureau

    The presenters will demonstrate how to use the Planning Database (PDB) and the Response Outreach Area Mapper (ROAM), giving several examples of their capabilities. The PDB is an easy-to-access dataset that is updated annually. It contains the greatest hits of American Community Survey (ACS) 5-year estimates. These include popular U.S. housing, demographic, socioeconomic, and operational statistics from the 2010 Decennial Census and the most recent ACS dataset. The PDB also contains the Low Response Score (LRS), which is a predicted mail return rate by block group and by census tract. New to the 2019 and 2020 PDB are ACS 5-year internet access statistics and 5-year ACS self-response rates. The ROAM is an interactive mapping application, developed to make it easier to identify hard-to-survey areas and the socioeconomic and demographic profiles of those areas. It is based on a subset of PDB data at the census tract-level, including the LRS, poverty status, education level, race, Hispanic origin, and language spoken at home.

    Session 3C:  Research Methods

    Matt Jans, ICF*
    Georgette Lavetsky, Maryland Department of Health
    Samantha Collins, ICF

    Cognitive testing is a critical step in developing survey questions. Traditional methods typically include semi-structured, qualitative interviews with a small number participants, and are generally conducted before production interviewing. This presentation asks, "Can we train standardized survey interviewers to administer cognitive probes and obtain information helpful for revising questions?" We report on question testing for the 2016 Maryland Behavior Risk Factor Surveillance System (BRFSS), in which we incorporated five cognitive probes (e.g., What did the word 'neighborhood' mean to you in the preceding set of questions) into the BRFSS CATI interview following nine new (i.e., test) questions. The cognitive probes were written as standardized survey questions to adapt the cognitive interviewing method to the training and skills of standardized phone interviewers. We collected over 1,600 interviews with cognitive probe data, which is much larger than most cognitive tests. Adding cognitive probes to production CATI interviews allowed us to conduct pretesting more quickly than traditional cognitive testing and without disrupting production data collection.

    Erica Olmsted-Hawala, U.S. Census Bureau*
    Elizabeth Nichols, U.S. Census Bureau

    Before COVID-19, most usability sessions were conducted in-person. For household surveys, sessions took place either at the Census Bureau's headquarters, at a library, or community center. In-person testing is standard practice for many reasons: it helps interviewers orient participants to the testing procedures, gives interviewers opportunities to observe participant behavior, and simplifies logistics. However, in the spring of 2020, new social distancing requirements made such testing impossible. The usability team at the Census Bureau needed to conduct remote testing - that is, virtual testing using the Internet, with interviewer and participant in different locations for a project that was to begin in June. This talk shares our experiences with how we transformed our traditional in-person user testing to accommodate COVID-19 social distancing restrictions and successfully conducted remote usability testing with participants. We share tips and tricks on remote testing including best practices on getting participants familiar with new software, strategies on how to build connection and rapport with long distance participants, obtaining informed consent, and working with observers.

    Matt Jans, ICF Davia Moyse, ICF*
    Yang Yang Deng, ICF
    Ronaldo Iachan, ICF
    Lee Harding, ICF
    Kristie Healey, ICF
    James Dayton, ICF
    Scott Worthge, MFour
    Laura O'Campo, MFour
    Sarah Chung, MFour

    Now, more than ever, there are myriad types of nonprobability samples, that can replace or supplement probability samples. This study uses two nonprobability sample sources to evaluate how well they mirror estimates from the Behavioral Risk Factor Surveillance System (BRFSS): The Surveys-on-the-Go mobile panel, which only includes people who have a smart phone, and Amazon Mechanical Turk (MTurk). Respondents from both sources were asked health questions selected from the BRFSS. Initial results show that nonprobability sample source accuracy is estimate-specific, and that estimates of exercise, general health, and overall insurance coverage may be accurately obtained from nonprobability samples. Overall, there appears to be a pattern of nonprobability samples overrepresenting poorer health. This suggest that nonprobability surveys may be useful replacements for some health estimates, but not others. Researchers need to assess trade-offs cautiously and within the context of specific key health indicators.

    Jennifer Hunter Childs, U.S. Census Bureau*
    Emilia Peytcheva, RTI International

    The Census Bureau, in partnership with several other Federal Statistical Agencies, awarded a collaborative agreement to RTI International to build the Ask U.S. Panel. RTI will design, build, and maintain an address-based, probability-based online research panel that will be available for robust public opinion and methodological research for the common good by statistical agencies and nonprofit organizations. This will facilitate both longitudinal and quick-turn-around research that many organizations are interested in conducting. The Ask U.S. Panel will consist of an entirely new, representative, probability sample of U.S. adults who are not members of an existing survey panel. In the future, the panel may be supplemented with targeted subgroups or additional target populations, such as businesses and organizations. The approach will involve mixed-mode recruitment of the residential population from RTI's address-based sampling (ABS) frame. Through the life of the panel, members will be invited to participate in topical surveys about once a month. The panel will have quarterly replenishment samples and utilize multiple strategies to keep panel members engaged.



    Day 2: Wednesday, April 14


    9:00 am - 10:25 am
    Concurrent Sessions
    Session 4A:  Program Innovations

    Adela Luque, U.S. Census Bureau
    Kevin Rinz, U.S. Census Bureau
    James Noon, U.S. Census Bureau*
    Michaela Dillon, U.S. Census Bureau
    Renuka Bhaskar, U.S. Census Bureau
    Victoria Udalova, U.S. Census Bureau

    The new Nonemployer Statistics by Demographics series or NES-D is the Census Bureau's response to the challenges faced by 20th-century survey-based statistics while addressing 21st-century needs for more frequent and timely high-quality data, at lower cost and no additional respondent burden. NES-D is not a survey; rather, it exclusively uses existing administrative and census records to provide demographics for the universe of nonemployer businesses by geography, industry, receipt size class and legal form of organization. Its first release is in December, 2020. NES-D replaces the nonemployer component of the quinquennial Survey of Business Owners. Coupled with the new Annual Business Survey (ABS), which provides demographics for employer businesses, Census now provides annual business owner demographics through a blended-data approach that combines AR-derived estimates for nonemployer firms and survey-derived estimates for employer firms. In the near future, NES-D will be enhanced with characteristics relevant to understanding nonemployers' behavior and dynamics, such as characteristics related to the gig economy, household characteristics and transitions to employer status.

    So You Want to Build an Autocoding Model? Lessons learned from applied autocoding projects
    Emily Hadley, RTI International*
    Rob Chew, RTI International
    Peter Baumgartner, RTI International

    Automated coding models for open-ended text promise time and labor savings but can be challenging to implement in practice. Expectations of accuracy, implementation costs, and complexity of integration into existing processes are common sources of frustration. We discuss the lessons learned from four applied autocoding projects with varying degrees of implementation. We suggest considerations for determining the feasibility of an autocoding project, setting realistic benchmarks for the accuracy of the model, and anticipating the challenges of integrating the model into existing workflows. These lessons can inform ongoing or future development of custom autocoding models.

    Struther Van Horn, U.S. Bureau of Labor Statistics*
    Tod Sirois, U.S. Bureau of Labor Statistics
    Jean E. Fox, U.S. Bureau of Labor Statistics
    Susan Gymburch, U.S. Bureau of Labor Statistics
    Erin Lane, U.S. Bureau of Labor Statistics
    Andrew Theodore, U.S. Bureau of Labor Statistics
    Sarah Van Giezen, U.S. Bureau of Labor Statistics

    The Consumer Price Index (CPI) program at BLS wanted to solicit creative new ideas for improving data collection. We wanted to encourage all ideas, from small tweaks to large system revisions, covering any data collection procedure or system. As a framework, we used Design Thinking, a structured human-centered approach to addressing complex problems. In Design Thinking, team members connect directly with users through interviews, developing empathy for them and learning about their successes, pain points, and their ideas for improvements. These in-depth conversations reveal what's working in the current system, what's not working, and recommendations for improvements. In this presentation, we will: -Define Design Thinking and explain how it helps address complex problems -Walk through the steps in a Design Thinking effort -Describe the goals of the CPI's effort to identify creative new ideas for data collection -Share how we approached each step -Describe the types of findings we uncovered and how they will be useful in improving CPI data collection This presentation is not technical; it is appropriate for anyone involved in developing and data collection procedures and systems.

    Transparent Evaluation and Reporting on Cost Structures for Statistical Information Products and Services
    John L. Eltinge, U.S. Census Bureau*

    Many federal statistical programs are focusing increased attention on the integration of survey data with information from other sources, e.g., administrative records and commercial transactions. In some cases, prospective cost savings are viewed as a major motivating factor. In other cases, the primary motivation is improvement of data quality, but subject to the requirement that non-survey sources do not inflate aggregate production costs for the statistical program. This paper explores the transparent evaluation and reporting on cost structures required to address these issues, with emphasis on:
    (1) Practical managerial decisions that require specific types of empirical information on cost structures.
    (2) Special features arising from complex patterns of fixed and variable cost components, operating constraints and optionality.
    (3) Realistic methods to capture information for (1) in ways that account for (2).
    (4) Conceptual distinctions among correlation, causation and control that are important in empirical work to address (1)-(3).
    (5) Alignment of (1)-(4) with literature on transparent reporting for data quality.
    Two running examples illustrate the general ideas in (1)-(5).

    Session 4B:  Questionnaire Evaluation

    Using Paradata to Explore Navigation Through Web Surveys to Improve Survey Design
    Renee Ellis, U.S. Census Bureau*

    Understanding how users navigate online survey instruments may be useful for many reasons. For example, knowing more about these behaviors may alert us to problems with instrument usability. This may help identify problematic questions and common behaviors of survey respondents. One of the challenges of this type of analysis is that the web paradata being used for analysis are unstructured and often voluminous in nature. This current project examines how we can use a qualitative review of data and data visualizations to find patterns in respondent behaviors across survey pathways. In this look at how users navigate online survey instruments, we wrangle the paradata in a way that we can visualize user paths. From this we categorize common paths and discuss how they might be used to make survey design decisions.

    Matt Jans, ICF
    Ashley Schaad, MaritzCZ
    Melinda Scott, ICF*

    Questionnaire design can be the least transparent of all survey development phases, sometimes remaining a "black box" which is difficult to audit or replicate. The Questionnaire Appraisal System (QAS-99) was developed to a) make this process replicable and transparent, and b) allow questionnaire revision by survey staff with lower levels of training and experience. It has seven steps focusing on question characteristics (e.g., readability, instructions, implicit assumptions, and topic sensitivity). The QAS-04 added assessment of the translatability, cross-cultural assumptions, and issues across questions within the instrument. We present new developments in the QAS process that add a) a questionnaire-level and flow review to assess the entire instrument, and b) a step in the original question-specific evaluation that assesses how reasonable it is to assume that the respondent would have encoded the information required to produce an answer. These were incorporated into a single Excel file that facilitates QAS implementation. This innovation will be discussed in the context of time-sensitive questionnaire development, the overall survey process, and survey transparency.

    Elise M. Christopher, NCES*
    Laura Burns, RTI International

    In the High School Longitudinal Study of 2009, conducted by the National Center for Education Statistics, measures of sexual orientation and gender identity (SOGI) were added in the Second Follow-up, conducted in 2016. The panel of 2009 high school freshmen had already participated in three rounds of data collection in 2009, 2012, and 2013. These potentially sensitive SOGI measures were extensively examined via cognitive testing and field testing prior to their addition to national survey instruments. After national data collection was completed, paradata and metadata were used to examine whether these items led to data quality concerns. This presentation will share results of these analyses, including investigations of breakoffs, item-level nonresponse, and time spent on item screens. Results will be compared to those of extant sensitive items in the survey in order to make conclusions about item functioning.

    Deborah Krug Mangipudi, ICF*
    Matt Jans, ICF
    Robynne Locke, ICF
    Stephen Haas, ICF
    John Boyle, ICF
    Lizzie Remrey, ICF
    Heather Driscoll, ICF
    Samantha McCoy, Oregon CJC
    Michael Weinerman, Oregon CJC
    Siobahn McAlister, Oregon CJC
    Ken Sanchagrin, Oregon CJC

    Researchers usually design questionnaires to begin with simple topic-relevant questions. Perceived sensitivity can vary widely across respondents, producing hidden disproportionate nonresponse. This presentation addresses the following questions: 1) Do specific questions often lead to break-offs, 2) Do break-offs vary by respondent demographics, and 3) Do break-off rates differ between phone and web administration? Data come from the Oregon Crime Victimization Survey (OCVS), which used a dual-frame RDD sample, and a DSF-based ABS sample. Only adults who lived in Oregon for the past 12 months were eligible. Questionnaire flow was identical across modes. Sections included 1) eligibility screening, 2) consent, 3) quality of life, 4) demographics, 5) index crimes, 6) non-index crimes, and 7) crime follow-up questions. Among people who participated by phone or web, we evaluate the following: 1) % hang-up or break-off by 2) geographic stratum and 3) areas with varying levels of poverty. We will focus on whether the initial topics in the questionnaire (e.g., screening, consent, and neighborhood quality of life) obtain differential nonresponse across strata and geographies.

    Session 4C:  Data Collection During COVID

    Rachel Carnahan, NORC, University of Chicago*
    Andrea Mayfield, NORC, University of Chicago
    Elise Comperchio, NORC, University of Chicago

    The Medicare Current Beneficiary Survey (MCBS) is a continuous, multi-purpose longitudinal survey covering a representative national sample of the Medicare population sponsored by the Centers for Medicare & Medicaid Services (CMS). CMS leveraged the MCBS panel design to assess the impact of the COVID-19 pandemic on the lives of beneficiaries by planning rapid response surveys to supplement the main MCBS. The first supplement was administered by telephone in Summer 2020 during the regular production cycle to existing MCBS sampled beneficiaries who were living in the community as a test of the COVID-19 rapid response methodology. The MCBS collected and released data from the COVID-19 Supplements on an expedited timeline by developing a standalone questionnaire that was simultaneously fielded alongside the main MCBS. This presentation will share the innovative operational strategies the MCBS used to conduct a field test of the COVID-19 Summer Rapid Response survey and subsequently implement additional supplements using the same methodology. We will discuss how these methods can be adapted to implement rapid response surveys on other emerging topics for large surveys in the future.

    The Impact of COVID-19 on Large-scale Phone Survey Productivity and Response Rates
    Matt Jans, ICF*
    Jamie Dayton, ICF
    Randy ZuWallack, ICF
    Don Allen, ICF
    Josh Duell, ICF
    Andy Dyer, ICF
    Thomas Brassell, ICF
    Sam Collins, ICF
    Traci Creller, ICF

    COVID-19 has impacted survey productivity and response rates. Due to stay-at-home orders and lay-offs, more people are at home full-time than ever in recent history, making households easier to contact. More interviewers are also working from home. Productivity may improve in this context compared to a centralized call center. This presentation compares outcome rates (e.g., contact, refusal, cooperation and response rates) from several ICF phone surveys conducted before and during COVID-19. The presentation addresses the following three questions about the effect of COVID-19 on phone surveys: 1) how have survey outcome rates changed since the start of COVID compared to the months prior to COVID, and, where available, the same months the prior year, 2) For interviewers who worked both from centralized call centers and from home, is their personal productivity similar or different in both contexts, and 3) depending on the return to centralized call centers, can we observe further or maintained productivity improvements, or performance regressions among interviewers who return to centralized calling?

    James Dayton, ICF*
    Don Allen, ICF
    Mary Penn, ICF
    Aprille Hairston, ICF

    We will share our experience and challenges moving from 100% on-site call center operations to 100% off-site remote operations over the course of a few weeks. We will cover the process to determine if needed call center technology can be safely and securely deployed in a home-based environment that is outside of company firewalls and in compliance with our internal data protection protocols and Internal Review Board (IRB) respondent protection requirements. Can the required technology be effectively deployed using consumer-level internet connectivity in interviewer homes where other household members may be attempting to work, attend on-line classes and have other broadband needs. Assessments establishing the suitability of interviewing staff home-based work environment will also be discussed. Finally, we will explore the safe deployment of equipment, development of interviewer technical support helpdesks and the required updates to our interviewer supervision, quality assurance processes and other management procedures to assure the success of our home-based interviewing staff who no longer have the luxury to interact face-to-face in a centralized work environment.

    Melissa Kresin, U.S. Census Bureau*
    Victoria Bookhultz, U.S. Census Bureau
    Nicole Cummings, National Center for Health Statistics
    Sonja Williams, National Center for Health Statistics

    National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) respondents are uniquely impacted by the coronavirus disease (COVID-19) pandemic, as they are on the front lines administering care to individuals impacted by COVID-19. As physicians' offices and hospitals adapted to providing care during the pandemic, the N(H)AMCS survey sponsor and data collection agency had the unique opportunity to modify existing methodology and introduce new procedures to encourage survey participation and to collect information on how respondents are impacted by the pandemic. To ease respondent burden during the initial phases of the pandemic, data collection start dates were slightly delayed, with further delays in areas of high COVID-19 infection rates. Data collection procedures were modified to include various remote abstraction methods at U.S. Census Bureau regional offices. COVID-19 survey questions were developed swiftly and implemented mid-way through 2020 NAMCS data collection. Field Representatives (FRs) were provided with monetary awards for completing cases and thank you letters or phone calls were made acknowledging their hard work.


    10:30 am - 11:55 am
    Concurrent Sessions
    Session 5A:  Software Development

    Alisu Schoua-Glusberg, Research Support Services Inc.*

    Setting up CASIC instruments requires close collaboration between survey design teams and programmers. Federal CASIC studies increasingly require setting up instruments in multiple languages, most frequently in Spanish. Translated instruments present challenges for computer-assisted setup, given grammar mismatches between languages. For example, English adjectives don't specify gender but are gendered in Spanish and require alternative endings. Fills don't work identically across the two languages and word order varies across the two languages, which also impacts setup. Contractors often rely on translators to both translate the survey questions and to adapt the programmer code by inserting the Spanish in a way that will deliver an administrable Spanish version. This puts translators in a quasi-programmer role for which they have no particular skill or training. We will make the case for training translators, so as to end up with a more finished product that involves the least additional processing for both programmers and translators. Examples from several federal surveys will be provided.

    Daniel Gillman, U.S. Bureau of Labor Statistics*

    The Data Documentation Initiative is a family of statistical metadata standards. DDI-2 Codebook and DDI-3 Lifecycle are in use, and the new DDI-4 Cross-Domain Integration (DDI-CDI) was released in April 2020 for public review. Statistical agencies were asked to comment. The version 1.0 release is due in June 2021. DDI-CDI addresses new issues that other standards don't. Administrative and other sources of data are now being used to augment and substitute for survey data. However, these data sources are often in new formats. DDI-CDI contains several new formats: key-value pair (for Big Data), event history data (in administrative records), and a new description of multi-dimensional data. A new general process model useful for describing how data are used once they are acquired is included. It does not just rely on the common processes used in traditional statistical surveys. This expanded treatment makes DDI-CDI applicable to a wide variety of data applications. The talk will briefly describe these new features and provide examples to illustrate the ideas.

    Building a Statistical Metadata Registry using ISO 11179, the Generic Statistical Information Model (GSIM) and the National Information Exchange Model (NIEM)
    Christopher N. Carrino, U.S. Census Bureau*

    In order to manage the hundreds of system interfaces required to run the 2020 Decennial Census, the US Census Bureau built a statistical metadata registry upon federal and international data standards. The Generic Statistical Information Model (GSIM) serves as the conceptual framework for the metadata registry. The National Information Exchange Model (NIEM) serves as the naming and design rules for the data elements within the registry. And the principles of the ISO/IEC 11179 Metadata Registry specification serves as both the governance framework and the theoretical basis for the registry. This presentation will show how the Census Bureau implemented and extended the various conceptual models into a production system to serve as the registration authority for metadata elements across the 60 plus systems within the Decennial enterprise.

    Gina Cheung, SRC, University of Michigan*
    Lon Hofman, CBS, Statistical of Netherlands

    Now Blaise 5 App (on Android and iOS) has been developed, and we can load Blaise 5 instruments on smartphones and tablets for interviewers to conduct interviews. However, we need to have a Case Management Application (CMA) to manage the production process. During the presentation, we will demo selected functions in CMA for the mobile/tablet device, such as:
    • Get production sample lines to interviewer mobile devices.
    • Installation and management of samples within interviewers' devices.
    • Launch Blaise 5 app instruments for data collection.
    • Administration of case statuses (record appointments, enter call notes, etc.).
    • Send survey data to the central database.
    • Export survey data and paradata from the server-side.

    Session 5B:  Data Collection from Establishments

    Nicholas Johnson, U.S. Bureau of Labor Statistics*
    Jean E. Fox, U.S. Bureau of Labor Statistics

    The Current Employment Statistics (CES) program of the Bureau of Labor Statistics recently developed and implemented a new web collection tool designed specifically for multi-worksite respondents, particularly targeted at mid-size firms (5 to 50 worksites). Historically, CES has found it challenging to develop efficient collection methods for firms of this size. Over the last five years, CES worked to develop an online solution that would ease reporting for firms of this type. This new online collection tool, which is a web-based spreadsheet entry form, was deployed in late 2018. We propose to discuss the work required to develop the application, including surveys and interviews of existing respondents, user experience design, and the challenges of implementation. The purpose of this presentation is to highlight the lessons learned during this experience.

    Melissa Krakowiecki, Mathematica
    Karen CyBulski, Mathematica*
    Kevin Manbodh, Mathematica
    Larry Vittoriano, Mathematica
    Matt Potts, Mathematica
    Herman Alvarado, SAMHSA

    The Substance Abuse and Mental Health Services Administration (SAMHSA) sponsors two annual multi-mode behavioral health surveys, the National Survey of Substance Abuse Treatment Services (N-SSATS) and the National Mental Health Services Survey (N-MHSS). Each year, Mathematica administers the surveys via the web and telephone to the directors of approximately 18,000 N-SSATS facilities and 15,000 N-MHSS facilities. Each web survey includes an open-ended question for respondent comments. While some use this field to clarify survey responses, many express feelings about the survey. SAMHSA and Mathematica continually strive to improve both surveys based on this input; and have used this feedback to modify survey content, formatting, navigation, and web specific features. This presentation will discuss several enhancements incorporated into both web instruments to reduce survey burden and improve user experience. First, we prefilled data for responses that are unlikely to change from year to year. Second, we added a feature that enabled respondents to report on multiple facilities in one session. Previously, they had to log in to each web instrument separately to complete.

    Expansion of E-mail in an Ever-Changing Data Collection Environment
    Mark Govoni, U.S Census Bureau*

    Economic Programs at the U.S. Census Bureau began using email as a contact strategy to provide respondents with log-in credentials in 2017 when we started to collect email addresses through our Respondent Portal. Currently over 25 surveys successfully use email as part of their collection strategies. Due to COVID-19, we have not only expanded email contacts, but in many cases use of snail mail has been curtailed or abandoned. Plans to use email in more adaptive collection strategies are also being explored. However, expanded email use has led to numerous challenges to ensuring the best and quickest email delivery rates - e.g., coding requirements that optimize the display of messages across browsers, dealing with invalid email addresses and bounce backs, respondent fatigue with too many emails and conveying legitimacy of the sender and the request. In this presentation, we will share lessons learned and future plans for email use and describe the challenges we are currently grappling with. We plan to generate discussion among FedCASIC participants about experiences using email to contact survey respondents, along with challenges and best practices for overcoming them.

    Susanne Johnson, U.S. Census Bureau*

    With shrinking budgets and declining response, it is paramount to implement cost effective data collection strategies to maximize response for the 2022 Economic Census and other Economic surveys of businesses and governments. We have developed a comprehensive collection strategy research program based on focus groups and cognitive testing, pilots and randomized experiments, and lessons learned. The research conducted prior to the 2017 Economic Census provided invaluable information to improve our methods. We will build on that research, exploring new technology and expanding adaptive design. This testing is designed to enable data-driven decisions for comprehensive, cost-effective collection strategies to maximize response for the 2022 Economic Census and other Economic programs. We will discuss the strategic process used to identify which collection strategy methods to test and in which reoccurring annual surveys to embed pilots and randomized tests. Plans include tests of paradata-based systems, tailored respondent messaging, expanded use of emails, and conversion of reluctant nonrespondents. We are excited to share our research plans and seek feedback from survey professionals.

    Session 5C:  Operations

    Thomas Brassell, ICF*
    Kisha Bailly, ICF
    Joshua Duell, ICF
    Randy ZuWallack, ICF
    Priscilla Martinez, ARG
    Deidre Patterson, ARG
    Thomas K. Greenfield, ARG
    Katherine J. Karriker-Jaffe, ARG

    A recent AAPOR Task Force report cited the impacts on telephone survey research of cell "blockers";. Specifically, the report noted how the recent increase in these technologies raises concerns for telephone studies given the potential for the misidentification and blocking of legitimate research calls. While the direct impact to studies is difficult to determine given the challenge of identifying whether a survey call has been incorrectly flagged, the report highlighted that such blockers have potential to increase survey costs, reduce response rates, and perhaps even to create a perceived link between reputable scientific research organizations and unethical and deceptive companies. The current research examines the effect of various strategies to mitigate spam flagging across two national surveys. The first study compares contact rates between an SMS-enabled and a non-SMS-enabled outbound number. The second study compares contact rates between a static SMS-enabled outbound number and two-week rolling SMS-enabled outbound numbers. The results will add much needed data to advance the ongoing struggle of survey research organizations to separate themselves from Spammers.

    Matthew Bensen, RTI International*
    Preethi Jayaram, RTI International

    RTI conducted a study for the University of Texas Southwestern Medical Center (UTSW) where one randomly chosen adult from households in Dallas and Tarrant counties would take a survey and then get tested for COVID-19. With a lower than desired response rate, a second protocol was added, using a convenience sample. RTI programmed web and CATI surveys and developed system integrations with UTSW's web site to schedule COVID-19 testing appointments. Participants were routed to the scheduling site, and scheduling status for each person was returned once daily. RTI followed-up by phone with those that did not complete an appointment. RTI addressed several challenges, including connecting systems and securely passing data across organizations. Some involved system design given distinct goals of survey completion and COVID testing. We will discuss challenges and successes related to: -Connecting to the scheduler -Assuring unique web and CATI CASEIDs in a context where a setting to establish an initial CASEID did not exist -Reaching out to participants who had not completed their testing, given that a person could be at one of several scheduling statuses.

    Beth Fisher, NORC, University of Chicago*
    Kate Bachtell, NORC, University of Chicago

    The COVID-19 pandemic has had paradoxical effects on survey operations. For the 2020 General Social Survey (GSS), it had the unexpected effect of strengthening opportunities for adaptive survey design and the applied use of R-indicators, statistics that assess the representativeness of a survey sample. In this paper we present the 2020 GSS Panel as a case study and describe how R-indicators were used to achieve a more balanced sample and strategically time closedown activities. We focus on the operational aspects of sample monitoring, field management, and incentives in a closely coordinated intervention. Under an adaptive design approach, we used the R-indicators results to inform data collection decisions. Cases were allocated into three groups in order to have a more targeted approach. The first group received a higher monetary incentive and extra telephone outreach by a field interviewer. The second group continued to receive outreach, but at the normal level. The third group did not received no further outreach and it is used as baseline for comparisons. Through this, project staff were able to leverage this opportunity to refine application of adaptive design.


    1:00 pm - 2:25 am
    Concurrent Sessions
    Session 6A:  New Data Collections Methods for the Commodity Flow Survey and HAZMAT Supplement

    Christian Moscardi, U.S. Census Bureau*

    The U.S. Census Bureau and Bureau of Transportation Statistics are exploring the feasibility of collecting more timely and voluminous shipment information that a company can easily obtain from their databases for the Commodity Flow Survey (CFS) in order to meet the needs of data users, improve the quality and usefulness of the data products, and reduce respondent burden. Rather than collecting a sample of shipment data through a questionnaire, the goal is to ingest a company's shipment records through a simple, secure file upload process. Unlike in the survey, companies have the flexibility to report each establishment separately or combined as a consolidated report, further reducing burden. In addition, respondents will report data in a format that requires minimal manual transformation, editing, or classification on their part by using machine learning for more burdensome survey items. This presentation will focus on how we accomplished this with a small number of companies during the previous CFS, how we then developed a new collection instrument, and findings from a pilot of that collection instrument with CFS respondents.

    Rebecca Keegan, U.S. Census Bureau*
    Kristin Stettler, U.S. Census Bureau

    The Commodity Flow Survey (CFS) collects detailed data on the movement of goods in the U.S. In order to ease burden, CFS procedures call for respondents to report on only a sample of their shipments. Previous cognitive testing and direct feedback from respondents revealed that creating this sample was difficult. A new process of data collection is being developed utilizing machine learning and providing respondents with access to a portal that will host a large amount of their shipment data, thus eliminating the need for respondents to create a sample. To facilitate this major change in data collection, qualitative research methods were adapted to obtain feedback from a wide variety of respondents. Exploratory interviews were conducted to explore the feasibility of implementing this new process on a large scale, and to assess the potential effects on respondent burden. The new platform then underwent usability testing. As more surveys move towards innovative designs utilizing advancements in automation to reduce respondent burden, this presentation demonstrates how traditional qualitative survey pretesting methodology can be adapted to evaluate and facilitate these transformations.

    Julie Parker, Bureau of Transportation Statistics*

    As part of the panel on CFS collection modernization, Julie Parker, the CFS program manager at BTS, will discuss the benefits of this new data collection tool for the CFS program and data products. By collecting more data, BTS/CFS can meet a variety of data user needs, including demand for more geographically granular estimates of shipment activity and requests for better measures of the diverse shipment activity happening in e-commerce. In addition, by improving the quality of data submitted by respondents, CFS can reduce cost and time required to validate and correct data, ultimately leading to higher-quality data products. Finally, this "digitally native" data collection instrument is a step towards more automated collection of shipment data, which may make an annual CFS data product more viable in the future.

    Krista Chan, U.S. Census Bureau*
    Christian Moscardi, U.S. Census Bureau

    In partnership with the Pipeline Hazardous Materials and Safety Administration (PHMSA), the Census Bureau is implementing a hazardous materials (hazmat) supplement to the Commodity Flow Survey (CFS). We are asking hazmat shippers to provide information about materials shipped and the packaging used to protect those shipments in transit. Hazmat packaging is federally regulated - the Code of Federal Regulations (CFR) specifies packaging for each hazardous material that can be shipped. However, these specifications are not in a structured data format - they are written as English natural-language text. We have used Natural Language Processing (NLP) techniques to categorize the packaging regulations into a structured data format. We can now combine this information with survey response data to produce richer data about hazmat packaging for PHMSA, while not burdening our survey respondents with the need to look through opaque regulatory text. Last, PHMSA will use this structured data to enable easier and more streamlined searching through the CFR, e.g. for companies that need to comply with hazmat packaging regulations.

    Session 6B:  Evaluating Online Data Collection from Establishments

    Temika Holland, U.S. Census Bureau*

    Establishment surveys conducted by the federal government often collect factual data requiring respondents to utilize records and other data sources in order to report. This additional effort has implications related to the overall experience of the respondent with the survey instrument, their reporting behaviors, and the quality of data obtained. Given the additional complexity of the establishment survey response process, advanced features like machine learning, have been explored to aid in reporting. As establishment surveys continue to migrate to the Web, there is an increased need for methodological research on the design and evaluation of new developments of online self-administered establishment surveys. Findings from recent research will be shared in order to provide guidance to survey researchers and practitioners on improved or alternative design elements for online establishment and similar, more complex household surveys in the federal government. Considerations for usability and overall reporting experience will also be discussed.

    Jean E. Fox, U.S. Bureau of Labor Statistics*

    Testing surveys before we field them is critical to help ensure that respondents provide the information we are looking for in the way that we need. Pre-testing helps identify situations where respondents interpret instructions in unexpected ways, where their data doesn't match the constructs we intend to measure, or where they just miss important instructions. Testing is important for both household and establishment surveys, and each has its own special considerations. This presentation will focus on options for pre-testing establishment surveys, and will cover topics such as the differences between usability testing and cognitive testing, options for remote testing, recruiting (and motivating) participants, and using scenarios/vignettes.

    Melissa Cidade, U.S. Census Bureau*

    The Economic Census is a mandatory survey conducted by the Census Bureau every five years. The survey collects data electronically from nearly 4 million establishments representing all U.S. locations and industries on a range of operational and financial topics. Additionally, one question series presents respondents with a list of products and services typical for their industry using the North American Product Classification System (NAPCS); respondents then select those that are appropriate to their firm, as well as the ability to write in additional responses not prelisted. Previous administrations of the form resulted in an inordinate number of write-in responses, which require outsized resources to code, clean, and analyze. To assist in the selection of products/services, and to potentially reduce the number of write-in responses, the upcoming 2022 NAPCS survey item plans to incorporate machine learning functionality. Additionally, because of the novel coronavirus global pandemic, we adapted our usability methodology to provide for remote interviewing. This presentation provides an overview of the usability testing methods we used in incorporating machine learning for the NAPCS item, as well as preliminary findings and recommendations for incorporating such a feature into an online survey.

    Sudip Bhattacharjee, University of Connecticut U.S. Census Bureau*
    Justin C Smith, University of Connecticut
    Ugochukwu Etudo, University of Connecticut

    We develop (1) a suite of tools that systematically gather public, textual, information on US establishments and (2) a natural language processing and machine learning based methodology to predict full 6-digit NAICS codes. We rely upon a novel mix of publicly available, commercial, and official data. Our sample consists of approximately 130,000 establishments across all 20 NAICS sectors (2-digit), and across approximately 500 national industry codes (6-digit). We implement an ensemble machine learning framework that relies on four constituent trained machine learning classifiers. We show that publically available, firm-sourced data is typically most discriminative when used to train our models to detect the correct NAICS code at the 2, 4 and 6-digit levels. We find that data sourced from commercial entities additional discriminative information. Model accuracies range from 70% to 95%, depending on the level of NAICS specificity. Accuracies increase further with other feature engineering additions. We evaluate model stability and other performance criteria. Our research can reduce both respondent and analyst burden while improving the quality of business classifications.

    Session 6C:  Multimode Survey Considerations

    Francois Laflamme, Statistics Canada*

    Statistics Canada has been consistently focused on identifying opportunities for strategic improvement in data collection approaches, as well as on innovative data collection methods which may be more aligned with current respondent communication preferences. To meet these requirements, the Agency has implemented new multi-mode collection strategies. While these changes were necessary, they have increased the complexity of the survey collection processes and the risk of not obtaining survey objectives. In fact, surveys post-mortem analysis have indicated that key survey planning assumptions were sometimes not aligned with the expected response rate or survey budget (or even both). In practice, both survey budget and survey response rate need to be based on realistic key planning assumptions in order to obtain and manage expected results. This paper describes Statistics Canada's experiences in planning, costing, managing and assessing multi-mode surveys that have both Web and Computer-Assisted Telephone Interview (CATI), including the impact of differences between planned and observed key planning assumptions on survey results, budget and cost.

    Hanna Popick, Westat*
    Mina Muller, Westat
    Eric Jodts, Westat

    This presentation will address data harmonization challenges in multimode studies and provide researchers data approaches to consider when a multimode data collection obtains inconsistent data across modes. Paper and web surveys each have different strengths, so there can be a benefit to offering each of these options to respondents during a data collection effort. One benefit of a web survey is that it enforces logic and response constraints. Though a paper survey can be designed to maximize clear instructions, one cannot prevent a respondent from selecting more than one response on a single-response question, for example. The presentation will first address the process of harmonizing the data from multiple modes as well as the advantages and challenges of harmonized data that researchers should consider. The second part will focus on data strategies that can be applied when data for the same questions are inconsistent across modes. Specifically, the presentation will cover different question types, provide examples and address considerations that should be taken while editing the data to result in a single harmonized dataset.

    Randy ZuWallack, ICF*
    Matt Jans, ICF
    Thomas Brassell, ICF
    Kisha Bailly, ICF
    Priscilla Martinez, ARG
    Deidre Patterson, ARG
    Thomas K. Greenfield, ARG
    Katherine J. Karriker-Jaffe, ARG

    The National Alcohol Survey (NAS) has been conducted as a random-digit dial (RDD) since 2000. However, recent changes in respondent behavior towards telephone interviewing have necessitated a transition of the latest cycle to a multi-mode design. The current NAS cycle employs a multimode design consisting of both a national RDD and ABS frame, as well as a non-probability web panel. Building upon prior research measuring alcohol consumption through a self-administered and interviewer administered modes, our present research focuses on distinguishing between the effects of administration mode and the effects of a non-probability panel versus an ABS push to web. The multi-mode, multi-frame design allows for the measurement of a mode effect of conducting the survey by CATI and web collection. In addition, the design allows for a comparison of a self-administered probability-based sample with a nonprobability sample. Using a regression model, we estimate the mode effects and nonprobability panel effects after controlling for demographics. We present the results of the modeling and discuss mode adjustments as a means of transitioning surveys from RDD to an alternative mode.

    Matthew Bensen, RTI International*
    Sridevi Sattulari, RTI International
    Megan Waggy, RTI International
    Jennifer Hardison, RTI International
    Hannah Feeney, RTI International
    Mike Price, RTI International

    RTI International collected data on user perceptions of the National Human Trafficking Hotline for the Administration for Children and Families, U.S. Department of Health and Human Services. A fully participating respondent completed two surveys. Given that some participants could be in potentially dangerous situations, we offered multiple ways to participate in the second survey and receive an incentive. We did not mention the study topic in our communications. We never had nor collected a name or address. Given this, and that we were offering an incentive in the second survey, we asked participants to create a password that would be used to enter the second survey and developed a solution that considered that participants might create the same password. Finally, we needed to collect data close to the time of the reported incident. This presentation will show how we: -designed our case management system to accommodate various pathways -designed our communications to not convey the study topic -used preference data for second survey execution -expired cases so that we would only receive current data.


    2:30 pm - 2:55 pm
    Posters and Demonstrations III
    P&D Session 3A:  Meet the presenters

    Joe J. Murphy, RTI International*
    Michael A. Duprey, RTI International
    Rob Chew, RTI International
    Rebecca Powell, RTI International
    Katie Lewis, U.S. Energy Information Administration

    In the age of paradata, the amount of information available to inform decisions during data collection can be overwhelming. Furthermore, adaptive, responsive, or tailored designs require the survey team to monitor critical-to-quality indicators to minimize total error across data sources. To aid decision-making in a data-rich context, visualization can serve as a valuable tool to express data. In this presentation, we describe the process of designing a tool called the Adaptive Total Design (ATD) Dashboard designed to monitor and visualize data from multiple sources to track experimental, multimode, and longitudinal survey designs in near-real time. Data inputs may be from various systems and may exist at multiple units of analysis, thus we have constructed a data taxonomy to allow only logical instantiations. By employing an extensible app framework for R (Shiny) the dashboard standardizes visualizations and reports. We present examples from the 2020 Residential Energy Consumption Survey (RECS) illustrating the functionality of the dashboard. For RECS, the dashboard was used to closely monitor trends from the first phase of data collection to inform the design of the second phase.

    Pascal Heus, Metadata Technology North America*
    Andrew Decarlo, Metadata Technology North America
    Carson Hunter, Metadata Technology North America
    Jack Gager, Metadata Technology North America

    Ever wonder how to deliver data to researchers, applications developers, data scientists, or the public in a modern and effective way? This presentation will demonstrate the use of Rich Data Services (RDS), an innovative platform from Metadata Technology North America, designed to concurrently deliver data and metadata as a service to both users and applications. RDS was built to reduce data wrangling and empower information systems. Based on IT industry-standard REST technology and informed by global metadata standards, it enables immediate access to data and metadata by developers or data scientists. RDS also comes with web-based applications to allow casual users to explore and tabulate the data in a browser, or download for offline analysis. RDS opens endless capabilities for data discovery, analysis, storytelling visualizations, machine learning, and more. The platform is backed by MTNA's extensive expertise in data management and information technology. For information, see: Rich Data Services: https://www.richdataservices.com COVID-19 Data Center: https://covid19.richdataservices.com Public Data Center: https://public.richdataservices.com MTNA: https://www.mtna.us

    P&D Session 3B:  Meet the presenters

    A Novel Protocol for Remote Usability Testing of a Wireframe User Interface
    Alda G. Rivas, U.S. Census Bureau*
    Erica Olmsted-Hawala, U.S. Census Bureau
    Lin Wang, U.S. Census Bureau

    Human-centered design (HCD) emphasizes iterative usability testing throughout the life cycle of system development to ensure optimal user experience. In this presentation, we describe a protocol for remote usability testing of a wireframe for a data dissemination tool. The protocol involves presenting the user with the wireframe through screen-sharing, the user verbalizing the actions they would perform on the wireframe, and the researchers performing those actions on behalf of the users. The findings from the usability sessions allowed us to provide the designers with recommendations on how to improve the user experience by addressing identified usability issues before programming the full system. The implementation of our protocol allowed us to remotely conduct the usability evaluation of a wireframe while minimizing loss of user behavior data (e.g., clicking on inactive links).

    Rachel Kinder, ICF*
    Robynne Locke ICF
    James Dayton, ICF
    John Jasek, NYC DOHMH
    Eleni Murphy, NYC DOHMH

    Longitudinal surveys are valuable for assessing the impact of a program or intervention over time, such as public health campaign effectiveness. However, collecting longitudinal data presents numerous challenges, including the cost of recruiting and retaining panel members. The New York City Department of Mental Health and Hygiene (NYC DOHMH) and ICF have conducted three waves of a longitudinal survey to measure the impact of tobacco cessation programming on the smoking cessation behaviors of adults in NYC. The NYC Tobacco Cessation Panel Survey study includes a baseline survey and three waves of follow-up surveys administered over one year. Along with surveying existing nonprobability panel members, ICF explored alternative recruitment methods, including in-person, social media (Facebook), and online marketplaces (Craigslist) because no one mode alone could provide sufficient eligible sample (adults who smoke). In this presentation, we will describe the advantages and disadvantages of each method, comparing panel retention, demographic coverage, data quality, and cost-effectiveness. We will also present the overall impact of SMS panel retention methods across all recruitment.

    Lavaughn Cadiz Gooden, Westat*
    Victoria Hoverman, Westat
    Andrew Caporaso, Westat
    Jennifer Crafts, Westat
    Douglas Williams, Westat
    Kathryn Aikin, FDA
    Helen Sullivan, FDA

    Eye-tracking is used to assess attention to defined areas or features of stimuli in participants' visual fields. Stimuli can range from survey forms to informational and marketing products. Eye-trackers attached to computer monitors are effective for screen-based stimuli, but are sub-optimal when measuring attention to real-world stimuli such as multi-page paper ads with which respondents interact. Therefore, Westat used eye-tracking glasses to assess ad-viewing behavior for a large-scale data collection in six U.S. cities for the Food and Drug Administration. The study objective was to investigate the effects of repetition and location of risk information in direct-to-consumer print prescription drug ads on risk recall, recognition, and comprehension. Participants who had one of two medical conditions wore eye-tracking glasses as they viewed a fictitious ad related to their medical condition (N = 422). After reading a randomized version of the ad, each participant completed a web questionnaire about that specific ad. This presentation will report on adjustments made between a pretest and main study to optimize data quality and will supplement results from a prior FedCASIC poster.

    P&D Session 3C:  Meet the presenters

    Monica Polino Schneider, Decision Information Resources, Inc.*
    Heather Morrison, Decision Information Resources, Inc.

    It is a known trend that survey response rates are declining. Multi-mode surveys are one way to combat this trend, but aside from concerns about mode effects, there are technical challenges to managing cases across multiple modes. Existing survey software packages typically cannot provide real-time status across modes. Similarly, few support the cross-mode management of cases that have multiple contact types (respondent/alternate) with multiple pieces of contact information (phone number, address, email address). Some organizations build proprietary systems to address these challenges, but smaller organizations are typically left adapting existing products, with mixed success. In this presentation we will discuss how we adapted our Voxco system's Computer Assisted Telephone and Web Interviewing (CATI and CAWI) modules to develop a multi-mode case management system that could accommodate complex contact information. We will discuss key system features, functionality, benefits, and drawbacks. We will also discuss next steps including: the development of advanced case management rules, the capture of paradata, and the development of safeguards to prevent incorrect case assignment.

    Donna Perlmutter, IMPAQ International*
    Kelsey Walter, IMPAQ International
    John Wendt, IMPAQ International
    Noelle Poirier, IMPAQ International
    Teerachat Techapai, IMPAQ International
    Margaret Collins, IMPAQ Internationali

    IMPAQ conducts projects using TTY machines to make outgoing calls to test the capabilities of various locations to handle calls from hearing impaired people. IMPAQ launched an effort to investigate and test a digital TTY software to replace the current TTY machines. The software-based TTY allows the TTY communications to use existing telephony infrastructure. The program communicates through IP and uses standard session initiation protocol (SIP). Our initiative for upgrading IMPAQ's TTY capabilities include: Modernize the system with current technology: analog devices to digital devices, More efficient call logging and monitoring, Eliminate process, management, and storage of TTY paper tapes, Digital TTY support use by remote-based interviewers: desktop TTY required staff to be on site. Our challenges to overcome and provide detail to our client were; 1) can we connect to the locations, 2) does it take a similar amount of time to operate a digital system, and 3) do we have documentation for the call. We needed to understand the difference in utilizing TTY digital tapes rather than TTY paper tapes in terms of time, efficiency and clarity.


    3:00 pm - 4:25 am
    Concurrent Sessions
    Session 7A:  Updates from the Survey of Consumer Finances

    Kate Bachtell, NORC, University of Chicago*
    Micah Sjoblom, NORC, University of Chicago
    Catherine Haggerty, NORC, University of Chicago
    Shannon Nelson, NORC, University of Chicago
    Steven Pedlow, NORC, University of Chicago
    Joanne Hsu, Board of Governors of the Federal Reserve System

    In this paper we share results from two distinct approaches to incentive escalation implemented for the 2019 Survey of Consumer Finances (SCF). The SCF is funded triennially by the Board of Governors of the Federal Reserve System (FRB). Both escalation approaches were informed by well-documented, positive effects of monetary incentives on survey response (Godwin 1979, Church 1993, Goritz 2006, Singer and Couper 2008, Hsu et.al 2017), but varied considerably in design and execution. For the first approach, we developed an algorithm to identify SCF households that presented the most challenges for data collection and devised an experiment to isolate the impact of offering an escalated incentive - double the amount of the initial offer - beginning in week 11 of the field period. For the second approach, we worked closely with our field management team to design localized incentive escalation efforts that leveraged the presence of specialist interviewers in distinct frame areas. In this paper we highlight challenges balancing cost and other operational considerations, and examine the overall efficacy on the probability of survey participation, for each approach.

    Shannon Nelson, NORC, University of Chicago*
    Catherine Haggerty, NORC, University of Chicago
    Nella Coleman, NORC, University of Chicago
    Kate Bachtell, NORC, University of Chicago
    Micah Sjoblom, NORC, University of Chicago
    Steven Pedlow, NORC, University of Chicago
    Jesse Bricker, Board of Governors of the Federal Reserve System

    The Survey of Consumer Finances (SCF) is the premier source of information on the financial circumstances of American households. It is used by researchers and policymakers to inform important monetary policy impacting individuals, households, businesses and the overall economy. The accuracy and integrity of the SCF data is paramount. The SCF has a long tradition of engaging in continuous improvement across all survey processes. While the processes and procedures used to validate interviewer work has been examined and small changes made each round, the methods used have remained largely constant, with the first two interviews and a random ten percent of cases completed thereafter selected for validation. The use of tablets during the 2019 round allowed the robust set of validation data points to include the use of real-time tracking software to examine multiple GPS data points and the collection of electronic signatures from respondents which proved to be an additional means to identify potential falsified data. In this presentation we will review the standard validation measures used by the SCF in past rounds and describe a new proprietary data falsification system.

    Lisa Lee, NORC, University of Chicago*
    Richard Windle, Board of Governors of the Federal Reserve System
    Catherine Haggerty, NORC, University of Chicago
    Shannon Nelson, NORC, University of Chicago
    Frankie Duda, NORC, University of Chicago
    Kate Bachtell, NORC, University of Chicago
    Micah Sjoblom, NORC, University of Chicago
    Steven Pedlow, NORC, University of Chicago

    The SCF collects personal financial data that is both complex and sensitive, potentially affecting likelihood to reply via the web. In recent years, a number of studies have explored the use of web and mobile surveys to collect household financial data. (Jackle et al., 2017; see also Lessof et al., 2017 and Read, 2017). The results of these studies are promising and informed potential designs to include in the 2019 SCF web survey. However, it is important to note that the studies completed to date have not collected financial data via the web at the level of detail required for the SCF. The 2019 SCF included a test to allow for an assessment of how the SCF would perform in a self-administered web context. The web test included a subset of the SCF questionnaire with a range of different question types. The test successfully concluded with 222 respondents completing both a web and an interview-administered instrument. We present preliminary findings from this test including the web test methodology and the quality of the data collected.

    Heather Sawyer, NORC, University of Chicago*
    Kate Bachtell, NORC, University of Chicago
    Catherine Haggerty, NORC, University of Chicago
    Shannon Nelson, NORC, University of Chicago
    Micah Sjoblom, NORC, University of Chicago
    Kevin Moore, Board of Governors of the Federal Reserve System
    Jesse Bricker, Board of Governors of the Federal Reserve System
    Richard Windle, Board of Governors of the Federal Reserve System
    Joanne Hsu, Board of Governors of the Federal Reserve System

    The Survey of Consumer Finances (SCF) is the most comprehensive source of household financial data in the U.S. It collects a broad range of financial information, which can often be complex in nature. Field interviewers are the main conduit between the survey instrument and survey participants, and as a result, minimizing interviewer error is an important aspect in achieving high data quality. Beginning in 2004 the SCF has given interviewers detailed feedback about the quality of their work throughout the data collection period. In 2019 the project team implemented ongoing field interviewer training with an improved and enhanced individually tailored Data Quality Report. The report, generated from case reviews of each interview as they were completed, provided analyses of data quality and both praised good work and highlighted survey errors. The goal was to provide targeted feedback and lesson plans to address each interviewer's unique learning needs. Doing so on a large-scale survey project presents many challenges. This paper describes interviewer feedback efforts and the operational challenges.

    Katie Archambeau, NORC, University of Chicago*
    Kate Bachtell, NORC, University of Chicago
    Cathy Haggerty, NORC, University of Chicago
    Shannon Nelson, NORC, University of Chicago
    Micah Sjoblom, NORC, University of Chicago
    Steven Pedlow, NORC, University of Chicago
    Kevin Moore, Board of Governors of the Federal Reserve System

    Adaptive Survey Design (ASD) involves strategies that inform adjustments in data collection procedures based on quantifiable metrics (Groves and Heeringa, 2006). R-indicators may be used within ASD to estimate the degree to which the sample represents the larger population and/or which sample segments are under- or over-producing (Schouten et al., 2009). This information may then spur interventions to improve the representativeness of key subgroups, reduce effort on 'unproductive' cases, and streamline survey operations (Cohen, 2019). In this paper we describe the use of R-indicators for the 2019 Survey of Consumer Finances (SCF), funded by the Federal Reserve Board. We first discuss the process of computing the R-indicators using data from the 2016 and 2019 SCF area probability samples, along with population estimates from the American Community Survey. We then present retrospective results from the 2016 SCF and discuss implications for 2019. Finally, we share findings for the 2019 SCF area probability sample. We contribute to a larger body of work on R-indicators by assessing representativeness and improving efficiency and data quality in survey research.

    Session 7B:  User Experience and Accessibility

    Ian S. Thomas, RTI International*
    Helen Ray, RTI International

    We present a process for creating analysis and visualization tools that improve the accessibility and usability of public use administrative and survey data. By incorporating user experience methodologies into the software development process, we've created web applications that makes it easy for researchers to find public use datasets; explore the contents; and perform, share, and visualize their findings. We describe our user centered design and development process of clearly identifying the target users, developing the user interface by iteratively refining the user interface through user experience testing, as well as our techniques for ensuring for accuracy and speed. To demonstrate this process in we will show a case study to highlight the use of user-centered design and agile software development methodologies for creating data analysis and visualization products. We conclude with lessons learned, applying these ideas to survey design and recommended best practices to develop products that foster accessibility and usability.

    Scott Crawford, SoundRocket*
    Rob Young, SoundRocket

    As we work to adapt web-based surveys to various devices, it is also important to consider how design may impact those who rely on assistive technology. Federal Section 508 compliance standards have been around for a long time -- but the survey research industry has often selected the path of using alternative (non-technology) methods for including disabled individuals in our surveys. However, taking steps to ensure a more equitable experience (not just accessible) will help ensure that the most comparable data is captured. Using assistive technologies (screen readers, mouse input grids, voice, keyboard navigation, etc.) allows a segment of the population who may otherwise not have responded a chance to participate in research in a comparable way to others. In this presentation we will report on our experience in a web-based campus-wide climate survey of diversity, equity and inclusion. We will share our experiences in ensuring an equitable experience for those with sight and movement disabilities. We will describe how we adapted an off-the-shelf survey package to meet the population needs.

    Steve Gomori, RTI International*
    Mai Nguyen, RTI International
    Charlie Knott, RTI International
    Sue Pedrazzani, RTI International
    Frank Mierzwa, RTI International

    RTI has developed web-based surveys on many projects. To enhance usability and data quality, RTI developed forms for highly complex question layouts and more interactive behaviors. For the AURORA cooperative agreement, RTI was able to develop survey forms that have complex layouts with interactive elements that are enabled/disabled or change their behaviors based on actions taken by the user on the same form. We developed instruments capable of supporting both the interview-led and self-administered modes despite the two modes having different question wording, different response options and separate validations. RTI introduced an interactive body manikin into a web survey allowing the respondent to select regions on the body map that uses branching logics to launch subsequent diagnosis module(s). To enhance clinic physiological sensory protocol adherence and quality measures, we embed timers to direct clinic staff in completing temporal pain measures (i.e., threshold, tolerance) and cuff algometry. In our presentation, we will provide live demos of a few of these advanced customizations to illustrate some of these unique capabilities in our web-based survey systems.

    Session 7C:  Development and Evaluation of a Web- based Instrument Prototype

    Moderator:
    Lin Wang, U.S. Census Bureau
    Panelists:
    Alex Cohen, U.S. Census Bureau
    Shelley Feuer, U.S. Census Bureau
    Jonathan Katz, U.S. Census Bureau

    Identifying vacant housing in major household surveys is a crucial field operation for reducing nonresponse. This operation is currently being carried out by sending interviewers to the field to identify vacant units, which is costly. With the increasing popularity of online map, one way to reduce fieldwork is by asking the public to help report vacant units in their neighborhood using an interactive online map. In this session, we will first demonstrate the prototype of a online-map interface and discuss technical innovations, including house search strategies and the use of Apple Maps and Google Street View APIs; then we will report on the usability evaluation of the prototypes, Finally, we will present findings related to messaging, i.e., how to effectively get people to share information about their neighborhood. The three presentations provide a convergent story about how to motivate the public to provide data about their neighborhood, how many homes make up a "neighborhood" and what map interfaces help the public to navigate their neighborhood.

     

    Contact us:

    For questions about the FedCASIC workshops or technical issues with the FedCASIC website, please send an email to FedCASIC@census.gov.


    Source: U.S. Census Bureau, ADSD
    Last Revised: May 23rd, 2023