2021 Federal CASIC Workshops

Day 1: Tuesday, April 13

9:00 am - 9:55 am

Opening Session

Welcoming Remarks
Keynote Address: How Dealing With 2020 Changed Us

Michael Thieme, Assistant Director for Decennial Census Programs at the U.S. Census Bureau

10:00 am - 11:25 am

Concurrent Sessions

Session 1A: Alternative Data Sources

Modernizing Data Collection in Canada

Sylvie Bonhomme, Statistics Canada*

Statistics Canada has recently increased its emphasis on researching and introducing such innovative collection methods for household surveys. As a result, response rates have stabilized and costs have been managed effectively over the last few years. The first part of the presentation will describe the initiatives that successfully contributed to alleviating the downward trend in response rates. However, continued research is required on new data collection methods and techniques, as the downward trend in response rates could return, along with resulting cost increases to limit it. As a result, Statistics Canada is researching more advanced approaches, which might change its primary data collection more dramatically by complementing or replacing traditional collection. The next steps are thought to lead towards completely new data collection techniques, such as sensor and scanner use, crowdsourcing, web scraping, automated voice interface use, and other innovative methods. The second part of the presentation will describe some of the experiments, risks and opportunities that are being considered at Statistics Canada.

Learning to Crawl Before You Scrape: Strategies in building a frame through web scraping

Michael Gerling, U.S. Department of Agriculture National Agricultural Statistics Service*
Samuel Garber, U.S. Department of Agriculture National Agricultural Statistics Service
Tyler Wilson, U.S. Department of Agriculture National Agricultural Statistics Service

Since 2015, the National Agricultural Statistics Service (NASS) has explored the use of building a survey list frame of agricultural operations from open source information. To do this, one must learn to crawl before beginning to scrape. In this presentation, the methods of web crawling NASS has found to be efficient for identifying pertinent agriculture websites within the vast sea of internet information are presented. After web crawling, the scraping processes are described including the list frame information necessary for current and future needs. The layout and format resulting from these processes are discussed. Finally, the laborious processes used in data cleansing (complete a record's missing information, mark duplicate records, etc.) are reviewed with specific emphasis on pragmatism and working efficiently within the federal working environment. The reasons for choosing to automate some processes and to conduct others manually are discussed. Although the methods are discussed in the context of NASS's development of a survey list frame of hemp farms, the techniques and strategies highlighted are broadly applicable.

Using Machine Learning to Abstract Health Insurance Booklet Cost Sharing for the Medical Expenditure Panel Survey

Monica L. Wolford, AHRQ*
Sandra Pope, SoftDev Inc.
Patricia Keenan, AHRQ

In 2020 the Agency for Healthcare Research and Quality collected health insurance booklets from individuals participating in the Medical Expenditure Panel Study for the purpose of abstracting data on health insurance cost sharing such as deductibles and copayments/coinsurance. Policyholders were asked to call or access their insurance company websites to request documentation and return documents either by mail or by uploading them to the MEPS website. Once received, the unstandardized files were converted to PDF files. Camelot, a Python library, was used to extract tables and Cloud DLP was used to redact sensitive text from images. The MEPS Abstraction Tool leveraged learning capabilities to enhance image recognition, computer vision and natural language processing of the booklets text. After ingestion, the MEPS Abstraction Tool was used by trained abstractors to confirm and supplement the ingestion results. The Tool provided a dashboard to monitor abstraction progress by both the abstractors and the Quality Control staff.

Transfer Learning for Auto-Coding Free-Text Survey Responses

Peter Baumgartner, RTI International*
Murrey Olmsted, RTI International
Amanda Smith, RTI International
Dawn Ohse RTI International
Bucky Fairfax, RTI International

Coding responses from free-text, open-ended survey questions (i.e., qualitative analysis) can be a labor-intensive process. The resource requirements for qualitative coding can prevent researchers from extracting value from free-text responses and can influence decisions about the inclusion of open-ended questions on surveys. Machine learning (ML) has been proposed as a potential solution to alleviate coding burden, but traditional ML methods for text classification require large amounts of training data usually not available from surveys. With that problem in mind, we evaluated a ML approach that used responses from an open-ended question on a 2018 employee survey to train a model that predicted a set of codes applied to the same question on the 2019 survey. A coding team then adjudicated these predictions and provided coding corrections when applicable. We achieved promising performance despite an original training dataset of under 3,000 survey responses by using both data augmentation and recent advances in transfer learning models for natural language processing.

Session 1B: Online Diaries

Testing a Device Optimized Online Diary for Expenditure Data Collection

Parvati Krishnamurty, U.S. Bureau of Labor Statistics*

As response rates decline and costs of fielding traditional in-person, phone, and mail surveys increase, many surveys are considering alternate methods of data collection including web surveys. The Consumer Expenditure Surveys (CE) recently completed a test of a browser-based device optimized online diary prior to its implementation into production. The online diary is designed to collect information on small, frequently purchased items over a two-week period that is currently collected in a paper diary. The test was fielded from October 2019 to April 2020 and the goal was to identify any methodological, operational, or technical issues with the use of online diaries in the CE Diary Survey. This presentation will discuss design challenges, usability, operational issues, and future plans for online diary data collection.

Assessing and Improving Data Entry in Survey Instrument through Card Sorting Behavioral Modeling

Lin Wang, U.S. Census Bureau*
Anthony Schulzetenberg, U.S. Census Bureau
Alda Rivas, U.S. Census Bureau
Heather Ridolfo, U.S. Department of Agriculture National Agricultural Statistics Service
Shelley B. Feuer, U.S. Census Bureau

In designing a mobile survey, data loss and data accuracy are two particular concerns to survey designers. Data entry is a crucial task in survey data collection because entering inaccurate data or failure to enter data increases survey measurement error and non-response error. In the present study, the authors implemented an experimental approach to developing an optimal data entry model based on empirical behavioral analysis, using a national sample survey on households' food acquisition as a case study. A paradigm of sequential card sorting was developed to simulate the process of respondent's entering food information. Based on the findings from the three card sorting studies, we came to the recommendation of the following data entry order: Food acquisition location, food items (food item name, quantity and unit, food item cost), payment method. This study demonstrates that card sorting techniques combined with rigorous experimental design can be an effective method for mobile survey design research.

Is "Proof of Purchase" Really Proof?

Adam Kaderabek, Institute for Social Research University of Michigan*
Brady T. West, Institute for Social Research, University of Michigan
John A. Kirlin, Kirlin Analytical Services
Elina T. Page, U.S. Department of Agriculture Economic Research Service
Jeffrey M. Gonzalez, U.S. Department of Agriculture Economic Research Service

The USDA's first National Household Food Acquisition and Purchase Survey (FoodAPS-1) was a nationally representative survey that collected data about household food purchases and acquisitions. In advance of designing FoodAPS-2, a subsequent Alternative Data Collection Method (ADCM) study was conducted and requested that respondents use a web application to also scan and submit receipts for reported purchases. A validation of FoodAPS ADCM data was conducted using the submitted receipts. The objective of the receipt validation was to confirm the accuracy of respondent-reported expenditure data using the scanned receipts. The total cost, number of items, and item prices reported were key variables of interest. The validation effort also revealed that certain properties of the receipts were directly influencing their efficacy for data validation. This presentation will discuss the accuracy of FoodAPS events with a corresponding receipt and articulate the properties of receipts that were most influential during validation.

Plans for Using a Native Smartphone Application in FoodAPS-2 to Collect Detailed Information on Food Acquisitions

Jeffrey M. Gonzalez, U.S. Department of Agriculture Economic Research Service*
Mark Denbaly, U.S. Department of Agriculture Economic Research Service
Linda Kantor, U.S. Department of Agriculture Economic Research Service
Elina T. Page, U.S. Department of Agriculture Economic Research Service
John A. Kirlin, Kirlin Analytic Services

The USDA's National Household Food Acquisition and Purchase Survey (FoodAPS-1) was the first nationally representative survey of U.S. households to collect unique and comprehensive data about household food purchases and acquisitions. Development of survey's second round, FoodAPS-2, is underway and its design and data collection protocols draw on the lessons learned from FoodAPS-1. Additionally, changes in the surveying environment and in how people acquiring food precipitated a need to leverage advancements in web, mobile, and other digital technologies to combat concerns associated with data quality, including nonresponse and underreporting, respondent burden and fatigue, and significant backend data processing times. This session presents the current plans for FoodAPS-2 which will be evaluated in a forthcoming large-scale Field Test. We'll provide an overview of the key survey design features, present an in-depth look at a native smartphone application (the primary mode of data collection), highlight how the application uses the smartphone's built-in features, and discuss plans for leveraging extant databases to reduce burden and improve quality in real time.

Session 1C: 2020 Decennial Census

Conducting Remote Focus Groups with the 2020 Census Customer Service Representatives

Elizabeth Nichols, U.S. Census Bureau*
Shelley Feuer, U.S. Census Bureau
Erica Olmsted-Hawala, U.S. Census Bureau
Jasmine Luck, U.S. Census Bureau

For the 2020 Census, over 13 million calls came into the Census Questionnaire Assistance (CQA) telephone help line. The telephone help line supported 14 languages with over 9,000 Customer Service Representatives (CSRs) hired across 10 U.S. call centers. To help evaluate the operation, focus groups with a sample of CSRs were planned to be in person at each call center, but due to COVID-19 travel restrictions, moderators from the Census Bureau conducted them remotely using Skype for Business. CSRs were at their call centers using social distancing requirements, and moderators and observers were at their respective homes. Census Bureau staff had conducted remote focus groups throughout the decade with call center agents during the census tests and were familiar with the protocol, but the addition of the social distancing requirements at the call centers led to some new challenges. In the talk, we will share our procedures for conducting remote focus groups, lessons learned, and suggestions for how to conduct remote focus groups when social distancing requirements are and are not necessary.

The Role of Real Time Analysis of Web Paradata in Assessing and Monitoring 2020 Census Data Collection Process

Lydia Shia, U.S. Census Bureau*

In 2020, the U.S. Census committed to using the internet as a primary response option for the first time. The increasing use of web and mixed-mode survey allowed improved Census awareness and promoting self-response in a cost-effective manner. Unlike traditional datasets, paradata contains information on direct interactions between the respondents and the web instrument. The dataset displays user behaviors and instrument performances rather than the outcome of responses. It is constructed by sessions with bundles of user activities to give great insight into the response process in the self-administered web instrument. Paradata collects a variety factors that reflects respondent attitudes towards Census instrument, such as navigation behaviors (number of logins, breakoffs and completion time), response sufficiency, user characteristics (device, ID and browser types) with different levels of details. This presentation provides an overview of the use and application of paradata to improve the quality of 2020 Census outcome. It introduces real-time quality measurements and resolutions conducted based on respondent behaviors towards 2020 Census internet instrument.

Using Source Tracking URLs in the 2020 Census Paradata to Monitor and Assess the Mobile Questionnaire Assistance Operation and Digital Advertising Campaign

Brett Moran, U.S. Census Bureau*

The 2020 Census paradata contains special links called Source Tracking URLs. In this presentation, we will discuss how we used these URLs to monitor the Mobile Questionnaire Assistance (MQA) operation during the census, and how we are currently using the URLs to assess both the MQA operation and the 2020 Census Digital Advertising (Digital Ad) campaign. The MQA operation involved sending Census representatives to low self-responding areas to encourage response to the census either through respondents' own devices or through interviews with representatives. We will discuss how we used paradata URLs to monitor the MQA operation in real time, how that monitoring contributed to the operation's overall success, and how we are using the URLs along with response data to assess the operation. The Digital Ad campaign created and deployed numerous digital advertisements aimed at encouraging response to the 2020 Census. We will discuss how we are using the paradata URLs to assess the effects of the campaign on response rates for different demographic groups. Finally, we will discuss some of the limitations of the 2020 Census paradata, as well as recommendations for future research.

2020 Census Mobile Device Asset Management

Dave Hackbarth, U.S. Census Bureau
Frank Fisiorek, U.S. Census Bureau
Mark Markovic, U.S. Census Bureau*

More than 731,000 mobile devices were acquired through a Decennial Device as a Service (dDaaS) contract in support of the 2020 Census. The program included key milestones and encountered numerous logistical challenges. Such as: -The vendor purchasing, provisioning, and shipping over 731,000 total devices for use in multiple field ops, with varying end-user requirements; -Distribution of devices to 255 offices through a partnership with UPS; -Asset Management -the Intelligent Tracking and Management System (ITMS) was developed to manage the order, shipment, delivery and custody transfers of devices; -Training- development of accountability procedures, and training/guiding staff through the rollout of the ITMS; - Break/Fix - replacing broken devices during ops; - Retrieving devices from the end-users and return to the vendor; and - Reconciling the status of all devices in the program with the vendor. The presentation will elaborate on the challenges, lessons learned, and process improvements that were implemented for key activities encountered throughout the program. We will conclude with a set of statistics highlighting the effectiveness of the overall program.

11:30 am - 11:55 am

Posters and Demonstrations

P&D Session 1A: Meet the presenters

Using PDF Extraction and Web Scraping Tools to Collect Government Health Insurance Plan Information

Martha (Virginia) Gwengi, U.S Census Bureau*

The Census Bureau serves as the data collection agent for AHRQ for the Medical Expenditure Panel Survey-Insurance Component (MEPS-IC). The survey collects data on health insurance from private and public sector employers. In this work, we focus on the 1,000 government units that are sampled with certainty. Certainty government units are sampled every year and to reduce the burden on these respondents, they are only asked to respond to unit-level questions and some health insurance questions. The respondents are then asked to upload plan forms to the data collection instrument or to provide websites where Census analysts can search for the plan forms and manually extract the remaining health insurance information. This transfers burden of response to Census analysts. We seek to lessen this burden using data extraction and web scraping tools. We create a tool to extract Summary of Benefits and Coverage (SBC) forms which have a standardized format that was mandated by the Affordable Care Act. The extraction tool collects the information faster than the manual process while the web scraping tool allows us to crawl each webpage and search for the SBC forms more efficiently.

Integrating Survey-collected, Commercial, and Administrative Prescription Medicine Data

Grace Jaroscak, NORC, University of Chicago
Kali Defever, NORC, University of Chicago*

The Medicare Current Beneficiary Survey (MCBS) is a continuous, multipurpose survey of a nationally representative sample of the Medicare population, conducted by the Centers for Medicare & Medicaid Services (CMS) through a contract with NORC. The survey collects information from respondents about prescription medicine use, including medicine name, strength, and form. These medicines are linked to CMS administrative prescription medicine claims data, creating a uniquely rich data source. In 2017, CMS and NORC revised a lookup tool built in 2015, which integrates a high-quality commercial medicine name database into the questionnaire. The revised tool allows interviewers to select medicine details directly from the database, minimizing manual entry of data. The impact on reported data quality will be examined by assessing the match rates for survey-reported medicines to claims data. This poster will present the results of this descriptive analysis. The analysis includes: (1) match rates of survey-reported medicines to claims data, (2) the impact of the commercial database on match rates, and (3) the impacts of respondent characteristics and medicine name length on match rates.

Integrating Authoritative Sources to Enhance Survey Response Quality

Juan Salazar*

As we look into the future of surveys, while understanding the decreasing levels of responses, Federal Agencies like the Census Bureau has laid-out their vision for the the use of Authoritative (data) Sources, including Administrative Records, to further- supplement responses with data that has been confirmed to be dependable for its purpose. But Administrative Records and Third-party data will come in different formats, by definition, so in order to link new data to existing records, it's necessarily to create a flexible data model, while still supporting the transactional aspects of a traditional database. Our poster will show the workflow of arriving Authoritative Data, how to intake streams, stage the raw data, and iterate it towards a gold copy that can be used for data-linkage and Advanced Analytics. Note: This is a poster presented by the private sector, and the reference to the U.S. Census has no direct correlation to the Bureau.

P&D Session 1B: Meet the presenters

A New Site to Access Census Data

Tyson Weister, U.S. Census Bureau*

After a few years of ongoing development, testing, and user feedback, the Census Bureau been is now over a year into the launch of its new platform to access data -- data.census.gov. This site represents a new chapter in Census Bureau's data dissemination approach by centralizing access and allowing for a more rapid response to user feedback, replacing the previous site that was used for the last 20 years. In this session, we will demo the new interactive site on data.census.gov. Attendees will explore the platform's latest tables, maps, and data visualizations in an easily digestible format, and will have the opportunity to provide feedback to make Census data easier to access.

Demonstration of Institute of Educational Sciences - (IES) Public Trust Application Manager (IPAM)

Marilyn Seastrom, NCES*
Jennifer Nielsen, NCES
Zac Mangold, Sanametrix
Melissa Roessler, Sanametrix

IPAM is a web-based record management system for processing public trust security applications throughout the stages of agency level review and resubmissions. IPAM was designed to manage and track: contract and Contracting Officer Representative assignment, initiation of applicants, approval and/or rejection of the Public Trust application, fingerprints submission, release to Defense Counterintelligence and Security Agency (DCSA), DCSA approval or rejection, DCSA Schedule-Accepted status, and adjudication outcome. IPAM has significantly reduced the backlog of applications, increased processing time by automating data validations, streamlined the process for requesting an approval, and reduced the overall risk of sharing PII across organizations. In addition, the IPAM system includes data analysis reporting to identify most common rejections reasons, processing times, contract counts, status reports, and adjudication outcome reports. These reports contribute to the departments' continuous improvement process to ensure applications are processed in a timely manner with as few errors as possible.

P&D Session 1C: Meet the presenters

Metadata, Paradata and Enterprise Data

Christopher N. Carrino, U.S. Census Bureau*

With the exponential growth in recent years of interest in Data Science, Machine Learning, and Artificial Intelligence, the field of Data Management has become inundated with ambiguous terms and an undefined scope. In this presentation we define the often used and more often misused terms of Metadata, Paradata and Enterprise Data. These terms are often used in working conversations, blog articles and research papers even though the terms lack a formal definition and a shared understanding. We'll examine conflicting examples of their use and provide a framework to de-conflict these meanings.

Todd Johnsson, ExactData*
Beverly Harris, U.S. Census Bureau

The industry term is Fully Synthetic Data. On the Decennial program, it was more accurately called Correlated Simulated Data. The key Privacy characteristic is that the data sets are NOT a derivation of production data. The link to production data is completely broken, intentionally, thus in the dev and test environments, Privacy by Design. The simulated data itself is correlated across inputs and systems. It is longitudinally consistent, has happy and non-happy path scenarios, has intended patterns, has event-level artifacts and intended aggregate-level statistics. It has the interrelated complexity and realism that sophisticated data processing application dev and test require. Imagine Simulated Data for Dev and Test of Survey Responses, Admin Records, and MAF/TIGER products (MAFX, GRFC, GRFN), all correlated, with longitudinal consistency, with ground truth, scaled to the entire US population. Each dataset contained 1000s of files in various formats, types, schemas, versions, all correlated. With current configurable, extendable, maintainable Census data models, most datasets can be generated within a day, with millions of correlated records, to any defined geography.

1:00 pm - 2:25 pm

Concurrent Sessions

Session 2A: Machine Learning

AI-Assisted Coding for Transcript Data

Colleen Spagnardi, RTI International*
Alex Waldrop, RTI International

Transcripts hold a wealth of data, but to compare students across institutions, data such as course content must be standardized. Much of data categorization is done manually by training coders to match keywords to the best fitting code. Leveraging artificial intelligence (AI) enhances this process and reduces the labor to code data to existing taxonomies. In this talk, we describe how RTI piloted an AI-assisted recommendation engine to assign postsecondary fields of study and course codes to federal coding taxonomies for the Department of Education. Surveys could employ similar methods to ease the burden of coding responses both in real-time participation and post-production activities.

Finding Efficiencies for Open Text Review Using Natural Language Processing on a Panel Study

Catherine Billington, Westat
Jiating (Kristin) Chen, Westat*
Gonzalo Rivero, Westat
Andrew Jannett, Westat

Field requests for data updates to CAPI interviews are important for data quality in panel studies. Processing these open text comments is time-consuming and costly. A pilot project used Natural Language Processing (NLP) to assign category variables to comments to improve processing efficiency without loss of data quality. We trained a lasso model to perform this classification. A machine-learning (ML) pipeline in Python extracted linguistic features from the text fields. The output met our quality requirements, and we integrated a production version with our existing web application on a panel study. Data technicians must assign a category to each comment. Our ML model presents the top-3 categories by probability. Technicians can select one, or enter any category. We discuss how technicians used this feature in terms of efficiency and data quality. Superfluous comments account for about 40% of entries. We look at how comments assigned the 'Other' category by the model were dispositioned, and evaluate a similar approach to identify non-actionable comments. We discuss the risks and benefits of this approach, which will vary by project based on priorities, cost, and acceptable risk.

Challenges and Potential Benefits of Leveraging Task Data from the Occupational Requirements Survey

David H. Oh, U.S. Bureau of Labor Statistics*

The Occupational Requirements Survey (ORS) contains text data on the critical tasks performed by selected jobs. With growing interests in the types of work being done in the U.S. economy, the ORS task data has the potential to provide invaluable insights for the public. However, originally designed to simply aid in review of the coded ORS elements, the tasks contained within ORS for each selected job are stored as free-form text with minimal structure, limiting its usability beyond manual review. To overcome this, the Office of Compensation and Working Conditions (OCWC) at the Bureau of Labor Statistics (BLS) has been exploring various ways to utilize the rich information contained within the ORS task data through the use of natural language processing (NLP) and machine learning techniques. This presentation provides a brief description of the task data contained in ORS, discuss the NLP methods used to process the text, and highlight some ongoing OCWC projects that leverage them.

Increasing Survey Response Rates and Decreasing Costs by Combining Numeric and Text Mining Strategies on Survey Paradata

Sudip Bhattacharjee, University of Connecticut, U.S. Census Bureau*
Nevada Basdeo, U.S. Census Bureau
Ugochukwu Etudo, University of Connecticut
Sara Alaoui, U.S. Census Bureau

Response rates are dropping and data collection costs are rising in federal surveys. As a result, practitioners use response propensity models to mitigate cost while retaining response rates. We evaluate key determinants of survey completion for the American Community Survey (ACS) using paradata of Contact History Information (CHI). To our knowledge, unstructured field representatives' (FR) notes are omitted in paradata models within the Census Bureau. We believe that incorporating these notes would improve the performance of paradata models. From the notes, we identify themes and terms that are useful for estimating response propensity. Additionally, FRs cannot contact a respondent if the response burden exceeds a threshold. We present the first steps in solving this optimization problem. We show two findings: (1) combining CHI and FR's notes can improve response propensity estimates, and (2) FR's notes can be incorporated in calculating burden scores for respondents. Also our text mining of the FR's notes may be useful in training FRs. Results from our study can be generalized to other surveys that capture both numeric and textual paradata from survey operations.

Session 2B: Challenges in Technology and Survey Computing

Top Three Challenges Organizations Are Encountering in Technology and Survey Computing

Moderator:

Karen Davis, RTI International

Panelists:

Bryan Beverly, U.S. Bureau of Labor Statistics

James Berry, U.S. Energy Information Administration

Kyle Fennell, NORC, University of Chicago

Gregg Peterson, University of Michigan

Panelists will identify the top challenges facing their organizations today given the changing survey technology, data systems, and programming environments. Projects today often include innovative survey technologies, the use of specialized programming customization, incorporate administrative and extant data sources, and the integration of different devices and technologies to support data collection. The panelists will discuss the ways that their organizations are dealing with the environmental changes that they have identified, and offer examples and lessons learned in addressing these challenges.

Session 2C: Development and Pilot of the OMB ICR Application

Development and Pilot of the OMB ICR Application: Stakeholder perspectives on an automated tool for Information Collection Reviews

Moderator:

Marilyn Seastrom, NCES

Panelists:

Jennifer Nielsen, NCES

Bob Sivinski, OMB

Carrie Clarady, NCES

Zac Mangold, Sanametrix

The National Center for Education Statistics has recently completed a pilot associated with the 2020 Federal Data Strategy, Action 17, to develop an automated tool for Office of Management and Budget Information Collection Review (ICR) documents that supports data inventory creation and updates. The pilot project built upon NCES' research and the Department of Education's Data Inventory to develop electronic templates within a capture/review system that generates Supporting Statements Parts A and B of the ICR while tagging the data elements needed to populate the ED Data Inventory. This proposed panel presents stakeholder perspectives across the project various will showcase the work conducted to date on this application and allow for discussion with participants about the project. Marilyn Seastrom, NCES, will discuss the background and context of the project. Jennifer Nielsen, NCES, Bob Sivinski, OMB, and Carrie Clarady, NCES, will discuss the OMB ICR from the perspective of the Government, while Zac Mangold will discuss the developer's perspective. Lastly, the panel will present any up to the minute efforts from the current work to enhance and improve the Application.

2:30 pm - 2:55 pm

Posters and Demonstrations II

P&D Session 2A: Meet the presenters

The Human and the AI: Collaborative text analytics for open-ends

Maurice Gonzenbach, Caplena*

Open-ended feedback is becoming more and more abundant: Ad-hoc studies, trackers and transactional feedback often contain open questions, both reducing the bias and broadening the depth of opinions. However, many researchers struggle to evaluate verbatims in a scalable, actionable way and to find the right balance between automation and quality. While AI methods have been promising relief for a long time, "traditional" AI methods are usually based on manually defined rules, which neither scale well nor deliver the desired quality. Modern AI methods, such as machine learning, have set out to fix this. Their downside is the huge number of training-data they require as well as their "black-box" behavior. Recent developments in machine learning (specifically transfer learning), allow categorizing texts with only a few dozen training examples, while keeping humans in the loop. We present Caplena.com, an easy to use platform enabling market research agencies as well as corporates worldwide to efficiently categorize their open-ends on the one hand and then perform advanced analyses (like correlation or driver analysis) on the results, while keeping full control over the process.

Putting Data Science Skills into Action

Lisa Frid, U.S. Census Bureau*

Census piloted a Data Science Training Program to grow the beginner and intermediate data science skills of employees. While the program included low-cost online learning components that can be readily translated to other agencies, the highest-impact portions of the program were those that allowed participants to practice their skills and get to know the Census-specific technical environment and datasets. Evaluations revealed the importance of selecting these hands-on opportunities based on their ability to translate to current Census data science needs. The success of the pilot, student feedback, and the evolving nature of data usage at Census indicate a need to (1) intentionally select participants based on their potential for using data science skills in their work, and (2) include content that specifically addresses the Census technical environment and opportunities for students to get hands-on experience. This presentation will share how we addressed these challenges in our pilot and plan to address them in our upcoming program, with the goal of educating other federal stakeholders on how to best grow their data science workforce through specialized and hands-on training.

Enhancing Survey Data with Public Data and Text Analysis

Benjamin Feder, Coleridge Initiative*
Julia Lane, Coleridge Initiative, New York University
Clayton Hunter, Coleridge Initiative
Ekaterina Levitskaya, Coleridge Initiative

Survey providers are being urged to combine new sources of data with their surveys. In this demonstration, we will show you how text analysis can be applied to public research information to augment survey data. We will provide an overview of the record linkage to administrative data. We will then walk through a sample Jupyter Notebook and show you how FedReporter data abstracts have been parsed into specific topics of research to better understand doctoral recipients' academic careers. You can then take the Jupyter Notebooks and apply the code to generate topics from other text sources.

P&D Session 2B: Meet the presenters

The Medicare Current Beneficiary Survey COVID-19 Data Tool

Nola du Toit, NORC, University of Chicago*
Jennifer Titus, NORC University of Chicago
Michael Latterner, NORC, University of Chicago

The Medicare Current Beneficiary Survey (MCBS) is an ongoing survey of a representative national sample of the Medicare population, including beneficiaries aged 65 and over and beneficiaries aged 64 and below with certain disabling conditions. With the emergence of the COVID-19 pandemic in the U.S., the Centers for Medicare & Medicaid Services quickly collected vital information on how the pandemic impacted the Medicare population. The MCBS COVID-19 Supplements are a series of nationally representative, cross-sectional telephone surveys of MCBS respondents conducted by NORC as a supplement to MCBS annual data collection. To make the survey's findings more accessible to the public, NORC constructed public use files (PUFs) and developed the MCBS COVID-19 Data Tool, an interactive website created using R Shiny and D3 to present PUF data. The tool aims to accelerate research with this data and ultimately help inform stakeholders' decisions about the pandemic. Our demonstration will address the process undertaken to develop the tool, including the technical and methodological challenges overcome.

Demonstration of the OMB ICR Application: An automated tool for Information Collection Reviews that supports data inventory creation and updates

Jennifer Nielsen, NCES*
Marilyn Seastrom, NCES
Zac Mangold, Sanametrix
Rickita Walley, Sanametrix

The National Center for Education Statistics has recently completed a pilot project for the 2020 Federal Data Strategy, Action 17 to develop an automated tool for Office of Management and Budget Information Collection Reviews (ICRs) that supports data inventory creation and updates. The pilot project built upon NCES' research and the Department of Education's Data Inventory by developing electronic templates within a capture/review system that generates Supporting Statements Parts A and B of the ICR while tagging the data elements needed to populate the ED Data Inventory. To date, the pilot tool has been tested within two agencies (NCES and the National Science Foundation's National Center for Science and Engineering Statistics) and stakeholder input and feedback has been obtained from 18 stakeholder engagement activities. The OMB ICR Application has been designed in such a manner that, should additional funding be secured in later years, implementation of this automated process could be extended beyond the pilot and into other federal agencies. This proposed demonstration will showcase the work conducted to date on this application and allow for discussion.

P&D Session 2C: Meet the presenters

Incorporating Graph Databases into the Survey Lifecycle

Nestor Alexis Ramirez, RTI International*

Graph databases are useful for storing and traversing relationships and connections between data and have several use cases through the survey lifecycle. In a graph database, data objects are represented by nodes, or vertices, and relationships between objects are represented by edges. In situations where modeling and understanding data relationships among large amounts of data are important - for example, the connection between sample member survey response and sample member contact methods in a longitudinal study - using graph databases can provide projects with several advantages compared to traditional relational databases. This demonstration will showcase a use case of graph databases as part of the derived variables task of the Baccalaurate and Beyond longitudinal study. JanusGraph and Neo4j, two graph database systems, will be shown as part of the demonstration, and potential use cases across the survey lifecycle will be discussed.

RAPTER: The Random Assignment, Participant Tracking, Enrollment, and Reporting System

Marcy Gialdo, Mathematica*
Mark Lafferty, Mathematica
Dan Glovic, Mathematica

Collecting and monitoring data is a key component of programs providing community social services. Programs not only use data to evaluate, assess performance, and track quality improvement but also collect participant data as a condition of receiving federal grant funding. To address this growing need, Mathematica developed RAPTER, a scalable, secure cloud-based data system enabling federal agencies and grantees to consistently and effectively collect, track, and report participation data to inform program efficacy. To meet privacy, confidentiality, and data security standards, RAPTER is NIST compliant and FedRAMP ready. RAPTER has customizable modules for participant enrollment, random assignment, service tracking, survey administration, and reporting. Within RAPTER, studies can create and manage cohorts and monitor data with built-in reporting including a dashboard. This poster presentation explains how RAPTER supports consistent and effective data collection, analysis, and reporting across programs over time. It demonstrates how RAPTER helps solve challenges in managing data collection specifically highlighting use cases and system benefits during the COVID-19 pandemic.

The Blaise Choréo Multimode Management System Design

Mark M. Pierzchala, MMP Survey Services, LLC*

The Blaise 5 Choréo MultiMode Management System will be described. The name Choréo is derived from the word choreography because like a choreographer, the system must coordinate many moving parts all at one time. Choréo works in concert with the Blaise CATI management system, the CAPI Case Management System (CMA) and other modes like web and paper. The system fits into an institute's existing infrastructure. For example, most institutes already have a Sample Management System (SMS) to send mail and email; these are not replicated. There is a Survey Handling Database. It keeps track of all survey management happenings, statuses, counts, and indicators. It does not contain any Personally Identifiable Information (PII). It issues instructions to Blaise 5 modules and to other institute systems. Choréo operates the way each institute wants. This is done through specification databases such as for (1) survey design parameters, (2) happenings, and (3) actions. There are hooks in the system to implement management responsive design, as well as to keep track of survey burden. An institute can continue to use its own coding scheme and naming conventions.

3:00 pm - 4:25 pm

Concurrent Sessions

Session 3A: Challenges and Achievements Using AI and Data Science

Challenges and Achievements in Using AI and Data Science Approaches

Moderator:

Jane Shepherd, Westat

Panelists:

Rebecca Hutchinson, U.S. Census Bureau

Alex Measure, U.S. Bureau of Labor Statistics

Jason Keller, NORC, University of Chicago

Gayle Bieler, RTI International

Marcelo Simas, Westat

This panel will discuss the challenges and achievements that organizations have encountered is applying AI and data science approaches to their survey/data management projects. While there is significant publicity about the application of AI and data science techniques, the level of sophistication and experience varies. Panelists will explore strategic approaches, best practices, and examples of where they have employed these techniques, and cover lessons learned in doing so.

Session 3B: Workshop: Tools for Survey and Census Planning

Diving into the U.S. Census Bureau's Planning Database and the ROAM Application: Tools for Survey and Census Planning

Kathleen M. Kephart, U.S. Census Bureau*
Suzanne McArdle, U.S. Census Bureau
Luke Larsen, U.S. Census Bureau

The presenters will demonstrate how to use the Planning Database (PDB) and the Response Outreach Area Mapper (ROAM), giving several examples of their capabilities. The PDB is an easy-to-access dataset that is updated annually. It contains the greatest hits of American Community Survey (ACS) 5-year estimates. These include popular U.S. housing, demographic, socioeconomic, and operational statistics from the 2010 Decennial Census and the most recent ACS dataset. The PDB also contains the Low Response Score (LRS), which is a predicted mail return rate by block group and by census tract. New to the 2019 and 2020 PDB are ACS 5-year internet access statistics and 5-year ACS self-response rates. The ROAM is an interactive mapping application, developed to make it easier to identify hard-to-survey areas and the socioeconomic and demographic profiles of those areas. It is based on a subset of PDB data at the census tract-level, including the LRS, poverty status, education level, race, Hispanic origin, and language spoken at home.

Session 3C: Research Methods

An Innovative Approach to Cognitive Testing: Using Cognitive Probes in Production CATI Interviews

Matt Jans, ICF*
Georgette Lavetsky, Maryland Department of Health
Samantha Collins, ICF

Cognitive testing is a critical step in developing survey questions. Traditional methods typically include semi-structured, qualitative interviews with a small number participants, and are generally conducted before production interviewing. This presentation asks, "Can we train standardized survey interviewers to administer cognitive probes and obtain information helpful for revising questions?" We report on question testing for the 2016 Maryland Behavior Risk Factor Surveillance System (BRFSS), in which we incorporated five cognitive probes (e.g., What did the word 'neighborhood' mean to you in the preceding set of questions) into the BRFSS CATI interview following nine new (i.e., test) questions. The cognitive probes were written as standardized survey questions to adapt the cognitive interviewing method to the training and skills of standardized phone interviewers. We collected over 1,600 interviews with cognitive probe data, which is much larger than most cognitive tests. Adding cognitive probes to production CATI interviews allowed us to conduct pretesting more quickly than traditional cognitive testing and without disrupting production data collection.

Tips & Tricks on Remote Usability Testing

Erica Olmsted-Hawala, U.S. Census Bureau*
Elizabeth Nichols, U.S. Census Bureau

Before COVID-19, most usability sessions were conducted in-person. For household surveys, sessions took place either at the Census Bureau's headquarters, at a library, or community center. In-person testing is standard practice for many reasons: it helps interviewers orient participants to the testing procedures, gives interviewers opportunities to observe participant behavior, and simplifies logistics. However, in the spring of 2020, new social distancing requirements made such testing impossible. The usability team at the Census Bureau needed to conduct remote testing - that is, virtual testing using the Internet, with interviewer and participant in different locations for a project that was to begin in June. This talk shares our experiences with how we transformed our traditional in-person user testing to accommodate COVID-19 social distancing restrictions and successfully conducted remote usability testing with participants. We share tips and tricks on remote testing including best practices on getting participants familiar with new software, strategies on how to build connection and rapport with long distance participants, obtaining informed consent, and working with observers.

Can We Do This Another Way? Potential Nonprobability Sample Sources for Social and Health Surveys

Matt Jans, ICF Davia Moyse, ICF*
Yang Yang Deng, ICF
Ronaldo Iachan, ICF
Lee Harding, ICF
Kristie Healey, ICF
James Dayton, ICF
Scott Worthge, MFour
Laura O'Campo, MFour
Sarah Chung, MFour

Now, more than ever, there are myriad types of nonprobability samples, that can replace or supplement probability samples. This study uses two nonprobability sample sources to evaluate how well they mirror estimates from the Behavioral Risk Factor Surveillance System (BRFSS): The Surveys-on-the-Go mobile panel, which only includes people who have a smart phone, and Amazon Mechanical Turk (MTurk). Respondents from both sources were asked health questions selected from the BRFSS. Initial results show that nonprobability sample source accuracy is estimate-specific, and that estimates of exercise, general health, and overall insurance coverage may be accurately obtained from nonprobability samples. Overall, there appears to be a pattern of nonprobability samples overrepresenting poorer health. This suggest that nonprobability surveys may be useful replacements for some health estimates, but not others. Researchers need to assess trade-offs cautiously and within the context of specific key health indicators.

Ask U.S. Panel: An Address- and Probability-based Online Panel for the Public Good

Jennifer Hunter Childs, U.S. Census Bureau*
Emilia Peytcheva, RTI International

The Census Bureau, in partnership with several other Federal Statistical Agencies, awarded a collaborative agreement to RTI International to build the Ask U.S. Panel. RTI will design, build, and maintain an address-based, probability-based online research panel that will be available for robust public opinion and methodological research for the common good by statistical agencies and nonprofit organizations. This will facilitate both longitudinal and quick-turn-around research that many organizations are interested in conducting. The Ask U.S. Panel will consist of an entirely new, representative, probability sample of U.S. adults who are not members of an existing survey panel. In the future, the panel may be supplemented with targeted subgroups or additional target populations, such as businesses and organizations. The approach will involve mixed-mode recruitment of the residential population from RTI's address-based sampling (ABS) frame. Through the life of the panel, members will be invited to participate in topical surveys about once a month. The panel will have quarterly replenishment samples and utilize multiple strategies to keep panel members engaged.

Day 2: Wednesday, April 14

9:00 am - 10:25 am

Concurrent Sessions

Session 4A: Program Innovations

The New Non-employer Business Demographics Statistics: Responding to 20th-century survey-based statistics challenges while addressing 21st-century needs

Adela Luque, U.S. Census Bureau
Kevin Rinz, U.S. Census Bureau
James Noon, U.S. Census Bureau*
Michaela Dillon, U.S. Census Bureau
Renuka Bhaskar, U.S. Census Bureau
Victoria Udalova, U.S. Census Bureau

The new Nonemployer Statistics by Demographics series or NES-D is the Census Bureau's response to the challenges faced by 20th-century survey-based statistics while addressing 21st-century needs for more frequent and timely high-quality data, at lower cost and no additional respondent burden. NES-D is not a survey; rather, it exclusively uses existing administrative and census records to provide demographics for the universe of nonemployer businesses by geography, industry, receipt size class and legal form of organization. Its first release is in December, 2020. NES-D replaces the nonemployer component of the quinquennial Survey of Business Owners. Coupled with the new Annual Business Survey (ABS), which provides demographics for employer businesses, Census now provides annual business owner demographics through a blended-data approach that combines AR-derived estimates for nonemployer firms and survey-derived estimates for employer firms. In the near future, NES-D will be enhanced with characteristics relevant to understanding nonemployers' behavior and dynamics, such as characteristics related to the gig economy, household characteristics and transitions to employer status.

So You Want to Build an Autocoding Model? Lessons learned from applied autocoding projects

Emily Hadley, RTI International*
Rob Chew, RTI International
Peter Baumgartner, RTI International

Automated coding models for open-ended text promise time and labor savings but can be challenging to implement in practice. Expectations of accuracy, implementation costs, and complexity of integration into existing processes are common sources of frustration. We discuss the lessons learned from four applied autocoding projects with varying degrees of implementation. We suggest considerations for determining the feasibility of an autocoding project, setting realistic benchmarks for the accuracy of the model, and anticipating the challenges of integrating the model into existing workflows. These lessons can inform ongoing or future development of custom autocoding models.

Fostering Innovation in Data Collection with Design Thinking

Struther Van Horn, U.S. Bureau of Labor Statistics*
Tod Sirois, U.S. Bureau of Labor Statistics
Jean E. Fox, U.S. Bureau of Labor Statistics
Susan Gymburch, U.S. Bureau of Labor Statistics
Erin Lane, U.S. Bureau of Labor Statistics
Andrew Theodore, U.S. Bureau of Labor Statistics
Sarah Van Giezen, U.S. Bureau of Labor Statistics

The Consumer Price Index (CPI) program at BLS wanted to solicit creative new ideas for improving data collection. We wanted to encourage all ideas, from small tweaks to large system revisions, covering any data collection procedure or system. As a framework, we used Design Thinking, a structured human-centered approach to addressing complex problems. In Design Thinking, team members connect directly with users through interviews, developing empathy for them and learning about their successes, pain points, and their ideas for improvements. These in-depth conversations reveal what's working in the current system, what's not working, and recommendations for improvements. In this presentation, we will: -Define Design Thinking and explain how it helps address complex problems -Walk through the steps in a Design Thinking effort -Describe the goals of the CPI's effort to identify creative new ideas for data collection -Share how we approached each step -Describe the types of findings we uncovered and how they will be useful in improving CPI data collection This presentation is not technical; it is appropriate for anyone involved in developing and data collection procedures and systems.

Transparent Evaluation and Reporting on Cost Structures for Statistical Information Products and Services

John L. Eltinge, U.S. Census Bureau*

Many federal statistical programs are focusing increased attention on the integration of survey data with information from other sources, e.g., administrative records and commercial transactions. In some cases, prospective cost savings are viewed as a major motivating factor. In other cases, the primary motivation is improvement of data quality, but subject to the requirement that non-survey sources do not inflate aggregate production costs for the statistical program. This paper explores the transparent evaluation and reporting on cost structures required to address these issues, with emphasis on:
(1) Practical managerial decisions that require specific types of empirical information on cost structures.
(2) Special features arising from complex patterns of fixed and variable cost components, operating constraints and optionality.
(3) Realistic methods to capture information for (1) in ways that account for (2).
(4) Conceptual distinctions among correlation, causation and control that are important in empirical work to address (1)-(3).
(5) Alignment of (1)-(4) with literature on transparent reporting for data quality.
Two running examples illustrate the general ideas in (1)-(5).

Session 4B: Questionnaire Evaluation

Using Paradata to Explore Navigation Through Web Surveys to Improve Survey Design

Renee Ellis, U.S. Census Bureau*

Understanding how users navigate online survey instruments may be useful for many reasons. For example, knowing more about these behaviors may alert us to problems with instrument usability. This may help identify problematic questions and common behaviors of survey respondents. One of the challenges of this type of analysis is that the web paradata being used for analysis are unstructured and often voluminous in nature. This current project examines how we can use a qualitative review of data and data visualizations to find patterns in respondent behaviors across survey pathways. In this look at how users navigate online survey instruments, we wrangle the paradata in a way that we can visualize user paths. From this we categorize common paths and discuss how they might be used to make survey design decisions.

Improving the Question Appraisal System (QAS): Moving away from black magic and black boxes

Matt Jans, ICF
Ashley Schaad, MaritzCZ
Melinda Scott, ICF*

Questionnaire design can be the least transparent of all survey development phases, sometimes remaining a "black box" which is difficult to audit or replicate. The Questionnaire Appraisal System (QAS-99) was developed to a) make this process replicable and transparent, and b) allow questionnaire revision by survey staff with lower levels of training and experience. It has seven steps focusing on question characteristics (e.g., readability, instructions, implicit assumptions, and topic sensitivity). The QAS-04 added assessment of the translatability, cross-cultural assumptions, and issues across questions within the instrument. We present new developments in the QAS process that add a) a questionnaire-level and flow review to assess the entire instrument, and b) a step in the original question-specific evaluation that assesses how reasonable it is to assume that the respondent would have encoded the information required to produce an answer. These were incorporated into a single Excel file that facilitates QAS implementation. This innovation will be discussed in the context of time-sensitive questionnaire development, the overall survey process, and survey transparency.

Using Paradata and Metadata to Assess Effects of Addition of Sensitive Items to an Ongoing Longitudinal Survey

Elise M. Christopher, NCES*
Laura Burns, RTI International

In the High School Longitudinal Study of 2009, conducted by the National Center for Education Statistics, measures of sexual orientation and sex identity (SOGI) were added in the Second Follow-up, conducted in 2016. The panel of 2009 high school freshmen had already participated in three rounds of data collection in 2009, 2012, and 2013. These potentially sensitive SOGI measures were extensively examined via cognitive testing and field testing prior to their addition to national survey instruments. After national data collection was completed, paradata and metadata were used to examine whether these items led to data quality concerns. This presentation will share results of these analyses, including investigations of breakoffs, item-level nonresponse, and time spent on item screens. Results will be compared to those of extant sensitive items in the survey in order to make conclusions about item functioning.

Hello? Good-bye?: Hang-ups and break-offs by mode, geography, and neighborhood characteristics in Oregon

Deborah Krug Mangipudi, ICF*
Matt Jans, ICF
Robynne Locke, ICF
Stephen Haas, ICF
John Boyle, ICF
Lizzie Remrey, ICF
Heather Driscoll, ICF
Samantha McCoy, Oregon CJC
Michael Weinerman, Oregon CJC
Siobahn McAlister, Oregon CJC
Ken Sanchagrin, Oregon CJC

Researchers usually design questionnaires to begin with simple topic-relevant questions. Perceived sensitivity can vary widely across respondents, producing hidden disproportionate nonresponse. This presentation addresses the following questions: 1) Do specific questions often lead to break-offs, 2) Do break-offs vary by respondent demographics, and 3) Do break-off rates differ between phone and web administration? Data come from the Oregon Crime Victimization Survey (OCVS), which used a dual-frame RDD sample, and a DSF-based ABS sample. Only adults who lived in Oregon for the past 12 months were eligible. Questionnaire flow was identical across modes. Sections included 1) eligibility screening, 2) consent, 3) quality of life, 4) demographics, 5) index crimes, 6) non-index crimes, and 7) crime follow-up questions. Among people who participated by phone or web, we evaluate the following: 1) % hang-up or break-off by 2) geographic stratum and 3) areas with varying levels of poverty. We will focus on whether the initial topics in the questionnaire (e.g., screening, consent, and neighborhood quality of life) obtain differential nonresponse across strata and geographies.

Session 4C: Data Collection During COVID

Technologies and Data Collection Strategies Used to Collect Emerging Data on a Rapid Response COVID-19 Supplement

Rachel Carnahan, NORC, University of Chicago*
Andrea Mayfield, NORC, University of Chicago
Elise Comperchio, NORC, University of Chicago

The Medicare Current Beneficiary Survey (MCBS) is a continuous, multi-purpose longitudinal survey covering a representative national sample of the Medicare population sponsored by the Centers for Medicare & Medicaid Services (CMS). CMS leveraged the MCBS panel design to assess the impact of the COVID-19 pandemic on the lives of beneficiaries by planning rapid response surveys to supplement the main MCBS. The first supplement was administered by telephone in Summer 2020 during the regular production cycle to existing MCBS sampled beneficiaries who were living in the community as a test of the COVID-19 rapid response methodology. The MCBS collected and released data from the COVID-19 Supplements on an expedited timeline by developing a standalone questionnaire that was simultaneously fielded alongside the main MCBS. This presentation will share the innovative operational strategies the MCBS used to conduct a field test of the COVID-19 Summer Rapid Response survey and subsequently implement additional supplements using the same methodology. We will discuss how these methods can be adapted to implement rapid response surveys on other emerging topics for large surveys in the future.

The Impact of COVID-19 on Large-scale Phone Survey Productivity and Response Rates

Matt Jans, ICF*
Jamie Dayton, ICF
Randy ZuWallack, ICF
Don Allen, ICF
Josh Duell, ICF
Andy Dyer, ICF
Thomas Brassell, ICF
Sam Collins, ICF
Traci Creller, ICF

COVID-19 has impacted survey productivity and response rates. Due to stay-at-home orders and lay-offs, more people are at home full-time than ever in recent history, making households easier to contact. More interviewers are also working from home. Productivity may improve in this context compared to a centralized call center. This presentation compares outcome rates (e.g., contact, refusal, cooperation and response rates) from several ICF phone surveys conducted before and during COVID-19. The presentation addresses the following three questions about the effect of COVID-19 on phone surveys: 1) how have survey outcome rates changed since the start of COVID compared to the months prior to COVID, and, where available, the same months the prior year, 2) For interviewers who worked both from centralized call centers and from home, is their personal productivity similar or different in both contexts, and 3) depending on the return to centralized call centers, can we observe further or maintained productivity improvements, or performance regressions among interviewers who return to centralized calling?

Creation of Remote Call Center During COVID-19 - Success moving from centralized to remote calling

James Dayton, ICF*
Don Allen, ICF
Mary Penn, ICF
Aprille Hairston, ICF

We will share our experience and challenges moving from 100% on-site call center operations to 100% off-site remote operations over the course of a few weeks. We will cover the process to determine if needed call center technology can be safely and securely deployed in a home-based environment that is outside of company firewalls and in compliance with our internal data protection protocols and Internal Review Board (IRB) respondent protection requirements. Can the required technology be effectively deployed using consumer-level internet connectivity in interviewer homes where other household members may be attempting to work, attend on-line classes and have other broadband needs. Assessments establishing the suitability of interviewing staff home-based work environment will also be discussed. Finally, we will explore the safe deployment of equipment, development of interviewer technical support helpdesks and the required updates to our interviewer supervision, quality assurance processes and other management procedures to assure the success of our home-based interviewing staff who no longer have the luxury to interact face-to-face in a centralized work environment.

Adapting Data Collection for Frontline Workers During the Pandemic

Melissa Kresin, U.S. Census Bureau*
Victoria Bookhultz, U.S. Census Bureau
Nicole Cummings, National Center for Health Statistics
Sonja Williams, National Center for Health Statistics

National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) respondents are uniquely impacted by the coronavirus disease (COVID-19) pandemic, as they are on the front lines administering care to individuals impacted by COVID-19. As physicians' offices and hospitals adapted to providing care during the pandemic, the N(H)AMCS survey sponsor and data collection agency had the unique opportunity to modify existing methodology and introduce new procedures to encourage survey participation and to collect information on how respondents are impacted by the pandemic. To ease respondent burden during the initial phases of the pandemic, data collection start dates were slightly delayed, with further delays in areas of high COVID-19 infection rates. Data collection procedures were modified to include various remote abstraction methods at U.S. Census Bureau regional offices. COVID-19 survey questions were developed swiftly and implemented mid-way through 2020 NAMCS data collection. Field Representatives (FRs) were provided with monetary awards for completing cases and thank you letters or phone calls were made acknowledging their hard work.

10:30 am - 11:55 am

Concurrent Sessions

Session 5A: Software Development

Division of Labor in Programming Translated Surveys: The case for training translators

Alisu Schoua-Glusberg, Research Support Services Inc.*

Setting up CASIC instruments requires close collaboration between survey design teams and programmers. Federal CASIC studies increasingly require setting up instruments in multiple languages, most frequently in Spanish. Translated instruments present challenges for computer-assisted setup, given grammar mismatches between languages. Fills don't work identically across the two languages and word order varies across the two languages, which also impacts setup. Contractors often rely on translators to both translate the survey questions and to adapt the programmer code by inserting the Spanish in a way that will deliver an administrable Spanish version. This puts translators in a quasi-programmer role for which they have no particular skill or training. We will make the case for training translators, so as to end up with a more finished product that involves the least additional processing for both programmers and translators. Examples from several federal surveys will be provided.

DDI-4 Cross-Domain Integration - Metadata for a New World

Daniel Gillman, U.S. Bureau of Labor Statistics*

The Data Documentation Initiative is a family of statistical metadata standards. DDI-2 Codebook and DDI-3 Lifecycle are in use, and the new DDI-4 Cross-Domain Integration (DDI-CDI) was released in April 2020 for public review. Statistical agencies were asked to comment. The version 1.0 release is due in June 2021. DDI-CDI addresses new issues that other standards don't. Administrative and other sources of data are now being used to augment and substitute for survey data. However, these data sources are often in new formats. DDI-CDI contains several new formats: key-value pair (for Big Data), event history data (in administrative records), and a new description of multi-dimensional data. A new general process model useful for describing how data are used once they are acquired is included. It does not just rely on the common processes used in traditional statistical surveys. This expanded treatment makes DDI-CDI applicable to a wide variety of data applications. The talk will briefly describe these new features and provide examples to illustrate the ideas.

Building a Statistical Metadata Registry using ISO 11179, the Generic Statistical Information Model (GSIM) and the National Information Exchange Model (NIEM)

Christopher N. Carrino, U.S. Census Bureau*

In order to manage the hundreds of system interfaces required to run the 2020 Decennial Census, the US Census Bureau built a statistical metadata registry upon federal and international data standards. The Generic Statistical Information Model (GSIM) serves as the conceptual framework for the metadata registry. The National Information Exchange Model (NIEM) serves as the naming and design rules for the data elements within the registry. And the principles of the ISO/IEC 11179 Metadata Registry specification serves as both the governance framework and the theoretical basis for the registry. This presentation will show how the Census Bureau implemented and extended the various conceptual models into a production system to serve as the registration authority for metadata elements across the 60 plus systems within the Decennial enterprise.

Case Management Application (CMA) for Blaise Mobile Surveys

Gina Cheung, SRC, University of Michigan*
Lon Hofman, CBS, Statistical of Netherlands

Now Blaise 5 App (on Android and iOS) has been developed, and we can load Blaise 5 instruments on smartphones and tablets for interviewers to conduct interviews. However, we need to have a Case Management Application (CMA) to manage the production process. During the presentation, we will demo selected functions in CMA for the mobile/tablet device, such as:

Get production sample lines to interviewer mobile devices.
Installation and management of samples within interviewers' devices.
Launch Blaise 5 app instruments for data collection.
Administration of case statuses (record appointments, enter call notes, etc.).
Send survey data to the central database.
Export survey data and paradata from the server-side.

Session 5B: Data Collection from Establishments

Implementing a New Web Collection Tool for Multi-Worksite Respondents in the Current Employment Statistics (CES) Survey

Nicholas Johnson, U.S. Bureau of Labor Statistics*
Jean E. Fox, U.S. Bureau of Labor Statistics

The Current Employment Statistics (CES) program of the Bureau of Labor Statistics recently developed and implemented a new web collection tool designed specifically for multi-worksite respondents, particularly targeted at mid-size firms (5 to 50 worksites). Historically, CES has found it challenging to develop efficient collection methods for firms of this size. Over the last five years, CES worked to develop an online solution that would ease reporting for firms of this type. This new online collection tool, which is a web-based spreadsheet entry form, was deployed in late 2018. We propose to discuss the work required to develop the application, including surveys and interviews of existing respondents, user experience design, and the challenges of implementation. The purpose of this presentation is to highlight the lessons learned during this experience.

How Open-Ended Comments Can Be Used to Improve Establishment Surveys and Reduce Respondent Burden

Melissa Krakowiecki, Mathematica
Karen CyBulski, Mathematica*
Kevin Manbodh, Mathematica
Larry Vittoriano, Mathematica
Matt Potts, Mathematica
Herman Alvarado, SAMHSA

The Substance Abuse and Mental Health Services Administration (SAMHSA) sponsors two annual multi-mode behavioral health surveys, the National Survey of Substance Abuse Treatment Services (N-SSATS) and the National Mental Health Services Survey (N-MHSS). Each year, Mathematica administers the surveys via the web and telephone to the directors of approximately 18,000 N-SSATS facilities and 15,000 N-MHSS facilities. Each web survey includes an open-ended question for respondent comments. While some use this field to clarify survey responses, many express feelings about the survey. SAMHSA and Mathematica continually strive to improve both surveys based on this input; and have used this feedback to modify survey content, formatting, navigation, and web specific features. This presentation will discuss several enhancements incorporated into both web instruments to reduce survey burden and improve user experience. First, we prefilled data for responses that are unlikely to change from year to year. Second, we added a feature that enabled respondents to report on multiple facilities in one session. Previously, they had to log in to each web instrument separately to complete.

Expansion of E-mail in an Ever-Changing Data Collection Environment

Mark Govoni, U.S Census Bureau*

Economic Programs at the U.S. Census Bureau began using email as a contact strategy to provide respondents with log-in credentials in 2017 when we started to collect email addresses through our Respondent Portal. Currently over 25 surveys successfully use email as part of their collection strategies. Due to COVID-19, we have not only expanded email contacts, but in many cases use of snail mail has been curtailed or abandoned. Plans to use email in more adaptive collection strategies are also being explored. However, expanded email use has led to numerous challenges to ensuring the best and quickest email delivery rates - e.g., coding requirements that optimize the display of messages across browsers, dealing with invalid email addresses and bounce backs, respondent fatigue with too many emails and conveying legitimacy of the sender and the request. In this presentation, we will share lessons learned and future plans for email use and describe the challenges we are currently grappling with. We plan to generate discussion among FedCASIC participants about experiences using email to contact survey respondents, along with challenges and best practices for overcoming them.

Development of Research Program for Collection Strategies in Establishment Surveys

Susanne Johnson, U.S. Census Bureau*

With shrinking budgets and declining response, it is paramount to implement cost effective data collection strategies to maximize response for the 2022 Economic Census and other Economic surveys of businesses and governments. We have developed a comprehensive collection strategy research program based on focus groups and cognitive testing, pilots and randomized experiments, and lessons learned. The research conducted prior to the 2017 Economic Census provided invaluable information to improve our methods. We will build on that research, exploring new technology and expanding adaptive design. This testing is designed to enable data-driven decisions for comprehensive, cost-effective collection strategies to maximize response for the 2022 Economic Census and other Economic programs. We will discuss the strategic process used to identify which collection strategy methods to test and in which reoccurring annual surveys to embed pilots and randomized tests. Plans include tests of paradata-based systems, tailored respondent messaging, expanded use of emails, and conversion of reluctant nonrespondents. We are excited to share our research plans and seek feedback from survey professionals.

Session 5C: Operations

Hotline Bling: Assessing strategies to improve CATI cell contact rates in a world of cell "blockers"

Thomas Brassell, ICF*
Kisha Bailly, ICF
Joshua Duell, ICF
Randy ZuWallack, ICF
Priscilla Martinez, ARG
Deidre Patterson, ARG
Thomas K. Greenfield, ARG
Katherine J. Karriker-Jaffe, ARG

A recent AAPOR Task Force report cited the impacts on telephone survey research of cell "blockers";. Specifically, the report noted how the recent increase in these technologies raises concerns for telephone studies given the potential for the misidentification and blocking of legitimate research calls. While the direct impact to studies is difficult to determine given the challenge of identifying whether a survey call has been incorrectly flagged, the report highlighted that such blockers have potential to increase survey costs, reduce response rates, and perhaps even to create a perceived link between reputable scientific research organizations and unethical and deceptive companies. The current research examines the effect of various strategies to mitigate spam flagging across two national surveys. The first study compares contact rates between an SMS-enabled and a non-SMS-enabled outbound number. The second study compares contact rates between a static SMS-enabled outbound number and two-week rolling SMS-enabled outbound numbers. The results will add much needed data to advance the ongoing struggle of survey research organizations to separate themselves from Spammers.

Addressing Technical Challenges of Sharing Information Across Organizations

Matthew Bensen, RTI International*
Preethi Jayaram, RTI International

RTI conducted a study for the University of Texas Southwestern Medical Center (UTSW) where one randomly chosen adult from households in Dallas and Tarrant counties would take a survey and then get tested for COVID-19. With a lower than desired response rate, a second protocol was added, using a convenience sample. RTI programmed web and CATI surveys and developed system integrations with UTSW's web site to schedule COVID-19 testing appointments. Participants were routed to the scheduling site, and scheduling status for each person was returned once daily. RTI followed-up by phone with those that did not complete an appointment. RTI addressed several challenges, including connecting systems and securely passing data across organizations. Some involved system design given distinct goals of survey completion and COVID testing. We will discuss challenges and successes related to: -Connecting to the scheduler -Assuring unique web and CATI CASEIDs in a context where a setting to establish an initial CASEID did not exist -Reaching out to participants who had not completed their testing, given that a person could be at one of several scheduling statuses.

Applying R-indicators to Data Collection and Case Management: The case of the GSS 2020 panel

Beth Fisher, NORC, University of Chicago*
Kate Bachtell, NORC, University of Chicago

The COVID-19 pandemic has had paradoxical effects on survey operations. For the 2020 General Social Survey (GSS), it had the unexpected effect of strengthening opportunities for adaptive survey design and the applied use of R-indicators, statistics that assess the representativeness of a survey sample. In this paper we present the 2020 GSS Panel as a case study and describe how R-indicators were used to achieve a more balanced sample and strategically time closedown activities. We focus on the operational aspects of sample monitoring, field management, and incentives in a closely coordinated intervention. Under an adaptive design approach, we used the R-indicators results to inform data collection decisions. Cases were allocated into three groups in order to have a more targeted approach. The first group received a higher monetary incentive and extra telephone outreach by a field interviewer. The second group continued to receive outreach, but at the normal level. The third group did not received no further outreach and it is used as baseline for comparisons. Through this, project staff were able to leverage this opportunity to refine application of adaptive design.

1:00 pm - 2:25 am

Concurrent Sessions

Session 6A: New Data Collections Methods for the Commodity Flow Survey and HAZMAT Supplement

A New Way of Collecting Transactional Data for the Commodity Flow Survey

Christian Moscardi, U.S. Census Bureau*

The U.S. Census Bureau and Bureau of Transportation Statistics are exploring the feasibility of collecting more timely and voluminous shipment information that a company can easily obtain from their databases for the Commodity Flow Survey (CFS) in order to meet the needs of data users, improve the quality and usefulness of the data products, and reduce respondent burden. Rather than collecting a sample of shipment data through a questionnaire, the goal is to ingest a company's shipment records through a simple, secure file upload process. Unlike in the survey, companies have the flexibility to report each establishment separately or combined as a consolidated report, further reducing burden. In addition, respondents will report data in a format that requires minimal manual transformation, editing, or classification on their part by using machine learning for more burdensome survey items. This presentation will focus on how we accomplished this with a small number of companies during the previous CFS, how we then developed a new collection instrument, and findings from a pilot of that collection instrument with CFS respondents.

Adapting Qualitative Pretesting Methods to Aid Development and Evaluation of Electronic Data Transfer Techniques to Augment Survey Collection

Rebecca Keegan, U.S. Census Bureau*
Kristin Stettler, U.S. Census Bureau

The Commodity Flow Survey (CFS) collects detailed data on the movement of goods in the U.S. In order to ease burden, CFS procedures call for respondents to report on only a sample of their shipments. Previous cognitive testing and direct feedback from respondents revealed that creating this sample was difficult. A new process of data collection is being developed utilizing machine learning and providing respondents with access to a portal that will host a large amount of their shipment data, thus eliminating the need for respondents to create a sample. To facilitate this major change in data collection, qualitative research methods were adapted to obtain feedback from a wide variety of respondents. Exploratory interviews were conducted to explore the feasibility of implementing this new process on a large scale, and to assess the potential effects on respondent burden. The new platform then underwent usability testing. As more surveys move towards innovative designs utilizing advancements in automation to reduce respondent burden, this presentation demonstrates how traditional qualitative survey pretesting methodology can be adapted to evaluate and facilitate these transformations.

Benefits of a New Data Collection Tool for the Commodity Flow Survey Program

Julie Parker, Bureau of Transportation Statistics*

As part of the panel on CFS collection modernization, Julie Parker, the CFS program manager at BTS, will discuss the benefits of this new data collection tool for the CFS program and data products. By collecting more data, BTS/CFS can meet a variety of data user needs, including demand for more geographically granular estimates of shipment activity and requests for better measures of the diverse shipment activity happening in e-commerce. In addition, by improving the quality of data submitted by respondents, CFS can reduce cost and time required to validate and correct data, ultimately leading to higher-quality data products. Finally, this "digitally native" data collection instrument is a step towards more automated collection of shipment data, which may make an annual CFS data product more viable in the future.

Parsing the Code of Federal Regulations for the Commodity Flow Survey's Hazardous Materials Supplement

Krista Chan, U.S. Census Bureau*
Christian Moscardi, U.S. Census Bureau

In partnership with the Pipeline Hazardous Materials and Safety Administration (PHMSA), the Census Bureau is implementing a hazardous materials (hazmat) supplement to the Commodity Flow Survey (CFS). We are asking hazmat shippers to provide information about materials shipped and the packaging used to protect those shipments in transit. Hazmat packaging is federally regulated - the Code of Federal Regulations (CFR) specifies packaging for each hazardous material that can be shipped. However, these specifications are not in a structured data format - they are written as English natural-language text. We have used Natural Language Processing (NLP) techniques to categorize the packaging regulations into a structured data format. We can now combine this information with survey response data to produce richer data about hazmat packaging for PHMSA, while not burdening our survey respondents with the need to look through opaque regulatory text. Last, PHMSA will use this structured data to enable easier and more streamlined searching through the CFR, e.g. for companies that need to comply with hazmat packaging regulations.

Session 6B: Evaluating Online Data Collection from Establishments

Advancements in the Design of Online Establishment Surveys and Usability Considerations

Temika Holland, U.S. Census Bureau*

Establishment surveys conducted by the federal government often collect factual data requiring respondents to utilize records and other data sources in order to report. This additional effort has implications related to the overall experience of the respondent with the survey instrument, their reporting behaviors, and the quality of data obtained. Given the additional complexity of the establishment survey response process, advanced features like machine learning, have been explored to aid in reporting. As establishment surveys continue to migrate to the Web, there is an increased need for methodological research on the design and evaluation of new developments of online self-administered establishment surveys. Findings from recent research will be shared in order to provide guidance to survey researchers and practitioners on improved or alternative design elements for online establishment and similar, more complex household surveys in the federal government. Considerations for usability and overall reporting experience will also be discussed.

Options for Pre-testing Online Establishment Surveys

Jean E. Fox, U.S. Bureau of Labor Statistics*

Testing surveys before we field them is critical to help ensure that respondents provide the information we are looking for in the way that we need. Pre-testing helps identify situations where respondents interpret instructions in unexpected ways, where their data doesn't match the constructs we intend to measure, or where they just miss important instructions. Testing is important for both household and establishment surveys, and each has its own special considerations. This presentation will focus on options for pre-testing establishment surveys, and will cover topics such as the differences between usability testing and cognitive testing, options for remote testing, recruiting (and motivating) participants, and using scenarios/vignettes.

Remotely Testing the Integration of Machine Learning in the 2022 Economic Census

Melissa Cidade, U.S. Census Bureau*

The Economic Census is a mandatory survey conducted by the Census Bureau every five years. The survey collects data electronically from nearly 4 million establishments representing all U.S. locations and industries on a range of operational and financial topics. Additionally, one question series presents respondents with a list of products and services typical for their industry using the North American Product Classification System (NAPCS); respondents then select those that are appropriate to their firm, as well as the ability to write in additional responses not prelisted. Previous administrations of the form resulted in an inordinate number of write-in responses, which require outsized resources to code, clean, and analyze. To assist in the selection of products/services, and to potentially reduce the number of write-in responses, the upcoming 2022 NAPCS survey item plans to incorporate machine learning functionality. Additionally, because of the novel coronavirus global pandemic, we adapted our usability methodology to provide for remote interviewing. This presentation provides an overview of the usability testing methods we used in incorporating machine learning for the NAPCS item, as well as preliminary findings and recommendations for incorporating such a feature into an online survey.

Classifying Businesses into NAICS Codes using a Text Mining and Machine Learning Platform

Sudip Bhattacharjee, University of Connecticut U.S. Census Bureau*
Justin C Smith, University of Connecticut
Ugochukwu Etudo, University of Connecticut

We develop (1) a suite of tools that systematically gather public, textual, information on US establishments and (2) a natural language processing and machine learning based methodology to predict full 6-digit NAICS codes. We rely upon a novel mix of publicly available, commercial, and official data. Our sample consists of approximately 130,000 establishments across all 20 NAICS sectors (2-digit), and across approximately 500 national industry codes (6-digit). We implement an ensemble machine learning framework that relies on four constituent trained machine learning classifiers. We show that publically available, firm-sourced data is typically most discriminative when used to train our models to detect the correct NAICS code at the 2, 4 and 6-digit levels. We find that data sourced from commercial entities additional discriminative information. Model accuracies range from 70% to 95%, depending on the level of NAICS specificity. Accuracies increase further with other feature engineering additions. We evaluate model stability and other performance criteria. Our research can reduce both respondent and analyst burden while improving the quality of business classifications.

Session 6C: Multimode Survey Considerations

Statistics Canada's Experiences in Planning, Costing, Managing and Assessing Data Collection of Multi-Modes Social Surveys

Francois Laflamme, Statistics Canada*

Statistics Canada has been consistently focused on identifying opportunities for strategic improvement in data collection approaches, as well as on innovative data collection methods which may be more aligned with current respondent communication preferences. To meet these requirements, the Agency has implemented new multi-mode collection strategies. While these changes were necessary, they have increased the complexity of the survey collection processes and the risk of not obtaining survey objectives. In fact, surveys post-mortem analysis have indicated that key survey planning assumptions were sometimes not aligned with the expected response rate or survey budget (or even both). In practice, both survey budget and survey response rate need to be based on realistic key planning assumptions in order to obtain and manage expected results. This paper describes Statistics Canada's experiences in planning, costing, managing and assessing multi-mode surveys that have both Web and Computer-Assisted Telephone Interview (CATI), including the impact of differences between planned and observed key planning assumptions on survey results, budget and cost.

Considerations in Data Harmonization for Multimode Surveys

Hanna Popick, Westat*
Mina Muller, Westat
Eric Jodts, Westat

This presentation will address data harmonization challenges in multimode studies and provide researchers data approaches to consider when a multimode data collection obtains inconsistent data across modes. Paper and web surveys each have different strengths, so there can be a benefit to offering each of these options to respondents during a data collection effort. One benefit of a web survey is that it enforces logic and response constraints. Though a paper survey can be designed to maximize clear instructions, one cannot prevent a respondent from selecting more than one response on a single-response question, for example. The presentation will first address the process of harmonizing the data from multiple modes as well as the advantages and challenges of harmonized data that researchers should consider. The second part will focus on data strategies that can be applied when data for the same questions are inconsistent across modes. Specifically, the presentation will cover different question types, provide examples and address considerations that should be taken while editing the data to result in a single harmonized dataset.

Estimating RDD, ABS and Web Panel Mode Effects in a Nationwide Survey of Alcohol Use

Randy ZuWallack, ICF*
Matt Jans, ICF
Thomas Brassell, ICF
Kisha Bailly, ICF
Priscilla Martinez, ARG
Deidre Patterson, ARG
Thomas K. Greenfield, ARG
Katherine J. Karriker-Jaffe, ARG

The National Alcohol Survey (NAS) has been conducted as a random-digit dial (RDD) since 2000. However, recent changes in respondent behavior towards telephone interviewing have necessitated a transition of the latest cycle to a multi-mode design. The current NAS cycle employs a multimode design consisting of both a national RDD and ABS frame, as well as a non-probability web panel. Building upon prior research measuring alcohol consumption through a self-administered and interviewer administered modes, our present research focuses on distinguishing between the effects of administration mode and the effects of a non-probability panel versus an ABS push to web. The multi-mode, multi-frame design allows for the measurement of a mode effect of conducting the survey by CATI and web collection. In addition, the design allows for a comparison of a self-administered probability-based sample with a nonprobability sample. Using a regression model, we estimate the mode effects and nonprobability panel effects after controlling for demographics. We present the results of the modeling and discuss mode adjustments as a means of transitioning surveys from RDD to an alternative mode.

Multiple Mode, Multiple Survey Implementation for a Sensitive Population

Matthew Bensen, RTI International*
Sridevi Sattulari, RTI International
Megan Waggy, RTI International
Jennifer Hardison, RTI International
Hannah Feeney, RTI International
Mike Price, RTI International

RTI International collected data on user perceptions of the National Human Trafficking Hotline for the Administration for Children and Families, U.S. Department of Health and Human Services. A fully participating respondent completed two surveys. Given that some participants could be in potentially dangerous situations, we offered multiple ways to participate in the second survey and receive an incentive. We did not mention the study topic in our communications. We never had nor collected a name or address. Given this, and that we were offering an incentive in the second survey, we asked participants to create a password that would be used to enter the second survey and developed a solution that considered that participants might create the same password. Finally, we needed to collect data close to the time of the reported incident. This presentation will show how we: -designed our case management system to accommodate various pathways -designed our communications to not convey the study topic -used preference data for second survey execution -expired cases so that we would only receive current data.

2:30 pm - 2:55 pm

Posters and Demonstrations III

P&D Session 3A: Meet the presenters

An Interactive Data Visualization Platform for Survey Data Monitoring and Paradata Exploration

Joe J. Murphy, RTI International*
Michael A. Duprey, RTI International
Rob Chew, RTI International
Rebecca Powell, RTI International
Katie Lewis, U.S. Energy Information Administration

In the age of paradata, the amount of information available to inform decisions during data collection can be overwhelming. Furthermore, adaptive, responsive, or tailored designs require the survey team to monitor critical-to-quality indicators to minimize total error across data sources. To aid decision-making in a data-rich context, visualization can serve as a valuable tool to express data. In this presentation, we describe the process of designing a tool called the Adaptive Total Design (ATD) Dashboard designed to monitor and visualize data from multiple sources to track experimental, multimode, and longitudinal survey designs in near-real time. Data inputs may be from various systems and may exist at multiple units of analysis, thus we have constructed a data taxonomy to allow only logical instantiations. By employing an extensible app framework for R (Shiny) the dashboard standardizes visualizations and reports. We present examples from the 2020 Residential Energy Consumption Survey (RECS) illustrating the functionality of the dashboard. For RECS, the dashboard was used to closely monitor trends from the first phase of data collection to inform the design of the second phase.

Rich Data Services: Modernizing data publication and sharing

Pascal Heus, Metadata Technology North America*
Andrew Decarlo, Metadata Technology North America
Carson Hunter, Metadata Technology North America
Jack Gager, Metadata Technology North America

Ever wonder how to deliver data to researchers, applications developers, data scientists, or the public in a modern and effective way? This presentation will demonstrate the use of Rich Data Services (RDS), an innovative platform from Metadata Technology North America, designed to concurrently deliver data and metadata as a service to both users and applications. RDS was built to reduce data wrangling and empower information systems. Based on IT industry-standard REST technology and informed by global metadata standards, it enables immediate access to data and metadata by developers or data scientists. RDS also comes with web-based applications to allow casual users to explore and tabulate the data in a browser, or download for offline analysis. RDS opens endless capabilities for data discovery, analysis, storytelling visualizations, machine learning, and more. The platform is backed by MTNA's extensive expertise in data management and information technology. For information, see: Rich Data Services: https://www.richdataservices.com COVID-19 Data Center: https://covid19.richdataservices.com Public Data Center: https://public.richdataservices.com MTNA: https://www.mtna.us

P&D Session 3B: Meet the presenters

A Novel Protocol for Remote Usability Testing of a Wireframe User Interface

Alda G. Rivas, U.S. Census Bureau*
Erica Olmsted-Hawala, U.S. Census Bureau
Lin Wang, U.S. Census Bureau

Human-centered design (HCD) emphasizes iterative usability testing throughout the life cycle of system development to ensure optimal user experience. In this presentation, we describe a protocol for remote usability testing of a wireframe for a data dissemination tool. The protocol involves presenting the user with the wireframe through screen-sharing, the user verbalizing the actions they would perform on the wireframe, and the researchers performing those actions on behalf of the users. The findings from the usability sessions allowed us to provide the designers with recommendations on how to improve the user experience by addressing identified usability issues before programming the full system. The implementation of our protocol allowed us to remotely conduct the usability evaluation of a wireframe while minimizing loss of user behavior data (e.g., clicking on inactive links).

Comparing the Quality and Cost Effectiveness of Facebook, Craigslist, and In-person Recruitment Methods for Longitudinal Panels

Rachel Kinder, ICF*
Robynne Locke ICF
James Dayton, ICF
John Jasek, NYC DOHMH
Eleni Murphy, NYC DOHMH

Longitudinal surveys are valuable for assessing the impact of a program or intervention over time, such as public health campaign effectiveness. However, collecting longitudinal data presents numerous challenges, including the cost of recruiting and retaining panel members. The New York City Department of Mental Health and Hygiene (NYC DOHMH) and ICF have conducted three waves of a longitudinal survey to measure the impact of tobacco cessation programming on the smoking cessation behaviors of adults in NYC. The NYC Tobacco Cessation Panel Survey study includes a baseline survey and three waves of follow-up surveys administered over one year. Along with surveying existing nonprobability panel members, ICF explored alternative recruitment methods, including in-person, social media (Facebook), and online marketplaces (Craigslist) because no one mode alone could provide sufficient eligible sample (adults who smoke). In this presentation, we will describe the advantages and disadvantages of each method, comparing panel retention, demographic coverage, data quality, and cost-effectiveness. We will also present the overall impact of SMS panel retention methods across all recruitment.

Using Eye-Tracking Glasses to Collect Physical Measures of Attention to Direct-to-Consumer (DTC) Prescription Drug Print Ads

Lavaughn Cadiz Gooden, Westat*
Victoria Hoverman, Westat
Andrew Caporaso, Westat
Jennifer Crafts, Westat
Douglas Williams, Westat
Kathryn Aikin, FDA
Helen Sullivan, FDA

Eye-tracking is used to assess attention to defined areas or features of stimuli in participants' visual fields. Stimuli can range from survey forms to informational and marketing products. Eye-trackers attached to computer monitors are effective for screen-based stimuli, but are sub-optimal when measuring attention to real-world stimuli such as multi-page paper ads with which respondents interact. Therefore, Westat used eye-tracking glasses to assess ad-viewing behavior for a large-scale data collection in six U.S. cities for the Food and Drug Administration. The study objective was to investigate the effects of repetition and location of risk information in direct-to-consumer print prescription drug ads on risk recall, recognition, and comprehension. Participants who had one of two medical conditions wore eye-tracking glasses as they viewed a fictitious ad related to their medical condition (N = 422). After reading a randomized version of the ad, each participant completed a web questionnaire about that specific ad. This presentation will report on adjustments made between a pretest and main study to optimize data quality and will supplement results from a prior FedCASIC poster.

P&D Session 3C: Meet the presenters

Multi-mode Survey Systems on a Shoestring: Adapting off-the-shelf survey systems to support complex multi-mode data collection

Monica Polino Schneider, Decision Information Resources, Inc.*
Heather Morrison, Decision Information Resources, Inc.

It is a known trend that survey response rates are declining. Multi-mode surveys are one way to combat this trend, but aside from concerns about mode effects, there are technical challenges to managing cases across multiple modes. Existing survey software packages typically cannot provide real-time status across modes. Similarly, few support the cross-mode management of cases that have multiple contact types (respondent/alternate) with multiple pieces of contact information (phone number, address, email address). Some organizations build proprietary systems to address these challenges, but smaller organizations are typically left adapting existing products, with mixed success. In this presentation we will discuss how we adapted our Voxco system's Computer Assisted Telephone and Web Interviewing (CATI and CAWI) modules to develop a multi-mode case management system that could accommodate complex contact information. We will discuss key system features, functionality, benefits, and drawbacks. We will also discuss next steps including: the development of advanced case management rules, the capture of paradata, and the development of safeguards to prevent incorrect case assignment.

Going Digital! Experiences with Converting to Teletype (TTY) software

Donna Perlmutter, IMPAQ International*
Kelsey Walter, IMPAQ International
John Wendt, IMPAQ International
Noelle Poirier, IMPAQ International
Teerachat Techapai, IMPAQ International
Margaret Collins, IMPAQ Internationali

IMPAQ conducts projects using TTY machines to make outgoing calls to test the capabilities of various locations to handle calls from hearing impaired people. IMPAQ launched an effort to investigate and test a digital TTY software to replace the current TTY machines. The software-based TTY allows the TTY communications to use existing telephony infrastructure. The program communicates through IP and uses standard session initiation protocol (SIP). Our initiative for upgrading IMPAQ's TTY capabilities include: Modernize the system with current technology: analog devices to digital devices, More efficient call logging and monitoring, Eliminate process, management, and storage of TTY paper tapes, Digital TTY support use by remote-based interviewers: desktop TTY required staff to be on site. Our challenges to overcome and provide detail to our client were; 1) can we connect to the locations, 2) does it take a similar amount of time to operate a digital system, and 3) do we have documentation for the call. We needed to understand the difference in utilizing TTY digital tapes rather than TTY paper tapes in terms of time, efficiency and clarity.

3:00 pm - 4:25 am

Concurrent Sessions

Session 7A: Updates from the Survey of Consumer Finances

A Tale of Two Approaches: Incentive Escalation Strategies for the 2019 Survey of Consumer Finances

Kate Bachtell, NORC, University of Chicago*
Micah Sjoblom, NORC, University of Chicago
Catherine Haggerty, NORC, University of Chicago
Shannon Nelson, NORC, University of Chicago
Steven Pedlow, NORC, University of Chicago
Joanne Hsu, Board of Governors of the Federal Reserve System

In this paper we share results from two distinct approaches to incentive escalation implemented for the 2019 Survey of Consumer Finances (SCF). The SCF is funded triennially by the Board of Governors of the Federal Reserve System (FRB). Both escalation approaches were informed by well-documented, positive effects of monetary incentives on survey response (Godwin 1979, Church 1993, Goritz 2006, Singer and Couper 2008, Hsu et.al 2017), but varied considerably in design and execution. For the first approach, we developed an algorithm to identify SCF households that presented the most challenges for data collection and devised an experiment to isolate the impact of offering an escalated incentive - double the amount of the initial offer - beginning in week 11 of the field period. For the second approach, we worked closely with our field management team to design localized incentive escalation efforts that leveraged the presence of specialist interviewers in distinct frame areas. In this paper we highlight challenges balancing cost and other operational considerations, and examine the overall efficacy on the probability of survey participation, for each approach.

Augmenting Validation on the SCF

Shannon Nelson, NORC, University of Chicago*
Catherine Haggerty, NORC, University of Chicago
Nella Coleman, NORC, University of Chicago
Kate Bachtell, NORC, University of Chicago
Micah Sjoblom, NORC, University of Chicago
Steven Pedlow, NORC, University of Chicago
Jesse Bricker, Board of Governors of the Federal Reserve System

The Survey of Consumer Finances (SCF) is the premier source of information on the financial circumstances of American households. It is used by researchers and policymakers to inform important monetary policy impacting individuals, households, businesses and the overall economy. The accuracy and integrity of the SCF data is paramount. The SCF has a long tradition of engaging in continuous improvement across all survey processes. While the processes and procedures used to validate interviewer work has been examined and small changes made each round, the methods used have remained largely constant, with the first two interviews and a random ten percent of cases completed thereafter selected for validation. The use of tablets during the 2019 round allowed the robust set of validation data points to include the use of real-time tracking software to examine multiple GPS data points and the collection of electronic signatures from respondents which proved to be an additional means to identify potential falsified data. In this presentation we will review the standard validation measures used by the SCF in past rounds and describe a new proprietary data falsification system.

Web Data Collection for the 2019 SCF: A Test

Lisa Lee, NORC, University of Chicago*
Richard Windle, Board of Governors of the Federal Reserve System
Catherine Haggerty, NORC, University of Chicago
Shannon Nelson, NORC, University of Chicago
Frankie Duda, NORC, University of Chicago
Kate Bachtell, NORC, University of Chicago
Micah Sjoblom, NORC, University of Chicago
Steven Pedlow, NORC, University of Chicago

The SCF collects personal financial data that is both complex and sensitive, potentially affecting likelihood to reply via the web. In recent years, a number of studies have explored the use of web and mobile surveys to collect household financial data. (Jackle et al., 2017; see also Lessof et al., 2017 and Read, 2017). The results of these studies are promising and informed potential designs to include in the 2019 SCF web survey. However, it is important to note that the studies completed to date have not collected financial data via the web at the level of detail required for the SCF. The 2019 SCF included a test to allow for an assessment of how the SCF would perform in a self-administered web context. The web test included a subset of the SCF questionnaire with a range of different question types. The test successfully concluded with 222 respondents completing both a web and an interview-administered instrument. We present preliminary findings from this test including the web test methodology and the quality of the data collected.

Data Quality Efforts through Interviewer Feedback

Heather Sawyer, NORC, University of Chicago*
Kate Bachtell, NORC, University of Chicago
Catherine Haggerty, NORC, University of Chicago
Shannon Nelson, NORC, University of Chicago
Micah Sjoblom, NORC, University of Chicago
Kevin Moore, Board of Governors of the Federal Reserve System
Jesse Bricker, Board of Governors of the Federal Reserve System
Richard Windle, Board of Governors of the Federal Reserve System
Joanne Hsu, Board of Governors of the Federal Reserve System

The Survey of Consumer Finances (SCF) is the most comprehensive source of household financial data in the U.S. It collects a broad range of financial information, which can often be complex in nature. Field interviewers are the main conduit between the survey instrument and survey participants, and as a result, minimizing interviewer error is an important aspect in achieving high data quality. Beginning in 2004 the SCF has given interviewers detailed feedback about the quality of their work throughout the data collection period. In 2019 the project team implemented ongoing field interviewer training with an improved and enhanced individually tailored Data Quality Report. The report, generated from case reviews of each interview as they were completed, provided analyses of data quality and both praised good work and highlighted survey errors. The goal was to provide targeted feedback and lesson plans to address each interviewer's unique learning needs. Doing so on a large-scale survey project presents many challenges. This paper describes interviewer feedback efforts and the operational challenges.

R-Indicators for the 2019 Survey of Consumer Finances (SCF)

Katie Archambeau, NORC, University of Chicago*
Kate Bachtell, NORC, University of Chicago
Cathy Haggerty, NORC, University of Chicago
Shannon Nelson, NORC, University of Chicago
Micah Sjoblom, NORC, University of Chicago
Steven Pedlow, NORC, University of Chicago
Kevin Moore, Board of Governors of the Federal Reserve System

Adaptive Survey Design (ASD) involves strategies that inform adjustments in data collection procedures based on quantifiable metrics (Groves and Heeringa, 2006). R-indicators may be used within ASD to estimate the degree to which the sample represents the larger population and/or which sample segments are under- or over-producing (Schouten et al., 2009). This information may then spur interventions to improve the representativeness of key subgroups, reduce effort on 'unproductive' cases, and streamline survey operations (Cohen, 2019). In this paper we describe the use of R-indicators for the 2019 Survey of Consumer Finances (SCF), funded by the Federal Reserve Board. We first discuss the process of computing the R-indicators using data from the 2016 and 2019 SCF area probability samples, along with population estimates from the American Community Survey. We then present retrospective results from the 2016 SCF and discuss implications for 2019. Finally, we share findings for the 2019 SCF area probability sample. We contribute to a larger body of work on R-indicators by assessing representativeness and improving efficiency and data quality in survey research.

Session 7B: User Experience and Accessibility

Incorporating User Experience into the Development of a Survey Response Analysis Applications

Ian S. Thomas, RTI International*
Helen Ray, RTI International

We present a process for creating analysis and visualization tools that improve the accessibility and usability of public use administrative and survey data. By incorporating user experience methodologies into the software development process, we've created web applications that makes it easy for researchers to find public use datasets; explore the contents; and perform, share, and visualize their findings. We describe our user centered design and development process of clearly identifying the target users, developing the user interface by iteratively refining the user interface through user experience testing, as well as our techniques for ensuring for accuracy and speed. To demonstrate this process in we will show a case study to highlight the use of user-centered design and agile software development methodologies for creating data analysis and visualization products. We conclude with lessons learned, applying these ideas to survey design and recommended best practices to develop products that foster accessibility and usability.

Inclusivity Through Accessible Web-Based Surveys

Scott Crawford, SoundRocket*
Rob Young, SoundRocket

As we work to adapt web-based surveys to various devices, it is also important to consider how design may impact those who rely on assistive technology. Federal Section 508 compliance standards have been around for a long time -- but the survey research industry has often selected the path of using alternative (non-technology) methods for including disabled individuals in our surveys. However, taking steps to ensure a more equitable experience (not just accessible) will help ensure that the most comparable data is captured. Using assistive technologies (screen readers, mouse input grids, voice, keyboard navigation, etc.) allows a segment of the population who may otherwise not have responded a chance to participate in research in a comparable way to others. In this presentation we will report on our experience in a web-based campus-wide climate survey of diversity, equity and inclusion. We will share our experiences in ensuring an equitable experience for those with sight and movement disabilities. We will describe how we adapted an off-the-shelf survey package to meet the population needs.

Improving User Experience and Data Quality in Web-Based Surveys Using Custom Forms and JavaScript

Steve Gomori, RTI International*
Mai Nguyen, RTI International
Charlie Knott, RTI International
Sue Pedrazzani, RTI International
Frank Mierzwa, RTI International

RTI has developed web-based surveys on many projects. To enhance usability and data quality, RTI developed forms for highly complex question layouts and more interactive behaviors. For the AURORA cooperative agreement, RTI was able to develop survey forms that have complex layouts with interactive elements that are enabled/disabled or change their behaviors based on actions taken by the user on the same form. We developed instruments capable of supporting both the interview-led and self-administered modes despite the two modes having different question wording, different response options and separate validations. RTI introduced an interactive body manikin into a web survey allowing the respondent to select regions on the body map that uses branching logics to launch subsequent diagnosis module(s). To enhance clinic physiological sensory protocol adherence and quality measures, we embed timers to direct clinic staff in completing temporal pain measures (i.e., threshold, tolerance) and cuff algometry. In our presentation, we will provide live demos of a few of these advanced customizations to illustrate some of these unique capabilities in our web-based survey systems.

Session 7C: Development and Evaluation of a Web- based Instrument Prototype

Development and Evaluation of Web-based Vacant Housing Identification Prototypes

Moderator:

Lin Wang, U.S. Census Bureau

Panelists:

Alex Cohen, U.S. Census Bureau

Shelley Feuer, U.S. Census Bureau

Jonathan Katz, U.S. Census Bureau

Identifying vacant housing in major household surveys is a crucial field operation for reducing nonresponse. This operation is currently being carried out by sending interviewers to the field to identify vacant units, which is costly. With the increasing popularity of online map, one way to reduce fieldwork is by asking the public to help report vacant units in their neighborhood using an interactive online map. In this session, we will first demonstrate the prototype of a online-map interface and discuss technical innovations, including house search strategies and the use of Apple Maps and Google Street View APIs; then we will report on the usability evaluation of the prototypes, Finally, we will present findings related to messaging, i.e., how to effectively get people to share information about their neighborhood. The three presentations provide a convergent story about how to motivate the public to provide data about their neighborhood, how many homes make up a "neighborhood" and what map interfaces help the public to navigate their neighborhood.

For questions about the FedCASIC workshops or technical issues with the FedCASIC website, please send an email to FedCASIC@bls.gov.

Source: U.S. Census Bureau, ADSD

Last Revised: January 31st, 2025

2021 FedCASIC Virtual Conference