Improvements to the 2020 Census Race and Hispanic Origin Question Designs, Data Processing, and Coding Procedures

August 03, 2021

Written by:

Rachel Marks, chief, Racial Statistics Branch, Population Division and Merarys Rios-Vargas, chief, Ethnicity and Ancestry Branch, Population Division

Tiempo de lectura estimado: 10 minutos

La Oficina del Censo de los EE. UU. ha recopilado datos (solo en inglés) sobre la raza desde el primer Censo de 1790 y sobre el origen hispano o latino (al que nos referimos como origen hispano en este blog) desde el Censo de 1970. La manera en que se miden estos temas, y cómo se recopilan y codifican sus estadísticas, ha cambiado prácticamente cada década durante toda la historia del censo como reflejo de factores sociales, políticos
y económicos.

Este blog analiza cómo hemos mejorado las preguntas del censo sobre la raza (solo en inglés) y el origen hispano (solo en inglés), también conocido como origen étnico, entre 2010 y 2020. Estos cambios proporcionan un contexto importante mientras nos preparamos para la publicación del Archivo resumen de los datos para la redistribución legislativa del Censo del 2020 (Ley Pública 94-171).

Los datos para la redistribución legislativa proporcionarán las primeras estadísticas sobre la raza y el origen hispano del Censo del 2020. Esperamos que los datos reflejen no solo los cambios en la población, sino también las mejoras en cómo formulamos las preguntas y cómo registramos y codificamos las respuestas.

El diseño de las preguntas

Es importante tener en cuenta que la Oficina del Censo recopila datos sobre la raza y el origen étnico en conformidad con los Estándares para conservar, recopilar y presentar datos federales sobre raza y origen étnico de 1997 (solo en inglés), estipulados por la Oficina de Administración y Presupuesto de los EE. UU. (OMB, por sus siglas en inglés).

Por lo tanto, el diseño de las preguntas sobre el origen hispano y la raza del Censo del 2020 es similar al diseño que se usó en los censos del 2000 y el 2010.

Aunque la Oficina del Censo probó diseños de preguntas alternativos en el 2015 (solo en inglés), a fin de cuentas debemos seguir los estándares de la OMB de 1997 y, por consiguiente, usamos dos preguntas separadas para recopilar datos sobre la raza y sobre el origen étnico. Sin embargo, nuestra prueba demostró que podíamos hacer mejoras a las preguntas sobre la raza y el origen étnico del Censo del 2020 que siguieran las pautas establecidas por la OMB.

Entonces, basándonos en una amplia investigación y extensión comunitaria (solo en inglés), junto con asesores y partes interesadas durante la última década, hicimos varias mejoras a las preguntas del Censo del 2020. También mejoramos la forma en que la Oficina del Censo procesa y codifica las respuestas para estas preguntas.

Mejoras a la pregunta sobre el origen hispano del Censo del 2020

La pregunta sobre el origen hispano del Censo del 2020 (Figura 1) incluyó las mismas tres casillas de verificación detalladas que aparecieron en el Censo del 2010 (“mexicano, mexicano americano, chicano”, “puertorriqueño”, “cubano”), junto con una casilla de verificación “Sí, otro origen hispano, latino o español”. Se hicieron dos cambios a la pregunta sobre el origen hispano del Censo del 2020.

La instrucción “Escriba el origen, por ejemplo” se cambió a “Escriba, por ejemplo”. Durante nuestra investigación, descubrimos que este cambio en la instrucción les permitía a las personas encuestadas comprender cuál era la información que se solicitaba en la pregunta
y no limitaba su respuesta escrita con instrucciones confusas con los términos (p. ej.: origen) que significan cosas distintas para diferentes personas.
Se cambiaron los grupos de ejemplo “argentino, colombiano, dominicano, nicaragüense, salvadoreño, español, etc.” por “salvadoreño, dominicano, colombiano, guatemalteco, español, ecuatoriano, etc.” para representar a los grupos más numerosos de población origen hispano (solo en inglés) y la diversidad geográfica de la categoría hispano o latino, según se define en los estándares de la OMB de 1997.

Design Improvements to the 2020 Census Race Question

We also made several design improvements to the race question for the 2020 Census (Figure 2) based on our research over the past decade.

In response to community feedback over the past decade, we added dedicated write-in response areas and examples for the “White” and the “Black or African Am.” racial categories.
We provided six example groups for each of the “White,” “Black or African American,” and “American Indian or Alaska Native” racial categories. These examples represent the largest population groups within each of the geographically diverse population groups of each race category, as defined by the 1997 OMB standards.
Based on successful previous testing, the term “Negro” was removed from the 2020 Census by updating the category “Black, African Am., or Negro” to “Black or African Am.” on paper questionnaires and “Black or African American” on electronic instruments.
We reordered detailed Asian and Native Hawaiian or Other Pacific Islander checkboxes by population size.
We changed the checkbox category “Guamanian or Chamorro” to “Chamorro” based on research and positive stakeholder feedback.
We updated the write-in instructions for the “Some Other Race” category to better solicit detailed reporting. The 2010 Census form included the instruction to “Print race,” but we updated the 2020 Census instruction to read “Print race or origin” to correspond with the overall question instruction to “Mark ☒ one or more boxes AND print origins.”

How Data on Hispanic Origin and Race Are Processed and Coded in the 2020 Census Compared to the 2010 Census

Coding is a process we use to assign numeric codes to write-in responses to the Hispanic origin and race questions, and we use the numeric codes when we process and tabulate the data. In the 2010 Census, we only captured the first 30 characters of written responses to the race and ethnicity questions and coded up to two write-in responses in each write-in line.

Our research found that people were reporting longer and more detailed responses to the questions. For the 2020 Census, we wanted to reflect more fully and accurately the complex details of how people identify their race and ethnicity.

Based on further research, testing and outreach throughout the decade, we changed how we captured and coded responses for the 2020 Census race and Hispanic origin questions:

We increased the number of characters captured from 30 to 200, which allowed us to capture and fully recognize longer write-in responses.
Instead of prioritizing multiple responses into only two codes, we coded up to six detailed codes for each write-in area. This increase effectively gave all responses an equal opportunity to be coded into one of the six major race categories. It also simplified our coding work that removed duplicate or repetitive terms to reduce responses to two codes.

We fully tested these coding and question changes in the 2015 National Content Test and finalized them in the 2018 Census Test. We processed and coded the race and ethnicity data from the 2020 Census from April to December 2020.

The figures below illustrate how a response was coded in 2010 versus 2020, based on the differences described above.

Figure 3 shows that in the 2010 Census, the response of “MEXICAN AMERICAN INDIAN AND PORTUGUESE AND AFRICAN AMERICAN” was not fully coded because it was longer than 30 characters.

Only the text outlined by the red box “MEXICAN AMERICAN INDIAN AND PO” was captured and coded in the 2010 Census.
The additional text “RTUGUESE AND AFRICAN AMERICAN” was not captured.
Portuguese and African American was not coded.

Figure 4 shows that if the same write-in response was provided in 2020, all of the text outlined by the red box was captured. This change enabled all three terms of “MEXICAN AMERICAN INDIAN AND PORTUGUESE AND AFRICAN AMERICAN” to be recognized and coded.

In another major improvement for the 2020 Census, we used a single code list for coding data from the Hispanic origin question and the race question.

In previous censuses, we used two separate code lists — one for the Hispanic origin question and one for the race question. Previously, these code lists focused on providing codes for either detailed Hispanic groups or detailed race groups. By combining these code lists, we expanded the number of detailed groups that could be coded in each question.

For example, if someone reported their detailed Hispanic origin response in the race question, we were easily able to code it because all detailed Hispanic origin groups are included in the newly combined code list. Likewise, if they reported their detailed Asian response in the Hispanic origin question, we were easily able to code it because all detailed Asian groups are included in the combined code list.

We also expanded our code list to include additional detailed White and Black or African American groups as the race question purposefully elicited the collection of detailed White and Black or African American responses through dedicated write-in lines for the first time.

The 2020 Census Hispanic Origin and Race Code List is available online as part of the 2020 Census National Redistricting Data Summary File Technical Documentation, and illustrates the extensive codes added for 2020 to enable the coding of responses for all racial and ethnic groups in the United States.

Coding Improvements to the 2020 Census Hispanic Origin Question

During the 2010 Census, if someone provided more than two write-in responses in the Hispanic origin question write-in area, we prioritized coding Hispanic groups over race groups or other types of responses.

In the 2020 Census, subject-matter experts coded what they saw, coding up to six responses from left to right, regardless of the Hispanic origin or race group. This enabled all responses to be treated equally. Table 1 illustrates this coding change.

After coding up to six responses on the Hispanic origin question write-in line, like the 2010 Census and previous censuses, only one response is permitted to be tabulated for Hispanic origin in accordance with the 1997 OMB standards.

In Figure 5, we can see how this does not impact the Hispanic origin data for the redistricting data. In 2020, with the coding improvements, we code all three responses of Black (as shown in the green text), Colombian and Peruvian. All of these codes are retained internally for research purposes.

For the official redistricting file tabulations, this response is tabulated as Hispanic or Latino. Following the 1997 OMB standards, respondents can only be Hispanic or Not Hispanic. As long as the respondent provides at least one Hispanic origin response, they are tabulated as Hispanic.

Coding Improvements to the 2020 Census Race Question

The 2010 Census used a complex series of coding rules to determine how to prioritize and assign up to two codes for each unique text string. In 2010, if more than two groups were part of a write-in text string on the same line in the race question, we prioritized coding race groups over Hispanic origin groups or other types of responses because we were limited to only coding two responses.

In the 2020 Census, our subject-matter experts coded what they saw, coding up to six responses from left to right, regardless of race group or Hispanic origin, enabling all responses to be treated equally. Table 2 illustrates this coding change.

In Figure 6, we can see how this improvement to the coding rules impacts our final data by recognizing the rich and complex detailed identities reported by respondents.

In 2010, even though Cuban was written first (as shown in the green text), the responses of Thai and Filipino were prioritized over Cuban because they are detailed race groups.
Cuban was not coded or included in race tabulations because the two race groups were prioritized over it in coding.
This response was tabulated as part of the Asian race category (representing Thai and Filipino) in 2010.
In 2020, all three groups in the write-in response are coded and the response would be tabulated as Asian and Some Other Race (representing Thai and Filipino, and Cuban). The 2020 Census Redistricting Data (Public Law 94-171) Summary File tabulates major race groups (e.g., White, Black, Asian, etc.), so the responses of Thai and Filipino are included in the aggregate Asian racial category. Note, Hispanic responses to the race question are tabulated as part of the Some Other Race category, as Hispanic or Latino is not considered a race in the 1997 OMB standards.

Summary

Improving the 2020 Census questions on Hispanic origin and race, along with our coding procedures, enable us to have a more complete picture of the detailed identities reported by the U.S. population in 2020.

Because of the changes associated with questionnaire design, processing, and coding, users may see differences in the data when comparing to other Census Bureau surveys or non-Census Bureau data sources. If unexpected differences occur, this may be related to a number of factors, primarily the design of the race and ethnicity questions and the improvements to the ways in which we code what people tell us.

We expect that the race and Hispanic origin statistics in the upcoming redistricting data release will not only reflect demographic changes, but also improvements in how we asked the questions and captured and coded the responses.

These improvements more accurately illustrate the richness and complexity of how people identify their race and ethnicity in the 21st century.