Tiempo de lectura estimado: 10 minutos
La Oficina del Censo de los EE. UU. ha recopilado datos (solo en inglés) sobre la raza desde el primer Censo de 1790 y sobre el origen hispano o latino (al que nos referimos como origen hispano en este blog) desde el Censo de 1970. La manera en que se miden estos temas, y cómo se recopilan y codifican sus estadísticas, ha cambiado prácticamente cada década durante toda la historia del censo como reflejo de factores sociales, políticos
y económicos.
Este blog analiza cómo hemos mejorado las preguntas del censo sobre la raza (solo en inglés) y el origen hispano (solo en inglés), también conocido como origen étnico, entre 2010 y 2020. Estos cambios proporcionan un contexto importante mientras nos preparamos para la publicación del Archivo resumen de los datos para la redistribución legislativa del Censo del 2020 (Ley Pública 94-171).
Los datos para la redistribución legislativa proporcionarán las primeras estadísticas sobre la raza y el origen hispano del Censo del 2020. Esperamos que los datos reflejen no solo los cambios en la población, sino también las mejoras en cómo formulamos las preguntas y cómo registramos y codificamos las respuestas.
Es importante tener en cuenta que la Oficina del Censo recopila datos sobre la raza y el origen étnico en conformidad con los Estándares para conservar, recopilar y presentar datos federales sobre raza y origen étnico de 1997 (solo en inglés), estipulados por la Oficina de Administración y Presupuesto de los EE. UU. (OMB, por sus siglas en inglés).
Por lo tanto, el diseño de las preguntas sobre el origen hispano y la raza del Censo del 2020 es similar al diseño que se usó en los censos del 2000 y el 2010.
Aunque la Oficina del Censo probó diseños de preguntas alternativos en el 2015 (solo en inglés), a fin de cuentas debemos seguir los estándares de la OMB de 1997 y, por consiguiente, usamos dos preguntas separadas para recopilar datos sobre la raza y sobre el origen étnico. Sin embargo, nuestra prueba demostró que podíamos hacer mejoras a las preguntas sobre la raza y el origen étnico del Censo del 2020 que siguieran las pautas establecidas por la OMB.
Entonces, basándonos en una amplia investigación y extensión comunitaria (solo en inglés), junto con asesores y partes interesadas durante la última década, hicimos varias mejoras a las preguntas del Censo del 2020. También mejoramos la forma en que la Oficina del Censo procesa y codifica las respuestas para estas preguntas.
La pregunta sobre el origen hispano del Censo del 2020 (Figura 1) incluyó las mismas tres casillas de verificación detalladas que aparecieron en el Censo del 2010 (“mexicano, mexicano americano, chicano”, “puertorriqueño”, “cubano”), junto con una casilla de verificación “Sí, otro origen hispano, latino o español”. Se hicieron dos cambios a la pregunta sobre el origen hispano del Censo del 2020.
We also made several design improvements to the race question for the 2020 Census (Figure 2) based on our research over the past decade.
Coding is a process we use to assign numeric codes to write-in responses to the Hispanic origin and race questions, and we use the numeric codes when we process and tabulate the data. In the 2010 Census, we only captured the first 30 characters of written responses to the race and ethnicity questions and coded up to two write-in responses in each write-in line.
Our research found that people were reporting longer and more detailed responses to the questions. For the 2020 Census, we wanted to reflect more fully and accurately the complex details of how people identify their race and ethnicity.
Based on further research, testing and outreach throughout the decade, we changed how we captured and coded responses for the 2020 Census race and Hispanic origin questions:
We fully tested these coding and question changes in the 2015 National Content Test and finalized them in the 2018 Census Test. We processed and coded the race and ethnicity data from the 2020 Census from April to December 2020.
The figures below illustrate how a response was coded in 2010 versus 2020, based on the differences described above.
Figure 3 shows that in the 2010 Census, the response of “MEXICAN AMERICAN INDIAN AND PORTUGUESE AND AFRICAN AMERICAN” was not fully coded because it was longer than 30 characters.
Figure 4 shows that if the same write-in response was provided in 2020, all of the text outlined by the red box was captured. This change enabled all three terms of “MEXICAN AMERICAN INDIAN AND PORTUGUESE AND AFRICAN AMERICAN” to be recognized and coded.
In another major improvement for the 2020 Census, we used a single code list for coding data from the Hispanic origin question and the race question.
In previous censuses, we used two separate code lists — one for the Hispanic origin question and one for the race question. Previously, these code lists focused on providing codes for either detailed Hispanic groups or detailed race groups. By combining these code lists, we expanded the number of detailed groups that could be coded in each question.
For example, if someone reported their detailed Hispanic origin response in the race question, we were easily able to code it because all detailed Hispanic origin groups are included in the newly combined code list. Likewise, if they reported their detailed Asian response in the Hispanic origin question, we were easily able to code it because all detailed Asian groups are included in the combined code list.
We also expanded our code list to include additional detailed White and Black or African American groups as the race question purposefully elicited the collection of detailed White and Black or African American responses through dedicated write-in lines for the first time.
The 2020 Census Hispanic Origin and Race Code List is available online as part of the 2020 Census National Redistricting Data Summary File Technical Documentation, and illustrates the extensive codes added for 2020 to enable the coding of responses for all racial and ethnic groups in the United States.
During the 2010 Census, if someone provided more than two write-in responses in the Hispanic origin question write-in area, we prioritized coding Hispanic groups over race groups or other types of responses.
In the 2020 Census, subject-matter experts coded what they saw, coding up to six responses from left to right, regardless of the Hispanic origin or race group. This enabled all responses to be treated equally. Table 1 illustrates this coding change.
After coding up to six responses on the Hispanic origin question write-in line, like the 2010 Census and previous censuses, only one response is permitted to be tabulated for Hispanic origin in accordance with the 1997 OMB standards.
In Figure 5, we can see how this does not impact the Hispanic origin data for the redistricting data. In 2020, with the coding improvements, we code all three responses of Black (as shown in the green text), Colombian and Peruvian. All of these codes are retained internally for research purposes.
For the official redistricting file tabulations, this response is tabulated as Hispanic or Latino. Following the 1997 OMB standards, respondents can only be Hispanic or Not Hispanic. As long as the respondent provides at least one Hispanic origin response, they are tabulated as Hispanic.
The 2010 Census used a complex series of coding rules to determine how to prioritize and assign up to two codes for each unique text string. In 2010, if more than two groups were part of a write-in text string on the same line in the race question, we prioritized coding race groups over Hispanic origin groups or other types of responses because we were limited to only coding two responses.
In the 2020 Census, our subject-matter experts coded what they saw, coding up to six responses from left to right, regardless of race group or Hispanic origin, enabling all responses to be treated equally. Table 2 illustrates this coding change.
In Figure 6, we can see how this improvement to the coding rules impacts our final data by recognizing the rich and complex detailed identities reported by respondents.
Improving the 2020 Census questions on Hispanic origin and race, along with our coding procedures, enable us to have a more complete picture of the detailed identities reported by the U.S. population in 2020.
Because of the changes associated with questionnaire design, processing, and coding, users may see differences in the data when comparing to other Census Bureau surveys or non-Census Bureau data sources. If unexpected differences occur, this may be related to a number of factors, primarily the design of the race and ethnicity questions and the improvements to the ways in which we code what people tell us.
We expect that the race and Hispanic origin statistics in the upcoming redistricting data release will not only reflect demographic changes, but also improvements in how we asked the questions and captured and coded the responses.
These improvements more accurately illustrate the richness and complexity of how people identify their race and ethnicity in the 21st century.