SAHIE 2008 - 2023 Demographic and Income Model Methodology: Summary for Counties and for States

Starting in 2008, SAHIE began utilizing the American Community Survey (ACS) as the basis for its estimates. For years prior to 2008, SAHIE utilized the Annual Social and Economic Supplement to the Current Population Survey (CPS ASEC). Other input data sources remain the same, as described further on this page.

The definitions of health insurance coverage differ between the two surveys. In the CPS ASEC, "insured" was defined as being covered SOME TIME during the past calendar year. The ACS health insurance question asks, "Is this person CURRENTLY covered by [specifically stated] health insurance or health coverage plans?"

Due to these definitional differences, comparisons between 2008-2023 SAHIE estimates and earlier years are not recommended. Guidance for making statistical comparisons using SAHIE is available on the SAHIE FAQ page.

For 2008-2023, SAHIE publishes STATE and COUNTY estimates of population with and without health insurance coverage, along with measures of uncertainty, for the full cross-classification of:

5 age categories: 0-64, 18-64, 21-64, 40-64, and 50-64
3 sex categories: both sexes, male, and female
6 income categories: all incomes, as well as income-to-poverty ratio (IPR) categories 0-138%, 0-200%, 0-250%, 0-400%, and 138-400% of the poverty threshold
8 races/ethnicities (for states only): all races/ethnicities, White alone (not Hispanic or Latino), Black or African American alone (not Hispanic or Latino), American Indian and Alaska Native alone (not Hispanic or Latino), Asian alone (not Hispanic or Latino), Native Hawaiian and Other Pacific Islander alone (not Hispanic or Latino), Two or More Races (not Hispanic or Latino), Hispanic or Latino (any race).

In addition, estimates for age category 0-18 by the income categories listed above are published.

Each year’s estimates are adjusted so that, before rounding, the county estimates sum to their respective state totals and for key demographics the state estimates sum to the national ACS numbers insured and uninsured.

The remainder of this page provides a summary of the demographic and income model methodology used for the SAHIE estimates. Additional methodological detail is available at the below individual links.

Overview

The SAHIE program produces model-based estimates of health insurance coverage for demographic groups within counties and states. We publish state estimates by sex (female, male, both), race/ethnicity (all races, non-Hispanic White, non-Hispanic Black, non-Hispanic American Indian and Alaska Native, non-Hispanic Asian, non-Hispanic Native Hawaiian and Other Pacific Islander, non-Hispanic Two or More Races, Hispanic or Latino), age (0-18, under 65, 18-64, 21-64, 40-64, 50-64), and income. We publish county estimates by the same sex, age and income groups, but not by race. Income groups are defined by the income-to-poverty ratio (IPR) – the ratio of family income to the appropriate federal poverty level. We produce estimates for all incomes, and for the IPR groups: 0-138%, 0-200%, 0-250%, 0-400%, and, beginning in 2012, 138-400% IPR.

Model Summary

For estimation, SAHIE uses statistical models that combine survey data from the American Community Survey (ACS) with administrative records data and Census 2010 data. The models are "area-level" models because we use survey estimates and administrative data at certain levels of aggregation, rather than individual survey and administrative records. Our modeling approach is similar to that of common models developed for small area estimation, but with some additional complexities.

The published estimates are based on aggregates of modeled demographic groups. For states, we model at a base level defined by the full cross-classification of: five age groups, seven race/ethnicity groups, both sexes, and five income groups. For counties, we model at a base level defined by the same age, sex, and income groups.

We use estimates from the Census Bureau’s Population Estimates Program for the population in groups defined for state by age by race/ethnicity by sex, and for county by age by sex. We treat these populations as known. Within each of these groups, the number with health insurance coverage in any of the income categories is given by that population multiplied by two unknown proportions to be estimated: the proportion in the income category and the proportion insured within that income category. The models have two largely distinct parts - an "income part" and an "insurance part" - that correspond to these proportions. We use survey estimates of the proportions in the income groups and of the proportions insured within those groups. We assume these survey estimates are unbiased and follow known distributions. We also assume functional forms for the variances of the survey estimates that involve parameters that are estimated. We treat supplemental variables that predict one or both of the unknown income and insurance proportions in one of two ways.

Some of these variables are used as fixed predictors in a regression model. There is a regression component in both the income and insurance parts of the model. In each case, a transformation of the proportion is predicted by a linear combination of fixed predictors. Some of these predictors are categorical variables that define the demographic groups we model. Others are continuous. The continuous fixed predictors include variables regarding employment, educational attainment, and demographic population.

We also utilize random continuous predictors, which include data from 5-year ACS, Internal Revenue Service, Supplemental Nutrition Assistance Program, and Medicaid/Children’s Health Insurance Program. These are not fixed predictors in the model. Instead, we treat them as random, in a way similar to survey estimates, but not as unbiased estimators of the numbers. Instead, we assume that their expectations are linear functions of the number in an income group or the number insured within an income group. We typically assume they are normally distributed with variances that depend on unknown parameters.

We formulate the model in a Bayesian framework and report the posterior means as the point estimates. We use the posterior means and variances together with a normal approximation to calculate symmetric 90-percent confidence intervals, and report their half-widths as the margins of error.

Controlling to National Estimates

We control the estimates to be consistent with specified national totals. As a result, when the estimates are summed over the states, they match specified national ACS survey estimates. We match the national estimates for both the number insured and the number uninsured for the following groups:

Ages 0-18
Ages 0-18, IPR 0-138%
Ages 0-18, IPR 0-200%
Ages 0-18, IPR 0-400%
Ages 0-64
Ages 0-64, IPR 0-138%
Ages 0-64, IPR 0-200%
Ages 0-64, IPR 0-400%
Ages 0-64, Hispanic
Ages 0-64, Black non-Hispanic
Ages 0-64, White non-Hispanic
Ages 0-64, American Indian and Alaska Native non-Hispanic
Ages 0-64, Asian non-Hispanic
Ages 0-64, Native Hawaiian and Other Pacific Islander non-Hispanic
Ages 0-64, Two or More Races non-Hispanic

Our margin of error estimates take into account that these controls are not without error.

We also control the estimates from the SAHIE county model to the state small area estimates of the number insured and uninsured by demographic group. As a result, there is arithmetic consistency across the geographic levels for many of the demographic groups.

Related Information

New Race Groups Added to SAHIE

Methodology

Demographic and Income Model Methodology (2005-2007)

Page Last Revised - July 22, 2025