There are two noteworthy changes to employment income imputation in the 2018 Survey of Income and Program Participation (SIPP). First, self-employed respondents are now able to report an income range if they do not know or refused to report their business profits. Second, employment income is now primarily imputed by Sequential Regression Multivariate Imputation (SRMI).

Income Ranges

Since wave 1 of the 2014 SIPP Panel, respondents that did not know or refused to report their earnings were asked to report a range for their wage and salary earnings. For example, a respondent that did not know her annual salary would be asked if her income was (1) less than $20,000, (2) between $20,000 and $34,999, (3) between 35,000 and $49,999, or (4) $50,000 or more. In the 2014 SIPP Panel, respondents that reported a range were then imputed an income value within this range via hot deck, and the status flag associated with this variable was assigned a value of 6.

Beginning with the 2018 SIPP Panel, self-employed respondents who did not know or refused to report their profits were asked whether their profits were (1) less than $7,000, (2) between $7,000 and $19,999, (3) between $20,000 and $39,999, or (4) $40,000 or more. This range follow-up question was only asked when self-employed respondents reported that their businesses ran at a profit.

Missing Data Imputation

A primary limitation of the current hot-deck imputation algorithm—matching nonrespondents with demographically identical donors based upon a set of covariates (e.g., sex, age, and education)—is that the number of matching covariates is inherently limited by the curse of dimensionality. The addition of matching variables to the hot-deck algorithm exponentially increases the number of cells and, thus, substantially decreases the likelihood of being able to assign a donor. Unfortunately, imputed responses and excluded variables are unlikely to be as strongly correlated as the excluded variables are to reported responses. Covariates excluded from the hot-deck matching algorithm are only correlated with the imputed values to the extent that these excluded variables are correlated with covariates included in the hot-deck matching algorithm.

SRMI imputation addresses the issue in two ways. First, SRMI allows for the inclusion of far more covariates than hot-deck methodologies. Because SRMI does not suffer from the curse of dimensionality, there is no practical limit to the number of covariates that can be included in the model. Second, SRMI as applied to SIPP allows us to incorporate administrative measures of earnings into our imputation procedure. We primarily include contemporaneous and lagged earnings from the Social Security Administration’s Detailed Earnings Record, which include all income reported on W-2 and 1040-SE tax forms. The administrative data offer direct insight into non-respondents’ total earnings and allow us to include this information in our imputation procedure.

It is worth noting that the 2018 SIPP is not the first application of SRMI imputation in the SIPP. Starting with wave one of the 2014 SIPP, SRMI imputation has been used to impute topic flag variables (e.g., the presence of a job). And Benedetto et al (2015) find that SRMI improves upon hot-deck methodologies.

The Implementation of SRMI

For each job in the SIPP missing any component of earnings we impute the total annual labor income for the calendar year using SRMI. Since a job can have multiple income sources (e.g., hourly earnings and tips) and changes in earnings during the calendar year, we must decide how to allocate the annual income imputed by SRMI. The job of allocating income is further complicated by the fact that respondents may partially report income; for example, a waiter may report his hourly wage but not his tips.

Our process for allocating the SRMI imputed income is as follows. We first allow the hot-deck procedure to impute income as it did in the 2014 SIPP panel. We then sum all hot-deck imputed income on the job and separately sum all reported income on the job. Next, we subtract the job’s sum of reported income from the SRMI imputed income. We then divide the difference by the sum of the hot-deck imputed income. This process then provides us with a scaling factor, which when multiplied by each hot-deck imputed income amount will guarantee the sum of reported and scaled hot-deck income on each job will equal to the SRMI imputed value.

We assign a value of 4 (SRMI) to the status flag associated with any scaled hot-deck imputed income variable. We do not scale any reported income amounts.

SRMI Scaling Example

Suppose a respondent reported $98,000 in annual earnings on her job and additionally was imputed to have a $1,000 bonus by hot-deck imputation. Assume she was also imputed to earn $100,000 on this job by SRMI. The difference between reported and imputed earnings is then $2,000, implying the scaling factor is 2 ($2,000/$1,000). Therefore, this respondent will have reported $98,000 in earnings and be imputed a $2,000 bonus, for a total annual labor income of $100,000 on this job. Below is an equation showing how we calculate the scaling factor for this example:

SRMI Reasonableness Constraints

The SRMI procedure can be applied to all earnings variables, but we impose a set of reasonableness constraints to ensure the imputed data are not distorted. When any of these constraints bind, total annual earnings on that job will deviate from SRMI imputed annual earnings on that job. For example, if the SRMI implies that a within year increase in hourly wage exceeding 100 percent, we use the hot-deck imputed values of earnings instead. The hot-deck imputation procedure implemented in the 2014 panel also prohibits increases in hourly earnings exceeding 100 percent. Therefore, some earnings will still be assigned by hot-deck and assigned an allocation flag of 2.

A second example relates to cases when respondents report their earnings within a range. The hot-deck imputation procedure implemented in the 2014 panel enforces that imputed values fall within any reported range. We attempt to apply our SRMI imputation algorithm to these hot-deck imputed amounts. If the scaled amount falls within the reported range, we keep the scaled value and assign an allocation flag value of 4 (SRMI). If the scaled amount would fall outside the reported range, we set the imputed income value to the minimum or maximum of the range and set the allocation flag to 6

Page Last Revised - March 24, 2022

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top