Data Processing & Treatment of Nonresponse

Skip Navigation

Data Processing & Treatment of Nonresponse

To prepare economic census data for release to the public, the data are processed in three primary ways:

Data Edits - to detect reporting errors and other problems
Nonresponse Imputation - to estimate missing data
Tabulation and Analytical Processing - to tabulate and analyze summary data and prevent disclosure of respondents’ identities

Data Edits

Data captured in an economic census must be edited to identify and correct reporting errors. The data also must be adjusted to account for missing items and for businesses that do not respond. Data edits detect and validate data by considering factors such as proper classification for a given record, historical reporting for the record and industry/geographic ratios and averages.

The first step of the data editing process is classification. To assign a valid kind-of-business or industry classification code to the establishment, computer programs subject the respondents’ responses to pre-specified items of a series of data edit programs. The specific items used for classification depend on the census report forms and include:

self-designated kind-of-business check-box classifications,
responses to product lines sold by a retail establishment,
products manufactured by a plant and
entries written in by the respondent explaining the establishment’s activities.

If critical information is missing, the record is flagged and fixed by analysts before further processing occurs.

If all critical information is available, the classification code is assigned automatically. After classification codes are assigned, a "verification" operation is performed to validate the industry, geography and ZIP Codes.

After an establishment has been assigned a valid industry code, the data edits further evaluate the response data for consistency and validity—for example, assuring that employment data are consistent with payroll or sales/receipts data. Response data is always evaluated by industry; in some cases, type of operation or tax-exempt status is also taken into account. Additional checks compare current year data to data reported in previous censuses or from administrative sources.

Nonresponse Imputation

Nonresponse is handled by estimating or imputing missing data. Imputation is defined as the replacement of a missing or incorrectly reported item with another value derived from logical edits or statistical procedures.

There are two types of nonresponse:

Unit nonresponse occurs when an eligible unit fails to provide sufficient data to be classified as a response.
Item nonresponse occurs when some but not all data have been collected for the respondent.

Title 13 of the United States Code states that respondents are required to answer all questions to the best of their ability. Incomplete forms, unclear or erroneous data, or nonresponse can affect data analyses and the quality of the published data.

Problems that arise from missing data include:

Analyses of tables with missing data are more problematic than analyses of complete tables.
Lack of consistency among similar analyses due to differing treatments of missing data.
Inappropriate imputation models, if the reason for nonresponse is dependent on the data item collected and not random.

Note: If a data cell contains too much imputation, the value will be suppressed with an ‘S’ flag.

Tabulation and Analytical Processing

Individual establishment records are tabulated in different ways based on data product and analytical needs. Tabulations include data summed by industry, specified geographic areas, establishment-size, products produced, materials used, fuels used and product lines sold.

During macro-analysis: