U.S. flag

An official website of the United States government

Skip Header


Statistical Quality Standard F2: Providing Documentation to Support Transparency in Information Products

Purpose: The purpose of this standard is to specify the documentation that must be readily accessible to the public to ensure transparency and reproducibility in information products released by the Census Bureau.

The documentation required by this standard aims to provide sufficient transparency into the Census Bureau’s information products so that qualified users can reproduce the estimates and results in the products. However, federal law (e.g., Title 13, Title 15, and Title 26) and Census Bureau policies require safeguarding the confidentiality of protected information or administratively restricted information. Therefore, complete transparency and reproducibility may not always be possible. At a minimum, the documentation will allow users to assess the accuracy and reliability of the estimates and results in the Census Bureau’s information products.

Note: Statistical Quality Standard F1, Releasing Information Products, addresses the required documentation and metadata to describe any serious data quality problems and the likely effects of the problems on the data and estimates in the Census Bureau’s information products.

Scope: The Census Bureau’s statistical quality standards apply to all information products released by the Census Bureau and the activities that generate those products, including products released to the public, sponsors, joint partners, or other customers. All Census Bureau employees and Special Sworn Status individuals must comply with these standards; this includes contractors and other individuals who receive Census Bureau funding to develop and release Census Bureau information products.

Exclusions:
The global exclusions to the standards are listed in the Preface. No additional exclusions apply to this standard.

Key Terms: Administratively restricted information, data program, information product, protected information, qualified user, readily accessible, reproducibility, and transparency.

Requirement F2-1: Documentation that would breach the confidentiality of protected information or administratively restricted information or that would violate data-use agreements with other agencies must not be released. (See Statistical Quality Standard S1, Protecting Confidentiality.)

Requirement F2-2: Documentation must be readily accessible in sufficient detail to allow qualified users to understand and analyze the information and to reproduce (within the constraints of confidentiality requirements) and evaluate the results. The documentation must be made readily accessible by doing one or more of the following:

  1. Including the documentation in the information product if it is necessary for readers to understand the results.
  2. Referencing the full methodological documentation in the information product (e.g., providing a URL) and publishing the documentation on the Census Bureau’s Internet Web site.
  3. Delivering the full methodological documentation to the sponsors of reimbursable programs or providing them with a URL to the documentation.

Note:The Census Bureau Geospatial Product Metadata Standard (GPMS), and the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) provide additional requirements for geospatial products.

Sub-Requirement F2-2.1: Descriptions of the data program must be readily accessible.

Examples of information that describes the data program include:

  • The purpose of the program (e.g., survey, census, evaluation study, or research).
  • The organizational sponsor(s) of the program.
  • The organization that conducted the program.
  • The data source (e.g., organization or agency) and the database or systems from which the data are drawn for administrative records data.
  • The universe of inference or target population for the program.

Sub-Requirement F2-2.2: Descriptions of the concepts, variables, and classifications that underlie the data must be readily accessible.

Examples of concepts, variables, and classifications that underlie the data include:

  • Definitions of the primary concepts being measured.
  • The wording of questions asked in surveys or censuses.
  • Identification of the key variables.
  • Descriptions of the concepts underlying all variables.
  • Geographic levels of the data.
  • The reference dates for the data and for the geographic levels.
  • Descriptions of any derived measures.

Sub-Requirement F2-2.3: Descriptions of the methodology, including the methods used to collect and process the data and to produce estimates, must be readily accessible.

Examples of documentation of the methodology include:

  • Discussion of methods employed to ensure data quality.
  • Quality profiles. (See the Census Bureau Guideline on Quality Profiles.)
  • Documentation of pretesting of the data collection instruments, including qualitative studies.
  • Source and accuracy statement.
  • Description of the sampling frame.
  • Description of the sample design.
  • The size of the sample.
  • Information on eligibility criteria and screening procedures.
  • Description of sample weights, including adjustments for nonresponse.
  • The mode and methods used to collect the data.
  • The dates of data collection.
  • Description of any bounding methods used to control telescoping.
  • Description of estimation procedures, including weighting, editing, and imputation methods.
  • Reasons for not imputing the data when imputation for item nonresponse is not carried out.
  • Description of how to calculate variance estimates.
  • Discussion of potential nonsampling errors (e.g., nonresponse, coverage, processing, and measurement).
  • Discussion of the methods to approximate the standard errors of derived statistics.
  • Description of any substantial changes in procedures or methodology over time and the known impact on the data.
  • References to methodological documentation maintained by the source organization supplying administrative records data.
  • Model description, including assumptions and type of model.
  • Equations or algorithms used to generate estimates.
  • Description of seasonal adjustment methods. (See the Census Bureau Guideline on Seasonal Adjustment Diagnostics.)
  • Description of small area estimation methods.
  • Any limitations or data quality problems affecting the estimates or projections.
  • Descriptions of known data anomalies and corrective actions.

Sub-Requirement F2-2.3.1: Measures and indicators of the quality of the data must be readily accessible.

Examples of measures and indicators of the quality of the data include:

  • The disposition of sample cases (e.g., numbers of interviewed cases, ineligible cases, and nonresponding cases).
  • Unit response rates or quantity response rates.
  • Item response rates, item allocation rates, total quantity response rates, or quantity response rates for key data items.
  • Rates for the types of nonresponse (e.g., refusal, unable to locate, no one home, temporarily absent, language problem, insufficient data, and undeliverable as addressed).
  • Coverage ratios.
  • Indicators of the statistical precision of the estimates (e.g., estimates of sampling variances, standard errors, coefficients of variation, or confidence intervals).
  • Coverage of the target population by the set of administrative records.
  • The proportion of administrative records that have missing data items or that contain invalid data for key variables.
  • The proportion of data items with edit changes because the data items were invalid or otherwise required changes.
  • The proportion of records lost from the analysis or estimate due to nonmatches when linking data sets.
  • Effects on the estimates related to coverage issues, nonmatches in record linking, and missing data items in surveys, censuses, or administrative records.
  • Model diagnostics (e.g., goodness of fit, coefficient of variation, and percent reduction in confidence interval of the direct estimates).

Note: Statistical Quality Standard D3, Producing Measures and Indicators of Nonsampling Error, contains requirements on producing measures and indicators of nonsampling error.

Sub-Requirement F2-2.3.2: The methodology and results of evaluations or studies of the quality of the data must be readily accessible.

Examples of evaluations or studies of the quality of the data include:

  • Nonresponse bias analyses.
  • Evaluation studies (e.g., evaluation studies of response error, interviewer variance, respondent debriefing, record check or validation, and mode effects).
  • Response analysis surveys.
  • Comparisons with independent sources, if available.
  • Match analyses.
  • Reconciliations (e.g., a comparison of import and export data).
  • Periodic summaries of quality control results (e.g., interviewer quality control (QC) results and error rates measured by data entry QC and coding QC).

Note: Results of routine reviews and verifications need not be readily accessible unless needed for data users to assess the quality of the information product.

Sub-Requirement F2-2.4: Documentation of public-use data files must be readily accessible in sufficient detail to allow a qualified user to understand and work with the files.

Examples of documentation of public-use data files include:

  • File description.
  • File format (e.g., SAS file or text file).
  • Variable names and descriptions (e.g., data dictionary or record layout).
  • Data type for each variable (e.g., numeric, alphanumeric, and length).
  • Description of variables used to uniquely identify records in the data file.
  • Description of flags to indicate missing and imputed items.

Page Last Revised - December 16, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header