U.S. flag

An official website of the United States government

Skip Header


Methodology

2016
  • 2016
  • 2015
  • 2014
2016

Sources of the Data

Letters were used to inform business owners of the business’s legal requirement to complete the Annual Survey of Entrepreneurs (ASE) and provided survey access instructions for reporting electronically, with no paper reporting option.  These letters were mailed to a random sample of employer businesses selected from a list of all firms operating during 2016 with receipts of $1,000 or more except those classified in the following NAICS industries:

  • Crop and Animal Production (NAICS 111 and 112)
  • Rail Transportation (NAICS 482)
  • Postal Service (NAICS 491)
  • Monetary Authorities-Central Bank (NAICS 521)
  • Funds, Trusts, and Other Financial Vehicles (NAICS 525)
  • Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)
  • Private Households (NAICS 814)
  • Public Administration (NAICS 92)

The 2012 North American Industry Classification System (NAICS) is the standard used by Federal statistical agencies in classifying business establishments.

The list of all firms (or universe) was compiled from a combination of 2016 business tax returns and data collected on other economic census reports. The Census Bureau obtained electronic files from the Internal Revenue Service (IRS) for all companies reporting any business activity on any one of the following 2016 IRS tax forms:

  • 1065, "U.S. Return of Partnership Income"
  • any one of the 1120 corporation tax forms
  • 941, "Employer's Quarterly Federal Tax Return"
  • 944, "Employer's Annual Federal Tax Return"

The IRS provided certain identification, classification, and measurement data for businesses filing those forms.

For most firms with paid employees, the Census Bureau also obtained employment, payroll, and industry classification data for each establishment from administrative records. These administrative records are either taken directly from the Business Register, the listing of all businesses in the United States, used as a frame by many economic surveys, or from the County Business Patterns series, an annual series from the Census Bureau that provides subnational economic data by industry. Detailed receipts data are not available from administrative records data. For the ASE, receipts data are based on administrative payroll and payroll to receipts ratios.

Content for the ASE includes questions from the 2012 Survey of Business Owners (SBO) (form SBO-1). The Census Bureau’s collaboration with the survey sponsors, the Ewing Marion Kauffman Foundation and the Minority Business Development Agency (MBDA), resulted in the expanded collection of the businesses’ sources of capital and access to financing, and a new module of questions for each survey year based on relevant economic topics.  The module selected for the 2016 ASE focused on business banking relationships, practices in obtaining and using advice from professional and non professional sources, and the effect of regulations on business growth. A summary of the module content for each survey year is shown in the table below.


Survey Year

Module Content Topic

2014 ASE
  • Innovation
  • Research and Development
2015 ASE
  • Management Practices
  • Record Keeping Practices
2016 ASE
  • Business Banking Relationships
  • Business Advice and Planning
  • Regulations


The information collection maximized response through the following means:

  • Mailing materials that emphasize the mandatory and confidential nature of census reports, as provided by Title 13, United States Code;
  • Designing effective electronic reporting instruments and instructions;
  • Providing respondents a preview of the survey with the option to print the survey worksheet to aid in data collection from multiple business owners prior to reporting electronically.  Examples of the initial letter and the ASE worksheet are available from the Information for Respondents/Questionnaires page.
  • Offering a toll-free telephone number for companies that have questions or need assistance in completing the electronic survey;
  • Conducting systematic mail follow-ups to nonrespondents;
  • Conducting nonresponse bias analysis if unit response rate falls below 60 percent.


Response data were geographically coded and edited in preparation for tabulation. The data editing process applied corrections in batch or manually when needed using standard procedures.

The data were then tabulated by the 2012 NAICS, subjected to further data analysis, and the resulting corrections applied to individual data records. Tabulations were then produced for the final published results available through American FactFinder (AFF), the Census Bureau's online, self-service data access tool. Later, the data was transitioned from AFF to the more centralized Census data platform, data.census.gov. 

A more detailed examination of census methodology is presented in the History of the Economic Census.

 

Industry Classification of Firms

A firm is a business organization or entity consisting of one domestic establishment (location) or more under common ownership or control. All establishments are included as part of the owning or controlling firm. For the economic census, the terms "firm" and "company" are synonymous.

The industry classifications for all firms are based on the 2012 North American Industry Classification System (NAICS).

Firms with more than one domestic establishment are counted in each industry and geographic area in which they operate, but only once in the total for all sectors and the totals at the national and state levels. The primary source of industry classification is derived from data collected through the economic census or through other Census Bureau surveys. When this is not available, the Census Bureau uses a hierarchy of administrative record sources to assign a NAICS code, including classifications from the Bureau of Labor Statistics, business birth information, and self-assigned codes from income tax records.

Precautions in Analyzing and Interpreting Data

All survey and census results contain measurement errors and may contain sampling errors. Information about these potential errors is provided or referenced with the data or the source of the data. The Census Bureau recommends that data users incorporate this information into their analyses as these errors could impact inferences. Researchers analyzing data to create their own estimates are responsible for the validity of those estimates and should not cite the Census Bureau as the source of the estimates but only as the source of the core data.

Please contact the Census Bureau for more detailed information and interpretation of the sampling and nonsampling errors.

Comparability to Other Surveys

Many economic surveys from the Census Bureau directly collect receipts, payroll, and employment, or are conducted on a quinquennial basis to coincide with collection of these data from the economic census. The ASE does not directly collect receipts, payroll, or employment. Therefore, the payroll and employment figures are derived from administrative sources and may not be directly comparable to the values reported in other surveys.

In addition, administrative receipts are not available in non-economic census years. In order to estimate receipts for the ASE, an imputation model was developed using administrative payroll and data from the 2012 Economic Census. These imputed receipts are used to calculate model-based estimates for receipt totals throughout the publication.

Caution should be exercised when making any comparisons between ASE and another survey.

Basis of Reporting

The ASE is conducted on a company or firm basis rather than an establishment basis. A company or firm is a business consisting of one or more domestic establishments under its ownership or control at the end of 2016. One report is collected from each company or firm, and administrative records of the establishments that make up the firm are used to assign the company to different tabulation categories.

Sampling and Estimation Methodologies

Sampling. The Census Bureau uses the following sources of information to estimate the probability that a business was minority- or women-owned:

  • Administrative data from the Social Security Administration
  • Lists of minority- and women-owned businesses published in syndicated magazines, located on the Internet, or disseminated by trade or special interest groups
  • Word strings in the company name indicating possible minority ownership (derived from 2012 survey responses)
  • Racial distributions for various state-industry classes (derived from 2012 survey responses) and racial distributions for various ZIP Codes
  • Gender, ethnicity, race, and veteran status responses of a single-owner business to a previous SBO, a previous American Community Survey (ACS), or the 2010 Decennial Census


These probabilities were then used to place each firm in the ASE universe in one of nine frames for sampling:

  • American Indian
  • Asian
  • Black or African American
  • Hispanic
  • Non-Hispanic white men
  • Native Hawaiian and Other Pacific Islander
  • Other (a different race was supplied as a write-in to another source)
  • Publicly owned
  • Women

The ASE universe was stratified by frame, geographic area, and the number of years the firm was in business. The geographic area stratification variable included a combination of state and the 50 most populous metropolitan statistical areas (MSA) as of 2014. The MSAs are listed below. If a business operated in multiple states or large MSAs, its geographical area was set to a multi-state category. If it operated in only one large MSA, its geographical area was set to that large MSA. Otherwise, the sole state that the business operated in was used as its geographical area. Only companies that had paid employees were sampled. The Census Bureau selected large companies with certainty.'These companies were selected based on volume of sales, payroll, or number of paid employees. All certainty cases were sure to be selected and represented only themselves (i.e., had a selection probability of one and a sampling weight of one). The certainty cutoffs varied by sampling stratum, and each stratum was sampled at varying rates, depending on the number of firms in a particular industry in a particular state. The remaining universe was subjected to stratified systematic random sampling.

The ASE was designed to sample the same in-scope businesses each survey year.  Approximately 81.2 percent of the businesses selected for the 2014 ASE continued for the 2016 ASE.  Businesses that closed or otherwise determined to be out of scope were removed from the sample.  The remaining sample consisted of 8.7 percent of businesses previously selected in 2015, and 10.1 percent newly selected in 2016. The number of new businesses selected was based on the number of businesses that were removed. New businesses were selected using the same method as the 2014 ASE sample selection.

Each firm selected into the sample was asked the percentage of ownership, gender, ethnicity, race, and veteran status for up to four persons owning the largest percentages in the business. These firms were then asked additional characteristic questions (e.g., age, education level).

50 Most Populous MSAs based on the 2014 Annual Estimates of the Resident Population: April 1, 2010 to July 1, 2014

MSA Code

TITLE

12060

Atlanta-Sandy Springs-Roswell, GA Metro Area

12420

Austin-Round Rock, TX Metro Area

12580

Baltimore-Columbia-Towson, MD Metro Area

13820

Birmingham-Hoover, AL Metro Area

14460

Boston-Cambridge-Newton, MA-NH Metro Area

15380

Buffalo-Cheektowaga-Niagara Falls, NY Metro Area

16740

Charlotte-Concord-Gastonia, NC-SC Metro Area

16980

Chicago-Naperville-Elgin, IL-IN-WI Metro Area

17140

Cincinnati, OH-KY-IN Metro Area

17460

Cleveland-Elyria, OH Metro Area

18140

Columbus, OH Metro Area

19100

Dallas-Fort Worth-Arlington, TX Metro Area

19740

Denver-Aurora-Lakewood, CO Metro Area

19820

Detroit-Warren-Dearborn, MI Metro Area

25540

Hartford-West Hartford-East Hartford, CT Metro Area

26420

Houston-The Woodlands-Sugar Land, TX Metro Area

26900

Indianapolis-Carmel-Anderson, IN Metro Area

27260

Jacksonville, FL Metro Area

28140

Kansas City, MO-KS Metro Area

29820

Las Vegas-Henderson-Paradise, NV Metro Area

31080

Los Angeles-Long Beach-Anaheim, CA Metro Area

31140

Louisville/Jefferson County, KY-IN Metro Area

32820

Memphis, TN-MS-AR Metro Area

33100

Miami-Fort Lauderdale-West Palm Beach, FL Metro Area

33340

Milwaukee-Waukesha-West Allis, WI Metro Area

33460

Minneapolis-St. Paul-Bloomington, MN-WI Metro Area

34980

Nashville-Davidson--Murfreesboro--Franklin, TN Metro Area

35380

New Orleans-Metairie, LA Metro Area

35620

New York-Newark-Jersey City, NY-NJ-PA Metro Area

36420

Oklahoma City, OK Metro Area

36740

Orlando-Kissimmee-Sanford, FL Metro Area

37980

Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area

38060

Phoenix-Mesa-Scottsdale, AZ Metro Area

38300

Pittsburgh, PA Metro Area

38900

Portland-Vancouver-Hillsboro, OR-WA Metro Area

39300

Providence-Warwick, RI-MA Metro Area

39580

Raleigh, NC Metro Area

40060

Richmond, VA Metro Area

40140

Riverside-San Bernardino-Ontario, CA Metro Area

40900

Sacramento--Roseville--Arden-Arcade, CA Metro Area

41180

St. Louis, MO-IL Metro Area

41620

Salt Lake City, UT Metro Area

41700

San Antonio-New Braunfels, TX Metro Area

41740

San Diego-Carlsbad, CA Metro Area

41860

San Francisco-Oakland-Hayward, CA Metro Area

41940

San Jose-Sunnyvale-Santa Clara, CA Metro Area

42660

Seattle-Tacoma-Bellevue, WA Metro Area

45300

Tampa-St. Petersburg-Clearwater, FL Metro Area

47260

Virginia Beach-Norfolk-Newport News, VA-NC Metro Area

47900

Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area

Tabulation. Business ownership is defined as having 51 percent or more of the stock or equity in the business and is categorized by:

All firms classifiable by gender, ethnicity, race, and veteran status

  • Gender
    • Female-owned
    • Male-owned
    • Equally male-/female-owned
  • Ethnicity
    • Hispanic
    • Equally Hispanic/non-Hispanic
    • Non-Hispanic
  • Race
    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Minority
    • Equally minority/nonminority
    • Nonminority
  • Veteran status
    • Veteran-owned
    • Equally veteran-/nonveteran-owned
    • Nonveteran-owned
  • Publicly held and other firms not classifiable by gender, ethnicity, race, and veteran status

Businesses could be tabulated in more than one racial group. This can result because:

  1. The sole owner was reported to be of more than one race.
  2. The majority owner was reported to be of more than one race.
  3.  A majority combination of owners was reported to be of more than one race.


The detail may not add to the total or subgroup total because a Hispanic or Latino firm may be of any race, and because a firm could be tabulated in more than one racial group. For example, if a firm responded as both Chinese and Black majority owned, the firm would be included in the detailed Asian and Black estimates, but would only be counted once toward the higher level all firms' estimates.

For the tabulations by gender, ethnicity, race, and veteran status, the data for each firm in the ASE sample were weighted by the reciprocal of the firm's probability of selection.

It is important to note that while a business’s eligibility to enter the sample and be included in tabulations was determined using administrative data from 2015, the actual tabulations use administrative data from 2016 whenever they are available. This was done due to time constraints on the availability of administrative data. This can result in unexpected results such as businesses being classified as employers while having zero payroll and employment.

Modeled Receipts

For most firms, administrative receipts were not available. Therefore, receipts were modeled using a combination of 2012 Economic Census data and data compiled by the Bureau of Economic Analysis (BEA). Payroll and receipts data from the 2012 Economic Census were used as the basis. From the BEA, industry-level output data were used as well as industry-level data on wages and salary.

To calculate the modeled receipts, establishment-level ratios of payroll to receipts were constructed using payroll and receipts from the 2012 Economic Census. The ratios are calculated by using one or more of the following categories: 4-digit NAICS code, state, and establishment type (single-unit or multi-unit) and size of payroll. The ratios were then adjusted using the industry-level gross output and wages and salary statistics from the BEA for 2013 through 2015.

The modeled receipts were created by multiplying the establishment’s administrative payroll by the ratio calculated above. Estimates for receipts were calculated as weighted totals, using the same weights as used for all other variables.

Reliability of Estimates

The figures shown in these datasets are, in part, estimated from a sample and will differ from the figures that would have been obtained from a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the sampling errors and nonsampling errors measured, but also on the nonsampling errors not explicitly measured. For particular estimates, the total error may considerably exceed the measured error. In particular, the estimates of receipts are subject to nonsampling errors, due to the model, that are likely to be much larger than the stated errors. The following is a description of the sampling and nonsampling errors associated with this tabulation.

Sampling Variability. The particular sample used for this survey is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The relative standard error and standard error are measures of the variability among the estimates from all possible samples. The estimated relative standard errors and estimated standard errors presented in the tables estimate the sampling variability, and thus measure the precision with which an estimate from the particular sample selected for this survey approximates the average result of all possible samples. Relative standard errors and standard errors are applicable only to those published cells in which sample cases are tabulated. A relative standard error is an expression of the standard error as a percent of the quantity being estimated.

The sample estimate and an estimate of its relative standard error can be used to estimate the standard error and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:

  1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples.
  2. Approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average value of all possible samples.

Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.

Example of a confidence interval. Suppose the estimate is 51,707 and the estimated relative standard error is 2 percent. The standard error is then 2 percent of 51,707 or 1,034. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 1,034 = 1,654, the confidence interval in this example is 51,707 + or - 1,654 or the range 50,053 to 53,361.

For the Characteristics of Businesses and Characteristics of Business Owners datasets, some data are expressed as percentages with standard errors rather than relative standard errors as indicated above. Construction of the confidence interval is illustrated by the following example:

Example of a confidence interval for percentage data. Suppose the estimate is 76.9 and the estimated standard error is 0.4 percent. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 0.4 = 0.64, the confidence interval in this example is 76.9 + or - 0.64 or the range 76.26 to 77.54.

Nonsampling Errors. All surveys and censuses are subject to nonsampling errors. Nonsampling errors are attributable from various sources, including the inability to obtain information for all cases in the universe, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.

While explicit measures of the effects of these nonsampling errors are not available, adjustments are made to the published relative standard errors to account for errors associated with imputation of missing data. It is believed that most of the important operational and data errors were detected and corrected through an automated data edit designed to review the data for reasonableness and consistency. Quality control techniques were used to verify that operating procedures were carried out as specified.

Unpublished Estimates. Some unpublished estimates can be derived directly from datasets by subtracting published estimates from their respective totals. However, the estimates obtained by such subtraction would be subject to poor response, high sampling variability, or other factors that may make them potentially misleading. Individuals who use estimates in datasets to create new estimates should cite the Census Bureau as the source of only the original estimates.

Treatment of Nonresponse

Approximately 64.7 percent of the 289,937 businesses in the ASE sample responded to the survey. For the 2016 ASE, 67.9 percent of businesses submitted an online questionnaire (the only response mode available), but 4.7 percent of submissions did not contain enough information to be considered a response for the estimates by gender, ethnicity, race and veteran status.

Of the 2016 ASE nonrespondents, approximately 52.1 percent responded to the 2012 SBO, the 2014 ASE, or the 2015 ASE, and had prior survey response data substituted for missing 2016 ASE responses to determine the gender, ethnicity, race, and veteran status of the business owner or majority business owners. The remaining nonrespondents’ gender, ethnicity, race and veteran status were imputed from donor respondents in the same sampling frame with similar characteristics (industry, legal form of organization, geography). Because the assignment of businesses to sampling frames relies heavily on administrative data, and there is a high level of agreement between sampling frame assignment and tabulated race or ethnicity for responding firms, the donor imputations are considered to be reliable. Estimates of sampling variability are adjusted to account for nonresponse. Estimates with high error (for example, relative standard error for sales or receipts of 50 percent or more) are suppressed.

Overall, imputed data accounted for approximately 17.7 percent of the firm count estimates by gender, ethnicity, race, and veteran status and approximately 24.4 percent of the estimates of sales.

Firm Size and Years in Business Categories

The firm size categories, both by receipts and employment, are based on the total nationwide receipts and/or employment of the firm.

The receipts and employment of a multi-unit firm are determined by summing the receipts and employment, respectively, of all associated establishments. The receipts size and employment size of a firm are determined by the summed revenue or employment of all associated establishments. The employment size group "0" includes firms for which no associated establishments reported paid employees in the mid-March pay period, but paid employees at some time during the year.

Receipts size and employment size are determined for the entire company. Hence, counterintuitive results are possible, for example, only 100 employees in a category of firms with 500 employees or more in a particular industry.

Data by receipts size of firm are presented by the following receipts size categories:

  • All firms
  • Firms with sales/receipts of less than $10,000
  • Firms with sales/receipts of $10,000 to $49,999
  • Firms with sales/receipts of $50,000 to $99,999
  • Firms with sales/receipts of $100,000 to $249,999
  • Firms with sales/receipts of $250,000 to $499,999
  • Firms with sales/receipts of $500,000 to $999,999
  • Firms with sales/receipts of $1,000,000 or more

Data by employment size of firm are presented by the following employment size categories:

  • All firms
  • Firms with no employees
  • Firms with 1 to 4 employees
  • Firms with 5 to 9 employees
  • Firms with 10 to 19 employees
  • Firms with 20 to 49 employees
  • Firms with 50 to 99 employees
  • Firms with 100 to 249 employees
  • Firms with 250 to 499 employees
  • Firms with 500 to 999 employees
  • Firms with 1,000 or more employees

Employer firms include firms with payroll at any time during 2014. Employment reflects the number of paid employees during the March 12, 2016 pay period.

ASE data are also categorized by the firm’s number of years in business, which is determined by the first year that the Census Bureau received administrative records data for the business.  Data by years in business are presented by the following years in business categories:

  • All firms
  • Firms with less than 2 years in business
  • Firms with 2 to 3 years in business
  • Firms with 4 to 5 years in business
  • Firms with 6 to 10 years in business
  • Firms with 11 to 15 years in business
  • Firms with 16 or more years in business

Disclosure

Confidentiality. In accordance with federal law governing census reports (Title 13 of the United States Code), no data are published that would disclose the operations of an individual establishment or business. However, the number of firms is not considered a disclosure. Therefore, the number of firms may be released even though other information is withheld. Techniques employed to limit disclosure are discussed at the Census Business Help Site.

The information and data obtained from the Internal Revenue Service, the Social Security Administration, and other sources are also treated as confidential and can be seen only by Census Bureau employees sworn to protect the data from disclosure.

Disclosure Avoidance. Disclosure is the release of data that have been deemed confidential. It generally reveals information about a specific individual or firm or permits deduction of sensitive information about a particular individual or establishment. Disclosure avoidance is the process used to protect the confidentiality of the survey data provided by an individual or firm. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put confidential information at risk of disclosure. Although it may appear that a table shows information about a specific individual or business, the Census Bureau has taken steps to disguise or suppress the original data while making sure the results are still useful. The techniques used by the Census Bureau to protect confidentiality in tabulations vary, depending on the type of data.

Noise Infusion. The ASE uses noise infusion as the primary method of disclosure avoidance. Noise infusion is a method of disclosure avoidance in which values are perturbed prior to tabulation by applying a random noise multiplier to the magnitude data, such as the sales and receipts for all firms. Disclosure protection is accomplished in a manner that causes the vast majority of cell values to be perturbed by, at most, afew percentage points. For sample-based tabulations, such as ASE, the estimated relative standard error for a published cell includes both the estimated sampling error and the amount of perturbation in the estimated cell value due to noise.

In certain circumstances, some individual cells may be suppressed for additional disclosure avoidance and the data replaced by one of the following characters:

  • N - Not available or not comparable
  • S - Withheld because estimates did not meet publication standards, such as the relative standard error of the sales and receipts is 50 percent or more
  • X - Not applicable

To provide meaningful information for cells that have suppression of sensitive employment data, these characters are used to indicate the employment range for a firm:

  • a - 0 to 19 employees
  • b - 20 to 99 employees
  • c - 100 to 249 employees
  • e - 250 to 499 employees
  • f - 500 to 999 employees
  • g - 1,000 to 2,499 employees
  • h - 2,500 to 4,999 employees
  • i - 5,000 to 9,999 employees
  • j - 10,000 to 24,999 employees
  • k - 25,000 to 49,999 employees
  • l - 50,000 to 99,999 employees
  • m - 100,000 employees or more

Receipts ranges are used for estimates at the sector, state, and MSA level. Receipts ranges are presented in the following categories:

  • B -  Less than $1 million
  • I -    $1 million to less than $5 million
  • K -  $5 million to less than $15 million
  • L -   $15 million to less than $50 million
  • M -  $50 million to less than $75 million
  • O -  $75 million to less than $150 million
  • R -  $150 million to less than $500 million
  • T -   $500 million to less than $1 billion
  • U -  $1 billion to less than $5 billion
  • W - $5 billion or more


For a complete list of all economic programs symbols, see the Economic Census Data Dictionary



Page Last Revised - October 8, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header