U.S. flag

An official website of the United States government

Skip Header


Methodology

Introduction

The Business Formation Statistics (BFS) are a product of the U.S. Census Bureau developed in research collaboration with economists affiliated with Board of Governors of the Federal Reserve System, Federal Reserve Bank of Atlanta, University of Maryland, and University of Notre Dame. The current BFS is released as a research product in “beta” form. A final version is in the research and development phase.

The Business Formation Statistics (BFS) provide timely and high frequency data on business applications and employer business formations.  The BFS measure business initiation activity (Business Application Series) as indicated by applications for an Employer Identification Number (EIN) on the IRS Form SS-4. The BFS also provide information on actual and projected employer business formations (Business Formation Series) that originate from these applications, based on the record of first payroll tax liability for an EIN. In addition, the BFS contain measures of delay in business starts as indicated by the average duration between the application for an EIN and the transition to an employer business.

The BFS currently cover the period starting from the third quarter of 2004 (2004q3) onwards at a quarterly frequency. The data are available nationally and by individual states.

Business Formation Statistics Datasets

IRS Form SS-4

Uses

Understanding how business formation is related to current national and local economic conditions is a challenging task that requires accurate, timely and comprehensive high frequency data. The BFS respond to this challenge by providing comprehensive data on business application and formation in a timely manner. The BFS can help economists, policymakers, regional planners and businesses assess the current state of early entrepreneurship at the national and state levels. The BFS uncover the trends in business applications and formations at previously unavailable levels of frequency, coverage, and timeliness. The data can be used to study a variety of issues in entrepreneurship, including, but not limited to, the high-frequency dynamics of entrepreneurial activity, the effects of business cycles on entrepreneurship, the effects of regional economic development policies on new business formation, the impact of state tax policies and regulations on business initiation, and the formation of new industrial clusters and agglomerations. A key benefit from these data are their timeliness and high frequency, which allow policymakers, analysts and researchers to better monitor the state of entrepreneurial activity in the United States.

Data Sources

The data for the BFS come from three main sources. The data on business applications are based on applications for an Employer Identification Number (EIN) through filings of IRS Form SS-4. Employer business formations originating from these business applications are identified using the Census Bureau’s Business Register (BR) and the Longitudinal Business Database (LBD), which together provide information on the timing of first payroll tax filing for a business based on tax records.  The BR is the Census Bureau’s main sampling frame for the universe of U.S. businesses and contains quarterly payroll and employment information for employer businesses. The LBD is constructed by linking annual snapshot files from the BR to provide a longitudinal history for each business establishment (see Jarmin and Miranda (2002)). Through these linkages, the LBD is able to provide information on the first-ever appearance of a business in the BR as a business with payroll or employment.

Business Register

Longitudinal Business Database

IRS Form SS-4

Coverage, Scope and Key Definitions

The BFS contain EIN applications made in the United States, including those associated with starting a new employer business. EINs are IDs used by business entities for tax purposes. All employer businesses in the United States must have an EIN to file payroll taxes (see the IRS guidance on who needs an EIN; sole proprietors with no employees can use Social Security Numbers (SSN) instead of an EIN for tax filings). Applications for new EINs are filed through IRS Form SS-4. Applicants submit information such as the name and address of the applicant and the intended business, the reason for application, the type of business entity, information on the principal activity of the business, plans to hire employees and date of initial wage payments, and the business start date. The Census Bureau uses the EIN applications to support its Business Register (BR). The BR is the enumeration list for the Economic Census and is the sampling frame for other business surveys conducted by the Census Bureau. It serves as the central storage for administrative business data at the Census Bureau and is the source of statistical products including the County Business Patterns (CBP) and the Business Dynamics Statistics (BDS). EIN applications are used to keep the BR and the associated sampling frames current.

The BFS currently cover the entire set of EIN applications transmitted to the Census Bureau starting with the period 2004q3. The data are presented at quarterly frequency.  

A number of restrictions are placed on the set of applications that are used by the BFS to generate business applications series and to model business formation. Four broad types of applications are omitted from the analysis based on type of entity, industry, geography, and the observed concentration of applications from a specific source. With regard to type of entity, three groups are removed from the data: applications associated with tax-liens, trusts, and estates. These applications are generally not associated with business formation and their presence in the data varies over time. Applications from a set of detailed industries within the agricultural, financial services and private household sectors are also excluded. Applications from these specific industries have very low transition rates to employer businesses. Applications by public entities (e.g., state or local governments) are also not included. The analysis also omits applications with missing state information (a small fraction of applications) and applications made from outside the 50 states or the District of Columbia, such as Puerto Rico or the Virgin Islands. Finally, applications are also excluded if they are part of concentrated filing spikes. A concentrated filing spike is defined as a group of EIN applications that appear in the same weekly application cycle, come from the same zip code, and share the same industry code. These filings are mainly related to some type of financial filing and do not represent an intent to form a business.

The resulting business applications are matched to the set of businesses in the BR that are identified as new employer businesses in the Longitudinal Business Database. The match to the BR reveals which applications become employer businesses and the quarter in which they begin to pay employees. Currently, the applications are matched to the employer business universe only, though many applications for new businesses may end up as non-employer businesses. In addition, the BFS emphasize new business formations, so EIN applications from existing business entities due to changes in legal form, reorganizations or expansions are not included in the match.

It is important to note that, while comprehensive, EIN applications may leave out some business initiations in the form of sole proprietorships with no employees. These business initiations represent many types of entrepreneurship in the form of independent contractors who rely on the entrepreneur's Social Security Number (SSN) for tax purposes instead of an EIN.

For a detailed description of various business application and formation series, see Business Application Series and Business Formation Series.

Business Application Series

Business Formation Series

IRS Guidance on EINs

Concepts and Methodology

The information submitted by applicants in the IRS Form SS-4 for an EIN application is used to model employer business formation for the U.S. economy as a whole and for individual states. Let Ngt be the number of new applications in a geographic region g (e.g., a state or the entire U.S.) in quarter t. The total number of business formations that occur between quarters t and t + k from these applications is then given by

where Iigt+k is a realization of a Bernoulli random variable that governs whether application i turns into an employer business by the end of quarter t + k. The probability distribution function for Iigt+k is given by

where Pigt+k is the probability that application igt turns into an employer business between quarters t and t + k. Then, the expected number of business formations can be written as

To estimate E[Sgt+k], an estimate of Pigt+k is needed. Towards that goal, one can model Iigt+k as a function of application-level variables, Zigt, provided as part of an EIN application in the IRS Form SS-4 and a set of unknown parameters, βgt. Using a Linear Probability Model (LPM) model, the probability of an application transitioning to an employer can then be estimated as

where F is a linear function, and  is an estimate of the unknown parameters, βgt, based on the LPM. The predicted application-level probabilities, , can be used to construct an estimate of the expected number of business formations, E[Sgt+k], as

This approach amounts to reweighting each application by the predicted probability that the application becomes an employer business between quarters t and t + k. In the analysis, k is set to either four or eight, corresponding to four and eight quarters, respectively. The four and eight quarter windows were chosen to allow a long enough time for an application to become an employer business and cover a majority of transitions to employer business. These choices prevent a significant loss of information due to right censoring – some applications transition beyond the four or eight quarter window. The estimated expected number of business formations are used to generate the series Projected Business Formations within 4 Quarters (PBF4Q) and Projected Business Formations within 8 Quarters (PBF8Q), as described in the Business Formation Series section below. For further details on the estimation methodology, see Bayard, Dinlersoz, Dunne, Haltiwanger, Miranda, and Stevens (2018).

Comparability with Other Data

The Business Dynamics Statistics (BDS) program of the Census Bureau also provides information on new employer businesses at annual frequency. However, there are some key differences in how the BDS and BFS account for new business formation. First, the BDS use employment rather than payroll to identify new businesses. Employment in the BDS is a point-in-time measure. The BDS capture employment as of the payroll week covered by March 12 of the year. The BFS, by contrast, use the presence of payroll as a measure of business formation activity. In addition, the BFS are based on a quarterly measure of payroll within each year. The quarterly frequency leads to timing differences with respect to the BDS in the identification of business startups that hired their first employee after the payroll week of March 12. Second, because of left censoring in the business applications, the BFS do not account for employer business formations that originate from EIN applications dated before 2004q3. This effect, however, dissipates toward the end of the sample period, as nearly all business formations eventually tend to arise from business applications made since 2004q3. For these reasons, the BDS annual count of new employer businesses do not exactly match the corresponding count in the BFS, but they track each other closely.

Business Dynamics Statistics

Reliability of the Data

Because the BFS are constructed using a combination of administrative data, rather than a probability sample, sampling error does not apply to the BFS. Non-sampling error, however, still exists. Non-sampling errors can occur for many reasons, such as the employer submitting corrected payroll or employment data after the end of the year as well as late filers. Other sources of error include typographical errors made by businesses when providing information on the survey or administrative forms. Such errors, however, are likely to be distributed randomly throughout the dataset.

There is also projection error in the projected number of business formations based on the econometric models. The models perform well in terms of prediction error within the estimation sample and in out-of-sample projection exercises (see Bayard, Dinlersoz, Dunne, Haltiwanger, Miranda and Stevens (2017)).  It is possible to provide measures of error and confidence bands for the projected number of business formations, and such measures will be considered for future versions of the BFS.  

Changes in administrative data sometimes can also create complications in identifying business startups with payroll. The Longitudinal Business Database (LBD) addresses these issues in detail in order to avoid overstating business openings (Jarmin and Miranda (2002)). The BFS are subject to periodic changes based on corrections to the LBD due to updates coming from the new BR files. Such changes will reflect themselves on actual and projected business formation series on an annual basis once the BFS are revised based on the updated LBD-based firm birth information. There are also some changes in the content of the IRS Form SS-4 over time, and new information in the form is incorporated in to the analysis as it becomes available.

Business Application Series

These series describe the business applications for tax IDs as indicated by applications for an Employer Identification Number (EIN) through filings of IRS Form SS-4. Business applications are presented in four different series reflecting different subsets of the applications for an EIN.  All business applications series cover the period from 2004q3 onwards.

  • Business Applications (BA): The core business applications series that correspond to a subset of all applications for an EIN. Includes all applications for an EIN, except for applications for tax liens, estates, trusts, or certain financial filings, applications outside of 50 states and DC or with no state‐county geocodes, applications with a NAICS sector code of 11 (agriculture, forestry, fishing and hunting) or 92 (public administration), and applications in certain industries (e.g. private households, civic and social organizations).
  • High-Propensity Business Applications (HBA): Business Applications (BA) that have a high propensity of turning into businesses with payroll. The identification of high-propensity applications is based on the characteristics of applications revealed on the IRS Form SS-4 that are associated with a high rate of business formation. High-propensity applications include applications: (a) for a corporate entity, (b) that indicate they are hiring employees, purchasing a business or changing organizational type, (c) that provide a first wages-paid date (planned wages); or (d) that have a NAICS industry code in manufacturing (31-33), retail stores (44), health care (62), or restaurants/food service (72).
  • Business Applications with Planned Wages (WBA): High-Propensity Business Applications (HBA) that indicate a first wages‐paid date on the IRS Form SS-4. The indication of a wages-paid date is associated with a high likelihood of transitioning into a business with payroll.
  • Business Applications from Corporations (CBA): High-Propensity Business Applications (HBA) from a corporation or personal service corporation, based on the legal form of organization stated in the IRS Form SS-4. Similar to the WBA series, this series is important primarily because it consists of a set of applications that have a high rate of transitioning into businesses with payroll.

The following is a Venn diagram of the relationship between the four business applications series (BA, HBA, WBA, CBA) and EIN applications.

The Relationship Between Different Business Applications Series

Business Formation Series

These series describe employer business formations as indicated by the first instance of payroll tax liabilities for the corresponding business applications. The business formation series are forward-looking in the sense that they measure new business formations from the time of business application in any given quarter. Two series are provided: the first describes transitions within the next four quarters, and  the second within the next eight quarters. All business formation series start in 2004q3, the earliest quarter for which the data on business applications is available.

  • Business Formations within 4 Quarters (BF4Q):  This series provide the number of employer businesses that originate from Business Applications (BA) within four quarters from the quarter of application. By definition, the end-point of this series is determined by the most recent quarter for which the administrative data is available on payroll.
  • Projected Business Formations within 4 Quarters (PBF4Q): The projected number of employer businesses that originate from Business Applications (BA) within four quarters from the quarter of application.  The projections are based on an econometric model that generates estimates of the likelihood that a business application turns into an employer business. For the details of the model, see the working paper. The projected business formation series cover the period for which the actual number of quarterly business formations is not yet available. Combining the projected series with the actual business formations (the BF4Q series) results in an up-to-date, forward-looking business formation series.
  • Spliced Business Formations within 4 Quarters (SBF4Q): This series combines (splices) BF4Q and PBF4Q to provide the entire time series for the actual and projected business formations within 4 quarters.
  • Business Formations within 8 Quarters (BF8Q):  The number of employer businesses that originate from Business Applications (BA) within eight quarters from the quarter of application, similar to the BF4Q series. Again, the end-point of this series is determined by the most recent quarter for which the administrative data on payroll is available.
  • Projected Business Formations within 8 Quarters (PBF8Q): The projected number of employer businesses that originate from Business Applications (BA) within eight quarters from the quarter of application, similar to the PBF4Q series. The projected business formation series cover the period for which the actual business formations are not yet available.
  • Spliced Business Formations within 8 Quarters (SBF8Q): This series combines (splices) BF8Q and PBF8Q to provide the entire time series for the actual and projected business formations within 8 quarters.
  • Average Duration (in Quarters) from Business Application to Formation within 4 Quarters (DUR4Q): A measure of delay between business application and formation, measured as the average duration (in quarters) between the quarter of business application and the quarter of business formation, conditional on business formation within four quarters. These series span the same period as BF4Q.
  • Average Duration (in Quarters) from Business Application to Formation within 8 Quarters (DUR8Q): A measure of delay between business application and formation, similar to the DUR4Q series. The difference is that the window for business formation is restricted to eight quarters, rather than four. The series span the same period as BF8Q.

The following is a graphical representation of the relationship between business application and formation series.

The Relationship Between Business Application and Business Formation Series

Seasonal Adjustment

Because of strong seasonality detected in most of the business application and formation series, all series are provided with and without seasonal adjustment. In the case of the duration series (DUR4Q and DUR8Q), seasonality is not significant in general. Therefore, no seasonally adjusted duration series are provided. Seasonal adjustment is performed using the X-13ARIMA-SEATS seasonal adjustment program of the US Census Bureau. Users can implement their own seasonal adjustment methods using the unadjusted data.

Page Last Revised - October 8, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header