Methodology

2018

Survey Design

For purposes of this document, the following definitions are provided:

Building—a separate physical structure identified by the respondent containing one or more units.
Property—one or more buildings owned by a single entity (person, group, leasing company, and so on). For example, an apartment complex may have several buildings but they are owned as one property.

Target population: All rental housing properties in the United States, circa 2017.

Sampling frame: The RHFS sample frame is a single frame based on a subset of the 2017 American Housing Survey (AHS) sample units. The RHFS frame included all 2017 AHS sample units that were identified as:

rented or occupied without payment of rent
units that are owner occupied and listed as “for sale or rent”.

Two categories of vacant AHS units were inadvertently excluded from the RHFS sample frame. These categories are “for rent” and “rented but not yet occupied”. An internal analysis determined that the distribution (by region and by size (number of units)) of the RHFS sample frame was not significantly different than an alternative frame that included the vacant units. Characteristics such as year built and metro status were also found to be similar between the two frames. The estimation procedure includes a population adjustment to the total number of rental units that accounts for vacant units.

By design, the RHFS sample frame excluded public housing and transient housing types (i.e. boat, RV, van, other). Public housing units are identified in the AHS through a match with the Department of Housing and Urban Development (HUD) administrative records.

The RHFS frame is derived from the AHS sample, which is itself composed of housing units derived from the Census Bureau Master Address File. The AHS sample frame excludes group quarters housing. Group quarters are places where people live or stay in a group living arrangement. Examples include dormitories, residential treatment centers, skilled nursing facilities, correctional facilities, military barracks, group homes, and maritime or military vessels. As such, all of these types of group quarters housing facilities are, by design, excluded from the RHFS.

In some cases, nursing homes are co-located with other non-group quarter housing unit types, such as “assisted living” or “independent living” housing units. Since these units are “in scope” in the AHS sample frame, they are deemed “in scope” for the RHFS sample frame. However, it can be difficult to separate the group quarters units from the housing units (i.e., non-group quarters) when reporting total units for the RHFS property. Census Bureau field representatives do their best to ensure the property information collected in the RHFS reflects only the housing units portion of the property. Moreover, due to the often complicated and unique financial structure of these type of properties, it makes little sense to attempt to administer the RHFS questionnaire to them. Properties identified as assisted living facilities are counted in the RHFS for purposes of total rental units, and are flagged as “assisted living”, but little information is collected about the property.

Finally, the 2018 RHFS sample was selected from the 2017 AHS, but surveyed in 2018. The 2017 AHS sample is selected in late calendar year 2016 and includes new housing units built and ready for occupancy as of late 2016. As such, the RHFS sample will not include any rental properties built between late 2016 and 2018. The total number of rental units in 2+ properties built in 2017 was 336,000[1].

Sampling unit: Buildings with at least one unit that is either rented or vacant-for rent.

Sample design: The 2018 RHFS was sampled from all rental units as identified in the 2017 AHS sample. AHS cases were stratified based on building size. Building size was based on the number of units in the building as reported by the AHS respondent. AHS respondents reporting single-family attached or detached were put in the single unit building strata. Multiunit buildings were further categorized into four pre-defined strata based on the number of units (2-4, 5-24, 25-49, 50 or more).

Within each stratum, eligible buildings were sorted by geographic variables, including census region, state, urban/rural status, county, and zip code to obtain a stratified systematic sample. The within-stratum sampling rates were determined to result in an expected coefficient of variation (CV)[2] of 10 percent for the aggregated stratum estimates at the national level.

Table 1 below shows the frame size and the final sample sizes by stratum. The sample size incorporates an oversample of buildings that was based on the results from the 2015 iteration of this survey. This oversample was needed for three reasons; a high non-response rate, a large number of ineligible properties, and a large number of sampled units that changed strata after being selected from the frame. Table 1 also displays the number of completed interviews in both the original and final stratum. Units in the 5-24 units and 25-49 units stratum were selected with certainty because the target sample size exceeded the number of units in the frame for these strata. The owners and/or property managers of the sampled buildings were contacted and asked about specific financing and property-related characteristics.

Table 1. 2018 RHFS Sample Size by Stratum

Stratum	Frame Size	Sample Size	Completed Interviews by Original Stratum	Completed Interviews By Final Stratum
1 unit	9,977	2,000	746	600
2-4 units	3,262	1,500	623	419
5-24 units	5,782	5,782	2,217	678
25-49 units	1,056	1,056	357	370
50+ units	2,543	1,000	388	2,184
Total Sample	22,620	11,338	4,331	4,331

Frequency of sample redesign: The RHFS sample is reselected approximately every three years.

Sample maintenance: There are no sample maintenance procedures since the sample is selected from a new frame each iteration.

Data Collection

Data items requested and reference period covered: RHFS collects data on the financial, managerial, and physical characteristics of rental housing properties nationwide. The reference period of the survey was all twelve months of 2017.

Key data items: Key data items for RHFS are the definition of the property and the presence of a mortgage.

Type of request: Voluntary

Frequency and mode of contact: The 2018 RHFS included single-family residential and multifamily residential properties with at least one housing unit rented or intended for rent. Data were collected from May 2018 through September 2018. Data collection was conducted in two phases. During Phase I, cases with contact information were first invited to self-respond through a website which accessed an online questionnaire. Additionally during Phase I, field staff began searching for property owners and managers, and conducting interviews for cases that did not include owner contact information. During Phase 2, field staff began working the cases that did not self-respond during Phase 1 while continuing work on the original Phase 1 cases.

Data collection unit: Data were collected from owners, managers, or knowledgeable agents of rental housing properties.

Special Procedures: There are no special procedures for the survey.

Compilation of Data

Editing: Respondent data were reviewed for consistency across related items.

Nonresponse: Nonresponse is defined as the inability to obtain requested data from an eligible survey unit. Two types of nonresponse are often distinguished. Unit nonresponse is the inability to obtain any of the substantive measurements about a unit. In most cases of unit nonresponse, the Census Bureau was unable to obtain any information from the survey unit after several attempts to elicit a response. Item nonresponse occurs either when a question is unanswered or unusable.

Nonresponse adjustment and imputation: A nonresponse adjustment factor, which is the ratio of the sample properties divided by the interviewed sample properties, is calculated and assigned to the interviewed sample properties. Separate factors are computed for each stratum.

For details on the nonresponse adjustment factor, see the Estimation section.

Other macro-level adjustments: Weights of the sampled rental units are adjusted to create rental housing properties based on the respondents’ answers to a specific set of questions. For details on the weighting adjustments, see the Estimation section.

Tabulation unit: Rental housing properties and units within rental housing properties.

Estimation: Estimates of total rental housing properties were calculated using the final sample weights which include sampling, nonresponse, and population adjustments.

This final weight is the product of the following components:

Basic Weight (SMPWGT)
Sampling Adjustment Factor (GWGT)
Nonresponse Adjustment Factor (NR_ADJ)
Total Rental Units Control Factor, based on 2017 AHS (PS_ADJ)

The factors are successively multiplied by the sample weight to obtain the final weight. The completed responses receive a final weight greater than 0, and each ineligible sample unit receives a weight of 0.

Basic Weight (SMPWGT)

The basic weight is the inverse of the initial probability of selection of the address. The sample address was selected from rental units identified in the 2017 AHS sample. The sample weight (SMPWGT) is the product of the first stage weight (FWGT) and the second stage weight (SWGT) as described below:

First Stage Weight (FWGT)

The first stage (AHS sample rental units) of the RHFS design is the same as the 2015 AHS. In order to account for this stage in the sample, the AHS sample weight is used in the calculation of the basic weight for the sample. The first stage sample weight for the sample AHS cases is the AHS basic weight provided at time of RHFS sample. The second stage weight (SWGT) of the RHFS sample design accounts for the within stratum sampling across the five building size strata shown in Table 1. It is the ratio of the number of units in each strata to the number of units sampled from each strata.

The sample weight is calculated as: SMPWGT = FWGT * SWGT

Sampling Adjustment Factor (GWGT)

The sampling adjustment factor adjusts the sample weights so that the final weight is representative of the complete rental property instead of the originally selected sample AHS_ID, where AHS_ID refers to an AHS sample case. Calculating the correct probability of selection and subsequent weight requires the assumption that each part of the property is selected independently. The probability of selection for the property is calculated as follows:

Where:

π_propertycalculated probability of selection for the total property

π_iprobability of selection for AHS_ID_i of the property

1- π_i probability of not selecting AHS_ID_i

Thus,

Property Weight = 1/ π_property

For example, AHS_ID₁ was originally selected for interview. However, the property is made up of 3 more AHS_IDs (AHS_ID₂, AHS_ID₃, AHS_ID₄) and they each have an associated probability of selection (π₁₌ 0.5, π₂₌ 0.5, π₃₌ 0.26, π₄₌ 0.36). Using the above formula gives us π_i = 1-[(1-0.5)(1-0.5)(1-0.26)(1-0.36)] = 0.8816 and a new weight of 1.13. A property cannot have a π_igreater than 1.

This approach is based on this concept: P(A or B or C or D selected) = 1- P(none of A or B or C or D are selected)

The adjustment factor is then calculated as

When this factor is applied to the SMPWGT we get the property weight (PROPWGT=SMPWGT* GWGT). This factor was calculated by looking at the respondent’s property definition, which includes the number of units per building.

Nonresponse Adjustment Factor (NR_ADJ)

The purpose of the nonresponse adjustment is to inflate the weights of responses to account for eligible nonresponses. The calculations to compute the nonresponse adjustment are as follows:

Step 1. Assign both the complete response and the nonresponse cases to the appropriate cells as shown using region and sampling strata information.

2018 RHFS Nonresponse Adjustment Cell Assignment

	BUILDSTRAT=00 (1 Unit)	BUILDSTRAT=01 (2-4 Units)	BUILDSTRAT=02 (5-24 Units)	BUILDSTRAT=03 (25-49 Units)	BUILDSTRAT=04 (50+ Units)
Region 1
Region 2
Region 3
Region 4

Step 2. For each of the cells of above, obtain the totals shown below:

Weighted count of completed responses (WC),

Weighted count of non responses (WNR),

Step 3. Calculate the nonresponse adjustment as:

Step 4. Apply the calculated nonresponse adjustment factor to all complete responses in the appropriate cell. This results in a non-response adjusted property weight (NR_PROPWGT).

NR_ADJ_i * PROPWGT = NR_PROPWGT

Total Rental Units Control Factor (PS_ADJ)

The 2018 RHFS sample was selected from the 2017 AHS rental units. HUD and the Census Bureau deemed it desirable to ensure consistency between RHFS and AHS rental unit estimates. To carry this out, a Total Rental Units Control Factor, otherwise known as AHS control totals, was applied to the 2018 RHFS rental unit estimates.

The application of the Total Rental Units Control Factor proceeded in four steps. In the first step, estimates of “AHS rental units by building size” were derived from the 2017 AHS. This included occupied rental units, vacant rented and vacant for rent units. The estimates excluded public housing and transient housing. Table 2 shows the weighted number of AHS Units that are in scope for RHFS.

Table 2. AHS Unit Control Totals

Number of AHS units in building	AHS Units in RHFS Scope[1] (weighted) (in thousands)
1 unit	20,068
2 to 4 units	7,812
5 to 24 units	13,075
25 to 49 units	2,320
50+ units	4,973
Total	48,248

In step 2, we created an original-to-final stratum matrix. Recall that to stratify the AHS rental units for the RHFS, the AHS building size (structure type) variable was used. However, this “original” stratification variable reflected “units in a building,” not “units in a property.” A property in RHFS is defined as all units owned under a single mortgage which may, and often does, include more than one building. For example, an AHS unit in a 10-unit building could be part of a larger property with five 10-unit buildings, meaning the property has 50-units. Similarly, a single unit may actually be part of a multifamily property. Because of the “buildings may not equal properties” issue, it was necessary to create an original-to-final stratification matrix summarizing the number of units that were originally stratified in one category (i.e., 2-4 units) but were determined to be in a different stratification category once the property was visited (i.e., 5-24 units). Table 3 shows the original-to-final stratification matrix.

Table 3. Original-to-Final stratification matrix for AHS control totals

		Original RHFS Stratum (based on AHS building size)
		1 unit	2-4 units	5-24 units	25-49 units	50+ units
Final RHFS Stratum (Based on RHFS data collection)	1 unit	0.7694	0.1064	0.0484	0.0302	0.0267
	2-4 units	0.1129	0.4894	0.0101	0.0027	0.0049
	5-24 units	0.0242	0.0988	0.2501	0.0632	0.0146
	25-49 units	0.0177	0.0502	0.0695	0.4258	0.0316
	50+ units	0.0758	0.2553	0.6220	0.4780	0.9223

The third step was to multiply the “units by building size” vector in Table 2 by the original-to-final stratification matrix in Table 3. The final AHS control totals are presented in Table 4. Note that in the original AHS “units by building size” estimate (Table 2), there were 20.068 million rental units. After the original-to-final stratification matrix is applied, there are 17.106 million rental units. This means that nearly 3 million rental units classified as single unit in the AHS are actually part of multifamily properties.

Table 4. Final AHS Control Totals

Number of units in RHFS property	Post-Stratified Unit Controls (in thousands)
1 unit	17,106
2 to 4 units	6,251
5 to 24 units	4,746
25 to 49 units	2,801
50+ units	17,344
Total	48,248

The fourth and final step was to apply the control total from Table 4. The non-response adjusted property weight (NR_PROPWGT) for each case i is used to compute a weighted total of units within each RHFS property size stratum k.

The adjustment factor (PS_ADJ) for each stratum is then calculated as the ratio of each stratum’s control total and the weighted sum of units for that stratum:

Sampling Error: The sampling error of an estimate based on a sample survey is the difference between the estimate and the result that would be obtained from a complete census conducted under the same survey conditions. This error occurs because characteristics differ among sampling units in the population and only a subset of the population is measured in a sample survey. The particular sample used in this survey is one of a large number of samples of the same size that could have been selected using the same design. Because each unit in the sampling frame had a known probability of being selected into the sample, it was possible to estimate the sampling variability of the survey estimates.

Common measures of the variability among these estimates are the sampling variance, the standard error, and the coefficient of variation (CV), which is also referred to as the relative standard error (RSE). The sampling variance is defined as the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value. The standard error is the square root of the sampling variance. The CV expresses the standard error as a percentage of the estimate to which it refers. For example, an estimate of 200 units that has an estimated standard error of 10 units has an estimated CV of 5 percent. The sampling variance, standard error, and CV of an estimate can be estimated from the selected sample because the sample was selected using probability sampling. Note that measures of sampling variability, such as the standard error and CV, are estimated from the sample and are also subject to sampling variability. It is also important to note that the standard error and CV only measure sampling variability. They do not measure any systematic biases in the estimates.

The Census Bureau recommends that individuals using these estimates incorporate sampling error information into their analyses, as this could affect the conclusions drawn from the estimates.

To estimate the variance of the 2018 RHFS survey estimates, a method of successive difference replication as outlined by Ash (2014)[4] was adapted for use by RHFS. This method uses replicate weights to compare the variation in the sampled units by cycling through a pattern of replicate factors. This method of replication described by Ash (2014) builds on the successive difference replication method developed by Fay and Train (1995).[5] The weighting procedure, including all adjustments, was repeated r = 1 to 160 times, once for each replicate, to produce 160 sets of replicate weights.

Confidence Interval: The sample estimate and an estimate of its standard error allow us to construct interval estimates with prescribed confidence that the interval includes the average result of all possible samples with the same size and design. To illustrate, if all possible samples were surveyed under essentially the same conditions, and an estimate and its standard error were calculated from each sample:

Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average estimate derived from all possible samples.
Approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average estimate derived from all possible samples.

In the example above, the margin of error (MOE) associated with the 90 percent confidence interval is the product of 1.645 and the estimated standard error.

An MOE is provided for each survey estimate displayed in the tables. The sample was designed to result in an expected coefficient of variation of 10% at the national level. The key estimates and MOEs are displayed below in Table 5.

Table 5. 2018 RHFS Key Estimates

Key Estimate	Number of Properties (000s)		Number of Units within Properties (000s)
Key Estimate	Estimate	Margin of Error	Estimate	Margin of Error
Number of Properties	19,955	696.4
Number of Units			48,248	1289.1
Properties with Mortgages	8,275	441.3	28,534	1,143.8
Properties without Mortgages	11,680	816.9	19,714	967.8

Nonsampling error:

Nonsampling error encompasses all factors other than sampling error that contribute to the total error associated with an estimate. This error may also be present in censuses and other nonsurvey programs. Nonsampling error arises from many sources: inability to obtain information on all units in the sample; response errors; differences in the interpretation of the questions; mismatches between sampling units and reporting units, requested data and data available or accessible in respondents' records, or with regard to reference periods; mistakes in coding or keying the data obtained; and other errors of collection, response, coverage, and processing.

The Census Bureau recommends that individuals using these estimates factor in this information when assessing their analyses of these data, as nonsampling error could affect the conclusions drawn from the estimates.

A potential source of nonsampling error in the estimates is nonresponse. Nonresponse is the inability to obtain all the intended measurements or responses about all selected units. Unit nonresponse is used to describe the inability to obtain any of the substantive measurements about a sampled unit. For the 2018 survey, the average unit response rate was 64.3%. To mitigate the effect of nonresponse, a nonresponse adjustment was used to inflate the weights of responses to account for eligible nonresponses. For details on the nonresponse adjustment factor, see the Estimation section.

Disclosure avoidance: Disclosure is the release of data that reveals information or permits deduction of information about a particular survey unit through the release of either tables or microdata. Disclosure avoidance is the process used to protect each survey unit’s identity and data from disclosure. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put information at risk of disclosure. Although it may appear that a table shows information about a specific survey unit, the Census Bureau has taken steps to disguise or suppress a unit’s data that may be “at risk” of disclosure while making sure the results are still useful.

Cell suppression (primary and complementary) is applied to estimates for the RHFS. Cell suppression is a disclosure avoidance technique that protects the confidentiality of individual survey units by withholding cell values from release and replacing the cell value with a symbol, usually a “D.” If the suppressed cell value were known, it would allow one to estimate an individual survey unit’s too closely.

The cells that must be protected are called primary suppressions.

To make sure the cell values of the primary suppressions cannot be closely estimated by using other published cell values, additional cells may also be suppressed. These additional suppressed cells are called complementary suppressions.

The process of suppression does not usually change the higher-level totals. Values for cells that are not suppressed remain unchanged. Before the Census Bureau releases data, computer programs and analysts ensure primary and complementary suppressions have been correctly applied.

For more information on disclosure avoidance practices, see FCSM Statistical Policy Working Paper 22.

The Census Bureau has reviewed the estimates in Table Creator for unauthorized disclosure of confidential information and has approved the disclosure avoidance practices applied. (Approval ID: CBDRB-FY19-450).

History of Survey Program: Click here for information regarding RHFS sampling methodologies.

Data users should exercise caution when making comparisons between the 2015 and 2018 Rental Housing Finance Survey estimates. The 2015 sample design used separate frames for single and multi-unit addresses. Single unit rentals were selected from a frame of eligible rental units identified in the 2013 American Housing Survey (AHS) sample and multi-unit addresses were selected from a frame based on a list of basic street addresses on the Master Address File (MAF) located in 2013 AHS sample Primary Sampling Units (PSUs). The 2018 sample design used a single frame based solely on addresses of rental units identified in the 2017 AHS. The 2017 AHS was based on the new sample that was redesigned in 2015, while the 2013 AHS was based on the previous AHS sample design. Thus, the post-stratification of unit control totals had very different distributions across survey years. The differences between 2015 and 2018 RHFS estimates of rental properties can be largely attributed to the differences in the post-stratification of unit control totals and the increase in average property size as measured in units per property between the 2015 and 2018 RHFS sample designs.

[1] Source: Survey of Construction, U.S. Census Bureau and Dept. of Housing and Urban Development, Annual Characteristics Tables, 2017. https://www.census.gov/construction/chars/xls/mfu_design_cust.xls

[2] CV is defined as the stratum standard error divided by the stratum total.

[3] AHS In Scope criteria: if ((INTSTATUS eq '1' and TENURE in ('2' '3')) or (INTSTATUS in ('2' '3') and VACANCY in ('01' '02' '04'))) and (HUDSUB_IUF ne '1') and (BLD ne '10)

[4] Ash, S. (2014). Using successive difference replication for estimating variances. Survey Methodology, 40(1), 47-59.

[5] Fay, R.E., and Train, G.F. (1995). Aspects of survey and model-based postcensal estimation of income and poverty characteristics for states and counties. Proceedings of the Section on Government Statistics, American Statistical Association, 154-159.