Large-scale data processing and analysis is not a new challenge for the U.S. Census Bureau, but the number of statistical programming languages and tools available to perform such work has expanded in recent years. We evaluate how statistical programming languages perform on a common data management task within the Census’s Bureau’s high-performance computing cluster. Specifically, we develop Python, SAS, Stata, and R scripts that merge the person, household, and geographic microdata from the full-count 1990 Census microdata files. We then use these merged data to perform basic analyses such as counting the number of individuals per household and calculating the average household size for every county in the U.S. We compare the different language implementations of these scripts based on runtime for each task. We find that there is wide variation between languages in runtime, and the speed of the programming language depends most heavily on the file format of the input data file.

Others in Series

Working Paper

Criminal Disqualifications in the Paycheck Protection Program

June 2020

In response to the COVID-19 pandemic, Congress created the Paycheck Protection Program (PPP) to support small businesses.

Working Paper

Final Report: Economic Census Synthetic Data Project Research Team

October 2020

This is the final report from the cross-directorate team.

Working Paper

Measuring U.S. Fertility using Administrative Data

July 2021

Census Bureau makes restricted-use administrative birth data available through the Census Numident for nearly all U.S. births for more than the last century.

Page Last Revised - October 8, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top