U.S. flag

An official website of the United States government

Skip Header


A Semi-Supervised Active Learning Approach for Block-Status Classification

Written by:

Abstract

The Census Bureau, as a part of its decennial census must maintain and update all the addresses present within the United States and its territories. These addresses help formulate policies and allocate valuable resources from the federal government. For the 2020 Census, in-office staff manually canvassed address coverage in every block. While this process was effective, it also brought about challenges associated with cost and time. To help aide the Census Bureau in labelling and classifying blocks, we have proposed a machine learning approach via semi-supervised learning. We present a robust machine learning solution to improve both data labeling and classification of parcel data to enable new data-driven insight while reducing costs and effort for data assessment. Towards this goal, we have employed an active-learning scheme to make accurate and precise classifications using the <1% (~50,000) labelled blocks out of the 8,000,000+ blocks within the country. We utilized multiple machine learning models including Logistic Regression, Random Forest, Gradient Boosting, Extreme Gradient Boosting, Light Gradient Boosting, and Categorical Boosting to make predictions on unlabeled data by training the model on the smaller set of labelled data. Predictions from all the models are then compared to pinpoint the blocks where there is a mismatch between the different models. These blocks are then forwarded to the human labelers to make a final prediction. Once the subset of predicted data has been validated by human labelers, it is then added to the training data before making predictions on the next subset of the data. We also discuss the different challenges associated with working on real-world data at this scale such as class-imbalance and data completeness, integrity.

Page Last Revised - December 12, 2024
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header