U.S. flag

An official website of the United States government

Skip Header


Overview of Record Linkage and Current Research Directions

Written by:
RRS2006-02

Introduction

Record linkage is the means of combining information from a variety of computerized files. It is also referred to as data cleaning (McCallum and Wellner 2003) or object identification (Tejada et al. 2002). The basic methods compare name and address information across pairs of files to determine those pairs of records that are associated with the same entity. An entity might be am business, a person, or some other type of unit that is listed. Based on economic relationships, straightforward extensions of methods might create functions and associated metrics for comparing information such as receipts or taxable income. The most sophisticated methods use information from multiple lists (Winkler 1999b), create new functional relationships between variables in two files that can be associated with new metrics for identifying corresponding entities (Scheuren and Winkler 1997), or use graph theoretic ideas for representing linkage relationships as conditional random fields that be partitioned into clusters representing individual entities (McCallum and Wellner 2003, Wei 2004).

Related Information


Page Last Revised - October 28, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header