U.S. flag

An official website of the United States government

Skip Header


Using the EM Algorithm for Weight Computation in the Fellegi-Sunter Model of Record Linkage

Written by:
RR2000-05

Abstract

Let A × B be the product space of two sets A and B which is divided into a (pairs representing the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide A × B into links (designated matches), possible links (pairs for which we delay a decision), and nonlinks (designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) provided a linkage rule that is optimal in the sense that it minimizes the set of possible links. The optimality is dependent on knowledge of certain joint inclusion probabilities that are used in a crucial likelihood ratio. In applying the record linkage model, assumptions are often made that allow estimation of weights that are a function of the joint inclusion probabilities. If the assumptions are not met, then the linkage procedure using estimates computed under the assumptions may not be optimal. This paper describes a method for estimating weights using the EM Algorithm under less restrictive assumptions. The weight computation automatically incorporates a Bayesian adjustment based on file characteristics.

Related Information


Page Last Revised - October 28, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header