U.S. flag

An official website of the United States government

Skip Header


Use of Field Information to Match the Records in Two Files

Written by:
RR86-21

Introduction

Consider two files of records. Within each file, each record corresponds to a different population unit; but the two files correspond to the same general population. We want to identify "matches” i.e., pairs of records (from the two files) that each correspond to the same population unit.

Each record contains data in K fields which correspond to characteristics such as age, race, etc. We may observe patterns of agreement/disagreement among the fields, for each pair of records. Using this information, we want as best as possible to identify matches. The problem of how best to use the field information has been addressed for K=3, under assumption that the events "agreement in field i," i=1, ..., K are stochastically mutually independent -- for true matches and likewise for true nonmatches. We address the problem for K>3, and avoid reliance on the assumption of independence by fitting interaction terms which reflect stochastic positive dependences.

Related Information


Page Last Revised - October 28, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header