Analyses and data mining of large computer files are affected by the quality of the information in the files. For large population registers and for files that are created by merging two or more files, duplicate entries must be identified. Duplicate identification can depend on record linkage software that can deal with name, address, and date-of-birth data containing many typographical errors. Quantitative and qualitative data must be edited to assure that mutually contradictory or missing items are changed automatically and quickly. This paper describes computational methods and software that are suitable for groups of files where individual files contain between 1 million and 4 billion records.

Others in Series

Working Paper

Convergence of a Robbins-Monro Algorithm for Recursive Parameter Es...

June 26, 2001

Convergence of a Robbins-Monro Algorithm for Recursive Parameter Estimation with Non-Monotone Weights and Multiple Zeroes

Working Paper

Record Linkage Software and Methods for Merging Administrative Lists

July 23, 2001

Record Linkage Software and Methods for Merging Administrative Lists

Working Paper

1979-80-Census Seasonal Adjustment Project: Final Report on Researc...

1980

1979-80-Census Seasonal Adjustment Project: Final Report on Research Activities

View All

Related Information

WORKING PAPER

Statistical Research Reports and Studies

Page Last Revised - October 28, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top