U.S. flag

An official website of the United States government

Skip Header


Quality of Very Large Databases

Written by:
RR2001-04

Abstract

Analyses and data mining of large computer files are affected by the quality of the information in the files. For large population registers and for files that are created by merging two or more files, duplicate entries must be identified. Duplicate identification can depend on record linkage software that can deal with name, address, and date-of-birth data containing many typographical errors. Quantitative and qualitative data must be edited to assure that mutually contradictory or missing items are changed automatically and quickly. This paper describes computational methods and software that are suitable for groups of files where individual files contain between 1 million and 4 billion records.

Related Information


Page Last Revised - October 28, 2021
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header