Record Linkage : Theory and Practice
Abstract
Record Linkage concerns the consolidation of data records from disparate sources into one general file or database. We explore the common model design of a Record Linkage system by first considering data preprocessing techniques. We then take a look at effective distance metrics and similarity measures such as the Levenshtein algorithm to rate similarity between features of interest. Next we take a look at effective clustering so as to effectively reduce computational complexity. Lastly we consider the performance of the system as a whole by looking at common evaluation metrics.