To comment on this page, please visit my blog post about it here.
Overview
Overview of Record Linkage and Current Research Directions
Duplicate record detection: A survey
Record linkage: Current practice and future directions
A Generalized Cost Optimal Decision Model for Record Matching
String Comparison Research
String Comparator Metrics and Enhanced Decision Rules in the Felligi-Sunter Model of Record Linkage
A Comparison of String Metrics for Matching Names and Records
The Flamingo Package (Approximate String Matching Research)
Robust and Efficient Fuzzy Match for Online Data Cleaning
Blocking Research
A Comparison of Fast Blocking Methods for Record Linkage
Approximate String Comparator Search Strategies for Very Large Administrative Lists
Learning Blocking Schemes for Record Linkage
Indexing Text with Approximate qgrams
Finding Approximate Matches in Large Lexicons
Space-Constrained Gram-Based Indexing for Efficient Approximate String Search
Adaptive Blocking: Learning to Scale Up Record Linkage
Adaptive filtering for efficient record linkage
Efficient Record Linkage in Large Data Sets
Entity Resolution
Entity Resolution with Markov Logic
Swoosh: A Generic Approach to Entity Resolution
D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution
Generic Entity Resolution with Data Confidences
Generic Entity Resolution with Negative Rules
Improving Grouped-Entity Resolution Using Quasi-Cliques
Real-world data is dirty: Data cleansing an the merge/purge problem
Reference reconciliation in complex information spaces
The merge/purge problem for large databases
Interactive deduplication using active learning
Learning object identification rules for information integration
Classification/Clustering Research
Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching
A Hierarchical Graphical Model for Record Linkage
Learning to Match and Cluster Large High Dimensional Data Sets for Data Integration
Automatic Record Linkage using Seeded Nearest Neighbor and Support Vector Machine Classification
Learnable Similairty Functions and Their Application to Record Linkage and Clustering
Relational Clustering for Entity Resolution Queries
Automatic Training Example Selection for Scalable Unsupervised Record Linkage
Exploiting Secondary Sources for Unsupervised Record Linkage
Learnable Similarity Functions and Their Application to Record Linkage and Clustering
Febrl
Febrl Home Page
A Parallel Open Source Data Linkage System
High Performance Computing Techniques for Record Linkage
BigMatch
BigMatch: A Program for Large Scale Record Linkage
BigMatch: A Program for Extracting Probable Matches from a Large File
TAILOR
Record Linkage: A Machine Learning Approach, a Toolbox, and a Digital Government Web Service
OpenMRS
OpenMRS: A Google Summer of Code Record Linkage System
Record Linkage in Databases
A small approximately min-wise independent family of hash functions
Using q-grams in a DBMS for Approximate String Processing
Private/Blindfold Record Linkage
A Hybrid Approach to Private Record Linkage
Blocking Aware Private Record Linkage
Record Linkage of Anonymous Data by Control Numbers
Datasets
Machine Learning