To comment on this page, please visit my blog post about it here.

Overview

Wikipedia

Overview of Record Linkage and Current Research Directions

Duplicate record detection: A survey 

Record linkage: Current practice and future directions 

A Generalized Cost Optimal Decision Model for Record Matching 

String Comparison Research

String Comparator Metrics and Enhanced Decision Rules in the Felligi-Sunter Model of Record Linkage

A Comparison of String Metrics for Matching Names and Records

The Flamingo Package (Approximate String Matching Research)

Robust and Efficient Fuzzy Match for Online Data Cleaning

Blocking Research

A Comparison of Fast Blocking Methods for Record Linkage

Approximate String Comparator Search Strategies for Very Large Administrative Lists

Learning Blocking Schemes for Record Linkage

Indexing Text with Approximate qgrams

Finding Approximate Matches in Large Lexicons

Space-Constrained Gram-Based Indexing for Efficient Approximate String Search

Adaptive Blocking: Learning to Scale Up Record Linkage 

Adaptive filtering for efficient record linkage

Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently

Efficient Record Linkage in Large Data Sets

Entity Resolution

Query Time Entity Resolution

Entity Resolution with Markov Logic

Swoosh: A Generic Approach to Entity Resolution

D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution

Generic Entity Resolution with Data Confidences

Generic Entity Resolution with Negative Rules

Improving Grouped-Entity Resolution Using Quasi-Cliques

Real-world data is dirty: Data cleansing an the merge/purge problem

Reference reconciliation in complex information spaces

The merge/purge problem for large databases

Interactive deduplication using active learning 

Learning object identification rules for information integration

Classification/Clustering Research

Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching

A Hierarchical Graphical Model for Record Linkage 

Learning to Match and Cluster Large High Dimensional Data Sets for Data Integration

Automatic Record Linkage using Seeded Nearest Neighbor and Support Vector Machine Classification

Learnable Similairty Functions and Their Application to Record Linkage and Clustering

Relational Clustering for Entity Resolution Queries

Automatic Training Example Selection for Scalable Unsupervised Record Linkage

Exploiting Secondary Sources for Unsupervised Record Linkage

Learnable Similarity Functions and Their Application to Record Linkage and Clustering 

Febrl
Febrl Home Page

A Parallel Open Source Data Linkage System

High Performance Computing Techniques for Record Linkage

BigMatch

BigMatch: A Program for Large Scale Record Linkage

BigMatch: A Program for Extracting Probable Matches from a Large File

TAILOR

Record Linkage: A Machine Learning Approach, a Toolbox, and a Digital Government Web Service

OpenMRS

OpenMRS homepage 

OpenMRS: A Google Summer of Code Record Linkage System

Record Linkage in Databases

A small approximately min-wise independent family of hash functions 

Using q-grams in a DBMS for Approximate String Processing

Private/Blindfold Record Linkage

A Hybrid Approach to Private Record Linkage

Blocking Aware Private Record Linkage

Record Linkage of Anonymous Data by Control Numbers

Datasets  

Various datasets 

Machine Learning

Map-Reduce for Machine Learning on Multicore