Skip to main content

ISTAT (Instituto Nazionale di Statistica) distributes, under the EUPL licence, their RELAIS toolkit.

EUPL in Italy

Published on: 30/03/2020 News Archived

RELAIS (REcord Linkage At IStat) is a toolkit for dealing with record linkage projects.

The purpose of record linkage is to identify the same real world entity that can be differently represented in multiple data sources, even if unique or common identifiers are not available or are affected by errors.

As an example, you may consider two standardized data sets, Set A and Set B, that contain different bits of information about individuals, using a variety of identifiers like the Social Security Number (SSN), name, date of birth (DOB), sex, and ZIP code (ZIP):

Data Set

#

SSN

Name

DOB

Sex

ZIP

Set A

1

000956723

Smith, William

1973/01/02

 

94701

 

2

000005555

Jones, Robert

1942/08/14

Male

94701

 

3

123001234

Sue, Mary

1972/11/19

Female

94109

Set B

1

000005555

Jones, Bob

1942/08/14

   

 

2

 

Smith, Bill

1973/01/02

Male

94701

Record linkage will establish, with an optimal probability level, that record 1 in data set A (Smith, William) targets in fact the same person as record 2 in set B (Smith, Bill) even if the name is not the same and if the SSN or the sex is missing.

Record linkage became a popular technique employed not only to integrate or centralize different databases for the purpose of social management, epidemiology, medical studies or even counterterrorism, but also for data cleaning and quality (detecting duplicate records). In the context of data privacy, record linkage has emerged as an important technique to evaluate the disclosure risk of protected data. In the context of developing Artificial Intelligence (AI) processes, record linkage helps providing valid information by consolidating multiple sources of input data.

Since record linkage can be seen as a complex process, by phases involving different knowledge areas, different techniques can be adopted for each phase. The RELAIS toolkit provides the choice of the most appropriate technique, depending on application.

RELAIS has been developed as an open source project, so several solutions already available for record linkage in the scientific community can be easily re-used. It is released under the EUPL license (European Union Public License) and is a precious contribution from ISTAT (which is a regular EUPL user) to interoperable European knowledge.

More information: https://www.istat.it/en/methods-and-tools/methods-and-it-tools/process/processing-tools/relais

Referenced solution