Study on data quality management

Published on: 23/09/2019 Last update: 27/09/2019 Document

The SEMIC action of the ISA² programme seeks to promote the use of semantic tools and solutions to overcome interoperability challenges between Member States public administrations when maintaining and exchanging data for the execution of European Public Services. To support this objective, the action has produced a number of studies, which focus on various aspects of semantic tools.

This study explores the intersection between data quality management (from a data governance point of view) and semantic interoperability: how semantic assets support and evolve data quality considerations. It describes state of the art concepts and frameworks for data quality and link those to semantic interoperability by studying how data quality can be improved.

The main objective of this study is to investigate how data quality in the context of data governance can be improved through the use of semantic methodologies and technologies. It presents the most prominent data quality dimensions and focuses on the semantic web methodologies, technologies and open standards that can be used by public organisations to improve their data quality primarily with respect to the quality dimensions and additionally from a general perspective. It also explores the idea of semantic enrichment of metadata, considering the impact of metadata in data quality and discoverability.

The main finding of the study was that by employing knowledge representation technologies (ontologies, thesauri, vocabularies, open standards) and mechanisms to model and organise governance data, public organisations can improve their quality of data and achieve interoperability. The use of ontologies enables automated reasoning, which can infer new relationships and properties and thus contribute in data accuracy and completeness. Semantic web query languages can be used to enhance the relevance of data. RDF validation mechanisms (i.e. SHACL) can improve the integrity and semantic accuracy of data. It was also demonstrated that semantic enrichment of metadata can lead to improved data quality. Machine learning techniques like natural language processing combined with deep learning can be used to systematically enhance the quality of governance data (structured, semi-structured and unstructured) and in combination with human in the loop methodologies can improve data discoverability and accessibility and provide data of high accuracy and completeness.

Categorisation

Type of document

Document

Attachment

SEMIC Study on data quality management.pdf

Report abusive content Share