The German Open Data Portal (GovData.de) is using the Test Bed’s RDF validator to provide quality assurance for its published data.
GovData.de, the Open Data Portal for Germany, offers uniform, central access to administrative data from federal, state, and local authorities. The goal is to centralise such data and expose it through a searchable catalogue, constructed on the basis of metadata that enrich the recorded information and facilitate its subsequent categorisation and searching. Metadata are expressed using DCAT-AP.de, the common German metadata model for the exchange of open administrative data, which is an RDF specification fully aligned with DCAT-AP to ensure cross-border interoperability, while catering for German specificities.
The data published on the GovData.de portal is harvested from various data providers at several administrative levels who are expected to make their datasets available in DCAT-AP.de. As a means of verifying the quality of provided data and reporting issues, GovData.de created a set of specification profiles with rules expressed as SHACL shapes, the constraint language for RDF data.
At the start of this activity, GovData.de approached the Interoperability Test Bed to benefit from its experience in validating RDF and specifically make use of its RDF validation service. Following initial development and testing, this work resulted in the DCAT-AP.de validator, a public validator instance for different DCAT-AP.de profiles that is hosted by the Test Bed. This milestone, as well as subsequent development, resulted from the close collaboration between GovData.de, supported by its service providers SEITENBAU and INIT, and the Test Bed’s experts. The validator was first brought online on March 12th, 2020, and has since been continuously updated to extend its provided validation options. The validator’s configuration and underlying SHACL shapes are publicly available on GitHub.
As a complement to the public validator, GovData.de proceeded with the development of a data quality dashboard that allows data providers to be informed of the quality of their provided data and take corrective actions if needed. For this purpose, the data ingestion process was elaborated, introducing a second on-premise RDF validator instance as an internal component to provide quality control and reporting. This second validator instance is used through its machine-to-machine API and integrates with internal triple stores using SPARQL queries to read the data to validate and enrich resulting reports before these are stored for the dashboard. The dashboard was announced and released to data providers on April 13th, 2021.
In terms of next steps, work continues to develop further validation artefacts, notably focusing on the upcoming DCAT-AP.de version 2.0. In addition, the data quality dashboard will continue being extended with a focus primarily on enriching the RDF validator’s report through further leveraging of SHACL and SPARQL. The goal is to provide concrete assistance to data providers on the resolution of reported validation errors.
Use of the Test Bed’s RDF validator by GovData.de offers an interesting example of how a validator was used both as a public community tool hosted on the Test Bed, but also as a separate internal quality control component, integrated on-premise into a data ingestion workflow. Providing quality control for data ingestion is an increasingly popular need for which the GovData.de case stands as a good reference.
Further information on the German Open Data activities and services can be found on the GovData.de portal. You are also invited to follow the GovData.de Twitter account for updates on the portal, DCAT-AP.de development or Open Data in Germany.
If you are new to the Test Bed’s RDF validator, you can find out more in the Test Bed’s RDF validation guide, with similar guides for XML, JSON and CSV. Finally, details on the Test Bed itself can be found in its Joinup space with its value proposition being a good starting point for newcomers. To remain updated of all the latest Test Bed news:
- Join the Test Bed’s community and subscribe to its news feed using your favourite RSS reader, or by managing your email notifications.
- Follow ISA²'s social media channels (Twitter, LinkedIn) for updates on the Test Bed and other interoperability solutions.