DCAT-AP: How to model Dataset series?

2 years ago

How to model Dataset series?

Issue

During the revision process of DCAT-AP in 2015, it was noted that the base specification of DCAT at W3C only considers relationships between a catalogue and the datasets described in the catalogue, and between a dataset and the distributions that represent the manifestations of the dataset.

The base specification is silent on any relationships between datasets, while many such relationships may exist. Two types of relationships are common, namely time series, e.g. annual budget files, or relationships in a geographic dimension, e.g. data from weather sensors in various locations that are combined into data for a wider geographic area.

Additionally, DCAT-AP allows relating datasets as ‘versions’ using dct:hasVersion/dct:isVersionOf but it is not clearly described in which cases to use these properties.

Current situation

In real-world implementations, these relationships may be modelled in different ways. In some implementations, (time) series are modelled as Distributions of a single Dataset; in others, as separate Datasets with or without links between them.

This leads to situations where different providers may apply different approaches to very similar data which may not always be understandable for users and may make comparison of data across data providers difficult.

Recommendation

Based upon consideration of existing practice and further discussion, the following approaches are suggested:

  • If users are mostly interested in the individual members of the series, it is recommended to describe them as separate Datasets. While DCAT itself and DCAT-AP do not specify a mechanism to express the relationship among such Datasets, the GeoDCAT Application Profile proposes the following approach:
  • One Dataset description is created with a dct:type of http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series, linking to the  members in the series using dct:hasPart;
  • For the individual members of the series, separate Dataset descriptions are created that can link back to the series using dct:isPartOf.
  • If users are mostly interested in the series as such, it is recommended to describe the members as multiple Distributions of a single Dataset. In order to provide information about the coverage of the Distributions, the metadata for the Distributions may include temporal or spatial coverage (dct:temporal and dct:spatial) to assist users to navigate to a particular file within the collection.
  • If user expectations are difficult to determine, create separate Datasets and one combined Dataset with the members as Distributions.
  • If you want to indicate precedence/sequence among different versions of a data set, DCAT-AP proposes the use of dct:hasVersion/dct:isVersionOf. Moreover, a versioning scheme should be put in place and version numbers should be assigned as value to owl:versionInfo. adms:versionNotes can be used for describing the differences between and version and its previous one, or for indicating that a newer version is more valid than an older one.

Rationale

In the absence of consensus on how to model temporal or spatial series, the recommendation intends to give advice that considers the issue from the user perspective and may lead to a more coherent environment that is understandable to users, while retaining flexibility in the approach followed by data providers.

Example

Annual budget data - focus on individual members of the series

<rdf:Description rdf:about="http://dataportal.example.eu/datasets/EUBudget">

<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/>

<dct:type rdf:resource="http://inspire.ec.europa.eu/metadata-codelist/ResourceType/series"/>

<dct:title xml:lang="en">EU Budget Data</dct:title>

<dct:hasPart rdf:resource="http://dataportal.example.eu/datasets/EUBudget2015"/>

<dct:hasPart rdf:resource="http://dataportal.example.eu/datasets/EUBudget2016"/>

</rdf:Description>

<rdf:Description rdf:about="http://dataportal.example.eu/datasets/EUBudget2015">

<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/>

<dct:title xml:lang="en">EU Budget 2015</dct:title>

<dct:isPartOf  rdf:resource="http://dataportal.example.eu/datasets/EUBudget"/>

</rdf:Description>

<rdf:Description rdf:about="http://dataportal.example.eu/datasets/EUBudget2016">

<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/>

<dct:title xml:lang="en">EU Budget 2016</dct:title>

<dct:isPartOf  rdf:resource="http://dataportal.example.eu/datasets/EUBudget"/>

</rdf:Description>

Annual budget data - focus on the collection

<rdf:Description rdf:about="http://dataportal.example.eu/datasets/EUBudget">

<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/>

<dct:title xml:lang="en">EU Budget Data</dct:title>

<dcat:distribution rdf:resource="http://dataportal.example.eu/datasets/EUBudget2015"/>

<dcat:distribution rdf:resource="http://dataportal.example.eu/datasets/EUBudget2016"/>

</rdf:Description>

<rdf:Description rdf:about="http://dataportal.example.eu/datasets/EUBudget2015">

<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Distribution"/>

<dct:title xml:lang="en">EU Budget 2015</dct:title>

</rdf:Description>

<rdf:Description rdf:about="http://dataportal.example.eu/datasets/EUBudget2016">

<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Distribution"/>

<dct:title xml:lang="en">EU Budget 2016</dct:title>

</rdf:Description>