Contextual metadata and data set provenance


EuroCris (Brigitte Joerg, Valerie Brasse, Nikos Houssos, Keith Jeffery, Jan Dvorak, and Miguel-Angel Sicilia) contributed the following: 

Section 3.1

The aspect of contextual metadata and data set provenance (besides accrual periodicity) is important and could be mentioned in the User Scenario. Users want to know information like which persons and organisations (e.g. in the roles of creator, maintainer, publisher) are or have been involved with a particular data set, who was the last to modify the data set, through which activity (e.g. project) it has been produced it or is being maintained, etc. This has been captured as an important user requirement also in the ENGAGE project evaluation activities. Two key aspects are (a) the clear semantics of relationships (e.g. the label “maintainer” for a relationship between organisation and data set) and (b) temporal information (e.g. organisation X was the maintainer of data set Y between 01-01-2012 and 31-03-2013).






Thu, 04/04/2013 - 08:45

Strongly agreed. A typical use case may be a local government which uses its dataportal to relate datasets internally (e.g. to align data owned by the mobility department to the geografical sectors used by the GIS service). This would require a search function that allows listing all datasets that are "maintained" by each of the city services.

Alternatively, when crowdsourcing data, citizens need to know which city service to contact for updates or corrections on a particular dataset. 

Can we contribute by writing a scenario? If so, could you provide me with a few pointers or suggestions on how to go about this?

Thu, 04/04/2013 - 12:35

Please note that a comment was made on the Last Call Working Draft of DCAT that brings up the issue of provenance for Catalogue and Distribution (not Dataset!):

The comment is at

Thu, 04/04/2013 - 14:36

Thank you for pointing this out! Keeping provenance information for catalog records and distributions should do fine for these use cases.

Mon, 15/04/2013 - 18:15


Properties related to provenance are not currently defined in DCAT. There was a comment on the Last Call Working Draft of DCAT on this issue. Resolution of that comment by the GLD WG is pending.

Tue, 16/04/2013 - 23:55

@Thimo: Please feel free to refine the user scenario you mentioned for this requirement. I wonder, though, whether we really need provenance information on the CatalogRecord and Distribution? Many data portals will not be able to provide this. Taking into account temporal aspecs, as suggested by EuroCris, will further increase complexity...

Wed, 17/04/2013 - 16:30

I do not agree with this proposal. Contextual and provenance metadata are needed for records management not for "the cataloguing business". I see this request as a meta-metadata business need. You do not need this meta-meta-data for exchanging catalogues, but for other reasons related to legal and preservation requirements of the administrations.

In records management what you usually have is "business metadata" on one side, and on another side separate extemely rich metadata for archiving purposes, for solving many different "contextual needs" such as (1) documenting the transfer between systems, which implies the physical transfer of documents (in our case the data exchange), and equally important, the transfer of responsibilities on the records keeping; (2) setting the retention and disposition decisions legally established on the documents (in our case on the data --remember "the right to be forgotten" issue? or many other legal issues on the period of time the data should be accessible?); (3) setting the digital preservation conditions of the data--which is a really complex sub-domain in its own; (4) other records keeeping needs. See MoReq2010 at for detailed explanations on this (among many other authoritative sources).

If we try to define contextual metadata we risk to put our feet in a dangerous pitfall: re-inventing an extremely rich and complex model that it's being perfected "for some centuries" (by the way, "provenance" is one contextual metadata among many others, in fact it constitutes a very strong archival principle).

Instead of adding contextual metadata to the DCAT AP Profile I propose adding 0..* references to external metadata sources aimed at other needs beyond the catalogue exchange need, such as records keeping management needs (amongst other to be identified). In this way, DCAT-based catalogues could re-use existing powerful metadata models.

Fri, 26/04/2013 - 08:59

This use case has been included in Draft 2 (section 3.5).

Fri, 26/04/2013 - 14:02

Probably this is already implicit in the scenarios oulined by Enric, but I would like to stress the importance of provenance to assess data quality. This is a fundamental issue for data that are used as a basis for policy making, as most gov (and, often, also research) data are.

A possible alternative to Enric's proposal (i.e., adding references to external resources etc.) can be recommending the use of the W3C Provenance Ontology (PROV-O) [1], that defines a generic framework for representing provenance information, re-usable also to address scenarios not explicitly discussed in this thread (like the issue on dataset quality rating [2]).

NB: I don't see Enric's approach and the one proposed above as mutually exclusive. We could have either or both. The advantage of using PROV-O is to make provenance info directly accessible in a harmonised way. On the other hand, links to external sources concerning provenance, if available, are important exactly for the reason outlined by Enric - the re-use of existing and well-established metadata schemas.


Fri, 26/04/2013 - 14:06

Sorry, I forgot to mention that the W3C Provenance WG defined a set of mappings [1] making provenance information expressed by using Dublin Core interoperable with PROV-O.


Login or create an account to comment.