PR20 - Add new property to Distribution with the main identifier

10/03/2015

Description

Add new property to Distribution with the main identifier

Proposed solution

Add optional property dct:identifier to Distribution with the main identifier (URI or other unique identifier in the context of the Catalog).

Component

Documentation

Category

improvement

Comments

Wed, 08/04/2015 - 10:58

DCAT proper defines such a property for Dataset (http://www.w3.org/TR/vocab-dcat/#Property:dataset_identifier), although DCAT does not give a strong justification for it: "having it represented explicitly is useful" (for what?)

The question is: will such a property be used in practice? Is there any evidence that such a property is currently used in data catalogues, and is there evidence that it is being used for a particular purpose?

Wed, 08/04/2015 - 17:00

Plenty of catalogues use an internal ID for a dataset (e.g. CKAN, Socrata, etc). The point of it is that the title of a dataset, and therefore its URL can change, but the ID would always stay the same, so you can keep track of the dataset.

 

Of course in the linked data world don't use IDs much - the URI is the identifier and we prefer URIs to stay the same. My understanding is that if a URI had a spelling mistake and was corrected, there would is a way to express that the two URIs mean the same dataset (sameAs). But CKAN, Socrata etc will probably never do it that way - best to record their IDs.

 

So yes, let's include it in the DCAT-AP as an optional value, with the example as a CKAN id value.

Thu, 09/04/2015 - 14:51

David, dct:identifier is already specified as a property for Dataset in DCAT proper (http://www.w3.org/TR/vocab-dcat/#Property:dataset_identifier). DCAT does not specify it for any of the other Classes. 

The DCAT-AP mentions it as a optional property for Dataset.

The question is: do we want to make it an optional property for other classes (Catalog, CatalogRecord, Distribution) as well?.

The property dct:identifier is still expected to carry a URI, and in particular the URI of the Dataset so it is duplicated. The RDF statement would be:

<http://example.org/dataset/001&gt; <dct:identifier> "http://example.org/dataset/001&quot;

For other types of identifiers, there is the optional property adms:identifier for Dataset.

I don't understand your sentence "the title of a dataset, and therefore its URL can change". We're talking URIs here, not URLs, and URIs should not be dependent on titles. And: "Cool URIs don't change".

Fri, 17/04/2015 - 10:32

My two cents on this:

 

- AFAIK the range of dct:identifier is a Literal, not an URI

- I think David's point here is that several data catalogues use URLs and not URIs in practice, so they can't not rely on those as unique and persistent IDs

Fri, 17/04/2015 - 10:48

Yes, the range of dct:identifier is rdfs:Literal. That's why there are double quotes in my example.

Fri, 17/04/2015 - 12:14

Yes I saw the quotes at the example but also your textual explanation reads "The property dct:identifier is still expected to carry a URI, and in particular the URI of the Dataset so it is duplicated." where that looks more like a personal opinion that anything from the current formal DCAT-AP specification that reads "This property contains the main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the Catalogue." 

Wed, 22/04/2015 - 22:25

Use case has not been made clear. No change will be made.

Thu, 23/04/2015 - 06:35

Please could you make clear if the dct:identifier has to be a URI or not? And if so, why ist is there as Makx said, it would be a duplicate of the dataset URI. And to be consequent, isn't it a meta meta information in the context of the local catalog, and therefore should better be part of the catalog record?

Thu, 23/04/2015 - 07:56

There is no formal requirement that the string in dct:identifier is a URI. As Carlos points out it "contains the main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the Catalogue". The base definition from DCMI is "unambiguous reference to the resource within a given context". DCAT has a usage note that says "might be used as part of the URI of the dataset, but still having it represented explicitly is useful". So the dct:identifier might be a URI, part of a URI or some other unique identifier in the context of the catalogue.

The URI of the Dataset is already included in the CatalogRecord as the object of foaf:primaryTopic.

Mon, 27/04/2015 - 16:50

The content of this field is kept private and will not be shown publicly.