[Issue #05] Content of Datasets (DataElement)

23/05/2019

This issue comes from Issue #4, regarding how to describe the themes and the content of datasets. Issue #4 is about the thematic classification of datasets and this one about how to specify the content of datasets. This is, if datasets may be broken down in 'data elements' as in the Belgian Registry (http://vocab.belgif.be/ns/authsrc#):

  • Datasets are described as DataSource(s)
  • A DataSource contains (0..n) DataElement(s)
  • A DataElement describes what is contained within the dataset, including literals and the dcterms:type property with a set of SKOS concepts to be used (i.e., businesses, locations, and people).
Overview of classes and relations

Proposals/options to discuss:

  1. Create a similar structure of (DataSource ->) DataElement to contain the relevant pieces of data within datasets.
  2. Create a controlled fine-grained scheme to classify those topics (aligned with Eurovoc)
  3. Use Eurovoc terms directly to classify the topics.
  4. Use dcterms:type to classify Datasets directly.
  5. Create a new sub-property of dcterms:subject/dcat:theme to classify the fine-grained elements.

 

Shared in

Comments

Mon, 27/05/2019 - 17:00

This proposal would benefit from a pause and consideration of the changes being proposed by DCAT Version 2 (because in the Belgian model :DataSource is a subClass of dcat:Dataset [this is DCAT Version 1] - see https://vocab.belgif.be/ns/authsrc#DataSource ).  

There are significant changes from version 1 for dcat:Dataset

From https://www.w3.org/TR/vocab-dcat-2/#changes-since-20140116 Class: Dataset: In DCAT 2014 [VOCAB-DCAT-20140116dcat:Dataset was a sub-class of dctype:Dataset, which is a term of the DCMI Types vocabulary [DCTERMS]. This relationship has been removed in the revised DCAT vocabulary - see Issue #98.

[note that the Editor's Draft - https://w3c.github.io/dxwg/dcat/ - is now only going to have stylistic changes.  It is effectively completed]

In DCAT v2 a dcat:Resource dct:conformsTo some specification, and it is there that the model or set of constraints that describe the diverse data elements that make up the resource are placed.  This is perhaps the way to go for describing a dataset - it can then be either an XML schema, a spreadsheet of element names and descriptions, a UML diagram, or anything that be identified by a URI

Regarding proposal #1, it is probably worth discussing the merits of an approach similar to UN/CEFACT where there the "core components" can be either basic or aggregate business information entities. This means that a DataElement can contain other DataElements, and at each level there might be a link to a skos:Concept [or any other theme classifiers].  

 

 

Mon, 08/07/2019 - 10:57

We need a standardized way to describe the content (the data elements) in a dataset, in particular 1) if a DataElement isAuthoritative and 2) the Concept that a DataElement represents (the meaning of the DataElement). The content in the specification that a dataset conformsTo could be anything so we will not have a standardized way of describing these aspects using dct:conformsTo.

How about letting Dataset hasPart DataElement (Dataset as in DCAT2, a subclass of dcat:Resource)?

Login or create an account to comment.