[Issue #04] Subject of Datasets

Published on: 15/05/2019
Last update: 29/05/2019
Discussion

W3C DCAT (and DCAT-AP) defines a property to indicate the thematic scope of datasets. This is, dcat:theme (sub-class of dcterms:subject). The DCAT-AP recommends the use of the 13-term Data Theme Taxonomy defined by the EU Publications Office: http://publications.europa.eu/resource/authority/data-theme 

CODE Themes
AGRI Agriculture, fisheries, forestry and food
ECON Economy and finance
EDUC Education, culture and sport
ENER Energy
ENVI Environment
GOVE Government and public sector
HEAL Health
INTR International issues
JUST Justice, legal system and public safety
REGI Regions and cities
SOCI Population and society
TECH Science and technology
TRAN Transport

A dataset may include none or several dcat:theme properties, so a dataset composed of different data topics may be described using more than one dcat:theme properties. For instance, a registry of persons and vehicles would be classified in the categories TRAN and SOCI.

No discussion on the use of dcat:theme and the 13-term taxonomy for datasets, but this high-level classification seems not being enough to classify master data.

Proposals/options to discuss:

  1. Use only this 13-term taxonomy
  2. Create a controlled fine-grained scheme to classify those topics (aligned with Eurovoc)
  3. Use Eurovoc terms directly to classify the topics.
  4. Use dcterms:type to classify Datasets directly using other schemes.
  5. Create a new sub-property of dcterms:subject/dcat:theme to classify the fine-grained elements.

This issue included also the discussion of how to describe the elements of dasets. So, another issue was created to deal with this topic. See also [Issue #05] Content of Datasets (DataElement).

Comments are welcomed!

Shared on

Comments

Fotis Zygoulis (not verified)
Wed, 22/05/2019 - 13:33

Is it obligatory to use Eurovoc terminology to classify the topics?

Bart Hanssens (not verified)
Wed, 22/05/2019 - 16:37

I'd recommend to split this in two issues: whether or not to create a DataElement (something I'd recommend) and another one on the thesaurus to be used to classify the datasources.

For the classification I'd suggest to use dcat:theme as property and suggest (but not require, nor restrict) to use the EU data (portal) themes. I'm not against Eurovoc, on the contrary, but some users may fail a bit overwhelmed with such a large thesaurus.

In addition, regardless of the outcome of this issue, it would be nice if the EU Publication Office could publish a SKOS mapping to match the MDR data themes to Eurovoc terms (would be very useful in other domains / projects as well)

Thu, 23/05/2019 - 17:22

Thanks for your comments. This issue was already split in two, as suggested.

The current one remains as [Issue #04] Subject of Datasets (thematic taxonomy).

A new one, [Issue #05], was created to discuss the DataElement proposal.

Jim Yang (not verified)
Mon, 27/05/2019 - 08:09

Support creating a DataElement (which is already done).

Mon, 27/05/2019 - 09:42

Fotis, Eurovoc is not mandatory. Perhaps too detailed for an easy-to-use solution but definitively it should be recommended in some way.

+1 to Bart's proposal on the map of MDR dataset themes <-> Eurovoc. This link to Eurovoc would be really helpful.   

Tue, 03/09/2019 - 09:55

CLOSED ISSUE. Decision: to use Eurovoc + Themes NAL.