W3C DCAT (and DCAT-AP) defines a property to indicate the thematic scope of datasets. This is, dcat:theme (sub-class of dcterms:subject). The DCAT-AP recommends the use of the 13-term Data Theme Taxonomy defined by the EU Publications Office: http://publications.europa.eu/resource/authority/data-theme
CODE | Themes |
---|---|
AGRI | Agriculture, fisheries, forestry and food |
ECON | Economy and finance |
EDUC | Education, culture and sport |
ENER | Energy |
ENVI | Environment |
GOVE | Government and public sector |
HEAL | Health |
INTR | International issues |
JUST | Justice, legal system and public safety |
REGI | Regions and cities |
SOCI | Population and society |
TECH | Science and technology |
TRAN | Transport |
A dataset may include none or several dcat:theme properties, so a dataset composed of different data topics may be described using more than one dcat:theme properties. For instance, a registry of persons and vehicles would be classified in the categories TRAN and SOCI.
No discussion on the use of dcat:theme and the 13-term taxonomy for datasets, but this high-level classification seems not being enough to classify master data.
Proposals/options to discuss:
- Use only this 13-term taxonomy
- Create a controlled fine-grained scheme to classify those topics (aligned with Eurovoc)
- Use Eurovoc terms directly to classify the topics.
- Use dcterms:type to classify Datasets directly using other schemes.
- Create a new sub-property of dcterms:subject/dcat:theme to classify the fine-grained elements.
This issue included also the discussion of how to describe the elements of dasets. So, another issue was created to deal with this topic. See also [Issue #05] Content of Datasets (DataElement).
Comments are welcomed!
Comments
Is it obligatory to use Eurovoc terminology to classify the topics?
I'd recommend to split this in two issues: whether or not to create a DataElement (something I'd recommend) and another one on the thesaurus to be used to classify the datasources.
For the classification I'd suggest to use dcat:theme as property and suggest (but not require, nor restrict) to use the EU data (portal) themes. I'm not against Eurovoc, on the contrary, but some users may fail a bit overwhelmed with such a large thesaurus.
In addition, regardless of the outcome of this issue, it would be nice if the EU Publication Office could publish a SKOS mapping to match the MDR data themes to Eurovoc terms (would be very useful in other domains / projects as well)
Thanks for your comments. This issue was already split in two, as suggested.
The current one remains as [Issue #04] Subject of Datasets (thematic taxonomy).
A new one, [Issue #05], was created to discuss the DataElement proposal.
Support creating a DataElement (which is already done).
Fotis, Eurovoc is not mandatory. Perhaps too detailed for an easy-to-use solution but definitively it should be recommended in some way.
+1 to Bart's proposal on the map of MDR dataset themes <-> Eurovoc. This link to Eurovoc would be really helpful.
CLOSED ISSUE. Decision: to use Eurovoc + Themes NAL.