Based on the discussion during the DCAT-AP workshop in Rome (13/05/2016), we extracted the following key points:
The participants discussed the different approaches for expressing relationships between datasets as proposed by the guideline on Joinup.
Some participants expressed their concern that one of the proposed workarounds proposed by the guidelines, i.e. describing members of a group as different distributions of a dataset, might create confusion. Is this use of Distribution compliant to its definition in DCAT?
In the current situation, i.e. with the current DCAT and DCAT-AP versions, expressing relationships requires a lot of (manual) work.
Other workarounds proposed by the guideline might also create confusion. For example, the guideline proposes using the hasPart and isPartOf attributes on the Dataset class. For some participants of the discussion, this could create confusion if the difference between a dataset and a catalogue are not clear anymore.
Regarding versioning, the participants noted that there should be a possibility to link a new data set to a previous version of the dataset directly.
The participants think that the best solution would be to extend DCAT, to then also extend DCAT-AP, foreseeing an option to express relationships between data sets.
There are two issues here that I'd like to comment on:
1. describing members of a group as different distributions of a dataset
This issue was discussed at length during the development of DCAT-AP (see https://joinup.ec.europa.eu/discussion/mo12-grouping-datasets). Various points of view were expressed, including the view that it would not make sense to model a dataset that has been split up into different time periods or locations as different dcat:Datasets. It was also concluded by the DCAT-AP working group that DCAT itself does not give guidance on whether distributions necessarily contain the same data in different formats -- DCAT is really silent on that, which means that differing opinions on this point can really not be resolved. This is a point that will certainly come up if and when W3C starts work on revising DCAT.
A possible example that indicates that it might be reasonable to have a single dataset with distributions that contain different data is the budget of a multi-annual programme. This might be managed as a single budget (e.g. in a spreadsheet workbook) with the individual years as worksheets in the workbook. The budget is managed as a whole and users can be expected to be interested primarily in the overall view. So it makes sense to publish this as a single dataset. This is obvious if the distribution is the spreadsheet workbook as a single file. If the publisher also distributes the information in CSV, the worksheets will be exported as separate files. It seems to me that it would then still be reasonable to treat the separate files as distributions of the single dataset, and not oblige the publisher to create separate datasets for what they perceive a single entity.
Given the situation that publishers use different approaches, the guidelines try to make people think from the perspective of practical use, rather than from theoretical considerations.
2. using the hasPart and isPartOf attributes on the Dataset class
I do not understand how the use of these properties both on Catalog and on Dataset creates confusion. The semantics of Catalog and Dataset are different and do not change because of using the same properties. After all, other properties are also used on both, e.g. dct:title, dct:publisher, dct:description etc. Applied to Catalog, dct:hasPart indicates that there is another Catalog that is part of this Catalog; applied to Dataset it says that there is another Dataset that is part of this Dataset. Where is the confusion?
I have an additional question, why is the "has part" 0..n and the "is part of" only 0..1. That implies that catalogs can have multiple parts, but the parts can only ever be assigned to only be part of one catalog. Is this correct?