StatDCAT-AP respects the conformance requirements defined for DCAT-AP version 1.1 (https://joinup.ec.europa.eu/release/dcat-ap-v11), which means that it will have, at least, the same mandatory classes and mandatory properties as DCAT-AP 1.1. StatDCAT-AP may extend DCAT-AP by specifying additional properties, as long as they are reused from existing RDF vocabularies. StatDCAT-AP may also provide usage notes for properties that already exist in DCAT-AP with specific recommendations for their use for statistical data.
During discussion with stakeholders it was suggested that it would be useful if quality aspects of datasets could be expressed.
Participants in this activity are asked to respond to the following questions:
- What kinds of quality information would be useful to express?
- Is quality information available in existing statistical systems and applications?
- How will exposing this information to general data portals enhance the discoverability of statistical datasets?
- Do you know of any in existing RDF vocabularies that could be used to express this information?
Component
DocumentationCategory
feature
Login or create an account to comment.
Comments
An option could be to use some properties from the Data Quality Vocabulary that is currently under development at W3C in the Data on the Web Working Group.
A simple way may be to add dqv:hasQualityAnnotation that has a range of dqv:QualityAnnotation, a subclass of oa:Annotation. DQV also uses dct:conformsTo which is already in DCAT-AP and is also, maybe more implicitly, a measure of quality.
On the issue of quality, another source of inspiration could be the list of quality-related concepts in EURO-SDMX Metadata Structure (ESMS). Some of those (e.g. Data description, Classification system, Reference area, Time coverage, Frequency of dissemination) are already covered elsewhere in DCAT-AP but many are not.
Two concepts have been separately proposed as additions (Statistical unit and Statistical population) but maybe there are others that could be considered useful for discovery of statistical datasets in general data portals.
We welcome suggestions for further additions in this area.
On our Statistics.gov.scot open data platform we've tried to model some of the ESS quality dimensions as RDF. Our aim was to provide some human readable metadata that would help users to understand the quality of the data.
An example of how we've done it is:
http://statistics.gov.scot/def/statistical-quality/accuracy-and-reliability
An example of a dataset containing some of these descriptions is (click the "About" tab to see them):
http://statistics.gov.scot/data/scottish-health-survey
I think it's useful information to provide to help users understand how the data can be used. I don't think it will enhance the discoverability of the data though.
The WG decided in the meeting of 13 May to discuss this issue further. It was noted that quality is an important aspect, but given the time constraint, the working group should decide whether a quality aspects extension will be included in the first version of the StatDCAT-AP, or in the second version (i.e. version 1.1, or version 2.0).
We are soliciting opinions and comments from the community on this issue .
Gregor, if I understand correctly, the properties defined in your ontology all are text-based. Would it be sensible to think these could be subproperties of dct:description?
Makx, before going into the reuse of RDF terms, I believe that your first question is key
For example, accuracy and reliability mentioned by Gregor, but also things like the data collection method, the sample, the statistical population etc. can all be determinants of quality.
The experts are invited to provide their input.
There are eight aspects of quality in the Scottish Statistical Quality Ontology (http://statistics.gov.scot/def/statistical-quality/ontology) of which Accuracy and Reliability is but one.
Another list of quality aspects is included in the ESMS structure.
At this point in the process, I think we need to decide on one of three courses of action:
Approach one is easy but does not reflect the opinions in the group that quality information is very important. Approach 3 may require more discussion and more time than we have at this stage of the work.
Would approach 2 be acceptable? In a further revision, we may then create subproperties of the 'generic' property.
Comments welcome.
Yes, that might be a better way to describe them. I've been speaking to Peter Winstanley in the Scottish Government to think about how we could better represent the quality metadata on statistics.gov.scot.
Comment by Chris Nelson
or it could be a link to the quality metadata as provided by a webpage
e.g. http://dsbb.imf.org/Pages/SDDS/DQAFBase.aspx?ctycode=LUX&catcode=EMP00
or Eurostat quality metadata as published on the web for a data set
Then you just need one property with a URL
Chris, the drawback of this solution is that it suffers in granularity and machine-readability.
However, it does provide information about the quality determinants, e.g. methodology, serviceability, accuracy and reliability, and accessibility.
Could those be used as as input to point 3 of Makx?
>>3. Add a set of properties based on an existing list of aspects to StatDCAT-AP. Possible sources for such a list are ESMS and the Scottish Statistical Quality Ontology.
Comment by Marco Pellegrino
In the ESS, we have an exhaustive framework for quality reporting, widely discussed, agreed upon and used by member States (http://ec.europa.eu/eurostat/web/quality/quality-reporting).
This is embedded into the ESQRS, which is – in SDMX terms – another MSD, different from ESMS. The distinction between the two is that ESMS is general, oriented towards dissemination and public, while ESQRS reports are sometimes restricted and are more oriented to data producers. But the vocabulary, implicitly, exists.
In principle, I would agree to option 3, but this would require a detailed description of all the properties now. Furthermore, different providers (Eurostat/ESS, OECD, IMF, other entities) might still structure quality info differently.
I agree that option 2 (or the URL solution brought forward by Chris) has the drawbacks that Nikos indicated, but it is a first solution that indicates the need of further improvements with a version 2. Actually, I would say that it would "oblige" to go towards a version 2. It would allow some experimentation in the short run. We would start using it, at Eurostat.
Option 1 is clear and clean, but too radical: it entails the risk of doing nothing now and just wait for next version. Are you sure we are going to have a new version?
Proposed resolution: Extension of DCAT-AP with DQV property dqv:hasQualityAnnotation
The resolution proposes alignment with the emerging W3C Data Quality Vocabulary https://www.w3.org/TR/vocab-dqv/), which “provides a framework in which the quality of a dataset can be described, whether by the dataset publisher or by a broader community of users”.
The property dqv:hasQualityAnnotation has a range that is a subclass of oa:Annotation from the Open Annotation Model https://www.w3.org/ns/oa, which allows annotations to be either embedded text or an external resource identified by a URI.
This would allow expression of quality information either as text or as a link to a document or webpage with details about the quality of the dataset.
The use of the suggested property from DQV does not preclude the use of more of the DQV model to describe the quality of datasets in the future. A future revision of StatDCAT-AP may consider to add more elements from DQV, for example the parts that allow the expression of quality measurements against quality metrics. Such quality metrics could be based on the quality specifications of the European Statistical System (ESS).
If the WG does not want to align with this W3C specification at this time, the alternative would be to create a new property to express quality statements in the StatDCAT-AP.
The property dqv:hasQualityAnnotation is included in Draft 4, section 6.2.2.