[Issue #02] property 'isAuthoritative' for datasets

Published on: 15/05/2019
Last update: 29/05/2019
Discussion

As Bart H. mentioned, the Belgian base registry model includes a boolean property "isAuthoritative" to indicate that a dataset or part of a dataset (called data element) is the official base data —a base registry can be a mix of authoritative and non-authoritative elements, e.g. a company base registry may also contain addresses because the official address registry is not yet in production.

There is always controversy when those flags are used to describe things semantically, but this is a direct solution to describe master data  and the parts that are official or not. Declaring this property is mandatory in this model. 

 

Proposal: to create the property 'isAuthoritative' to describe if data (dataset/datasource/data-element) is official or not. 

Shared on

Comments

Thu, 16/05/2019 - 11:31

The Dutch base registry model (currently Dutch commentary only) implemented a similar property, i.e. 'sc:authentiek', for the same reasons, but only with data elements in mind. It is also not mandatory yet.

Although there are some differences: +1 to the proposal

Jim Yang (not verified)
Tue, 28/05/2019 - 07:57

Support this proposal (during our work with the national data catalog, which also contains descriptions of all major base registries, such a property is miss in DCAT-AP).

Thu, 30/05/2019 - 17:47

+1 to the proposal too.

We don't have a registry of Spanish registries yet, but in our prototype we also use a similar property for data elements.

Tue, 03/09/2019 - 11:27

During the evolution of the specification, and trying to test the model with real-world examples, I realized that a new 'isAuthoritative' property is not the best option.

We all agree that it may be useful for most of the registries to express if certain data is official or not. The problem I see is about the semantics. What 'isAuthoritative' means? Different organisations may have different criteria to consider trustworthy the same dataset. An organization that compiles information from other organisations may considers the same information fully reliable (so non-authoritative). For instance, this may happen with regulators that collect information from private entities and republish them.

One solution for this is adding proper semantics to the description of 'isAuthoritative' (i.e., data_x is authoritative for org_x; data_y is authoritative for org_y). Through 'dct:conformsTo' we can specify that a Dataset is compliant with our concept of 'official data'. 

Another straightforward solution is relying on DQV to express the official nature of the data (i.e., dqv:hasQualityAnnotation and/or dqv:QualityPolicy).

We would need to define our standard to consider data as 'authoritative'. This will enable universal interoperability in terms of the official data source.

Any comment? 

Tue, 03/09/2019 - 13:24

An example in the Spanish case:

  • The "oX" organization provides the "dsXA" dataset with three data elements: dsXA = {de1; de2; de3}
  • But "oX"  is only the competent authority to asses the accuracy and validity of "de1" and "de2"; and "oX" imports "de3" from "oZ" because "oZ" is the competent authority to asses the accuracy and validity such "de3".
  • However, the three data elements are important to the dataset "dsXA".

This is because some registries need data from other registries in order to address their own functions appropriately.

I think that if any public organization is the competent authority of some registry, its datasets are always "official" datasets and they are conformed to the registry regulation. However, a dataset can included data elements whose accuracy and validity is not the responsibility of such public organization but of others.

So, I agree with Amir and I see "isAuthoritative" (or whatever you call it) is only applicable to data elements.

Thu, 05/09/2019 - 07:26

First, a new property "isAuthoritative" or using DQV: DQV is needed to describe qulity (accuracy, completeness, consistency, currentness, ...), conf. issue #6. However, if we are aiming at machine-processible semantics (which I hope we are), we should explicitly express if (parts of a) dcat:Dataset isAuthoritative or not (instead of using dqv:QualityAnnotation or pointing to a qdv:QualityPolicy). 

Second, at which level: I agree with Amir and Ana that "isAuthoritative" should be applicable at the level of "data elements". However, if we don't introduce DataElements as a new class, but use dct:hasPart to break down a dcat:Dataset into lower level dcat:Datasets (conf. issue #5), then it is ok to have "isAuthoritative" at dcat:Dataset, or at dcat:Resource such that it will also be applicable to dcat:DataService. 

Fri, 20/09/2019 - 10:52

This definition extracted from the Oxford Dictionary may be valid for our purposes: Proceeding from an official source and requiring compliance or obedience.

 

Anyway, personally I prefer the standard approach (using the Quality Data Vocabulary) since we would be able to describe the quality in terms of trustworthiness really straightforward. Indeed, there are classes and properties to make this possible (dqv:hasQualityAnnotation, ldqd:trustworthiness).
 

:myDataset a dcat:Dataset ;
  dqv:hasQualityAnnotation eg:isAuthoritative .

Then, we would need a common definition of isAuthoritative for our purposes (see at the bottom of the comment to simplify representation) that could be created and published by the Publications Office.

This is valid also at data element level. Following the proposal of using the Data Cube Vocabulary, this hasQualityAnnotation may be indicated in each measurement (AKA data element in our context) that defines the structure of a dataset, so it could be something like this:

eg:myDatasetDefinition a qb:DataStructureDefinition;
  rdfs:comment "personal data about drivers"@en;
  qb:component
    [ 
        qb:measure eg:sex,
        dqv:hasQualityAnnotation eg:isAuthoritative.
    ],
    [ 
        qb:measure eg:passport,
        dqv:hasQualityAnnotation eg:isAuthoritative.
    ],   

    [ qb:measure eg:postalCode,],
    .... 

 

This would be the corresponding definition of isAuthoritative.

eg:isAuthoritative a dqv:QualityCertificate ;
  oa:hasBody :isAuthoritativeNational;
  oa:motivatedBy dqv:qualityAssessment ;
  dqv:inDimension ldqd:trustworthiness .

eg:isAuthoritativeNational a skos:Concept ;
  skos:prefLabel ”Data is proceeding from an official source in Europe"@en ;
  skos:definition "Dataset maintained by an EU Member State… ."@en .


 

Please, correct me if I am wrong, but I think this proposal based on Quality Data Vocabulary, Data Cube Vocabulary solves this Issue #2 and the Issue #5. This would avoid creating new properties.

Thu, 26/09/2019 - 13:36

Our need is to express that something (a whole dataset, a data element in a dataset, a data service, ...) ‘is authoritative’, in a standardized and machine-readable way. [How each MS defines the authoritativeness (what requires to become ‘authoritative’) will have to depend on the national policies I believe (similar to being a 'citizen' of a MB).] 

In order to do it in a standardized and machine-readable way, and if DQV is the solution, I hope that we may all refer to a common definition using the same uri (let’s say isa:isAuthoritative), instead of each and every MS defining it in their own way (eg:isAuthoritative, sc:authentiek, no:autoritativ, …), although they all are dqv:QualityCertificates and all dqv:inDimension ldqd:trustworthiness, there may be other kinds of qualityCertificates in the dimension of ldqd:trustworthiness

By the way, what is the status of ldqd-definitions?