Skip to main content

Number of observations

Published on: 28/04/2016 Discussion

StatDCAT-AP respects the conformance requirements defined for DCAT-AP version 1.1 (https://joinup.ec.europa.eu/release/dcat-ap-v11), which means that it will have, at least, the same mandatory classes and mandatory properties as DCAT-AP 1.1.  StatDCAT-AP may extend DCAT-AP by specifying additional properties, as long as they are reused from existing RDF vocabularies.

 

During discussion with stakeholders, the following additional property was proposed:

 

‘Number of observations’ as a property for Dataset

 

The number of observations provides information on the total number of values that are contained in the Dataset.

 

This property is intended to provide an indication of the size of a Dataset. DCAT-AP has an option to indicate the size in bytes of a data file through the property byteSize (https://www.w3.org/TR/vocab-dcat/#Property:distribution_size) for Distribution but that only gives the physical size of the dataset which is not the only aspect of interest for statistical Datasets.

 

The expected value for this property is a string in an agreed format, e.g. “20 observations”

 

Participants in this activity are invited to respond to the following questions:

  1. Is the information for this property available in existing statistical systems and applications?
  2. How will exposing this information to general data portals enhance the discoverability of statistical datasets?
  3. Do you know of any property in existing RDF vocabularies that could hold this information?

Please note that there is also a proposal (https://joinup.ec.europa.eu/discussion/number-data-series) to provide information about the number of series in the Dataset.

Component

Documentation

Category

feature

Comments

Makx DEKKERS Fri, 29/04/2016 - 13:15

An option could be to use the property dct:extent which is defined as "The size or duration of the resource". The range of this property is dct:SizeOrDuration, for which the definition gives examples "a number of pages, a specification of length, width, and breadth, or a period in hours, minutes, and seconds".

The definition implies that the value is a resource, so it would be expressed as a blank node with a rdfs:label with text.

This text could be normalised as suggested above: "20 observations".

 

 

Bert VAN NUFFELEN Mon, 09/05/2016 - 15:47

I see that the same predicate is suggested both for number of observations as for number of timeseries. How do we know which one we are dealing with (without reading the text as humans).

 

I also am not so in favor for a text "20 observations" because we touch on the language-aspect.

I prefer a clear value definition with a machine readable unit.

I rather propose to standarize a concept in a controlled vocabulary for this.

 

 

 

Makx DEKKERS Mon, 09/05/2016 - 17:39

The proposal is for a simple, human-readable approach. The language issue could be mitigated by using shorter strings, e.g. "obs." which would be understandable across several languages. This would be in line with usage of dct:extent for book descriptions, e.g. "25 p." (p. for pages).

 

Can you explain how this would work with a concept in a controlled vocabulary?

 

Another more 'semantic' approach would be to define a new property, e.g. statdcatap:numObs as a subproperty of dct:extent.

 

The question is: how would a more semantic approach enhance discoverability? Would a user only want to find Datasets that have a specific number of observations (select datasets that have less than x, exactly x, more than x observations)? Or is its primary use for display to a user as a measure of how big the dataset is?

Anonymous (not verified) Mon, 09/05/2016 - 18:23

+1 on structured, machine readable values, regardless of the possible use cases we (do not) foresee. The RDF Primer gives some simple examples relying on rdf:value. Of course, decoupling the actual value from the unit still implies we would have to define a new concept to identify that unit, one way or another.

An alternative to defining a subproperty of dct:extent, perhaps, could be to define the measure as a subclass of dct:SizeOrDuration.

Makx DEKKERS Fri, 20/05/2016 - 16:01

The WG decided in the meeting of 13 May 2016 not to include the number of observations. This issue is now closed.