DT2: Service-based data access

06/06/2016

Based on the discussion during the DCAT-AP workshop in Rome (13/05/2016), we extracted the following key points:

In this discussion table, participants discussed the current limitations of DCAT-AP with regards to creating descriptions of datasets that are available via service endpoints, and proposed an approach for overcoming these. The DCAT-AP and the GeoDCAT-AP working groups have been discussing this issue: https://joinup.ec.europa.eu/discussion/service-based-data-access

The participants argued that the preferred option is to model service-based access to datasets as a dcat:Distribution. The modelling must cover both SOAP-based web service and RESTful service, e.g. a SPARQL endpoint. For example, in the case of SOAP-based web services modelled as dcat:Distribution, the value of dcat:downloadURL will include the link to the WSDL file.

In this case however, one of the limitations relates to the cardinality of dct:format, which is currently 0..1. This does not make sense in the case of services, as every format returned by the service would have to be modelled as a different dcat:Distribution. Hence, the participants agreed to propose a change request, i.e. changing the cardinality of dct:format from 0..1 to 0..n.

In order to model service-based access to datasets, the JRC are setting the value of dct:type to WEB_SERVICE, an authority code from the Distribution Types NAL of the Publications Office.

The Distributions Types NAL is not referenced in the specification of the DCAT-AP v1.1. The participants agreed to log the inclusion of a reference to it as a change request.

Additionally, the participants suggested that the Distribution Types NAL can be extended to support service types, such as SOAP-based web service, RESTful service and end-user application.

In order to describe service capabilities, the participants suggest reusing the metadata provided in the OpenSearch description documentvoid:openSearchDescription can be used for this. The idea is to include a reference to the XML OpenSearch description document in order not to redefine all its elements in RDF. The participants agreed that the following basic metadata should be completed in the OpenSearch description document: service type, name, possible values or data type, service binding, type of output (media type for format) and compression (for geodata).

Uve Woges provided an example (see table below) of distributing a dataset ABC via a specific download service XYZ. It is a HTTP/GET/KVP based API which is described in an OpenSearch description document. Transferring the HTTP/KVP service description into an external file, requires the development of an RDF-based representation.

 

The DCAT-AP “Distribution part:

                                   <dcat:distribution>

                                               <dcat:Distribution>

                                                           <dct:type rdf:resource="http://www.someServiceRegistry.eu/serviceTypes/XYZDownload/1.0"/>

                                                           <dct:title lang="en">Downloads parts of dataset ABC via specific download service XYZ</dct:title>

                                                           <!-- the download service provides the data in xml and jason format -->

                                                           <dct:mediaType rdf:resource="http://www.iana.org/assignments/media-types/application/xml"/>

                                                           <dct:mediaType rdf:resource="http://www.iana.org/assignments/media-types/application/json"/>

                                                           <!-- data compressed -> question: how to express "not compressed" ? -->

                                                           <ns:compression rdf:resource="http://www.iana.org/assignments/media-types/application/gzip"/>

                                                           <ns:httpService>

                                                                       <ns:HTTPService>

                                                                                   <!-- The service is not directly accessible but via the service endpoint described as URL-templates in an OpenSearch Description Document -->

                                                                                   <!-- we cannot currently include these URL-templates directly here because for OSDD there is not an RDF-model available -->

                                                                                   <dct:type rdf:resource="http://www.iana.org/assignments/media-types/application/opensearchdescr…"/>

                                                                                   <!-- Access URL of the OSDD via HTTP/GET -->

                                                                                   <dcat:accessURL rdf:resource="http://www.someXYZDownloadService.eu/OSDD.xml"/>

                                                                                   <ns:binding rdf:parseType="Resource">

                                                                                               <rdfs:label>HTTP/GET</rdfs:label>

                                                                                   </ns:binding>

                                                                       </ns:HTTPService>

                                                           </ns:httpService>

                                               </dcat:Distribution>

                                   </dcat:distribution>

 

 

The OSDD part:

 

<?xml version="1.0" encoding="UTF-8"?>

<OpenSearchDescription xml:lang="en" xmlns="http://a9.com/-/spec/opensearch/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:time="http://a9.com/-/opensearch/extensions/time/1.0/" xmlns:geo="http://a9.com/-/opensearch/extensions/geo/1.0/" xmlns:parameters="http://a9.com/-/spec/opensearch/extensions/parameters/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

            <ShortName>XYZDownloadService</ShortName>

            <Description>The XYZDownloadService................</Description>

            <Tags>DownloadService XYZ</Tags>

            <Contact>u.voges@conterra.de</Contact>

            <Url type="application/xml" rel="results" template="http://www.someXYZDownloadService.eu/datasetABC/download.xml?lang={language?}&amp;&amp;box={geo:box?}&amp;startDate={time:startDate?}&amp;endDate={time:endDate?}">

                        <parameters:Parameter name="lang" value="{language}" title="Two letters language code according to ISO 639-1">

                                   <Option value="en" label="English"/>

                                   <Option value="de" label="Deutsch"/>

                        </parameters:Parameter>

                        <parameters:Parameter name="box" value="{geo:box}" title="Defined by 'west, south, east, north' coordinates of longitude, latitude, in decimal degrees (EPSG:4326)" pattern="^[0-9\.\,\-]*$"/>

                        <parameters:Parameter

                                   name="startDate"

                                   value="{time:startDate}"

                                   title="Beginning of the time slice. Format should follow RFC-3339"

                                   minInclusive="2016-05-11T00:00:00Z"

                                   maxExclusive="2016-05-13T00:00:00Z"

                                   pattern="^[0-9]{4}-[0-9]{2}-[0-9]{2}(T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?(|Z|[\+\-][0-9]{2}:[0-9]{2}))?$"/>

                        <parameters:Parameter

                                   name="endDate"

                                   value="{time:endDate}"

                                   title="End of the time slice. Format should follow RFC-3339"

                                   minInclusive="2016-05-11T00:00:00Z"

                                   maxExclusive="2016-05-13T00:00:00Z"

                                   pattern="^[0-9]{4}-[0-9]{2}-[0-9]{2}(T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?(|Z|[\+\-][0-9]{2}:[0-9]{2}))?$"/>

            </Url>

            <Url type="application/xml" rel="results" template="http://www.someXYZDownloadService.eu/datasetABC/download.json?llang={language?}&amp;&amp;box={geo:box?}&amp;startDate={time:startDate?}&amp;endDate={time:endDate?}">

                        <parameters:Parameter name="lang" value="{language}" title="Two letters language code according to ISO 639-1">

                                   <Option value="en" label="English"/>

                                   <Option value="de" label="Deutsch"/>

                        </parameters:Parameter>

                        <parameters:Parameter name="box" value="{geo:box}" title="Defined by 'west, south, east, north' coordinates of longitude, latitude, in decimal degrees (EPSG:4326)" pattern="^[0-9\.\,\-]*$"/>

                        <parameters:Parameter

                                   name="startDate"

                                   value="{time:startDate}"

                                   title="Beginning of the time slice. Format should follow RFC-3339"

                                   minInclusive="2016-05-11T00:00:00Z"

                                   maxExclusive="2016-05-13T00:00:00Z"

                                   pattern="^[0-9]{4}-[0-9]{2}-[0-9]{2}(T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?(|Z|[\+\-][0-9]{2}:[0-9]{2}))?$"/>

                        <parameters:Parameter

                                   name="endDate"

                                   value="{time:endDate}"

                                   title="End of the time slice. Format should follow RFC-3339"

                                   minInclusive="2016-05-11T00:00:00Z"

                                   maxExclusive="2016-05-13T00:00:00Z"

                                   pattern="^[0-9]{4}-[0-9]{2}-[0-9]{2}(T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?(|Z|[\+\-][0-9]{2}:[0-9]{2}))?$"/>

            </Url>

            <Developer>Uwe Voges</Developer>

            <SyndicationRight>open</SyndicationRight>

            <AdultContent>false</AdultContent>

            <Language>en</Language>

            <Language>de</Language>

            <OutputEncoding>UTF-8</OutputEncoding>

            <InputEncoding>UTF-8</InputEncoding>

</OpenSearchDescription>

 

Component

Documentation

Category

improvement

Comments

Thu, 09/06/2016 - 09:56

One additional Indikation: When we transfer the HTTP/KVP service description into an external File, it becomes more complicated for the provider as he has to create and process this file on a web-accessible endpoint. This would speak for a description within the DCAT-AP representation. But then we need to develop (or use -if somewhere already existing) such an RDF-based representation.

Tue, 06/09/2016 - 16:01

Mon, 12/12/2016 - 22:41

During the "Smart Descriptions and Smarter Vocabularies" (SDSVoc) workshop, we had an ad hoc session on service / API-based data access.

I'm now going to include here the summary of that session, and some off-list email exchange following it.

Bar Camp session: Modelling service/API-based data access

Chair: Andrea Perego

Participants:

  • Makx Dekkers
  • Matthias Palmer
  • Christian Mader
  • Simon Dutkowski
  • Uwe Voges
  • Phil Archer (for some time)

Andrea: [Summarising the problem, starting from what proposed in the DCAT-AP IGs - see https://www.w3.org/2016/11/sdsvoc/SDSVoc16_paper_27#modelling-service-api-based-data-access]

Makx: Recap: we have files and endpoints/API - so this is about the distribution type. The type may be more than one.

All: [Discussion on the type of distributions from the OP's MDR and how to use them: http://publications.europa.eu/mdr/authority/distribution-type/ ]

Uwe: It may be not so straightforward whether it's a file or a service

Andrea: The distinction can simply be based on what we get back, a file or a query interface

All: [Discussing again about the distribution types from the OP's MDR: fine with downloadable file, fine with visualisation]

Matthias: It may be good to provide information on the "type of service", e.g., REST or SOAP - which is a different thing from the specific service interface - e.g., SPARQL, WMS, WPS

Makx: An option is to use the format, maybe

Simon: What we are still missing is how to query the service, and the relevant dataset

Makx: We have 4 levels: distribution type, service type, link to a service description, and a description of how to instantiate the service.

Simon + Matthias: [discussion about the need of being able to parametrise the call to the service to get the relevant dataset]

Matthias: I would also care having a textual, human-readable description.

All: [No problem to have it, but we have to decide whether to have also a machine readable description]

Phil: I think this came up frequently, and the issue is that there's no standard way of specifying the relevant data subset.

All: [Discussion on the possibility of using URL templates to specify parameters as a general approach - see https://tools.ietf.org/html/rfc6570]

Andrea: Trying to make the point:

  1. We agree to have the distribution type (dct:type) - see MDR NAL: http://publications.europa.eu/mdr/resource/authority/distribution-type/
  2. We agree to have a free text description to inform humans this is not a file and, possibly, how to use the service / API. A possibility is to use dct:description
  3. We agree to have the "service macro-type" (we can also call it "service category" or "service protocol") - SOAP, REST, etc. - but we have to decide how to model this.
  4. We agree to have the "service type" (dct:conformsTo) - WMS, SPARQL - see, e.g., https://github.com/OSGeo/Cat-Interop/blob/master/LinkPropertyLookupTable.csv
  5. We agree to have the template URL but (a) we need a specific property for this and (b) this does not address POST requests
  6. We agree to have the information used to instantiate the template URL - to be investigated how this information can be found / derived

Barcamp session closes. Results reported to the plenary by Matthias.

Mon, 12/12/2016 - 22:44

Mail from Uwe Voges (6 Dec 2016):

The only point I have is, how can a client detect if the response of a request send to the URL includes already the data or just the service description (an indirection to the data): e.g. a SOAP-based GetProduct request already provides the data while a SOAP-based GetCapabilities requests returns a service description including a link to a WSDL doc - both of the same service type "OGC EO Order 1.0") ? 

Mon, 12/12/2016 - 22:47

Mail from Matthias Palmér (6 Dec 2016):

I think that a distribution provides a way to get to the data of a dataset. If it is something that is downloadable, the distribution tells you about the format of the data and how to get to it (the downloadURL). In the same sense, for a service the distribution should tell you about the format of the data AND how to get to it. The latter includes the service endpoint as well as some information that helps you to use it (invoke it). The focus should never be solely on the service, the intent is to provide a way to get to the data.

So I think we better examplify, I am a bit unsure, but I guess a GIS dataset could be provided both via a VMS and a WFS service. If so I would argue they correspond to two distributions of the same dataset, e.g. getting the information in a structured format vs getting the information as images. 

(I am a bit new to the geo domain, so you have to pardon my ignorance here.) 

Assuming this is correct, an example of a WFS service is outlined:

Distribution type: http://publications.europa.eu/resource/authority/distribution-type/WEB_…

Service type: SOAP  (we have to introduce a vocabulary here)

Access URL:  ?   (*)

Download URL: -

Format: application/json,text/csv,application/gml+xml  (**)

URITemplate: http://example.com/wfs?service=wfs&version=2.0.0&request=GetFeature&typ…{featuretype}&outputFormat={format}

Invocation information: http://example.com/wfs?service=wfs&version=1.1.0&request=GetCapabilities

Conforms to: http://www.opengis.net/def/serviceType/ogc/wfs/

(*) Provide only if there is a nice webpage describing how this dataset is accessed via the WFS service.

(**) Assuming the requirement on a single format has been lifted

After writing this, I am starting to doubt the need for a "invocation information" property, maybe we can use the access URL for this anyway.

The logic would be quite clear:

  • If the distribution is downloadable, use downloadURL.
  • If the distribution is provided through a service, use the URITemplate.
  • In both cases accessURL can be used to provide more assistance for accessing the data.

Mon, 12/12/2016 - 22:53

Mail from Uwe Voges (7 Dec 2016):

From a formal point of view I guess there is no difference between a view service and a webService (as we try to differentiate): a getMap-request (a view-service) just returns another format (e.g. geoTiff) as getFeature-request (e.g. GML in some flavor).

The question for me is to which endpoints shall the distribution point: the (data)service-URL (e.g. the GetMap-request of a OGC WMS) AND/OR the serviceDescription-URL (e.g. the GetCapabilities-request) OR to something we specified on the SEMIC meeting in Rome: the URL of an OpenSearch-Description docment (OSDD) embedding one or both of the URLs described before). In any case we need to define what the response format is (e.g. geoTiff (for GetMap), Capabilities-Doc (for the GetCapabilities) or application/opensearchdescription+xml (for the OSDD)*). So we would maximally need two format: that of the data (returned after invoking the service) and that of the service-URL (which may in some case be the same (when it is already the link to GetMap)

* in OSDD we could provide more than one URL with different formats. But the problem with OSDD is that for every service a separate OSDD must be created, which may not be preferred by people…

I propose to provide one or more service endpoints having a different response formats.

Example:

  • GetMap / geoTiff (correct mimeType to use here)
  • GetCapabilities / OGCCapabilities (correct mimeType to use here)

The content of this field is kept private and will not be shown publicly.