PR3 - Add new property to Dataset to indicate why the Dataset is restricted or non-public

10/03/2015

Description

From: http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2015-February/000120.html

Based on feedback from public sector organizations we have thrown in these extra properties for the dataset class:

  • Access Level -to distinguish open data from the rest by dividing into public, restricted and non-public datasets.
  • Access Rights - to express why the dataset is restricted/non-public. Applies only for restricted/non-public datasets.

Proposed solution

Add new property to Dataset to indicate why the Dataset is restricted or non-public.

Component

Code

Category

improvement

Comments

Tue, 24/03/2015 - 09:31

From POD's usage note: (https://project-open-data.cio.gov/v1.1/schema/#rights) This may include information regarding access or restrictions based on privacy, security, or other policies. This should also serve as an explanation for the selected “accessLevel” including instructions for how to access a restricted file, if applicable, or explanation for why a “non-public” or “restricted public” data asset is not “public,” if applicable. If the dataset can be made available through a website indirectly, use accessURL for the URL that provides such access. Example {"rights":"This dataset contains Personally Identifiable Information and could not be released for public access."}

Tue, 24/03/2015 - 20:01

Proposed resolution: No new property. This information may be included as text in dct:Description.

Tue, 24/03/2015 - 23:12

I think the issue here is how to provide machine-readable description of access levels (or restrictions), that would enable data consumers (users and software agent) to filter out data not satisfying given requirements.

About the access levels in POD, the term "restricted" probably covers very different situations - e.g., the data are behind a paywall, you have to register and/or to do additional things to get the data.

It would be however useful to have a distinction between discriminatory and non-discriminatory access. For instance, we have quite a few examples of "open data" that can be downloaded by anybody after being registered. This is different from the notion of "authorisation", where data are accessible after authentication only if you have the right priviliges. By contrast, non-discriminatory registration means that ALL registered users can access the data.

Non-discriminatory registration is usually motivated by the need of data producers/providers to have feedback on who is using their data, but some users may not be willing to provide even minimal personal information (e.g., name and email address) because of privacy concerns and/or other reasons. So, for them, it might be important to be able to exclude from search results also this kind of data.

 

Wed, 08/04/2015 - 07:07

The contextualization of Andrea is indeed relevant. I have encountered several public bodies that at the same time would like to offer the data to anybody but also would like to provide some SLA via registration prior having access to the data. 

In particular for larger amount of data and offerings via an API (in order to make sure that one can control erroneous programs) a form of registration is being used.

To the question: to which extend do we want to go in DCAT-AP? In order to create a machine to machine interaction which based on a query on the dataset catalogue automatically can retrieve access to all the data will imply the standardization (or a selection of existing models) on file & data & API access on the web.

However I agree that as a user of a dataset catalogue it is valuable information to know whether free and immediate access is possible (todays default in open data portals), free after registration access or it is a payed service.

Whereas the first option does not require more user feedback, the second and the third require a pointer to the page where the process is started and the conditions are explained. From the end-user perspective I believe this information is more valueable than introducing a property with a range in a controlled vocabulary.

Fri, 10/04/2015 - 17:40

We have also seen the potential need for an 'access level' property.

But maybe this is a more fundamental question - is DCAT-AP only designed for 'Open Data', in the sense that it envisioned to be only used with datasets associated with an open license? if so, there's no need for access levels, as all data is freely accessible.

 

However if DCAT-AP could be used for data portals with closed and/or restricted data, then it is important to be able to classify the access level in a machine-readable way, e.g. confidentiality, data protections, research, etc.

Fri, 10/04/2015 - 19:52

If indeed we should consider such a property, two questions arise:

 

1. is there a property in a well-known namespace that could be used for this? 

2. Is there an existing SKOS concept scheme that can provide the values for it?

Tue, 14/04/2015 - 11:44

I definitely agree with the analysis of Andrea and Bert. I would also like to hear the view of the pan-European data portal team, especially if access to "not open" data falls within its scope. 

Thu, 16/04/2015 - 10:14

+1

In Belgium, we have a few interesting discussions on various interpretations of "open", and the use of DCAT-AP for the exchange of metadata about not-entirely-open-data and the need to distinguish between them when presenting these datasets to the users.

Fri, 17/04/2015 - 10:14

The intention of this proposal was to widen the scope of DCAT-AP assuming it to be restricted to open data only. By adding accessLevel, agents can expose description on all af their datasets within the same cataloge.  Whys: * An assumption that knowledge of the existence of a datasets has value for reusers even if its non-public * Reusers find it hard to be specific when they push for more open data, because they don't know what the agents holds. Inventorylists will helps reusers to ask for data and public sector to make its priorities "on demand" *  If agents expose information on all of their datasets and express why  datasets is restricted/non-public, this will improve transparency on their judgements. * By opening up for restricted/non-public datasets, DCAT-AP becomes a useful tool for public sector bodies internal information management activities (assumption).   I agree  that a distinction between free and immediate access and flavors of non-discriminatory "access on demand" is indeed tempting, but this can be done on dcat:distribution. To me, discriminatory access is not compliant with open data and should not be in a open data cataloge if open data is the scope.   Does it make sence to split in two?: Add a propery for dcat:distribution for different flavors of non-discriminatory access and one for dcat:dataset to identify the accessLevel based on the legal aspects?   Examples of accessLevel for dcat:dataset:  * a dataset contain personal information = restricted (can e.g. be shared with other public bodies) * a dataset containing sensitive information = non-public ( no sharing possible)    Yes - I realise that this opens up a can of worms since it probably introduces a rights-statement for dcat:dataset AND widens the scope of DCAT-AP

Fri, 17/04/2015 - 11:05

In fact I think this makes even more sense at the Distribution level, given that quite frequently you could be combining free access distribution for a given Dataset with more restricted ones that require for example previous registration (e.g. an API distribution)

The content of this field is kept private and will not be shown publicly.