Skip to main content

Tutorial: Give software taxonomies multilingual labels represented in SKOS

Published on: 11/10/2012 Last update: 04/10/2017 Document Archived

The use of common, multilingual taxonomies to categorise software can significantly facilitate the exchange of software description metadata and thus contribute to making software searchable on the Web.This article provides an overview of our work to publish the software taxonomies included in the Asset Description Metadata Schema for Software (ADMS.SW) specification as machine-readable reference data with multilingual labels. For this work, we need your help to review and curate the translated labels in this Google Docs spreadsheet

The ADMS.SW v1.00 specification recommends but does not impose - see this issue for a discussion - a number of relevant software taxonomies, such as the Trove Software Map contributed by SourceForge (© SourceForge 2012 CC by) and the contributions by CENATIC and the Spanish Technology Transfer Centre. The objective of this work is to make the software taxonomies recommended in the AMDS.SW specification available in a way that maximally encourages reuse. This means that the taxonomies should meet three criteria:

  1. The taxonomies must be published in a machine-readable format;
  2. The taxonomies must have multi-lingual labels; and
  3. The taxonomies must be published as Linked Data.SKPS

Status of our work

The table below provides an overview of the status per software taxonomy and language. The colour codes indicate the following:

  • Red indicates that no translation has been provided;
  • Orange indicates that a machine-translation has been provided by a non-native speaker or via machine translation;
  • Green indicates that the translation has been reviewed by a native speaker.

The initials lised below are the following contributors:

  • BH - Bart Hanssens (FEDICT)
  • DDG - Débora Di Giacomo (SEMIC team)
  • MDK - Michiel De Keyzer (SEMIC team)
  • SK - Saky Kourtidis (SEMIC team)
 

dk -

Danish

en -

English

es -

Spanish

fr -

French

ge -

German

it –

Italian

nl - Dutch

Intended Audience     DDG SK   DDG MDK BH
Locale     DDG SK   DDG MDK BH
Operating System     DDG SK   DDG MDK BH
Programming Language     DDG SK   DDG MDK BH
Status     DDG SK   DDG MDK BH
Topic     DDG SK   DDG MDK BH
User Interface Type     DDG SK   DDG MDK BH

This Subversion directory will contain the latest version of our work.

How can you help?

You can help by reviewing the multilingual labels in the Google Docs spreadsheet for one or more software taxonomies.

  1. Become a member of the ADMS.SW project and electronically sign the ISA Contributor Agreement v1.1
  2. Contact us and tell us which taxonomies and languages you wish to curate, and send us your Google account;
  3. We will give you write access to the Google Docs Spreadsheet;
  4. Review the translated labels. Do not make changes directly in the text but rather add a comment;
  5. Our team will go over your review comments and update the spreadsheet. Periodically, we will generate a new SKOS file from the spreadsheet.

What is SKOS?

The Simple Knowledge Organisation System (SKOSis a lightweight RDF vocabulary that allows representing the terms in a controlled vocabulary as instances of the class skos:Concept. SKOS also defines properties for multi-lingual labels (skos:prefLabel),  associated codes (skos:notation), and definitions (skos:definition). The SKOS specification meets the three aforementioned criteria.

1.     Machine-readability: Using SKOS each term in each taxonomy can be represented in a machine-readable format containing definitions, labels, and related concepts for this term expressed in SKOS. SKOS is a W3C Recommendation and a commonly used representation format for controlled vocabularies. Well-known controlled vocabularies such as EuroVoc have been expressed using an ontology that extends SKOS.

2.     Multilingualism: SKOS allows associating labels and definitions in multiple languages to any concept. This means that we can associate the labels “Mobile devices”@en, “Dispositivos móviles”@es, “appareils mobiles”@fr or “Mobile Geräte”@de to the concept identified with URI http://dbpedia.org/resource/Mobile_device to include the English, Spanish, and German labels.

3.     De-referencing: SKOS requires each term in the controlled vocabulary to be identified by a corresponding term URI based on the HTTP protocol. The term “Taxonomy” in the “Asset Type” scheme has for example the following term URI: <http://purl.org/adms/assettype/Taxonomy>. This means that when someone else encounters such a URI, he can look up its meaning by entering the URI in the address bar of his browser. This is called de-referencing.  This is simple yet powerful feature of the Web. 

The SKOS wiki page maintained by W3C lists a number of tools that allow creating SKOS files. Listpoint.co.uk, the Code List Management Service, provides a free online editor for managing codelists and taxonomies on the Web and exporting them as SKOS RDF files. The Poolparty Thesaurus Manager also provides suitable software for maintaining large multilingual taxonomies and exporting them as SKOS RDF. Mondeca’s ITM tool also allows managing large thesauri collaboratively and has built-in support for SKOS.

Via this link  the entire SKOS representation of the ADMS Controlled vocabularies can be retrieved. The Poolparty.biz SKOS validation service can be used to validate the created SKOS file.

What is our overall approach?

We will develop the software taxonomies by talking the following steps:

  • STEP 1 – Represent the software taxonomy as tabular data. Use a spreadsheet to represent all taxonomies in a machine-readable format.
  • STEP 2 – Give a Uniform Resource Identifier to each taxonomy term. Give URIs for each taxonomy term. Where relevant, reconcile the URIs of the taxonomies with concepts that are included in DBpedia. This is relevant for programming languages.
  • STEP 3 – Translate the labels of the concepts in each taxonomy. Use a machine translation service to provide multilingual labels for each concept in each software taxonomy. Manually curate the translations. Here we need your help.
  • STEP 4 – Export as SKOS. Use the Google Refine RDF tool to publish the taxonomies as machine readable data.
  • STEP 5 – Promote the software taxonomies. Promote the use of the software taxonomies.

These steps are described in the remainder of this article. Our team will take care of the conversion into SKOS, but we do rely on your help in STEP 3.

STEP 1 - Put the Software taxonomies in a spreadsheet

The first step is to represent the software taxonomies as tabular data in the embedded spreadsheet. The table  below gives an excerpt of the “Intended Audience” taxonomy represented as tabular data. The columns are based on the following SKOS classes, relationships, and properties:

  • skos:ConceptScheme: The software taxonomies can be represented as instances of the class skos:ConceptScheme. For example, “Intendend Audience” with URI < http://purl.org/adms/intendedaudience/1.00> is a skos:ConceptScheme.
  • skos:Concept: each term within a controlled vocabulary can be represented as an instance of the class skos:Concept. For example, the term “Public Administrations” with URI < http://purl.org/adms/intendedaudience/PublicAdministrations> is a skos:Concept.
  • skos:hasTopConcept: the relationship “skos:hasTopConcept” associates a concept scheme with the top-level concepts within it. The concept scheme “Intended Audience” has the concept “Public Administrations” as a top-level concept.
  • skos:inScheme: the relationship “skos:inScheme” expresses to which concept scheme(s) a particular concept belongs. The concept “Regional Public Administrations” belongs to the concept scheme “Intended Audience”.
  • skos:broader: the relationship “skos:broader” allows indicating that one term is more general than another term. For example, the concept “Regional Public Administrations” has a broader term “Public Administrations”.
  • skos:definition: the property “skos:definition” allows addition a textual definition for the terms in a controlled vocabulary.
  • skos:notation: the property “skos:notation” allows linking a concept to a particular code.
  • skos:prefLabel: the property “skos:prefLabel” allows associating preferred labels to a concept in multiple languages. To denote the language of a label, it is represented as a language-tagged string. SKOS also allows defining alternative labels for a concept.

 

ConceptURI

a skos:Concept

ConceptNotation

skos:notation

Concept-Label

skos:prefLabel

ConceptScheme- URI

a skos:Concept-Scheme

http://purl.org/adms/ intendedaudience/Citizens

Citizens

Citizens

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/Developers

Developers

Developer Community

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/Individuals

Individuals

Individual User

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/PublicAdministrations

PublicAdministrations

Public administrations

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/NationalPublicAdministrations

NationalPublicAdministrations

 National public administrations

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/RegionalPublicAdministrations

RegionalPublicAdministrations

 Regional public administrations

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/LocalPublicAdministrations

LocalPublicAdministrations

 Local public administrations

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/Enterprise

Enterprise

Enterprise

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/SelfEmployedIndividuals

SelfEmployedIndividuals

Self Employed Individuals (sei)

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/intendedaudience/SME

SME

Small and Medium Enterprises  (sme)

http://purl.org/adms/ intendedaudience/1.00

http://purl.org/adms/ intendedaudience/LargeEnterprise

LargeEnterprise

Large Enterprise

http://purl.org/adms/ intendedaudience/1.00

STEP 2 - Give a Uniform Resource Identifier to each taxonomy term

In a second step, ensure that each concept in each taxonomy is given a Uniform Resource Identifier (URI):

  • Intended Audience - Trove: the URIs attributed by the SourceForge Trove taxonomy;
  • Intended Audience – Elena Muõz: purl.org URIs (see the table above);
  • Locale: the URIs attributed by the US Library of Congress ;
  • Operating System: The URIs attributed by SourceForge taxonomy and DBpedia.org;
  • Programming Language: The URIs attributed by SourceForge taxonomy and DBpedia.org;
  • Status: The URIs attributed by SourceForge taxonomy;
  • Topic: The URIs attributed by SourceForge taxonomy; and
  • User Interface Type: The URIs attributed by SourceForge taxonomy.

The taxonomy terms for ‘Operating System’ and ‘Programming Language’ have been supplemented with URIs from DBpedia.org. DBpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia.org project leverages this gigantic source of knowledge by extracting structured information from Wikipedia and by making this information accessible on the Web under the terms of the Creative Commons Attribution-ShareAlike 3.0 License and the GNU Free Documentation License.To lookup the DBpedia URIs, we have imported the ‘Operating System’ and ‘Programming Languages’ taxonomies into Google Refine. Using the Reconciliation feature of the Google Refine RDF, we have looked up a corresponding DBpedia URIs for each label in a semi-automated way, as depicted in the figure below.

Adms_tutorial_SKOS

STEP 3 - Translate the labels of the concepts in each taxonomy

As a next step, use the Google Translate API to translate the labels of all taxonomy terms into the target languages. The translation API has been called from within Google Docs. Machine translations need to be manually curated. Here we need your help as described in the above section 'how can you help?'.

ADMS_Tutorial_SKOS_2

STEP 4 - Represent in SKOS format

As a final step, use a tool like Google Refine RDF to export the taxonomies represented in the spreadsheet in SKOS format:

  1. Import  the spreadsheet in Google Refine;
  2. Define a mapping to SKOS RDF, as indicated in the figure below; and
  3. Export the taxonomies as RDF.

The entire SKOS representation of the software taxonomies can be retrieved in Turtle and RDF-XML syntax. The Poolparty.biz SKOS validation service can be used to validate the created SKOS file.

Adms_tutorial_SKOS_3

STEP 5 - Promote the software taxonomies

The value is in its use. Once finalised, promote the software taxonomies using all possible means.

 

Nature of documentation: Technical report

Categorisation

Type of document
Document

Comments