Skip to main content

Semantic enrichment of APIs: an OpenAPI case study

Line
Line

Introduction

The objective of this blog post is to explore the relations between the semantic layers and the technical layers when working with OpenAPI. It is targeted at semantic engineers, looking for solutions to publish APIs, or API designers who seek to improve the interoperability of their APIs. 

Any feedback regarding this article, the use of the presented solutions, use of other solutions, or experiences on semantics and APIs can be provided on the Style Guide GitHub repository under the label ‘Blog-OpenAPI’.

Line

Building on top of vocabularies

While semantic data specifications (which include vocabularies and Application Profiles) allow to describe domain-specific or cross-domain concepts with a focus on the semantic level, semantic engineers also encounter the need to use physical data models, such as XML- and JSON-schemas, that can be used by APIs for data exchange. 

Note: moving from the semantic level to the technical level is outside the scope of this blog post, the reader can find considerations on mapping XML to vocabularies here.

APIs represent a means to exchange data using defined architectural approaches, protocols and according to a physical data model. These physical data models play an important role as they can be used to validate the data exchanged.

Therefore, there is a certain relationship between data semantics, data specifications and APIs, even though the three concepts have their own life cycle, in that, they each go through distinct stages of development, deployment, and maintenance. By explicitly defining this relationship, for example by mapping concepts in the API response to a semantic data model, it becomes possible to find common models among APIs. This will lead to improved interoperability of said APIs. 

Several existing catalogues are already putting this into practice as described in the ‘OpenAPI adoption' and ‘OpenAPI and semantics' sections.

Line

API roles

The most generic roles which can be considered when talking about APIs are, on one end, the API service provider, and on the other end, the API consumer. These roles are similar to the technical terms, server and client respectively, used when talking about the provider and consumer of APIs.

Within the role of service provider, a distinction can be made between business roles and technical roles (Medjaoui, Wilde, Mitra, & Amundsen, 2019):

Business Roles Technical Roles
  • API Designer
    • Responsible for all aspects of design.
    • Bridges business and technical sides to ensure design aligns with technical KPIs.
    • Makes design choices such as allowing RDF serialization or implementing JSON-LD.
  • API Technical Writer
    • Writes documentation related to the API.
    • Documentation serves both internal and external stakeholders.
    • Ensures correct interpretation of semantic concepts and proper reuse.
    • Documents additional complexity from APIs enriched with semantics.
  • Backend Developer
    • Implements the actual interface of the API.
    • Manages data storage.
    • Connects API to necessary services for its operation.
  • Test/Quality Assurance Engineer (QA Engineer)
    • Validates API design and tests its functionality.
    • Tests for interoperability, scalability, security, and capacity.
  • DevOps Engineer
    • Builds and deploys the API.
    • Monitors the API's performance to align with KPIs.

On the consumer side the role of the developer can be found.

  • Frontend Developer
    • Integrates the API into applications.

These roles support the various methods to semantically enrich APIs that are presented in the next sections.

Line

API evolution

The first APIs were adopted some time ago in the 1970s (Date & Codd, 1974). Since then, the existing technologies have evolved drastically. This section aims to provide an insight into the API landscape, focusing on the technical interoperability reached by SOAP, REST, and GraphQL, and their enhancements, bringing semantic interoperability into play.

SOAP

SOAP is a protocol endorsed by W3C for exchanging structured information via web services, often paired with HTTP. It is known for its robustness and strict standards adherence. Key to SOAP is the WSDL, an XML-based interface definition language that outlines the functionalities of web services. WSDL describes how SOAP web service requests and responses should be structured and which protocols will be used for data exchange. This model allows any system that understands XML to interoperate with SOAP-based services.

Despite its success in service accessibility, SOAP struggled with achieving loose coupling, crucial for technological agility and evolvability.

In addition, SOAP's verbosity and complexity, stemming from its XML foundations, make it less appealing for modern web applications and APIs. JSON, designed to describe data structures in JavaScript, has become the preferred standard for data exchange. JSON's structure aligns well with many programming languages, simplifying data binding and serialization compared to XML.

REST

REST is an architectural style using standard HTTP protocols and methods. RESTful APIs are stateless, cacheable, and have a uniform interface by default, making them ideal for web services. While SOAP can also be stateless and cacheable, REST is inherently designed to be so.

HATEOAS, a REST principle, augments responses with hypermedia links to related resources, eliminating the need for clients to understand specific API URL structures. For example, a client requesting data on a certain customer would receive not only the customer data but also links to related resources, such as the number of orders placed by the customer.

REST's simplicity and scalability have made it popular, with technologies like Hydra significantly enhancing these attributes.

Hydra is a lightweight vocabulary that promotes hypermedia-driven web APIs. It helps developers design APIs adhering to HATEOAS, making APIs more flexible and easier to evolve. Hydra documents APIs in a human and machine-readable manner, detailing available operations, required inputs, and produced outputs. This approach reduces client-server coupling, addresses issues like over-fetching or under-fetching, and improves data transfer efficiency, performance, and user experience.

Hydra also enhances web API discoverability and machine-readability, facilitating automated documentation and enabling generic clients to interact with any Hydra-based API.

GraphQL

GraphQL is a query language offering a powerful alternative to REST for designing APIs. It allows clients to request exactly the data they need using a GraphQL schema, which can be more efficient than REST.

GraphQL-LD extends GraphQL queries with a JSON-LD context, enabling evaluation over RDF data. This combines JSON-LD's web-scale data integration benefits with GraphQL's querying advantages, though it is less expressive than SPARQL. However, for many application data retrieval tasks, GraphQL-LD is sufficient.

The image below provides an overview of the technical and semantic layers of these various APIs.

1
Line

OpenAPI adoption

OpenAPI represents a way to document API capabilities, thought mainly for REST API. There are different ways to document APIs such as RAML or API BluePrint. The OpenAPI Specification (OAS), however, has accumulated a large community of adopters.

Typically, there is a collaboration between technical writers and API backend developers that need to choose one of the two approaches to take:

  • Model driven approach: designing the OAS first, in YAML or JSON, and generate code afterwards together with the HTML documentation.

  • Code driven approach: create server code and let a server framework generate the documentation.

In both cases, at the end, client and server exchange request and response, respectively. This is usually done in a JSON format, according to a defined model. This model is described as components in an OAS file, such as a Person object with their properties.

The image below shows both approaches where usually the Swagger UI is used to generate HTML documentation by reading the OAS file.   

2

A backend developer can rely on the OpenAPI generator, which takes as input an OAS file, that contains components like Person or Address, described in either JSON or YAML.

The OpenAPI generator produces the server code (“Server Stub”) in the programming language chosen by the backend developer. The server code contains just the operations and the components as described in the OAS file; it is up to the backend developer to create the logic for the API implementation. Once done, the server code will be executed by a framework able to accommodate the requests and the responses that will be managed by the server code.

A frontend developer will later use the OpenAPI generator to produce the client code in the chosen programming language to connect to the API.

The OpenAPI documentation could be generated either at run time by a framework or, at design time, by the Swagger UI application having as input the OAS file.

All the API roles can take advantage of the OAS file:

  • As described, API frontend and backend developers can:
    • generate code for the client and server in different programming languages from the OAS file using Swagger Codegen
  • API technical writers can: 
  • API DevOps engineers can:
    • implement API gateways that can import OAS files, such as WSO2 API managerKong or Gravitee, so that they can monitor the usage of API.
  • API Test engineers can:

At European level, the use of APIs is recognised by the Directive (EU) 2019/1024 on open data and the reuse of public sector information which asks for mandatory use of APIs for High-Value Datasets (HVDs).

To support such directive, in 2018 the European Commission started the study on ‘Application Programming Interfaces for Digital Government’ (APIs4DGov) followed by the 2020 study ‘API for Innovative Public services’ (API4IPS) where OAS was found to be used by almost half of all reviewed API portals (Vaccari, Posada, Boyd, & Santoro, 2021).

Within the SEMIC Action, the use of APIs has been highlighted to support Public Services provisions in the 2019 paper ‘APIs for CPSV-AP based Catalogue of Services’; in this context, CPSV-AP is a semantic specification that allows Member States to describe public services.

On one hand, the increase spread of APIs, and particularly OAS, in governments brought the need to design rules such as those defined by Belgium or The Netherlands. On the other hand, it required the creation of registries, the most notable of which are:

Country Registry
Finland https://liityntakatalogi.suomi.fi/en_GB/ 
France https://api.gouv.fr/
Ireland https://data.gov.ie/dataset?res_format=API&_res_format_limit=0
Italy https://developers.italia.it/it/api 
Norway https://data.norge.no/data-services 

Among these registries, some go a step further regarding the use of OAS:

  • Finland created the tietomallit.suomi.fi portal from which data models can be exported in an OAS format to be reused as components.

  • France requires the use of OAS when submitting an API to the registry.

  • Italy created the National Data Catalog to analyse the components of an OAS to be linked at a semantic level and provides the service of the Italian OpenAPI Checker to validate OAS files against different rules.

Line

OpenAPI and semantics

While the focus of OpenAPI is at technical level, there have been different approaches for adding semantics to REST APIs.

Mapping

Adding JSON-LD context to the API response

The most common method to enrich APIs is to add a JSON-LD context on top of the API response, described in an OAS file, or passed via an HTTP header. This is the method of choice for SEMIC, where a JSON-LD context, associated with a model, is provided, such as in the Core Person Vocabulary shown below on the left. 

3

The JSON-LD context acts as a mapping mechanism between the concepts described in the API response and the URI of each single class and property described in the model. For example, the “Address” object will be mapped to the URI http://www.w3.org.ns/locn#Address in the API response as depicted on the right in the image above.

In that sense, the API technical writer should be minimally involved since the OAS file is not modified, and a semantic expert could work in coordination with the API backend developer to perform the mapping.

Adding metadata in the OAS file

With this method, additional metadata is included in the OAS file. It requires the API technical writer to be aware of such metadata.

Usually, an OAS file contains generic metadata:

  • to describe the API, such as a “title”, “description”, “contact”, “license” or “version”

  • to describe the operations, such as a “description”

  • for each single component, the “description” of the component and a “description” for each single property

  • to classify a list of tags, such as a “name” and description”

  • to reference external documentation via the “externalDocs” object that includes a “description” and a “url” that can be used at any of the above levels, as used by the Finnish portal as shown later.

Nothing forbids to add further metadata (but tools might be limited in recognising it) that is the case for Italy as is also shown later.

Examples

Among Member States, there are different approaches regarding the use of mappings to enrich APIs. The following two examples illustrate both methods:

  • The Finnish portal where each data model can be exported as JSON-LD context but also in the OpenAPI format.
4

The latter contains additional metadata pointing to the URI of the data model. As can be seen in the image above, each object is associated with the URI of the corresponding class of the data model via the “url” property in an “externalDocs” object, see for example the Location object.

These components could then be referred to within an OAS file provided by the API technical writer.

  • In Italy, the National Data Catalog, following the guidance of the PDND (Piattaforma Digitale Nazionale Dati) prescribing the use of semantic, includes a “Schemi dati” section that is a list of OAS specifications with their metadata.

5

Each component in the schema has an associated (persistent) URI. In addition, by clicking on “Vai al sorgente” button, the end user can find two files:

  1. The OAS file, in a YAML format, enriched with metadata associated with the objects (x-jsonld-type) together with a JSON-LD context (x-jsonld-context). By doing this, it is possible to map the properties of the object to those referenced in the context.

  2. An RDF file describing metadata for the OAS file, which must contain ADMS-AP_IT metadata

These two files are requested by the National Data Catalog to the Public Administrations, so that OAS descriptions can be harvested, with the objective to support developers in reusing concepts.

Transformation

The relation between the semantic and the technical layer is not just about mapping, but also transforming.

In the literature, the approach to enrich an OAS file semantically with the intention to transform it into RDF, has already been investigated for some years. Such approaches can be found in Cremaschi & de Paoli (2017) and Bonnel & Mouton (2021). Their objectives are to enable service discovery, reusability, and composition (the output of a service is mapped to an input of another service).

In Norway, the national data portal publishes API descriptions for their services by harvesting API descriptions, described directly in RDF, from different Public Administrations. Then, the portal displays the API descriptions using DCAT-AP.

In addition, the oastodcat tool is provided to the Public Administrations, so they can convert an OAS file to a dcat:DataService only extracting the metadata of the service and not the data itself.

The components in the OAS file could be considered as classes, with their properties and relations, that could be mapped to RDF Classes. 

In contrast, the transformation can also go from RDF (OWL) to an OAS file as it is the case of the OWL2OAS project.

Architectural

Where a transformation process takes an input and transforms it to a certain output in a single step, the process compiled in this section offer a more holistic architectural approach to generate various outcomes and integrations using certain inputs.

The approach taken by the Ontology-Based API (OBA) framework has ontologies as input and generates:

6

This last approach is the opposite of the one taken by the GRLC tool, where, starting from SPARQL queries, an OAS server is created with the respective OAS file.

Both approaches help semantic engineers in publishing APIs without the support of backend developers.

In other cases, a proper division between a semantic layer and technical layer could be possible, without touching the API description. For example, by having a SPARQL endpoint behind a REST API. This is already possible in Finland, where SPARQL queries are passed as parameters to the REST API. 

The same approach will be available in Italy to monitor the use of code lists stored on SPARQL endpoints.

Line

Conclusions

At the technical layer, APIs enable the exchange of data between information systems. This requires defining a data model, which can be derived from a semantic data specification.

OpenAPI is a well-known method for describing API capabilities, including how the API operates and its associated data model. Additionally, various tools exist to address the needs found in the various API roles.

A semantic engineer can work together with the team responsible for developing an API by applying semantics in different ways. This can be done by performing mappings (creating a JSON-LD context or enriching OAS files), transforming (from OWL to OAS file and vice versa) or leveraging frameworks to publish API starting from ontologies or SPARQL queries.

Line

Glossary

Abbreviation Meaning
API Application Programming Interface is a way for two or more computer programs or components to communicate with each other.
GraphQL(-LD) GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL-LD extends those queries with a JSON-LD context.
HATEAOS HATEOAS (Hypermedia As The Engine Of Application State) is a constraint of the REST application architecture that allows clients to dynamically navigate resources by providing hypermedia links with the responses.
HTML HTML (HyperText Markup Language) is the standard markup language used for creating and structuring content on the web, including text, images, and links.
HTTP The Hypertext Transfer Protocol is a protocol designed to transfer information between networked devices using hypertext links.
JSON(-LD) The JavaScript Object Notation (JSON) is a lightweight data-interchange format that is both human-readable, machine-readable and completely language independent. JSON-Linked Data (JSON-LD) is a method for using JSON in a linked data context.
OAS OpenAPI Specification, is a specification for a machine-readable interface definition language for describing, producing, consuming and visualizing web services.
OWL OWL (Web Ontology Language) is a semantic web language designed to represent complex information and knowledge about things, groups of things, and relations between things.
RAML RESTful API Modeling Language (RAML) is a YAML-based language for describing static APIs designed for REST APIs, but capable of describing other APIs as well.
RDF RDF is a standard model for data interchange on the Web developed by W3C.
REST The representational state transfer (REST) is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web.
SOAP The Simple Object Access Protocol (SOAP) is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks.
SPARQL SPARQL  is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.
W3C The World Wide Web Consortium (W3C) is an standards organisation that develops standards and guidelines to help build a web based on the principles of accessibility, internationalisation, privacy and security.
WSDL The Web Services Description Language (WSDL) is an XML-based interface description language that is used for describing the functionality offered by a web service.
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
YAML YAML (Yet Another Markup Language) is a human-readable data serialization standard commonly used for configuration files and data exchange between languages with different data structures.
Line

References

Bonnel, N., & Mouton, A. (2021, May 12). Semantic Open API Specification library. Retrieved from GitHub: https://github.com/koumoul-dev/soas

Cremaschi, M., & de Paoli, F. (2017). Toward Automatic Semantic API Descriptions to Support Services Composition. 6th European Conference on Service-Oriented and Cloud Computing (ESOCC), 159-167. doi:10.1007/978-3-319-67262-5_12

Date, C., & Codd, E. (1974). The relational and network approaches: Comparison of the application programming interfaces. SIGFIDET '74: Proceedings of the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description, access and control: Data models: Data-structure-set versus relational, 83-113. doi:https://doi.org/10.1145/800297.811532

Medjaoui, M., Wilde, E., Mitra, R., & Amundsen, M. (2019). Continuous API Management. O'Reilly Media, Inc.

Vaccari, L., Posada, M., Boyd, M., & Santoro, M. (2021). APIs for EU Governments: A Landscape Analysis on Policy Instruments, Standards, Strategies and Best Practices. Data, 6, 59. doi:https://doi.org/10.3390/data6060059

Line