The eGovernment Core Vocabularies are simplified, re-usable and extensible data models that capture the fundamental characteristics of an entity in a context-neutral fashion. This guide presents some practical guidance on how users can take advantage of the Core Vocabularies. The information summarised here is available in the ISA² Handbook for using Core Vocabularies.
Interested users can take advantage of the Core Vocabularies starting in two ways:
- By designing a new data model and either binding it to an existing syntax or creating a new syntax for it; or
- By creating mappings from a data model to the Core Vocabularies’ conceptual data model and to the respective syntaxes.
There are three ways to perform the above:
- Select a standard syntax that can support the defined data model; or
- Create a new syntax (in case no suitable standard syntax is found); or
- Both, binding to a standard syntax and creating new elements.
When selecting a standard syntax, it should support requirements of the data model, but it also may provide for additional requirements, including optional or unnecessary data elements. This extra information can confuse eventual implementers, leading to different interpretations of the syntax and thus preventing interoperability.
When no standard syntaxes suitable to handle the data model exist, the authors of the data model should create a new syntax.
This guide to using the Core Vocabularies covers both cases. The figure below describes the steps in this methodology.
Step 1: Information Modelling
This step creates a conceptual data model covering the information requirements gathered when designing a data model. The output of this step is the conceptual data model aligned with the Core Vocabularies.
The conceptual data model should contain the following:
- Information requirement identifier: the unique identifier for the information requirement.
- Type of business term: Identifies if the information requirement corresponds to a class, a property or an association.
- Business term: The information requirement name. It has to follow the Core Vocabulary terminology when possible.
- Business term definition: The explanatory definition of the business term.
- Core Vocabulary identifier: The global and unique identifier of the Core Vocabulary concept for the business term for those business terms, where a corresponding Core Vocabulary term exists.
After defining the conceptual data model, the Core Vocabularies can help users:
- Check for alignment: the Core Vocabularies serve as a pattern to build similar data elements of the new conceptual data model. For instance, if the concept of Person is necessary in the conceptual data model, users should analyse the Core Person Vocabulary semantic concept to check if the Core Person attributes could serve as a pattern to fulfil the new conceptual data model requirements.
- Enhance names and semantics. Users should reuse the semantics provided by the Core Vocabularies for those data elements that have an exact match.
In order to perform these tasks:
- Download the Core Vocabulary spreadsheet from the Core Vocabulary repository.
- Identify common concepts. Compare the concepts of the new data model with the concepts defined in the Core Vocabulary in order to find matches.
- Align concepts and classes using the Core Vocabularies:
- Name data elements concepts with the Core Vocabulary terms when the match is exact. For non-exact matches, this activity allows finding synonyms, refined names in the context of the conceptual data model. Synonyms are acceptable, but the link to the corresponding Core Vocabulary term is necessary.
- Align data element descriptions using the Core Vocabulary descriptions.
- Align the data model classes to the Core Vocabulary classes. The conceptual data model classes should be a specialization of the Core Vocabulary classes by:
- Adding new properties representing new concepts in the new context.
- Removing properties from the Core Vocabularies not being used in the new context.
- Replacing properties and associations only when needed. Users should not replace properties or associations for narrow matches, as it can have a negative impact on interoperability.
- Identify in the conceptual data model the link to the Core Vocabulary using the Core Vocabulary identifier to ease the syntax binding and documentation steps.
Step 2: Business Rules
The previous step defined the information requirements. There are still action assertions, constraints, and derivations concerning some aspects of the conceptual data model that have to be defined. They are:
- Integrity constraints on the information model;
- Model dependencies and derivations;
- Inferences and mathematical calculations;
- Conditional business rules (including action assertions) and co-occurrence constraints; and
- Sets of allowed values for coded data elements.
The Core Vocabularies themselves do not define cardinalities for attributes nor business rules or additional constraints.
The outcome of this step is an enhanced data model with the cardinalities and constraints and the lists of sets of values that restrict the possible values for coded elements.
Step 3: Create data models using the Core Vocabularies
During the syntax binding process, the information requirements are bound to actual elements with a given syntax. When there is a standard syntax supporting a conceptual data model, it is recommended to maximally use the existing standard syntax. If no standard syntax is available, then a new syntax element can be created.
The process to create a syntax binding is:
1. Choose a representation format: the information requirements can be implemented in different ways depending on the use case, for example:
- XML Schema when creating an information exchange model or domain model;
- Linked Data (RDF), such as RDF/XML or JSON-LD, when creating an information exchange model or domain model.
- Data Definition Language (SQL) when creating a database.
2. Choose standard syntax bindings and naming and design rules: there are several standard syntaxes, depending on the domain of the conceptual data model and the selected representation format.
These standard syntaxes provide support for different domains such as transportation or procurement. For specific domains, other standard syntaxes exist, such as HL7 in the health domain, or XBRL for financial reporting.
In addition to standard syntax bindings, naming and design rules (NDRs) are necessary to create the actual syntax. Some naming and design rules according for XSD Schemas creation are:
The Core Vocabularies use the UBL methodology.
In most cases, the selection of the standard syntax also indicates the naming and design rules that apply.
3. Use existing mappings where available: The Core Vocabularies provide guidance in the syntax binding process as they pre-define a set of mappings to existing standard syntaxes. Currently, the following syntaxes have mappings to the Core Vocabularies:
- Core Vocabularies RDF Schemas;
- NIEM 3.0;
- UN/CEFACT CCL 13B;
- MUG- BII;
- OASIS UBL Common Library 2.1;
- KoSIT – XOV;
- Swedish Company data model;
- eIDAS minimum dataset;
- IMI Core Vocabulary 2.2;
- FSB Canonical Data Model PersonServices.
These mappings are available for download. These pre-defined mappings provide a consistent way to map the same concepts to the same syntax elements across projects and across domains. Use the Core Vocabulary mappings as follows:
- Download the Core Vocabulary spreadsheet from the Core Vocabulary repository.
- Identify the classes, properties and associations in the conceptual data model that refer to a Core Vocabulary identifier.
- Select the sheet "Mappings" from the Core Vocabulary spreadsheet. This sheet has the following information:
- Core Vocabulary Identifier: The identifier of the Core Vocabulary term.
- Relation: The type of relation with the Core Vocabulary.
- Foreign identifier: The identifier of the data element in the standard syntax.
- Foreign source: The name of the standard syntax.
- Comment: The additional comment to describe the relationship.
- Filter the sheet selecting the chosen syntax in the Foreign source column.
- Use the Identifier to find the corresponding class, property or association.
- Use the Foreign identifier as the mapping.
4. Use standard syntax where available: The information requirements that do not have a correspondence to a Core Vocabulary concept need to be mapped to the proper element in the standard syntax.
Use the semantics of the standard syntax to identify the mapping.
5. Mint new terms where needed: If an information requirement cannot be bound to the standard syntax, it will be necessary to mint new terms.
If the representation format is XML, the standard syntax needs to extend to add new terms. New terms must be used to create a new schema, and this new schema shall be then used.
6. Create a specific schema (validation artefacts): The outcome of this step is a schema, a specification that defines the new syntax.
- XML Schemas: Standardization Definition Organisations (SDOs) provide validation artefacts for their standard syntaxes, following predefined XML Naming and Design Rules (NDR). XSD Schema is the main type of validation artefact provided by SDOs. They validate that a particular XML document instance fulfils the structural and type constraints defined by the standard. Using the syntax binding process, additional constraints apply, on top of the standard restrictions. Developers must create additional validation artefacts to allow users to verify that instances fulfil the new data model restrictions. The validation artefacts can be created using different technologies:
- Restricted XSD Schema. An XSD schema restricted to the elements and attributes from the standard syntax actually used for the new data model.
- Schematron validation file. An artefact that checks for the presence of required data elements from the new model, and ensures there are no elements not belonging to the data model.
- RDF Schemas: Unlike XML, RDF Schema is intended for definition, and not for validation purposes. SHACL is appropriate for validation.
- Data Definition Language (SQL): The end product for a relational database representation is an SQL Data Definition Language (DDL) script that can be run to create a relational database structure that meets the information requirements and chosen syntax. Organisations use their own data base engineering methodologies, using logical data model design and naming conventions.
In summary, the Core Vocabularies help the implementers provide an appropriate mapping of the core concepts, which implies improving the interoperability of the conceptual data model, and leads to a consistent use of standard syntaxes.
Step 4: Syntax documentation and mapping
The aim of this step is to create documentation of the syntax that allows users to implement it, and at the same time allows the owner to claim conformance of the data model to the Core Vocabularies. The syntax documentation takes the following form:
- A mapping spreadsheet (mandatory): a spreadsheet documenting the syntax documentation and the mapping of the syntax to the Core Vocabularies.
- Schema annotations (optional): documentation provided as part of the validation artefacts
Mapping spreadsheet (mandatory)
The syntax documentation must be done using the spreadsheet that can be downloaded via the following link: http://mapping.semic.eu
In this spreadsheet, the sheet ‘mappings’ conforms to the mapping information. It already includes a number of sample mappings. The figure below contains a screenshot with information on the mappings. These mappings must be publicly accessible online as a Core Vocabulary self-conformance statement.
Schema annotations (optional)
The technical artefacts created in Step 4 should include Schema annotations capturing the mapping to the Core Vocabularies in order to be self-descriptive:
- Using XML Schema annotations: to include annotations for type definitions within the <xsd:annotation><xsd:documentation> elements. These annotations can be included as described in the figure below.
- Using RDF Schema annotations: the figure below shows an example of RDF Schema annotations.
The XSD Schemas should use <xsd:annotation> to describe the mappings to the Core Vocabularies.
The annotation documentation is useful to convey the following descriptive and mapping metadata:
- Identifier: The uniform resource identifier that is used to uniquely identify an element. This should preferably be an HTTP URI that is dereferenceable.
- Label: a meaningful label that represents the meaning of the element.
- Definition: a meaningful definition that univocally defines the element.
- Core Vocabulary URI: The global and unique uniform resource identifier that uniquely identifies the corresponding element of the Core Vocabulary, as it is defined in the Core Vocabulary specification.
- Core Vocabulary Version: The version of the Core Vocabulary specification.
- Mapping relation: The mapping relation of the annotated element to the Core Vocabulary element:
- Has exact match
- Has close match
- Has broad match
- Has narrow match
- Has related match
- Mapping comment: Explanatory comment on the mapping.