Using the validator
There are four different ways to start using the validator:
- Online form: A web user interface allowing a user to provide the CSV content to validate.
- SOAP API: A SOAP API allowing contract-based machine-to-machine integration using SOAP web service calls.
- Docker image: for on-premise or local usage.
- Command line: available as an executable JAR file, packaged alongside a README file in a ZIP archive.
The following sub-sections describe in detail the above. Additionally, two screen casts depicting the use of the validator from the web interface and command-line tool are provided at the "Attachments" section at the bottom of this page as additional material:
The validator’s user interface is available at the
/csv/kohesio/upload path. The exact path when using the validator on the Test Bed is: https://www.itb.ec.europa.eu/csv/kohesio/upload
The first page that you see is a simple form to provide the CSV content to validate.
This form expects the following input:
- Content to validate: The Kohesio CSV file that will be submitted for validation. The preceding dropdown selection determines how this will be provided, specifically as a file input (pre-selected), as a URI to be loaded remotely or as content to be provided using an editor.
- Specify CSV syntax settings: This allows to define the delimiter character used to separate field values if different to the standard commas (i.e., “,” as defined in rule 2.5 of the CSV specification RCF-4180).
Once you have provided your input click the Validate button to trigger the validation. Upon completion you will be presented with the validation results:
This screen includes an overview of the result listing:
- The validation timestamp (in UTC), the name of the validated file and the applied validation type (the Core Kohesio CSV specification).
- The overall result (SUCCESS or FAILURE).
- The number of:
- Errors: critical errors that need to be corrected to ensure the conformance of the data.
- Warnings: irregularities that even if non-critical may hinder the consistency of the data.
- Information messages: noteworthy points identified during the validation.
Please, note that if the validation contains more than 5,000 report items, no PDF file nor online visualisation is generated. To consult the validation report, the XML report must be used instead, as it will show all report items up to its limit (50,000 items). The correct number of report items is correctly shown at all times (i.e., if there are 51,000 report items, these are reflected in the displayed count).
This section is followed by the Details panel, where the details of each report item are listed:
- Its type (whether this is an error, warning or information message).
- Its description.
Clicking on each item details will open a popup showing, within the provided content the specific point that triggered the issue:
As you can see, the numbers to the left refer to the row number on the CSV and include the header (column name). Then, per each row, the different columns are separated by the standard comma character “,” (the separating character can be different, if specified as previously mentioned).
In terms or reporting, besides the on-screen display, the following buttons are available allowing you to download the validation report:
- View annotated input: Opens a popup with the provided content, including annotations for the lines with relevant report items, just as when clicking on the details panel, but including the whole report.
- Download XML report: Download as XML.
- Download PDF report: Download as PDF.
Note that the download buttons are initially disabled but are enabled as soon as the respective reports become available.
Finally, to trigger a new validation you may either use the form from the top of the result screen or click on the form’s title that will take you back to the previous page.
The validator’s SOAP API is available under the
/csv/soap path. The exact path (path to WSDL provided) when accessing the validator from the Test Bed is https://www.itb.ec.europa.eu/csv/soap/kohesio/validation?wsdl
The operations supported are as follows:
getModuleDefinition: Called to return information on how to call the service (i.e. what inputs are expected).
validate: Called to trigger validation for the provided content.
getModuleDefinition operation, a request of:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:v1="http://www.gitb.com/vs/v1/">
Will produce a response as follows:
xmlns:ns2="http://www.gitb.com/core/v1/" xmlns:ns3="http://www.gitb.com/tr/v1/" xmlns:ns4="http://www.gitb.com/vs/v1/">
<module operation="V" id="ValidatorService">
<ns2:param type="binary" name="contentToValidate" use="R" kind="SIMPLE" desc="The content to validate, provided as a string, BASE64 or a URI."/>
<ns2:param type="string" name="embeddingMethod" use="O" kind="SIMPLE" desc="The embedding method to consider for the 'contentToValidate' input ('BASE64', 'URL' or 'STRING')."/>
<ns2:param type="string" name="delimiter" use="O" kind="SIMPLE" desc="The character to be used as the field delimiter."/>
Running the validation itself is done through the validate operation. This expects the following inputs:
contentToValidate: The CSV content to validate.
embeddingMethod: The way to consider the content provided for the contentToValidate input (STRING the default, BASE64 or URI).
delimiter: The delimiter character used to separate fields in the provided content (default is ,). This can be changed to fit any custom delimiter use in the CSV file.
The content to validate can be provided by any of three means that are determined by the input element’s embeddingMethod attribute. Specifically:
STRING: The content is provided as an embedded text within the request.
BASE64: The content is provided as a BASE64 encoded string.
URI: The content is to be loaded remotely from the provided URI.
Using the same example as in the online form validation section, validating via URI would be done using the following envelope:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:v1="http://www.gitb.com/vs/v1/" xmlns:v11="http://www.gitb.com/core/v1/">
<input name="contentToValidate" embeddingMethod="URI" delimiter=";">
With the resulting report provided as follows:
<ns4:ValidationResponse xmlns:ns2="http://www.gitb.com/core/v1/" xmlns:ns3="http://www.gitb.com/tr/v1/" xmlns:ns4="http://www.gitb.com/vs/v1/">
<ns2:item name="contentToValidate" embeddingMethod="STRING" type="string">
<ns3:error xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns3:BAR">
<ns3:description>[Row: 2]: At least one of the location indicator fields [Location_Indicator_Postcode, Location_Indicator_NUTS_code, Location_Indicator_latitude_longitude] must be provided</ns3:description>
The returned report is the same as the XML report you can download from the user interface (see Validation via user interface). It includes:
- The validation timestamp (in UTC).
- The overall result (SUCCESS or FAILURE).
- The count of errors, warnings and information messages.
- The context for the validation (i.e. the CSV content that was validated).
- The list of report items displaying per item its description and location in the validated content.
The purpose of setting up the validator as a Docker container is to host it yourself or run it locally on workstations. For this, the European Commission's Interoperability Test Bed has produced and made available a Docker image that can be easily be downloaded from Docker Hub and run locally.
The first step is installing and setting-up Docker (see Docker’s documentation).
If you are setting this up on your own workstation you will most likely need to use Docker Desktop (available for Windows, Mac and Linux based systems). For this, see the dedicated chapter of the Docker documentation.
Once Docker Desktop is installed, the next step is downloading the Test Bed provided image from Docker Hub by typing in a console the following command:
docker pull isaitb/validator-kohesio
Once the image has been successfully pulled from Docker Hub, just type the following command to run it:
docker run -d -p 8080:8080 --name validator-kohesio isaitb/validator-kohesio:latest
After the image has been started, everything is ready to start using the validator, both from the built-in graphic user interface from the browser or the SOAP API. For this, the same instructions detailed in the sections above can be followed. The only difference regards the path to access the service:
- For the user interface: http://DOCKER_MACHINE:8080 /csv/kohesio/upload
- For the SOAP API: http://DOCKER_MACHINE:8080/csv/soap/kohesio/validation?wsdl
Note for Windows users
Non-Linux users may need to switch the containers from Windows to Linux ones, as the provided image uses the latter. To do so, make right click on the Docker icon in the taskbar and choose “Switch to Linux containers…” option, as depicted in the figure below:
The online form may be suitable for validating relatively small files, but for larger validations it would be unnecessarily lengthy uploading data and downloading the reports one by one. The command-line allows validating large sets of files in few seconds and retrieving the related reports the usual way: directly printed in the command prompt, as XML or PDF files.
It is available as an executable JAR file, packaged alongside a README file in a ZIP archive. The ZIP file is downloadable from this link.
To use it you need to:
- Ensure you have Java running on your workstation (minimum version 11).
- Download and extract the validator’s ZIP file.
- Open a command prompt and change to the directory in which you extracted the JAR file.
- View the validator’s help message by issuing
java -jar validator.jar
> Expected usage: java -jar validator.jar -input FILE_OR_URI_1 ... [-input FILE_OR_URI_N] [-noreports] [-delimiter]
- FILE_OR_URI_X is the full file path or URI to the content to validate.
- SCHEMA_FILE_OR_URI is the full file path or URI to a schema for the validation.
The summary of each validation will be printed and the detailed reports produced in the current directory (as "report.X.xml" and "report.X.pdf")
Running the validator will produce a summary output on the command console as well as the detailed validation report(s) (unless flag -noreports has been specified).
To resolve potential problems during execution, an output log is also generated with a detailed log trace.
An example is shown below:
> java -jar validator.jar -input latest_IE_example_adapted_v0.2.csv
Validating 1 of 1 ... Done.
Validation report summary [latest_IE_example_adapted_v0.2.csv]:
- Date: 2020-12-17T09:44:56.230Z
- Result: FAILURE
- Errors: 1
- Warnings: 0
- Messages: 0
- Detailed reports in [D:\tmp\report.0.xml] and [D:\tmp\report.0.pdf]