Linked Data is about connecting pieces of related data and information coming from different sources (e.g. information systems and databases). This article explains how to identify and establish links between data.
We are now experiencing the bloom of the so-called Web of Data. This term denotes the evolution of the Web into an ecosystem of interconnected data and information contributed by individuals, governments, businesses and machines (e.g. sensors).
The real value of the Web of Data does not lie solely in the volume of data and information published online. Interestingly, it lies in the relationships between the data. These relationships (referred to as links) put data in context and enrich their meaning and expressiveness.
This means that putting the data online is not enough. Publishers need to ensure that data is made available in both human- and machine-understandable formats and is linked to other data.
Linked Data, a set of four design principles put forward by Tim Berners-Lee in 2006, serve exactly that purpose. Hence, in order to publish Linked Data, publishers should:
- Use Uniform Resource Identifiers (URIs) as names for things, e.g. http://dbpedia.org/resource/Brussels can be used for referring to the city of Brussels.
- Use HTTP URIs, so that people can look up those names.
- When someone looks up a URI, provide useful information, using the standards (i.e. RDF, SPARQL).
- Include links to other URIs, so more things can be discovered, e.g. from http://dbpedia.org/resource/Brussels a link is available to http://dbpedia.org/resource/Belgium.
Links and relationships can be identified between:
- Overlapping data resources, i.e. data resources that refer to the same entity (often sharing some common information).
In this case, the linking takes place at the unique identifier level (i.e. URI) of the different data resources. For example, the DBpedia resource for the city of Brussels (accessible at http://dbpedia.org/resource/Brussels) can be linked to the one maintained by the Statistics Belgium (accessible at http://location.testproject.eu/so/au/AdministrativeUnit/STATBEL/21004). Linking these two data resources allows us to get richer information about Brussels.
- Complementary data resources, i.e. data resources that refer to different entities that somehow relate.
Imagine that one of the attributes of the data entity for the city of Brussels is country. This attribute reveals that a city is positioned in/belongs to a country. In our case the value for country is Belgium. There are different options for encoding this information.
One way would be to include the value for country as text, e.g. a literal or a string. This option however cannot take us too far and can suffer from different writings, different languages and even spelling errors. The Linked Data approach in this case opts for replacing the text value with a URI pointing to the specific country, i.e. to Belgium (the URI of DBpedia’s resource for Belgium is http://dbpedia.org/resource/Belgium). The Linked Data option allows us to unambiguously refer to Belgium and also navigate through the links in order to collect more information about Brussels.
Find more practical information and examples about Linked Data
- Case study on how Linked Data is transforming eGovernment
- Linked Data – Design Issues (note by Tim Berners-Lee)
- 10 Rules for persistent URIs
- Core Location Pilot on interconnecting Belgian National and Regional address data
- How to describe organizations in RDF using the Registered Organization vocabulary
- SEMIC webpage on Joinup
- e-Government Core Vocabularies
Related News
- 2013-01-28 - New publication: Case study on how Linked Data is transforming eGovernment
- 2012-09-26 - Registered Organization and Core Location Vocabularies piloted in Greece