Navigation path

(
 
)
3.8/5 | 49 votes

TenderGalaxy: interactive visualisation of Dutch public tenders

(
 
)
5/5 | 1 votes |

TenderGalaxy is a web application that allows visitors to interactively browse the connections between Dutch government entities, their published tenders, and the businesses interested in these tenders. It was built by André Vermeij at the first Dutch Accountability Hack last fall, where it won the second prize.

Massive numbers

The application builds on open data published on TenderNed, the Dutch marketplace for public tenders. The metadata in these datasets provide a wide variety of relationships between data points, Vermeij explains. Each tender is connected to both the government entity that published it and the businesses that were interested in participating.

According to Vermeij, the full dataset includes 10,672 nodes (5,938 tenders, 1,084 government entities and 3,650 businesses) linked by 14,630 connections. Because of the massive number of nodes and connections, an approach was chosen where users can incrementally build the network based on searches for specific tenders, government entities or businesses. Clicking on a search result shows the direct connections of that specific node. These can be expanded successively to build up a view of the network step by step.

Policy context

The Dutch Accountability Hack took place for the first time in September last year. It's a hackathon organised by the Dutch Court of Audit (Algemene Rekenkamer) and the House of Representatives (Tweede Kamer), in cooperation with several Dutch ministries, Statistics Netherlands (CBS), and the Open State Foundation.

During the day, developers, designers and data journalists worked on challenges provided by the organisers or on their own ideas, thereby building on open data (e.g. public tender data) and open APIs (i.e. online information services).

Accountability Hack 2017

The second edition of this event, in which a total of 150 people in 30 teams participated, took place earlier this month. The winner of the first prize was the 'Gemeente Deler' ("Municipal Denominator") app, which allows users to compare municipalities with regard to their spending and performance.

The Court of Audit will collaborate with the teams in further developing and possibly launching their applications. Where relevant, the agency will connect teams to specific departments, provinces and cities, to increase the chances that their projects will grow into full applications that can be deployed inside or outside the public sector.

Description of target users and groups

Even though the TenderGalaxy application was developed on his own account and is publicly available, Vermeij describes himself as a professional rather than a representative of civil society. I love doing network analyses, and developing and publishing showcases is a good way — also in a commercial sense — to show my capabilities. That's why I participated again earlier this month in this year's Accountability Hack, this time working on a network analyses of the Dutch Top Sectors Policy and Horizon 2020 research projects.

Description of the way to implement the initiative

Vermeij is the owner of Kenedict Innovation Analytics, a one-man company specialising in network analysis, visualisations and innovation. The company performs analyses on patents, publications and projects, for example, in specific markets and areas. The outcomes — information and tooling — can be used by R&D organisations and research centres to gain insight into innovation structures, competition and opportunities.

Open Source Software (in the form of JavaScript web development front-end libraries), open data and public web services play a pivotal role in these analyses and visualisations.

Technology solution

The source for the information in the TenderGalaxy application was a dataset on all Dutch tenders for the period H1 2016, which was published as open data by TenderNed, What made this dataset specifically interesting to Vermeij was that it contained not only information on the consortia and companies behind the winning bids but also on all other interested parties.

I specifically focus on network analyses, so without this additional information the dataset would not have been very interesting to me. Now I was able to look at the cooperative and competitive relations between all participants.

Obviously, this information was not supposed to be out there, so this specific dataset was retracted later on and replaced with a set that contained only the winning bids.

Main results, benefits and impacts

Insights that can be gained from traditional analyses — statistics, that is — on tender data are often based on simple sums and counts, without taking into account the actual connections between data points, Vermeij continues. Outcomes typically include something like a top-ten list of government expenditure. Making these applications network-based and interactive allows for a far better utilisation of the available data. First, because an interactive visualisation can make an application more accessible and more attractive to a larger group of users. Second, because working with a network structure rather than a list or table allows users to discover more and deeper insights from the data.

Return on investment description

According to Vermeij, the development of the core of the TenderGalaxy application was all done during the one-day hackathon. After that, he only had to do some additional enrichment and cleaning up of the data and interface before the application was published online.

It took about one hour to get my head around the structure and content of the dataset, basically an Excel spreadsheet with one explanatory page. Then I needed a couple of hours to interconnect related columns, enrich the data, and convert it into a graph (i.e. a network data structure). That left more than half of the work in building the interface (i.e. the actual visualisation) and the functionality of the application (e.g. buttons, labels and search fields).

Toolchain

Most of Vermeij's data-driven web visualisation projects — and the TenderGalaxy application was no exception — start by converting the original dataset into a file in an open CSV or JSON format. Over recent years he has built a toolchain based on JavaScript and Python to process and interconnect the columns, allowing him to focus his creative work on the visual aspects of his applications.

The fact that most open data is currently published as computer files rather than through online web services means that only static applications can be built on this information. When new data becomes available online, it has to be fed into the application again in order to update it. A case study on the Flemish DYNAcity project describes some of the difficulties government agencies face in making reliable and scalable web services available to the public.

Track record of sharing

Vermeij's toolchain itself is not available as open source, for it may in the near future evolve into a commercial software product. The front-end of the application is fully based on open-source JavaScript libraries, for which the structure and configuration can easily be found through the source code of the web page.

Although the TenderGalaxy application has been visited only a few hundred times since its publication online, Vermeij's server logs show that most of the visitors come from ministries and research institutes. This application has not resulted in any concrete traction yet. I now use it in the guest lectures I'm giving on network analysis.

Lessons learnt

Data definitions and quality

According to Vermeij, combining datasets is the most difficult part of his work. First, there are differences in data definitions and specifications (sometimes undocumented) that make it hard to combine columns that don't fully match. Generally, the definitions of the fields are not the main issue. These tend to become clear from their names and context. For one of my analyses, however, I had to combine a portfolio of patents and a list of scientific publications. Both had columns of authors, but they were specified in completely different ways.

Second, there is the quality of the data itself. It turns out that a dataset I'm currently working on has fourteen different ways of writing names. I do use fuzzy matching to solve these kinds of issues, but for up to thousands of records I learned that doing it by hand works the best.

Innovation

Next to transparency, Vermeij sees innovation as an important driver for governments to publish and work with open data. Regional administrations, for example, are very interested in analyses of their innovation ecosystems. They want to combine information on patents, startups, research institutes and industry to gain insight into the knowledge and technologies that are available in their regions. In other words, the regions see open data as a resource for the development of future business.

Case Info

Acronym:
TenderGalaxy
Website URL:
Start date:
2016
Operational date:
09 September 2016

Information

highlight:
ePractice
Case status:
Operation
Case type:
General case study
Funding source:
Private sector
Geographic coverage:
Netherlands
Themes:
Communication (infrastructure), Education, Science and Research, eGovernment, Infrastructure, Open Source, Procurement, Regional and Local, Services for Businesses, Services for Citizens, Business and Competition
Type of service:
IT Infrastructures and products
Technology choice:
Mainly (or only) open standards, Open source software
Scope:
National, Regional (sub-national)
Type of initiative:
Project or service