TenderGalaxy is a web application that allows visitors to interactively browse the connections between Dutch government entities, their published tenders, and the businesses interested in these tenders. It was built by André Vermeij at the first Dutch Accountability Hack last fall, where it won the second prize.
The application builds on open data published on TenderNed, the Dutch marketplace for public tenders.
The metadata in these datasets provide a wide variety of relationships between data points, Vermeij explains.
Each tender is connected to both the government entity that published it and the businesses that were interested in participating.
According to Vermeij, the full dataset includes 10,672 nodes (5,938 tenders, 1,084 government entities and 3,650 businesses) linked by 14,630 connections.
Because of the massive number of nodes and connections, an approach was chosen where users can incrementally build the network based on searches for specific tenders, government entities or businesses. Clicking on a search result shows the direct connections of that specific node. These can be expanded successively to build up a view of the network step by step.
The Dutch Accountability Hack took place for the first time in September last year. It's a hackathon organised by the Dutch Court of Audit (Algemene Rekenkamer) and the House of Representatives (Tweede Kamer), in cooperation with several Dutch ministries, Statistics Netherlands (CBS), and the Open State Foundation.
During the day, developers, designers and data journalists worked on challenges provided by the organisers or on their own ideas, thereby building on open data (e.g. public tender data) and open APIs (i.e. online information services).
Accountability Hack 2017
The second edition of this event, in which a total of 150 people in 30 teams participated, took place earlier this month. The winner of the first prize was the 'Gemeente Deler' ("Municipal Denominator") app, which allows users to compare municipalities with regard to their spending and performance.
The Court of Audit will collaborate with the teams in further developing and possibly launching their applications. Where relevant, the agency will connect teams to specific departments, provinces and cities, to increase the chances that their projects will grow into full applications that can be deployed inside or outside the public sector.
Description of target users and groups
Even though the TenderGalaxy application was developed on his own account and is publicly available, Vermeij describes himself as a professional rather than a representative of civil society.
I love doing network analyses, and developing and publishing showcases is a good way — also in a commercial sense — to show my capabilities. That's why I participated again earlier this month in this year's Accountability Hack, this time working on a network analyses of the Dutch Top Sectors Policy and Horizon 2020 research projects.
Description of the way to implement the initiative
Vermeij is the owner of Kenedict Innovation Analytics, a one-man company specialising in network analysis, visualisations and innovation. The company performs analyses on patents, publications and projects, for example, in specific markets and areas. The outcomes — information and tooling — can be used by R&D organisations and research centres to gain insight into innovation structures, competition and opportunities.
The source for the information in the TenderGalaxy application was a dataset on all Dutch tenders for the period H1 2016, which was published as open data by TenderNed, What made this dataset specifically interesting to Vermeij was that it contained not only information on the consortia and companies behind the winning bids but also on all other interested parties.
I specifically focus on network analyses, so without this additional information the dataset would not have been very interesting to me. Now I was able to look at the cooperative and competitive relations between all participants.
Technology choice: Mainly (or only) open standards, Open source software
Obviously, this information was not supposed to be out there, so this specific dataset was retracted later on and replaced with a set that contained only the winning bids.
Main results, benefits and impacts
Insights that can be gained from traditional analyses — statistics, that is — on tender data are often based on simple sums and counts, without taking into account the actual connections between data points, Vermeij continues.
Outcomes typically include something like a top-ten list of government expenditure. Making these applications network-based and interactive allows for a far better utilisation of the available data. First, because an interactive visualisation can make an application more accessible and more attractive to a larger group of users. Second, because working with a network structure rather than a list or table allows users to discover more and deeper insights from the data.
Return on investment
According to Vermeij, the development of the core of the TenderGalaxy application was all done during the one-day hackathon. After that, he only had to do some additional enrichment and cleaning up of the data and interface before the application was published online.
It took about one hour to get my head around the structure and content of the dataset, basically an Excel spreadsheet with one explanatory page. Then I needed a couple of hours to interconnect related columns, enrich the data, and convert it into a graph (i.e. a network data structure). That left more than half of the work in building the interface (i.e. the actual visualisation) and the functionality of the application (e.g. buttons, labels and search fields).
The fact that most open data is currently published as computer files rather than through online web services means that only static applications can be built on this information. When new data becomes available online, it has to be fed into the application again in order to update it. A case study on the Flemish DYNAcity project describes some of the difficulties government agencies face in making reliable and scalable web services available to the public.
Track record of sharing
Although the TenderGalaxy application has been visited only a few hundred times since its publication online, Vermeij's server logs show that most of the visitors come from ministries and research institutes.
This application has not resulted in any concrete traction yet. I now use it in the guest lectures I'm giving on network analysis.
Data definitions and quality
According to Vermeij, combining datasets is the most difficult part of his work. First, there are differences in data definitions and specifications (sometimes undocumented) that make it hard to combine columns that don't fully match.
Generally, the definitions of the fields are not the main issue. These tend to become clear from their names and context. For one of my analyses, however, I had to combine a portfolio of patents and a list of scientific publications. Both had columns of authors, but they were specified in completely different ways.
Second, there is the quality of the data itself.
It turns out that a dataset I'm currently working on has fourteen different ways of writing names. I do use fuzzy matching to solve these kinds of issues, but for up to thousands of records I learned that doing it by hand works the best.
Next to transparency, Vermeij sees innovation as an important driver for governments to publish and work with open data.
Regional administrations, for example, are very interested in analyses of their innovation ecosystems. They want to combine information on patents, startups, research institutes and industry to gain insight into the knowledge and technologies that are available in their regions. In other words, the regions see open data as a resource for the development of future business.