"Data is the fuel of growth"

OECD: open-source software pivotal in artificial intelligence

Published on: 23/11/2018
Last update: 06/12/2018

Open-source software plays an important role in artificial intelligence (AI). This includes specific software libraries, editors and development environments, and machine learning platforms. So say the authors of the latest OECD Science, Technology and Innovation (STI) Outlook (2018), who identified AI and (big) data as the most prominent disruptive developments in innovation.


According to the report, many of the tools deployed in AI exist as open-source software. "These include software libraries such as TensorFlow and Keras, and tools that facilitate coding such as GitHub, text editors like Atom and Nano, and development environments like Anaconda and RStudio."

But openness in a far wider sense is a fundamental theme all through the report, which emphasises the importance of open access, open science, open innovation (involving large communities of experts and consumers), and open data (FAIR [1, 2]) to enable and facilitate interoperability and collaborative development.

AI and big data

AI and big data turn out to be inextricably connected, specifically for deep machine machine learning: complex neural networks need to be trained (automatically) using huge datasets. One example is computer vision, which is important in robotics).

At the same time AI and open data have an impact on the scientific process itself: experiments are being replaced by data-driven validation, and discovery is becoming part of automated experiments. Some examples are the automated discovery of new medicines, application of pattern recognition to recognise skin cancer, and analysis of huge amounts of data in astronomy (from telescopes) and physics (e.g. from particle accelerators).

Data publishing

Making available the data underlying a scientific publication allows for validation and replication. Data citations have nowadays become a standard requirement for publications. And some journals including Science also require the software used to be made available.

From an innovation perspective, the availability of data for re-use at low marginal cost (i.e. open under the FAIR principles) allows for easy market entry by newcomers. To incumbents, meanwhile, not collaborating in an open way has high costs due to lost opportunities.

On the highest level, open data on the scientific process itself is required to create and adjust policies (meta-science, i.e. science on science). Such a change in policy management allows for data-driven evidence rather than indicator-driven evidence, which is "softer" and hence less effective.