Skip to main content

Contributions to open source software used as a proxy to measure Artificial Intelligence developments

AI in Open Source

Published on: 11/06/2020 News Archived

The Organisation for Economic Co-operation and Development (OECD) released a study on ‘Identifying and measuring developments in artificial intelligence’ in May 2020. The paper assesses global software advancements on Artificial Intelligence (AI) by measuring the number of contributions made to AI open source software on GitHub.

 

OECD Report

 

 

The OECD's paper, ‘Identifying and measuring developments in artificial intelligence’ aims to explore Artificial Intelligence and establish a common understanding of the technological developments that AI is comprised of. Additionally, the paper explores potential applications to help foster the practical development of AI. The study proposes an operational definition of AI based on its measurement in three distinct information sources:

  • Scientific development through published research papers;
  • Technological development through patents; and
  • Software development through open source software.

This three-pronged methodology aims at providing as complete of an overview as possible on existing AI developments. It is complemented by an experimental machine learning (ML) approach, tailored to the three aforementioned information sources, to analyse the collected data and gather insights on AI developments that occurred up until May 2020.

Focusing on research on software development, the OECD relies on the data provided through contributions to open source software on GitHub to track AI-related software developments and applications. The study focuses on information available in open source software, as data on proprietary software is rarely available. However, the OECD brings forward qualitative evidence suggesting that proprietary software is often built upon and combines open source components. For instance, Google’s TensorFlow is an open source software developed collaboratively on GitHub that has been used to programme numerous neural networks such as those of Google Translate or Twitter. Using open source software as a proxy for global software developments allows the study to grasp global technological advancements in the field of AI.

 

Understanding AI through Open Source Data

In order to have a better understanding of AI-related development, the OECD focuses on GitHub repositories labelled by the ML algorithm as being potentially related to AI. Using this technique, approximately 11,500 repositories were classified as being “unambiguously” AI-related. Using the available Readme files, which provide further information on a given solution, the repositories were analysed to further understand AI-related developments. Comparing this sample to the overall pool of GitHub repositories, the AI ones were found to contain more iterations of words such as “learning”, “algorithm” and “training”.

The OECD’s study shows that the contributions to AI-related repositories made up 0.26% of global contributions on GitHub in 2010, and 0.74% by 2017. Most of this growth took place between 2014 and 2017, when the number of AI-related open source software repositories was found to have grown approximately three times faster than other repositories.

Using the Latent Dirichlet Allocation algorithm, a topic modelling technique with the ability to scan a set of documents and detect wording patterns, the gathered software documentation (Readme files) were analysed to identify broad themes covered in AI developments. Among the resulting themes frequently found, notable themes are related to ML (several techniques, courses, deep learning etc.) and computational methods (including mathematics and statistics). Other recurrent themes include applications of AI such as image recognition, biology, text mining and simulation.

 

More Information