The European Organization for Nuclear Research (CERN) is a prestigious research institution and intergovernmental organisation in the area of particle physics. Established in 1954, with representatives from 23 countries sitting on its Council, CERN is also renowned for being the home of the Large Hadron Collider (LHC), the world’s largest and most powerful particle accelerator.
To achieve its objectives as the world leading laboratory for particle physics, CERN gathers scientists of all disciplines, including computer sciences. CERN has a long-standing history of research in computer sciences and has been a pioneer in the development of open source software. In 1989, British scientist Sir Tim Berners-Lee invented the World Wide Web while working at CERN. The original purpose of the World Wide Web software was to create an automated information-sharing system between scientists in universities and research institutes around the world. In 1993, CERN released the World Wide Web software into the public domain and subsequently under an open source licence in order to maximise its dissemination.
CERN’s ability to combine top-tier technology and international collaboration among different organisations allowed the research institution to develop cutting-edge open source software to support their research activities. In 2019, CERN launched the MALT Project (Managing Accessibility and Leveraging open Technologies project) with the aim to mitigate anticipated increases in software license fees by transitioning to open source products. In December 2019, OSOR reported on an initiative adopted in the framework of the MALT project: the evaluation on the suitability of the open source email server Kopano, conducted by CERN. As part of the MALT Project, CERN also decided to adopt the open source messaging app Mattermost for its personnel in early 2020.
To learn more about the development and use of open source software by CERN, OSOR interviewed Bob Jones, head of the Helix Nebula Science Cloud project at CERN, who is also CERN’s representative in the European Open Science Cloud initiative.
OPEN SOURCE AT CERN
CERN, an early open source actor
CERN has a long-standing history in the development and use of open source software, starting with the development of the Wold Wide Web software. In the early 1990’s, the concepts of open source software and the public domain were not well-known nor clearly defined. CERN, a pioneer in the field, decided to develop its own open source software licence in order to guarantee the dissemination of scientific discoveries and to foster international collaboration. CERN opted for a fully permissive open source licence. Licensees had the right to release derivative works under a licence of their choice, provided that they displayed the statement attributing the credit of the initial work to CERN.
With time, CERN observed that a fully permissive licence allowing derivative works to be distributed under a different licence was not suitable to encourage the creation of open source communities. The software could be appropriated by third parties, thus limiting the possibilities of further distributing and modifying the software. CERN then decided to use copyleft licencing to ensure that third parties would not appropriate the open source software. Under these licencing conditions, licensees cannot redistribute the original software or a derivative work with fewer rights than the ones they received. CERN now promotes standard form licences certified by the Open Source Initiative (OSI) authority, such as GPLv3. The international recognition of such licences allows CERN to maximise the dissemination of their software.
Nowadays, open source software can be found in pretty much every corner of CERN, as in other sciences and industry. Indico and Invenio – two of the largest open source projects developed at CERN to promote open collaboration – rely on the open source framework Python Flask. Experimental data are stored in CERN’s Exascale Open Storage system, and most of the servers in the CERN computing centre are running on Openstack – an open source cloud infrastructure to which CERN is an active contributor. Of course, CERN also relies heavily on open source GNU/Linux as both a server and desktop operating system. On the accelerator and physics analysis side, it is also all about open source. From C2MON, a system at the heart of accelerator monitoring and data acquisition, to ROOT, the main data-analysis framework used to analyse experimental data, the vast majority of the software components behind the science done at CERN are released under an open source licence.
Key milestones of CERN’s open source initiatives
- 1989: Sir Tim Berners-Lee submitted his first proposal for what became the World Wide Web.
- 1993: The World Wide Web software enters the public domain.
- 1994: “Licensing the Web” under a custom CERN OSS license.
- 2002: First release of the Invenio software.
- 2004: The first CERN-branded Linux distribution (Scientific Linux CERN).
- 2011: Release of CERN’s open source hardware licence.
- 2012: CERN Open Source Licensing Task Force recommends GPLv3 for CERN-made software.
- 2018: Launch of the MALT project.
MOVING AWAY FROM VENDOR LOCK-IN: the MALT strategy
In recent years, CERN has been developing a new approach to prevent vendor lock-in by using open source solutions. In 2018, the strategy led to launching the Managing Accessibility and Leveraging open Technologies project (MALT) which is looking for simple exit strategies from proprietary software and low switching costs to open source alternatives. Aiming to provide high-quality IT services to the whole CERN community, the MALT project is focused on the search and/or development of open source alternatives. All software solutions being considered as an alternative to a proprietary solution used by CERN must undergo a strict selection procedure. The underlying aim of CERN’s open source policy is three-fold:
- Avoiding vendor lock-in.
- Reducing the price of its software, considering the total cost of ownership of software alternatives.
- Ensuring the long-term viability and ease-of-use of the software adopted by CERN.
CERN’s procurement processes
CERN is a publicly funded intergovernmental organisation. However, CERN is not required to follow national public procurement rules and thus has its own procurement procedure. CERN’s procurement procedure is very similar to public procurement procedures in place across the EU at the national level, and the rules are directed by the same values of transparency, accountability and value for money. Bob Jones expressed that CERN, in general, has a preference for open source solutions; however, should no open source software be available or mature enough, proprietary solutions are adopted by CERN. When CERN opens a procurement process for open source software, its evaluation criteria include the size of the open source community, the commercial support for the solution's distribution, and its governance model. The evaluation of these criteria ensures that CERN is using a sustainable open source software in the long-term.
When no open source software is available to respond to CERN’s needs, the organisation can decide to develop it in-house or contract a company to develop it for CERN. The decision is made according to CERN’s engineering availabilities and a cost-benefit analysis.
In the framework of the MALT project, if no free and open source software alternative has been identified by CERN, enterprise versions of open source solutions are taken into account. To that end, they must comply with the CERN cloud policy and be approved by CERN’s Cloud Licence Office to ensure appropriate data protection.
Next steps of the development of CERN’s open source strategy
Following ‘recent sharp increases in the cost of many licenced software products’ according to CERN’s website, the organisation is trying to find alternatives for 37 proprietary software products that are currently in use through the MALT project. As of July 2020, 11 projects are in the pilot phase, 15 are under development, six are still prototypes, and four are undergoing an evaluation. With the goal of striking the right balance between efficiency, agility and cost, CERN’s aim is to replace those 37 identified proprietary software solutions with viable and cost-effective open source solutions. Another key factor that triggered CERN’s switch to open source solutions is the scalability of open source software products, allowing the launch of affordable solutions for CERN’s vast user base.
Awareness about open source software is also an important part of CERN’s open source approach. After the adoption of the open source solution Mattermost messaging platform, a Mattermost channel dedicated to FOSS solutions was created. Additionally, the ‘FOSS at CERN’ online portal was also created in 2019. Although it is currently under development, the portal will host information on CERN’s in-house developed open source software. CERN is also the owner of public open source repositories on GitHub and GitLab.
THE INVENIO SOFTWARE, AN EXAMPLE OF OPEN SOURCE SOFTWARE DEVELOPED IN-HOUSE
Invenio, an open source framework for large-scale digital repositories
Developed by CERN more than 10 years ago, Invenio is an open source framework for large-scale digital repositories. First released in 2002, Invenio v1 was a software application that acted as both a digital repository and an integrated library system based on MARC21. In 2015, Invenio v3 received an in-depth update in order to better support large-scale research data management use cases as well as to integrate major developments in open source off-the-shelf solutions. Invenio v3 also marked the departure from using MARC21 as the core metadata format towards a fully JSON-based system supporting several different metadata formats. The open source software is now available on GitHub under a MIT licence.
Developed in Python, the Invenio software has three main components:
- InvenioRDM: A repository/document management platform running on container platforms such as Kubernetes and OpenShift. InvenioRDM is easily scalable and can be integrated with other open science infrastructures such as ORCID, DataCite and OpenAIRE. InvenioRDM comes with pre-configured repository profiles for institutional repositories (IRs), research data management (RDM) systems, and domain-specific repositories for health and biomedical sciences. Additionally, InvenioRDM is characterised by a high level of interoperability as it can be integrated with any Single Sign-On Solutions, storage systems, public or private clouds, permission models, and custom fields. Finally, its stable API allows for easy and secure customisation.
- InvenioILS: An integrated library system that provides a data model based on JSON Schema with structured bibliographic records such as Documents, Series, Items, and Electronic Items. Its high-quality user interface guarantees user-friendliness. InvenioILS’ powerful back office makes it simple to manage file circulation workflows.
- Invenio Framework: A code library to build large-scale information systems such as InvenioRDM and InvenioILS. Its flexible data model uses the JSON Schema to describe articles, books, theses, photos, videos, research data, and software. Invenio Framework has advanced file management features and a powerful search engine.
Development and implementation
The development of the Invenio software was initiated by CERN with the support of other universities and research organisations. CERN also received financial support from international organisations such as the European Commission. The underlying motivation for the development of Invenio was linked to transparency requirements and the desire to foster knowledge exchange. CERN wanted to encourage the barrier-free adoption of the open source software. Invenio is a convenient, affordable tool as well as being easily accessible since training material is also available. ‘The fact that it is open source software has made it easier for the network of partners and users to deploy it on a wider variety of platforms’, added Bob Jones.
Apart from the technical challenges raised by the development of Invenio, CERN also had to overcome challenges in terms of software use policy. Bob Jones described Invenio as ‘policy agnostic’ since the software does not determine the use policy for the operators which adopt it. Each operator is free to define its own conditions for sharing and giving access to the content of the library. Such freedom in user policy has been developed over time since restricting access to content can be necessary, be it for ethical or security reasons.
Funding and maintenance of the solution
The maintenance and funding of Invenio is guaranteed by several research institutes and public organisations around the world, led by CERN. Annual conferences allow the partners to share their views on the challenges and the success factors of the Invenio project, and they agree on the future milestones. Additionally, regular consultations of the user base are organised in order to tailor the solution to users’ needs. After the development of Invenio, CERN’s partners committed to continue supporting the software and updates, which are regularly released. Agreements are signed between the partners through a Memorandum of Understanding where they collectively agree to contribute resources to the development and the maintenance of the software. Other private organisations have offered to develop specific features, but - Invenio being a public sector-led project - a collaboration scheme is difficult to put in place. One of the solutions found by CERN is the use of grants. For Bob Jones, the maintenance of the Invenio open source software is thus stable. The latest version of the software, Invenio 3.3.0, was released in May 2020. It has grown significantly thanks to important updates to its core components and various engineering updates. The latest release also improved the interoperability of Invenio, making it ‘part of an ecosystem of collaborative tools’ in the opinion of Bob Jones.
Within CERN, Invenio benefits from strong support from the management since CERN itself has also used it as the basis for CDS (CERN Document Server), the official institutional repository of CERN, CERN Open Data portal, INSPIREHEP, HEPData, CERN Analysis Preservation, CDS Videos, and the Zenodo repository - the default repository for hosting the outputs of many research activities around the world. Invenio is a solution recommended by important international organisations such as the European Research Council and the European Commission, thus contributing to its attractiveness and visibility.
Return on investment
In the case of Invenio, there are no official figures to monitor the solution’s return on investment. However, the economic gains were multiple, according to Bob Jones. The code that Invenio is based on could be used to develop additional software.
Another important aspect of the software is the long-term perspective of document archiving and the ease of use, which could not be guaranteed with proprietary software. Users benefit from direct and immediate access to research materials shared on Invenio or on Invenio-based digital libraries.
Another measurement of Invenio’s success is the number of instances around the world based on Invenio’s source code and the sheer volume of material currently accessible on Invenio.
Regarding lessons learnt on the development and use of open source software by CERN, Bob Jones underlined the importance of implementing a transparent governance model since CERN is involved in several large international partnerships. Additionally, one should ensure that the decision-making process and the governance model are fair and transparent in order to foster efficient collaboration among partners. ‘Trust is a very important element in the adoption of open source software’, added Bob Jones. The implementation of an efficient governance model has an important effect on the long-term sustainability of the open source software. Indeed, open source software developed by small companies all too often becomes proprietary once the company is bought out. Setting a clear governance model helps to secure the long-term sustainability of the open source software solution and ensures that the software will remain open source. Another key element mentioned by Bob Jones to ensure the long-term sustainability of an open source project is the identification of international standards. Their adoption allows the open source solution to be reused more easily. Finally, other important elements to assess the sustainability of an open source project are the community vibrancy, funding, and the software’s technological maturity. When starting a new project, it is critical to evaluate those elements before scaling up.
A European pioneer in the field of open source software since the 1980s, CERN has paved the way for international collaboration and the use of open source licences with the aim of fostering the dissemination of research tools and findings. In the pursuit of long-term open source solutions independent of private companies, CERN has dedicated resources for the in-house development of open source solutions such as Invenio. The success of Invenio is a testament to the great interest from public organisations and the research community in long-lasting, transparent and scalable solutions.
Today, CERN continues to be a leading organisation in the field of open source. Beyond open source software, CERN also actively contributes to the development of open source hardware. In March 2020, CERN released version 2.0 of its open source hardware licence, introducing three variants of the licence – strongly reciprocal, weakly reciprocal and permissive. More recently, CERN has also contributed actively to researching open source solutions to tackle the COVID-19 crisis with the creation of the CERN against COVID-19 task force. The task force has developed medical devices under the CERN open source hardware licence. It has also implemented several partnerships with public organisations to foster the use of several open source software solutions for computing and data analysis in the context of COVID-19. CERN’s activities with open source software, typified by the MALT project, will continue and have attracted much interest from the research community as well as governmental organisations in CERN’s member states.