"500,000 data scientists need…

"500,000 data scientists needed in European open research data"

08/11/2016

There is an alarming shortage of data experts both globally and in the European Union. This is partly based on an archaic reward and funding system for science and innovation, sustaining the article culture and preventing effective data publishing and re-use. A lack of core intermediary expertise has created a chasm between e-infrastructure providers and scientific domain specialists.

These are three of the observations made in the report 'Realising the European Open Science Cloud: First report and recommendations of the Commission High Level Expert Group on the European Open Science Cloud'.

European Open Science Cloud

The European Open Science Cloud (EOSC) is an EC initiative under the Digital Single Market — part of the Digital Agenda for Europe 2020 programme. It is working towards an infrastructure to support Open Research Data and Open Science in Europe.

The initiative was established in September 2015 when it formed the Commission High Level Expert Group (HLEG) to advise on the governance and the funding of an Open Science Cloud. The group was asked to draw a clear roadmap and set concrete commitments for the Commission to make this vision a reality by 2020.

Core Data Experts

The authors of this first report describe a historically grown chasm between domain specialists and e-infrastructure specialists. While the traditional research analyst was a full member and co-publisher of the research team, the modern computer and data specialists come from scientific and engineering cultures with very different reward systems and incentives, different jargons, and very different skill sets. These cultural differences have resulted in alarming scarcity and loss of crucial data-related skills in research.

As a side effect of the above, there is an alarming shortage of data expertise in the EU, and a pressing requirement with regard to the data expertise needed to support the aims of the EOSC. It became clear — and has been reflected in nearly all stakeholder contributions to the HLEG — that there will be a major hole in the EOSC planning if we do not repair the significant lack of Core Data Experts.

We use the term Core Data Experts here deliberately, emphasising that we are dealing with a range of skills that warrant the definition of a new class of colleagues with core scientific professional competencies and the communication skills to fill the gap between the two cultures. The number of people with these skills needed to effectively operate the EOSC is, we estimate, likely exceeding half a million within a decade.

Recommendations

The authors recommend (recommendation I3) the funding of a concerted effort to develop core data expertise in Europe, comprising a very substantial training initiative in Europe so as to locate, create, maintain and sustain the required core data expertise.

This program should aim to:

  • by 2022, train hundreds of thousands of certified core data experts with a demonstrable effect on ESFRI/e-INFRA activities and collaboration and prospects for long-term sustainability of this critical human resource;
  • consolidate and further develop assisting material and tools for the construction and review of Data Management Plans (including budgeting for re-use of data) and Data Stewardship plans (including budgeting for data publication and long-term preservation in FAIR status);
  • by 2020, have in each Member State and for each discipline at least one certified institute to support implementation of Data Stewardship per discipline.

From data-sparse to data-saturated

Computers have long surpassed individuals in their ability to perform pattern recognition over large data sets, says Barend Mons, Chairman of the HLEG-EOSC, in his foreword to the report. Scientific data is in dire need of openness, better handling, careful management, machine actionability and sheer re-use. One of the sobering conclusions of our consultations was that research infrastructure and communication appear to be stuck in the 20th century paradigm of data scarcity. We should see this step-change in science as an enormous opportunity and not as a threat.

According to Mons, the science system is in landslide transition from data-sparse to data-saturated. Meanwhile, scholarly communication, data management methodologies, reward systems and training curricula do not adapt quickly enough if at all to this revolution. Researchers, funders and publishers keep each other hostage in a deadly embrace by continuing to conduct, publish, fund and judge science in the same way as in the past century.

The content of this field is kept private and will not be shown publicly.