Skip to main content
Owner
Irish Centre for High End Computing
Academia/Scientific organisation

The Big Data Sandbox was created in 2014 in the context of the UNECE Big Data project overseen by the High-Level Group for the Modernisation of Official Statistics (HLG-MOS). It provides a shared platform for statistical organisations to collaborate on evaluating and testing new tools, techniques and sources which could be useful for modern statistical analysis.

The Sandbox is a shared platform for users from various international statistical organisations to access remotely and collaborate on shared methodologies and datasets. The platform itself is composed of a set of 5 servers which enable the reliable storage of up to 56TB of data and high performance concurrent processing of this data across 80 CPU cores. Users access a unified login portal connected to the Internet via a high bandwidth network connection.    

The  platform can be extended both in terms of data storage capacity and computational performance simply by adding more servers.    

This platform was built entirely from Open Source software, and is based on Linux, Hadoop.

The Sandbox is governed through the UNECE (United Nations Economic Commission for Europe) High-Level Group for the Modernisation of Official Statistics  (HLG-MOS). The mission of the HLG-MOS is to oversee the development of frameworks, tools and methods, to support modernisation in statistical organizations. The aim is to improve the efficiency of statistical production, and the ability to produce outputs that better meet user needs.

The HLG-MOS has a diverse international membership with over 20 countries and organisations actively represented and using the Big Data Sandbox. Users include CBS (NL), CSO (IE), Hellenic Statistical Authority (GR), INEGI (MX), Istat (IT), ONS (UK), ABS (AU), Eurostat, UN, OECD.
The UNECE HLG-MOS is responsible for governance of the Sandbox and organises training and workshops for its use. Staff from Istat in Italy provide much of the technical training and support involved in using the Sandbox and assisting the collaborative projects. ICHEC in Ireland is responsible for hosting and maintaining the core infrastructure as well as providing technical support for software installation and user support. 

The Big Data Sandbox helps national statistical organisations through providing a shared platform on which to collaborate with their peers in exploring the value of emerging tools and datasets to augment the information they produce for governments and the public. The technical work and associated costs of designing, building and installing the software for this platform is handled by domain experts in ICHEC and organisations are free to concentrate on evaluating the benefit of Big Data methodologies but in collaboration with their international peers rather than in isolation.  

The Hadoop software ecosystem is notoriously difficult to deploy and manage. Despite being Open Source, there can be a high cost in human time associated with using it. ICHEC has extensive experience in managing complex, large scale, production oriented compute services along with providing technical support to the end users of these platforms. Thus, users can concentrate on using the tools and working on methodologies rather than having to struggle with complex software installation and configuration details. Users also benefit from having a larger shared platform that might be possible in their own institution which will enable bigger, more complex problems to be addressed.    

The approximate cost of the hardware is €50,000. The software used has no cost but staff time for technical maintenance and support and the data centre costs per annum amount to approx €30,000.

To a large extent, the point of the Sandbox is that it  provides a central shared resource so that organisations don't have to implement one of their own. the success of this is borne out by the number of organisations using it as indicated above. Having said that, some organisations such as Statistics Netherlands (CBS) have used the Big Data Sandbox as a prototype on which to base their own internal platforms. Here, the reason would be to enable the use of sensitive data but with the software tools and methodologies learned on the shared Sandbox.

ICHEC operates a dedicated web and email based Helpdesk ticketing system for users to submit issues. ICHEC can assist with software installation and upgrades, ensure a stable and functioning system, investigate performance issues and bring new users onboard. Future enhancements would be to offer more in-depth assistance with developing applications and using the existing tools optimally. This would require domain specific skills in data analytics and statistics as well as advanced IT technical skills and would align well with the educational courses currently run by Istat.

Detailed information

Last update

Moderation

Only facilitators and authors can create content.
Non moderated