by Panos Charitos. Published: 24 April 2015

­On 20 November 2014, CERN launched the Open Data Portal, making data from real collision events at the LHC experiments available for the first time to the general public. This project is part of the Organisation's policy of openness, which is enshrined in its founding convention and has contributed to the creation of the open internet, the development of open source, and the dissemination of open access publications. In this framework, the LHC collaborations recently approved Open Data policies and will release collision data over the coming years.

"Data from the LHC programme are among the most precious assets of the LHC experiments, that today we start sharing openly with the world. We hope these open data will support and inspire the global research community, including students and citizen scientists," said Rolf Heuer, CERN Director-General.

The purpose of the ODP is to publish and archive data obtained by the CERN experiments, making them available to everybody for further analysis or use as educational material. Its development, as most projects' at CERN, was a collaborative process, requiring the concentrated efforts and hard work of digital library experts, data curators, meta data experts, researchers, and outreach teams from the four LHC experiments. The portal is using a number of different technologies to distribute and give access to the data, namely INVENIO, CernVM and EOS.

The Invenio Digital Library software enables users to run their own digital library n the web offering a valuable tool in digital library management. CernVM is a baseline virtual machine already used by the LHC experiments enabling users around the world to develop and run LHC data analysis locally on institutional and commercial computer clouds. Finally, ESO is a disk-based service that provides a low latency storage infrastructure for physics users. Based on these technologies, the new ODP assigns digital object identifiers to the data sets and code, making them citable objects in the normal scientific communications, with the aim of organising the content effectively. In addition, besides event data sets, users will be able to find open source software to analyse the data provided.

ALICE is also participating in ODP, making a public release of a number of datasets customised for demonstration and educational purposes. ALICE experiment works in close collaboration with DPHEP, the other CERN experiments and the CERN IT/GS for:

  • Implementing a common approach and generic solutions for data preservation and open access
  • Using the same data preservation principles and experiment policy guidelines
  • Open access portal, common analysis preservation framework, use of virtualization technology

ALICE is currently releasing about 8TB data of the reconstructed events corresponding to 2010 proton-proton and lead-lead data, that are currently being staged and indexed on the CERN data preservation portal. The analysis tools available on the portal allow only performing basic transverse momentum and pseudorapidity distribution plots, but more advanced analysis will be available in future releases. 

At this stage, a set of outreach and educational analysis exercises will be made available at the portal. They are based on specifically selected ALICE data, are widely used for the particle physics masterclasses, and come in the form of analysis packages and small datasets organised as root files. Although the tools are simplified, the users will get the feel of the real tools employed by physicists for data analysis. Each analysis downloads on demand the required software and data from a common graphics interface.

These masterclasses exercises highlight some of the ALICE physics. One concerns the search for particles containing strange quarks, based on their V0 decays; the motivation is to give an insight to how strangeness enhancement, one of the first signatures for the Quark-Gluon Plasma, is observed. Another exercise examines charged particle tracks; the aim is to calculate the nuclear modification factor RAA by comparing particle yields in the case of lead-lead and proton-proton collisions; the fact that RAA is less than one indicates suppression of charged particles due to interactions of partons with the QGP.

It should be noted that all four LHC experiments have approved data preservation and access policies which state that they will make their data (except level 4 data) available. ALICE developed its own policy approved by the collaboration board which can be found here: http://opendata.cern.ch/.

The plan is to release more collision data and analysis tools over the coming years to increase the scope and uses of the ODP. This open access initiative is only the beginning of a large scale effort to be pursued even beyond the experiment’s  lifetime