by Iva Raynova. Published: 30 July 2016

The Data Quality Monitoring shift

Imagine a camera. A professional one with custom-made parts, which you are using for a very big and important project. You carefully adjust all the settings, take care of the aperture, the shutter speed, but… you never look through the lens. It wouldn’t be a surprise when later you discover that all your photographs are blurry and overexposed and you just cannot use them.

To avoid a similar scenario happening in ALICE, our much more sophisticated “camera”, we have the Data Quality Monitoring (DQM) system. Absolutely crucial for the experiment, it prevents us from recording bad or low-quality data by alarming the DQM shifter in the ALICE Run Control Centre (ARC) of any potential problem with the raw data which may occur.

AD timing, one of the plots which the DQM shifter monitors, where it is possible to see the luminous region during collisions.

Responsible for the training of the future DQM shifters is Daniele De Gruttola, system run coordinator of the DQM. The first part of it consists of a lecture which can be done remotely, via Vidyo. Since most of the trainees are based in their home institutes, this is the option they often prefer. “This doesn’t mean that they could just connect to the training and do something completely different while it is ongoing. I make sure that they are present and that they are paying attention.” The class has the aim to instruct people on how to monitor the DQM system and the part of the Offline system which processes critical information for the data reconstruction, and also how to handle the Event Display in the control room.

The theoretical part of the training ends with an online test of two parts – one about what was taught during the presentation and one about the plots that the shifters will have to monitor. Then comes the practical part, which includes a three-day supervised training shift in the ARC. In the first year of Run 2 Daniele has trained about 250 people and nearly a hundred from the beginning of 2016 until now.

Once certified, the shifter can book his block and start his experience in the control centre, where he will have the very important task to monitor the quality of the recorded data. The DQM system receives information from all ALICE detectors, analyses the data sample online and produces plots, which are constantly monitored by the shifter. The process is very fast. When a new physics run is initiated, the DQM needs around five minutes to gather enough statistics to produce reliable plots. After that they are being constantly updated. The shifter looks at around one hundred of these plots, each of which has an automatic check with alarms of different levels. The framework, providing this feedback, is called AMORE (Automatic MOnitoRing Environment) and is developed and maintained by Barthélémy von Haller with the help of Adriana Telesca. The maintainer of the Event Display is Jeremi Niedziela who is doing his PhD at CERN.

Event display from Pb-Pb collision at 5.02 TeV recorded during the last heavy-ion data taking (November 2015).

The shifter has clear instructions on how to react in case of an alarm for any of the plots. “They have to notify the shift leader if there is a problem and the shift leader then decides if the run should be stopped or not. In some cases they just have to make an entry in the logbook to say that there is an issue which is not crucial for the data taking. Other times they have to call the detector expert who can decide if the detector has to be reconfigured for example – explains Daniele. – We have to separate the main task of the DQM shifter from the issues in the framework that he could encounter. Indeed, the shifter may have to call a detector expert because he has spotted an issue in their system through the DQM. That is the aim of the DQM. Other times there might be a problem with the visualisation of the plots, which could be related either to the code of a single system, or to the framework itself. There have been a few cases in which the framework was not able to fill the plots, but eventually it turns out to be a minor bug which is fixed in no time.” The shifter also has another task – to monitor the processes that produce the plots. These processes are called agents and each of them analyses the sample of raw data for a particular system.

What future is planned for the DQM? It will substantially change for Run 3. It will be more automated and it will be implemented in all the rest of the online processes as part of the O2 project.