CERN has publicly released 300 TB of Large Hadron Collider data

That's one way to smash through your data allowance

Image E

Just how badly would you like to get a look at the raw data collected from experiments performed in the Large Hadron Collider? Enough to consider downloading a massive 300 TB file that it would take over 63,000 single-side DVDs to hold? Great! Now you can, because the CMS Collaboration at CERN published just that much on the CERN Open Data Portal. Amazingly, it’s only 1% of the data that CERN scientists have to analyse annually.

The data covers observations gathered from the Compact Muon Solenoid Experiment. Let’s face it, most of us don’t have the time, the computer processing power or the inclination to actually download and use this data – it’s not exactly light reading – but it is important that it’s available. It’s all part of the CERN commitment to transparency and long-term data preservation. Not only does making this data freely available ensure that it’s kept alive (it’s said nothing’s ever really gone from the internet), it can act as a source of inspiration for future physicists.

According to Kati Lassila-Perini, a CMS physicist who leads the data-preservation efforts:

“Members of the CMS Collaboration put in lots of effort and thousands of person-hours each of service work in order to operate the CMS detector and collect these research data for our analysis. However, once we’ve exhausted our exploration of the data, we see no reason not to make them available publicly. The benefits are numerous, from inspiring high-school students to the training of the particle physicists of tomorrow. And personally, as CMS’s data-preservation co-ordinator, this is a crucial part of ensuring the long-term availability of our research data.”

The data does, of course, include “primary data sets” which remain in the same format used by the CMS collaboration to perform its research. It’s the “derived data sets” that are more useful for inspiring future generations of scientists as they require much less computer processing power and are in a format ready for analysis by high school or university students.

There’s even a free download of modelling software based on CernVM, CERN’s own software, so that students can more easily analyse the simulated data which plays a “crucial role in particle-physics research.” Kati says that CERN is looking forward to seeing how the data will be “utilised outside [their] collaboration, for research as well as for building educational tools.”

That’s our weekend reading sorted for a few months at least.


Image via Flickr © Image Editor