Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozniak (CERN)

The LHC performance has reached new heights, in-turn the amount of data produced by the underlying infrastructure and observation systems has increased by an order of magnitude putting unprecedented load on the current data Logging Service. In this talk Jakub will introduce you to the core business of beam production at CERN and the motivations behind the evolution of the current CERN Accelerator Logging Service towards Hadoop and Apache Spark.

  • Login to see the comments

The Next CERN Accelerator Logging Service—A Road to Big Data with Jakub Wozniak (CERN)

  1. 1. Jakub Wozniak, CERN Next CERN Accelerator Logging Service A road to Big Data #SparkSummit
  2. 2. 2 CERN?Physics Lab since 1954
  3. 3. CERN Complex On Map 3 LHC SPS
  4. 4. Future Circular Collider (FCC) 4 Plans for 80-100 km tunnel under Geneva Lake and Alps FCC
  5. 5. 5 Experiments
  6. 6. 6 1 PB / second of events (raw physics data) Total stored beam energy 116e09 p x 2800 bunches 50MeV 6.5TeV Proton Energy LHCBeam
  7. 7. Accelerator Timing • Beam time shared per experiment in cycles – of 1.2 sec, 2.4 sec, 3.6 sec, ... 7 Exp1 Exp2 Exp1 Exp3 Exp3 Exp4 Exp1
  8. 8. Accelerator Hardware 8 • Magnets – Dipoles - bending – Quadruples - focusing – Kickers – inject/extract • Radio-Freq. cavities - to accelerate • Beam diagnostics (intensity, position) • Timing & synchronization systems • Orbit feedback & steering systems • Radiation monitors • Security systems • Vacuum, cooling & ventilation • High voltage electrical systems • Specialized controls electronics RF Bending Quads Kicker Diagnostics
  9. 9. Hardware Controls 9 In practice • Thousands of different devices • Hundreds of properties each
  10. 10. LEIR 10 Low Energy Ion Ring @CERN
  11. 11. Controls Data Logging • Equipment readings (acquisitions or settings) – Not physics data from experiments! – Needed to operate the accelerators • Online monitoring & ad-hoc queries – Alarms, device/system failures – Beam degradation or dumps • Offline analytics & studies – Beam quality improvements – New beam types – Future experiments or machines 11
  12. 12. CERN Accelerator Logging Service • Old system (CALS) based on Oracle (2 DBs) – ~20,000 devices (from ~120,000 devices) – 1,500,000 signals – 5,000,000 extractions per day – 71,000,000,000 records per day • 2 TB / day (unfiltered data, 2 DBs) – 1 PB of total data storage (heavily filtered in 95%) 12
  13. 13. Current Issues With CALS • Performance / scalability problems – Difficult to scale horizontally – “… to extract 24h of data takes 12h” • CALS & tools not ready for Big Data! – Have to extract subsets of data to do analysis! 13
  14. 14. Upcoming Challenges • Future big scientific projects – High Luminosity LHC (HL-LHC) – Future Circular Collider (FCC) • Important increase of data loads – Data frequency increase (from 1Hz to 100Hz) – Much bigger vectors – Limited filtering 14
  15. 15. HL-LHC Luminosity Forecast 15
  16. 16. Current Controls Data Storage 16 Run 1 Run 2LS 1 900 GB/day
  17. 17. Big Data For Controls? 17 CALS Impala Kudu Next CALS (NXCALS)
  18. 18. Query Performance Results 18 0 50 100 150 200 250 23000 57747 230000 1350000 2650000 Time(s) Number of Records To Process Oracle Impala Spark Lower is better Disclaimer: Specific join on controls data
  19. 19. Python on Jupyter Platform Data Science & Scientific Computing 19
  20. 20. CERN On-Premise Cloud 20 250,000 cores available
  21. 21. NXCALS Architecture Spark Lo g. Pr oc. Datasources 21 Jupyter Old API NXCALS API ETL Kafka HBase HDFS Avro Parquet Hadoop API Meta-data service DB Scientists Programmers Applications Clients
  22. 22. Future Directions • Visualization for Analytics • Service for Web based Analysis (SWAN) – With NXCALS for Accelerator Controls data… – …and Spark as first class citizen • Unified Software Platform • Interactive data analysis in the cloud 22
  23. 23. Conclusions • Technology has to answer scientific needs • Big Data is present in Accelerator Controls • CERN started a sector-wide joint effort – NXCALS (Controls Big Data Storage) – SWAN (Analysis & Visualization) • To promote and support science • Spark is our choice for data extraction & analytics! 23
  24. 24. Links • NXCALS - https://gitlab.cern.ch/acc-logging-team/nxcals • Please come to the technical talk today afternoon! • SWAN - https://swan.web.cern.ch/ • Science, accelerators, experiments – https://greybook.cern.ch – https://home.cern/about/accelerators – PS virtual visit: https://goo.gl/1zqSTA • Lectures from accelerator schools, summer courses • http://cas.web.cern.ch/previous-schools • https://indico.cern.ch/category/345/ • Beams Department, Controls Group - https://be-dep-co.web.cern.ch/ • Accelerators state - https://op-webtools.web.cern.ch/vistar/vistars.php?usr=SPS1 • Take Part: https://jobs.web.cern.ch/ 24

×