Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Angelo Fausti & Frossie Economou
Vera C Rubin Observatory
How InfluxDB is helping us in
our quest to make the deepest,
wid...
Influx for
hardware
telemetry
Influx for
devops-
type
metrics
Influx for
capturing
scientific
insight
… but how did we get...
Space
Space
Space is in a
state of flux
• Comets and asteroids
vary in position
• (Super)novae, variable
stars vary in brightness
• Ga...
How to understand the
changing universe in 5
[not very] easy steps
xkcd
1522
Step 0:
Find Funding
Step 1:
Build a
3200 Megapixel
Camera
LSST Camera
Media: Rubin Observatory
Step 2:
Build a large
but nimble
telescope
Media: Rubin Observatory
<- 8.4 meter continuous
surface primary-tertiary mirror
Step 3:
Haul everything
up a mountain
Media: Rubin Observatory
Yes there’s Internet
No you can’t count on it
Step 4:
Observe the Sky
Relentlessly
for 10 years;
Issue 10M Alerts
Every Night
Media: Rubin Observatory
• “All” sky 2x pe...
Step 5:
Get People
(also a data centre or three)
Write Software
Wait for 2022
Media: Rubin Observatory
And get yourself a ...
photo: Wil O’Mullane
← ~ Oct 2019
We’ll hang
out on
#influxdays-
virtual
for more
Q&A
(@frossie
@afausti)
Over to
Angelo
How InfluxDB Helps Vera C. Rubin Observatory
Make the Deepest, Widest Image of the Universe
15
InfluxDays North America
No...
HSC COSMOS Ultra Deep Field (1.77 deg2) ~ Rubin 10yr depth
Data processing in Astronomy
https://pipelines.lsst.io
17
Data Management team
~70 FTEs (105 members)
18
I - Application Monitoring
Science Requirements and Performance Metrics
19
Rubin Science Requirements
https://ls.st/lpm-17
Example: Astrometric Performance
Better astrometry
Minimum goalDesign goal...
What a metric definition looks like
Verification Framework https://sqr-019.lsst.io
21
What a specification looks like
https://sqr-019.lsst.io
22
23
Problems with our in-house solution
● A relational DB is not optimized for time series data
● Stuck with predefined dashbo...
Time (Years)
Adopting a TSDB, which one?
https://db-engines.com/en/ranking
25+
25
30+
log(Score)
“If it takes more than three days to get it
working it is not the right solution for you.”
Frossie Economou
26
Why InfluxDB?
● It is more than a TSDB, it is an innovative solution
● Open source software and community
● InfluxDB: effi...
InfluxDB schema design
FieldsTags
Results from the Data Release Production pipeline
● Measurement groups the results of th...
First the Tags, then the Series
29
filter is the name of the optical filter used
at the telescope at a given time
drp,data...
Example of a Series
AM1: 6.42357
AM2: 6.48177
AM3: 4.62033
Time (run ID)
{field-set}i
Each point in a series contains the ...
Tracking application metrics with InfluxDB
https://squash.lsst.codes
31
Notifications going to Slack
32
Why that metric value change?
Make an annotation!
33
“Annotations are more important than
the data itself.”
Frossie Economou
34
II - Engineering and Facilities Database
Real-Time Monitoring of the Observatory Data
35
36
All subsystems of the Observatory coexist in a state of active interplay.
Observatory Data
https://ts-xml.lsst.io
37
● 60+ subsystems
● Total of 1148 DDS topics
○ 350 commands
○ 531 events
○ 267 t...
The M1M3 mirror cell subsystem
38
M1M3 mirror cell data
39
● 156 force actuators and sensors producing data at 50Hz
● Can we record and analyze the M1M3 dat...
Kafka + InfluxDB architecture
https://sqr-029.lsst.io
40
Stream Reactor
(OSS)
End-to-end latency characterization
Latency = (WriteTimestamp - SndTimestamp)
41
SndTimestamp
WriteTimestamp
Median latency ~60ms writing ~100k ppm
Executing queries
while writing
42
43
Aux Telescope and Weather Station tower
Aux Telescope Camera
Tucson Teststand - Aug 2019
44
Weather Station
Summit - September 2019
45
46
M2 mirror cell functional testing
Summit - March 2020
M2 mirror cell functional testing
Summit - March 2020
47
48
The beginnings of the Telescope control room
Summit - March 2020
49
US Data Facility
Urbana, IL
Project staff access
RP 10yr
TestStand
Tucson, AZ
Summit
Cerro Pachon, Chile
Restricted acc...
Data Replication and Aggregation
https://sqr-034.lsst.codes
50
Data Aggregation in Kafka with Faust
https://kafka-aggregator.lsst.io
51
Faust agents compute summary statistics on non-
o...
What’s next
52
● Migration to InfluxDB 2.0
○ Conversation with InfluxData design team about Annotations in 2.0
○ Flux trai...
Learn more…
53
● Vera C. Rubin Observatory
● Data Processing
● Verification Framework
● Engineering and Facilities Databas...
Thank you!
54
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepe...
Upcoming SlideShare
Loading in …5
×

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020

24 views

Published on

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020

  1. 1. Angelo Fausti & Frossie Economou Vera C Rubin Observatory How InfluxDB is helping us in our quest to make the deepest, widest image of the universe
  2. 2. Influx for hardware telemetry Influx for devops- type metrics Influx for capturing scientific insight … but how did we get here?
  3. 3. Space
  4. 4. Space
  5. 5. Space is in a state of flux • Comets and asteroids vary in position • (Super)novae, variable stars vary in brightness • Galaxies vary in age • Dark energy varies in, uh, spacetime? maybe? Subaru HSC colour composite of COSMOS field, NAOJ
  6. 6. How to understand the changing universe in 5 [not very] easy steps xkcd 1522
  7. 7. Step 0: Find Funding
  8. 8. Step 1: Build a 3200 Megapixel Camera LSST Camera Media: Rubin Observatory
  9. 9. Step 2: Build a large but nimble telescope Media: Rubin Observatory <- 8.4 meter continuous surface primary-tertiary mirror
  10. 10. Step 3: Haul everything up a mountain Media: Rubin Observatory Yes there’s Internet No you can’t count on it
  11. 11. Step 4: Observe the Sky Relentlessly for 10 years; Issue 10M Alerts Every Night Media: Rubin Observatory • “All” sky 2x per week • 60 seconds to produce alerts • 10-year images: 0.5 EB • Final DB size: 15 PB Legacy Survey of Space & Time (LSST) observing cadence simulation
  12. 12. Step 5: Get People (also a data centre or three) Write Software Wait for 2022 Media: Rubin Observatory And get yourself a data centre or three… All our own code is 💯% open source github.com/lsst github.com/lsst-sqre
  13. 13. photo: Wil O’Mullane ← ~ Oct 2019 We’ll hang out on #influxdays- virtual for more Q&A (@frossie @afausti) Over to Angelo
  14. 14. How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe 15 InfluxDays North America November 2020 Frossie Economou Technical Manager for Data Management, Vera C. Rubin Observatory Angelo Fausti Software Engineer Vera C. Rubin Observatory
  15. 15. HSC COSMOS Ultra Deep Field (1.77 deg2) ~ Rubin 10yr depth
  16. 16. Data processing in Astronomy https://pipelines.lsst.io 17
  17. 17. Data Management team ~70 FTEs (105 members) 18
  18. 18. I - Application Monitoring Science Requirements and Performance Metrics 19
  19. 19. Rubin Science Requirements https://ls.st/lpm-17 Example: Astrometric Performance Better astrometry Minimum goalDesign goalStretch goal 20
  20. 20. What a metric definition looks like Verification Framework https://sqr-019.lsst.io 21
  21. 21. What a specification looks like https://sqr-019.lsst.io 22
  22. 22. 23
  23. 23. Problems with our in-house solution ● A relational DB is not optimized for time series data ● Stuck with predefined dashboards and visualizations ● Limited exploratory analysis capabilities ● Our in-house development didn’t scale ● Use time more wisely: adopt an existing solution instead of (re)inventing our own 24
  24. 24. Time (Years) Adopting a TSDB, which one? https://db-engines.com/en/ranking 25+ 25 30+ log(Score)
  25. 25. “If it takes more than three days to get it working it is not the right solution for you.” Frossie Economou 26
  26. 26. Why InfluxDB? ● It is more than a TSDB, it is an innovative solution ● Open source software and community ● InfluxDB: efficient store for time series + InfluxQL and Flux language ● Chronograf: postdefined visualizations ● Kapacitor: foster collaborative conversation (Slack) 27
  27. 27. InfluxDB schema design FieldsTags Results from the Data Release Production pipeline ● Measurement groups the results of the pipeline ● Timestamp is the time when the pipeline run finishes ● Tags are metadata associated to the pipeline run ● Fields are the metrics measured by the pipeline Timestamp 28
  28. 28. First the Tags, then the Series 29 filter is the name of the optical filter used at the telescope at a given time drp,dataset=HSC,tract=509,filter=g {fields} timestamp For each combination of tag values, there’s a new series. A tract identifies a region in the sky* (*) https://pipelines.lsst.io/modules/lsst.skymap
  29. 29. Example of a Series AM1: 6.42357 AM2: 6.48177 AM3: 4.62033 Time (run ID) {field-set}i Each point in a series contains the set of metrics measured by the pipeline run and the results are grouped by the pipeline name. 30 drp,dataset=HSC,tract=509,filter=g
  30. 30. Tracking application metrics with InfluxDB https://squash.lsst.codes 31
  31. 31. Notifications going to Slack 32
  32. 32. Why that metric value change? Make an annotation! 33
  33. 33. “Annotations are more important than the data itself.” Frossie Economou 34
  34. 34. II - Engineering and Facilities Database Real-Time Monitoring of the Observatory Data 35
  35. 35. 36 All subsystems of the Observatory coexist in a state of active interplay.
  36. 36. Observatory Data https://ts-xml.lsst.io 37 ● 60+ subsystems ● Total of 1148 DDS topics ○ 350 commands ○ 531 events ○ 267 telemetry topics ● Total throughput ~21GB/h → real-time monitoring ○ ~15TB per month → offline analysis ○ ~1.5PB for the 10yr of operations → trend analysis
  37. 37. The M1M3 mirror cell subsystem 38
  38. 38. M1M3 mirror cell data 39 ● 156 force actuators and sensors producing data at 50Hz ● Can we record and analyze the M1M3 data in real-time?
  39. 39. Kafka + InfluxDB architecture https://sqr-029.lsst.io 40 Stream Reactor (OSS)
  40. 40. End-to-end latency characterization Latency = (WriteTimestamp - SndTimestamp) 41 SndTimestamp WriteTimestamp
  41. 41. Median latency ~60ms writing ~100k ppm Executing queries while writing 42
  42. 42. 43 Aux Telescope and Weather Station tower
  43. 43. Aux Telescope Camera Tucson Teststand - Aug 2019 44
  44. 44. Weather Station Summit - September 2019 45
  45. 45. 46 M2 mirror cell functional testing Summit - March 2020
  46. 46. M2 mirror cell functional testing Summit - March 2020 47
  47. 47. 48 The beginnings of the Telescope control room Summit - March 2020
  48. 48. 49 US Data Facility Urbana, IL Project staff access RP 10yr TestStand Tucson, AZ Summit Cerro Pachon, Chile Restricted access RP ~30 days TestStand Chilean Data Facility La Serena, Chile <10MB/s raw stream A preview of operations
  49. 49. Data Replication and Aggregation https://sqr-034.lsst.codes 50
  50. 50. Data Aggregation in Kafka with Faust https://kafka-aggregator.lsst.io 51 Faust agents compute summary statistics on non- overlapping windows of N seconds. Data Reduction factor R~10
  51. 51. What’s next 52 ● Migration to InfluxDB 2.0 ○ Conversation with InfluxData design team about Annotations in 2.0 ○ Flux training for the Observatory Staff ○ Flux Tasks for downsampling and trend analysis ● Rubin Observatory Interim Data Facility on Google Cloud ● Project transition from Construction to Operations is happening ○ New opportunities for using InfluxDB ● Self-monitoring ● Scalability as we load more data, RPs, etc.
  52. 52. Learn more… 53 ● Vera C. Rubin Observatory ● Data Processing ● Verification Framework ● Engineering and Facilities Database ● Kafka Aggregator ● Rubin Science Platform ● Rubin Technical Documentation
  53. 53. Thank you! 54

×