Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

The Pill for Your Migration Hell Slide 1 The Pill for Your Migration Hell Slide 2 The Pill for Your Migration Hell Slide 3 The Pill for Your Migration Hell Slide 4 The Pill for Your Migration Hell Slide 5 The Pill for Your Migration Hell Slide 6 The Pill for Your Migration Hell Slide 7 The Pill for Your Migration Hell Slide 8 The Pill for Your Migration Hell Slide 9 The Pill for Your Migration Hell Slide 10 The Pill for Your Migration Hell Slide 11 The Pill for Your Migration Hell Slide 12 The Pill for Your Migration Hell Slide 13 The Pill for Your Migration Hell Slide 14 The Pill for Your Migration Hell Slide 15 The Pill for Your Migration Hell Slide 16 The Pill for Your Migration Hell Slide 17 The Pill for Your Migration Hell Slide 18 The Pill for Your Migration Hell Slide 19 The Pill for Your Migration Hell Slide 20 The Pill for Your Migration Hell Slide 21 The Pill for Your Migration Hell Slide 22 The Pill for Your Migration Hell Slide 23 The Pill for Your Migration Hell Slide 24 The Pill for Your Migration Hell Slide 25 The Pill for Your Migration Hell Slide 26 The Pill for Your Migration Hell Slide 27 The Pill for Your Migration Hell Slide 28 The Pill for Your Migration Hell Slide 29 The Pill for Your Migration Hell Slide 30 The Pill for Your Migration Hell Slide 31 The Pill for Your Migration Hell Slide 32 The Pill for Your Migration Hell Slide 33 The Pill for Your Migration Hell Slide 34 The Pill for Your Migration Hell Slide 35 The Pill for Your Migration Hell Slide 36 The Pill for Your Migration Hell Slide 37 The Pill for Your Migration Hell Slide 38 The Pill for Your Migration Hell Slide 39
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

The Pill for Your Migration Hell

Download to read offline

This is the story of a great software war. Migrating Big Data legacy systems always involve great pain and sleepless nights. Migrating Big Data systems with Multiple pipelines and machine learning models only adds to the existing complexity. What about migrating legacy systems that protect Microsoft Azure Cloud Backbone from Network Cyber Attacks? That adds pressure and immense responsibility. In this session, we will share our migration story: Migrating a machine learning-based product with thousands of paying customers that process Petabytes of network events a day. We will talk about our migration strategy, how we broke down the system into migrationable parts, tested every piece of every pipeline, validated results, and overcome challenges. Lastly, we share why we picked Azure Databricks as our new modern environment for both Data Engineers and Data Scientists workloads.

  • Be the first to like this

The Pill for Your Migration Hell

  1. 1. The Pill for your Migration Hell Roy Levin & Tomer Koren Microsoft
  2. 2. Session Goals Discuss the challenges in migrating a production pipeline from a legacy Big Data platform to Spark. Present a methodical approach to accomplish this task Measuring quality, optimizing performance and scalability and deciding on the ‘Definition of Done’
  3. 3. Background On Premises Azure Security Center
  4. 4. Background
  5. 5. Background Feature Engineering 1 Features Classification Based Detection Raw Data 2 Alerts Preprocessing Feature Engineering N Current State State Manager Anomaly DetectionTime Series Alerts Processed Data 1 Detection Pipeline 1 Detection Pipeline N Raw Data 1 Raw Data M Previous States Alert Publisher Processed Data K
  6. 6. Agenda Discussing the Challenges Formalization Defining Metrics & Testing Tuning Performance Summarization
  7. 7. Expectations Vs Reality Legacy Pipeline Legacy Pipeline
  8. 8. Maintaining Semantics (Same Input == Same Output) Real Time Constraint ( Over 100 TB per Day) Cost of goods sold (COGS) Challenges
  9. 9. Maintaining Semantics Code Conversion: • Rewrite UDFs (C# -> Python) • Rewrite Transformations (USQL -> Pyspark) PySparkUSQL
  10. 10. Maintaining Semantics Code Conversion: • Different Datatypes: • Different ML Libraries
  11. 11. Maintaining Semantics Schema Changes: Source IP Destination IP Source Hostname Destination Hostname 1.0.0.1 2.0.0.2 Vm1 Vm2 3.0.0.3 4.0.0.4 Vm3 Source IP Destinatio n IP Source Hostname Exist? Hostname Host Id 1.0.0.1 2.0.0.2 True Vm1 111-11 1.0.0.1 2.0.0.2 False Vm2 222-22 3.0.0.3 4.0.0.4 True Vm3 333-33 SparkLegacy
  12. 12. Real Time Constraint • Multiple ETL Pipeline that runs on an hourly basis • Largest data feed contains ~4TB events per hour • Running time should be less than 60 minutes to avoid accumulated latency.
  13. 13. Cost of goods sold (COGS)
  14. 14. Agenda Discussing the Challenges Formalization Defining Metrics & Testing Tuning Performance Summarization
  15. 15. View Legacy Code in terms of High-Level Components Feature Engineering 1 Features Classification Based Detection Raw Data 2 Alerts Preprocessing Feature Engineering N Current State State Manager Anomaly DetectionTime Series Alerts Processed Data 1 Detection Pipeline 1 Detection Pipeline N Raw Data 1 Raw Data M Previous States Alert Publisher Processed Data K
  16. 16. View Legacy Code in terms of High-Level Components Feature Engineering 1 Features Classification Based Detection Raw Data 2 Alerts Preprocessing Feature Engineering N Current State State Manager Anomaly DetectionTime Series Alerts Processed Data 1 Detection Pipeline 1 Detection Pipeline N Raw Data 1 Raw Data M Previous States Alert Publisher Processed Data K
  17. 17. Cosmos Component Spark Component View Legacy Code in terms of High-Level Components
  18. 18. Cosmos Feature Engineering Spark Feature Engineering CosmosInputData1 CosmosInputData2 CosmosOutputData1 CosmosOutputData2 SparkOutputData2 SparkOutputData1 Schema mismatch! Validating the Migrated Components
  19. 19. Cosmos Feature Engineering Spark Feature Engineering CosmosInputData1 CosmosInputData2 CosmosOutputData1 CosmosOutputData2 SparkOutputData2 SparkOutputData1 Validating the Migrated Components Translator Translator Comparator Comparator
  20. 20. Managing it all • This migration process needs to be done for every component of every detection • Some components are reused across detections • What about the connections between the components? • These dependencies need to be represented and validated as well
  21. 21. Introducing - ¾E⁄ʽ Component Reuse & Connectivity
  22. 22. MultiTransformer per Component DataItem DataFrame Model DataItem name1 name2 … DataItemnamen MultiDataItem MultiTransformer DataItem DataItem name1 name2 … DataItem namem MultiDataItem
  23. 23. DAG of Components DataItem DataFrame Model DataItem name1 name2 … DataItemnamen MultiDataItem MT DataItem DataItem name1 name2 … DataItem namem MultiDataItem MT MT MTMT DagMultiTransformer dependencies
  24. 24. Stateful MultiTransfomers Slice from iterationi-1 StatefulMultiTransformer dataset1 dataset2 Slice of iterationi output-dataset1
  25. 25. - ¾E⁄ʽ Without CyFlow With CyFlow Deployment Multiple notebooks - one per component Deploy the DAG Unit Testing No framework Standalone Spark with pyunit Shared utility code Import whl files - hard to maintain Use a repository Using notebooks
  26. 26. - ¾E⁄ʽUsing notebooks Without CyFlow With CyFlow Deployment Multiple notebooks - one per component Deploy the DAG Unit Testing No framework Standalone Spark with pyunit Shared utility code Import whl files - hard to maintain Use a repository Structure Implicit, according to schedules Explicit and visually depicted
  27. 27. - ¾E⁄ʽ Without CyFlow With CyFlow Deployment Multiple notebooks - one per component Deploy the DAG Unit Testing No framework Standalone Spark with pyunit Shared utility code Import whl files - hard to maintain Use a repository Structure Implicit, according to schedules Explicit and visually depicted Typing No schema checks Schema checks before running Using notebooks
  28. 28. Agenda Discussing the Challenges Formalization Defining Metrics & Testing Tuning Performance Summarization
  29. 29. Cosmos Feature Engineering Spark Feature Engineering CosmosInputData1 CosmosInputData2 CosmosOutputData1 CosmosOutputData2 SparkOutputData2 SparkOutputData1 Validating the Migrated Components - revisited Translator Translator Comparator Comparator
  30. 30. CosmosOutputData2 SparkOutputData2 Validation -- A Closer Look Comparator Recall our challenges • Legacy ML models • Non-reproducable ML models • Indeterministic semantics (e.g.: based row numbers) • Some non-translatable schema changes • Some randomly generated UUIDs (e.g. AlertIds) How Much to invest in achieving full parity?
  31. 31. Decide Based on Soft Metric of the Final Output Resource Id 1 Res1 2 Res2 3 Res3 4 Res4 5 Res5 6 Res6 7 Res7 Alerts generated by Legacy Component Resource Id 1 Res1 2 Res3 3 Res4 4 Res5 5 Res6 6 Res8 7 Res9 8 Res10 Alerts generated by Spark Component 𝑗𝑠 = 𝑦 ∩ 𝑦′ 𝑦 ∪ 𝑦′ = 5 9 ≅ 0.56 Jaccard Similarity Precision 𝑝𝑟 = 𝑦 ∩ 𝑦′ 𝑦′ = 5 8 ≅ 0.63 Recall 𝑝𝑟 = 𝑦 ∩ 𝑦′ 𝑦 = 5 7 ≅ 0.71 For 𝒚 ∩ 𝒚 we write an alert content validator (for the rest of the column values)
  32. 32. Agenda Discussing the Challenges Formalization Defining Metrics & Testing Tuning Performance Summarization
  33. 33. Measure Running Time on Actual Load 14 Hours !! Feature Engineering Classification Based Detection CYFLOW DAG Alert Publisher Processed Data 1 Processed Data 2 Published Alerts
  34. 34. Finding the culprit Transformation 1 Intermediate Result 1 Feature Engineering Transformation 1 Feature Engineering Transformation 2 Intermediate Result 2 20 minutes13 hours !!
  35. 35. Tuning Performance • Aggregate hourly state into daily state (end of day) Partition N State 12:00 AMPartition 1 State 12:00 AM Partition N State 01:00 AMPartition 1 State 01:00 AM Partition N State 02:00 AMPartition 1 State 23:00 PM 24HourlySlices Daily Aggregator Partition N State Daily Partition 1 State Daily
  36. 36. Tuning Performance • Modify Default Partition (number of cores x 3) : • Use of broadcasting when UDF with large signatures are reused. • Cache (be aware for memory failures ) • Unpersist – remove dataframe from cache when no longer needed
  37. 37. Agenda Discussing the Challenges Formalization Defining Metrics & Testing Tuning Performance Summarization
  38. 38. Summary • Present the challenges in migrating a large-scale legacy Big Data system to Spark • Preserving Semantics • Realtime constrains • COGS • Introducing - ¾E⁄ʽ -- a framework built over Apache Spark that allows component reuse & connectivity • Discuss different validation strategies • Reducing runtime and COGS: aspects of Spark performance tuning
  39. 39. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

This is the story of a great software war. Migrating Big Data legacy systems always involve great pain and sleepless nights. Migrating Big Data systems with Multiple pipelines and machine learning models only adds to the existing complexity. What about migrating legacy systems that protect Microsoft Azure Cloud Backbone from Network Cyber Attacks? That adds pressure and immense responsibility. In this session, we will share our migration story: Migrating a machine learning-based product with thousands of paying customers that process Petabytes of network events a day. We will talk about our migration strategy, how we broke down the system into migrationable parts, tested every piece of every pipeline, validated results, and overcome challenges. Lastly, we share why we picked Azure Databricks as our new modern environment for both Data Engineers and Data Scientists workloads.

Views

Total views

100

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×