Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalable complex event processing on samza @UBER


Published on

The Marketplace data team at Uber has built a scalable complex event processing platform to solve many challenging real time data needs for various Uber products. This platform has been in production for almost a year and it has proven to be very flexible to solve many use cases. In this talk, we will share in detail the design and architecture of the platform, and how we employ Samza, Kafka, and Siddhi at scale.

This slides was presented at Stream Processing Meetup @ LinkedIn on June 15 2016.

Published in: Engineering

Scalable complex event processing on samza @UBER

  1. 1. Scalable Complex Event Processing On Samza @Uber Shuyi Chen Uber Technologies Inc.
  2. 2. ● 6 continents, 70 countries, 400+ cities ● Transportation as reliable as running water, everywhere, for everyone Uber
  3. 3. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  4. 4. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  5. 5. Uber is a data-driven company
  6. 6. Thousands of Kafka topics from different services
  7. 7. We can extract a lot of useful information from this rich set of logs in real-time!
  8. 8. Multiple logins from the same IP within a short interval
  9. 9. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip
  10. 10. Partners reject the second pickup of a UberPOOL trip
  11. 11. Multiple logins from the same IP within a short interval Window Aggregation
  12. 12. Partner accepted a trip → partner calls rider through the Uber APP → rider cancels the trip Pattern detection
  13. 13. Partners reject the second pickup of a UberPOOL trip Filter
  14. 14. Can we use declarative semantics to specify these stream processing logics?
  15. 15. Complex event processing ● Combines data from multiple sources to infer events or patterns that suggest more complicated circumstances ● CEP is used across many industries for various use cases, including: ○ Finance: Trade analysis, fraud detection ○ Airlines: Operations monitoring ○ Healthcare: Claims processing, patient monitoring ○ Energy and Telecommunications: Outage detection ● CEP uses declarative rule/query language to specify event processing logic
  16. 16. Siddhi: Complex event processing engine ● Lightweight, extensible, open source, released as a Java library ● Features supported ○ Filter ○ Join ○ Aggregation ○ Group by ○ Window ○ Pattern processing ○ Sequence processing ○ Event tables ○ Event-time processing ○ Declarative query language: SiddhiQL
  17. 17. How Siddhi works ● Specify processing logic declaratively with SiddhiQL
  18. 18. How Siddhi works ● Query is parsed at runtime into an execution plan runtime ● As events flow in, the execution plan runtime process events inside the CEP engine according the query logic
  19. 19. How can we make it scalable at Uber scale?
  20. 20. Samza ● A distributed stream processing framework ○ Scalable ○ Built-in State management ○ Built-in fault tolerant ○ At-least-once message processing ● Good support from our data infra team
  21. 21. How can we make the stream processing output useful?
  22. 22. Actions ● Generalize a set of common action templates to make it easy for services and human to harness the power of realtime stream processing ● Currently we support ○ Make an RPC call ○ Invoke a Webhook endpoint ○ Index to ElasticSearch ○ Write Cassandra ○ Kafka ○ Statsd ○ Chat service ○ Email ○ Push notification
  23. 23. Actions Real-time Scalable Complex Event Processing
  24. 24. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  25. 25. Preprocessor ● Enrich raw Kafka events with business information
  26. 26. Shuffler ● Re-shuffle events ● Prefiltering for predicate pushdown
  27. 27. Complex event processor ● Parse Siddhi queries into execution plan runtime ● Process events in Siddhi execution plan runtime ● Checkpoint state regularly to ensure recovery upon crash/restart using RocksDB
  28. 28. Action processor ● Execute actions upon the complex event output ● Support various kinds of actions for easy integration ● Implement configurable and finite action retry mechanism using RocksDB
  29. 29. No stream processing logic is hard-coded in the data pipeline
  30. 30. REST API backend ● All queries, actions, shuffling logics and pre-filtering logics are stored externally in Cassandra ● RESTFUL API for CRUD operations ● Data pipeline automatically reload the data upon update w/o job restart ○ fast data exploration ○ Realtime feedback loop ○ incremental DAG construction ● Decouple processing logic from the data pipeline
  31. 31. Unified management and monitoring ● Every use case ○ share the same data pipeline architecture ○ Use queries and actions to describe its processing logic ● A single monitoring template can be reused across different use cases
  32. 32. Applications ● Real-time fraud detection ● Real-time anomaly detection ● Real-time marketing campaign ● Real-time promotion ● Real-time monitoring ● Real-time feedback system ● Real-time analytics ● Real-time visualizations ● And etc.
  33. 33. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  34. 34. Not a general purpose stream processing system
  35. 35. No dynamic topology ● The DAG is not dynamic ● Can not shuffle arbitrary number of times ● Ideally, we can chain multiple copies of the data pipeline to build arbitrary DAG ○ Large DAG can be difficult to manage and monitor ○ Samza use Kafka as intermediate message queue between jobs, wide DAGs cause large load on Kafka ○ Out of 40+ use cases we run in production, none requires it.
  36. 36. Out-of-order event handling ● Not a big concern ○ Events of the same rider/partner are usually seconds aparts ● K-slack extension in Siddhi for out-of-order event processing
  37. 37. Job deployment ● Samza job creation is semi-automated ○ Auto-generate standard job properties ○ JVM memory tuning ○ Samza parameter tuning, e.g. container count ● Integrate with in-house cluster job management system to simplify start/restart/stop/upgrade of Samza jobs
  38. 38. Predicate pushdown ● Allow prefiltering of streams in shuffle stage ● Need manual configuration through Web UI ● In the future, we can automate this by query analysis
  39. 39. Outline ● Motivation ● Architecture ● Limitations ● Challenges
  40. 40. Broadcast stream ● We need broadcast stream to broadcast updates in storage backend to the data pipeline ● No broadcast stream in Samza 0.9.1 ● Override SystemStreamPartitionGrouper ● Samza 0.10.0 added broadcast support (SAMZA-676)
  41. 41. Unbalanced task workload ● Shufflers ingest multiple topics with different partition counts ● Default task partition assignment does not scale ● Override SystemStreamPartitionGrouper to balance the partitions across all tasks
  42. 42. Large checkpointing state ● Samza use Kafka to log state changes ● Kafka message size limit to 1 MB by default ● Solution: we build logics to slice state into smaller pieces and checkpoint them into Rocksdb
  43. 43. Synchronous checkpointing ● If state is large, time to checkpoint can be long ● Samza uses single-threaded model, unsafe to do it asynchronously ● Ongoing work on multi-thread support in Samza (SAMZA-863)
  44. 44. Exactly once state processing? ● Can not commit state and offset atomically ● No exactly once state processing
  45. 45. Debugging ● Need to inspect multiple logs to diagnose Samza job problems ○ Application master log ○ Multiple container logs ○ Log size is huge ○ Container logs are difficult to locate after job failure ● Sometimes, Samza job get stuck at launch, and no log can be found ○ YARN problem ○ Binary downloading problem
  46. 46. Upgrading Samza jobs ● Upgrade Samza jobs require a full restart, and can take minutes due to ○ Offset checkpointing topic too large → set retention to hours ○ Changelog topic too large → set retention or enable compaction in Kafka or host affinity (SAMZA-617) ● To minimize the interruption during upgrade, it would be nice to have ○ Rolling restart ○ Per container restart
  47. 47. Our solution: non-interrupted handoff ● For critical jobs, we use replication during upgrade ○ Start a shadow job ○ Upgrade shadow ○ Switch primary and shadow ○ Upgrade primary ○ Switch back ● Downside: require 2x capacity during upgrade
  48. 48. Manage complicated DAG ● Samza uses Kafka as message queue for intermediate processing output ○ This enables sharing of shuffler or preprocessor output among multiple downstream Samza jobs ○ Increase resource efficiency ● This gradually results in a large and complicated DAG ○ Complicated dependencies between jobs ○ Jobs closer to the sources of the DAG becoming more and more critical ● In practice, we isolate DAGs by logical groups
  49. 49. Thank you