Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scaling Up WSO2 BAM for Billions of  Requests and Terabytes of Data             Buddhika Chamith      Software Engineer – ...
Business Activity Monitoring“The aggregation, analysis, andpresentation of real-time informationabout activities inside or...
Aggregation●   Capturing data●   Data storage●   What data to    capture?
Analysis●   Data operations●   Building KPIs●   Operate on large    amounts of historic    data or new data●   Building BI
Presentation●   Visualizing KPIs/BI●   Custom Dashboards●   Visualization tools●   Not just dashboards!
Need for Scalability
BAM 2.x - Component Architecture
Data Agents●   Push data to BAM●   Collecting    ●   Service data    ●   Mediation data    ●   Logs etc.●   Various interc...
Performance Considerations●   Should be asynchronous●   Event batching●   SOAP?●   Apache Thrift (Binary protocol)
Apache Thrift  ●   A RPC framework  ●   With a pluggable architecture      for mixing different transports      with diffe...
Not Just Performance...●   Load balancing●   Failover●   All available within a Java SDK libary.●   You can use it too.
Data Receiver●   Capture and transfer data to subscribed sinks.●   Not just the database.●   Can be clustered.●   Load bal...
Data Bridge
Data Storage      ●   Apache Cassandra      ●   NoSQL column family          implementation      ●   Scalable, HA and no  ...
Deployment – Storage Cluster
Reciever Cluster
ResultsWith a single receiver node allocated 2GB heap with quad core onRHEL.
Disk Growth
Analyzer Engine●   Idea : Distribute processing to multiple nodes to    run in parallel●   Obvious choice : Hadoop●   Uses...
Map Reduce      ●   Process multiple data          chunks paralley at          Mappers.      ●   Aggregate map          ou...
Hadoop Components● Job Tracker● Name node● Secondary Name Node● Task Trackers● Data Nodes
Its Cool But ..                                ●   Do we need to have a                                    Hadoop cluster ...
Apache Hive●   You write SQL. (Almost)●   Let Hive convert to Map Reduce jobs.●   So Hive does two things    ●   Provide a...
A Typical Hive Script
Results
Task Framework●   Run Hive scripts periodically●    Can specify as cron expressions/ predefined    templates●   Handles ta...
Zookeeper●   Can be run seperately or embedded within    BAM
Analyzer Cluster
Dashboard●   Making dashboard scale.
Deployment PatternsSingle Node
High Availability
Fully Distributed Setup
Summary●   BAM●   Need for scalability●   Scaling BAM components●   Results●   BAM deployment patterns
Upcoming SlideShare
Loading in …5
×

Scaling up wso2 bam for billions of requests and terabytes of data

  • Login to see the comments

Scaling up wso2 bam for billions of requests and terabytes of data

  1. 1. Scaling Up WSO2 BAM for Billions of Requests and Terabytes of Data Buddhika Chamith Software Engineer – WSO2 BAM
  2. 2. Business Activity Monitoring“The aggregation, analysis, andpresentation of real-time informationabout activities inside organizationsand involving customers and partners.”- Gartner
  3. 3. Aggregation● Capturing data● Data storage● What data to capture?
  4. 4. Analysis● Data operations● Building KPIs● Operate on large amounts of historic data or new data● Building BI
  5. 5. Presentation● Visualizing KPIs/BI● Custom Dashboards● Visualization tools● Not just dashboards!
  6. 6. Need for Scalability
  7. 7. BAM 2.x - Component Architecture
  8. 8. Data Agents● Push data to BAM● Collecting ● Service data ● Mediation data ● Logs etc.● Various interceptors used ● Axis2 Handlers ● Synapse Mediators ● Tomcat Valves ● Log4j Appenders
  9. 9. Performance Considerations● Should be asynchronous● Event batching● SOAP?● Apache Thrift (Binary protocol)
  10. 10. Apache Thrift ● A RPC framework ● With a pluggable architecture for mixing different transports with different protocols ● Has multiple language bindings (Java, C++, Python, Perl, C# etc.) ● We mainly use Java binding
  11. 11. Not Just Performance...● Load balancing● Failover● All available within a Java SDK libary.● You can use it too.
  12. 12. Data Receiver● Capture and transfer data to subscribed sinks.● Not just the database.● Can be clustered.● Load balancing is handled from client side.
  13. 13. Data Bridge
  14. 14. Data Storage ● Apache Cassandra ● NoSQL column family implementation ● Scalable, HA and no SPOF. ● Very high write throughput and good read throughput ● Tunable consistency with data replication
  15. 15. Deployment – Storage Cluster
  16. 16. Reciever Cluster
  17. 17. ResultsWith a single receiver node allocated 2GB heap with quad core onRHEL.
  18. 18. Disk Growth
  19. 19. Analyzer Engine● Idea : Distribute processing to multiple nodes to run in parallel● Obvious choice : Hadoop● Uses Map Reduce Programming paradigm
  20. 20. Map Reduce ● Process multiple data chunks paralley at Mappers. ● Aggregate map outputs having similar keys at Reducers and store the result. ● Lets think of a useful example..
  21. 21. Hadoop Components● Job Tracker● Name node● Secondary Name Node● Task Trackers● Data Nodes
  22. 22. Its Cool But .. ● Do we need to have a Hadoop cluster in order to try out BAM? ● Are we supposed to code Hadoop jobs to get BAM to summarize some thing? ● Answers 1) NoCourtesy: http://goo.gl/QEnpN 2) No. Ok may be very rarely at best.
  23. 23. Apache Hive● You write SQL. (Almost)● Let Hive convert to Map Reduce jobs.● So Hive does two things ● Provide an abstraction for Hadoop Map Reduce ● Submit the analytic jobs to Hadoop● Hive may spawn a Hadoop JVM locally or delegate to a Hadoop Cluster
  24. 24. A Typical Hive Script
  25. 25. Results
  26. 26. Task Framework● Run Hive scripts periodically● Can specify as cron expressions/ predefined templates● Handles task failover in case of node faliure● Uses Zookeeper for coordination
  27. 27. Zookeeper● Can be run seperately or embedded within BAM
  28. 28. Analyzer Cluster
  29. 29. Dashboard● Making dashboard scale.
  30. 30. Deployment PatternsSingle Node
  31. 31. High Availability
  32. 32. Fully Distributed Setup
  33. 33. Summary● BAM● Need for scalability● Scaling BAM components● Results● BAM deployment patterns

×