SlideShare a Scribd company logo
1 of 33
Download to read offline
Scaling Up WSO2 BAM for Billions of
  Requests and Terabytes of Data
             Buddhika Chamith
      Software Engineer – WSO2 BAM
Business Activity Monitoring

“The aggregation, analysis, and
presentation of real-time information
about activities inside organizations
and involving customers and partners.”
- Gartner
Aggregation

●   Capturing data
●   Data storage
●   What data to
    capture?
Analysis
●   Data operations
●   Building KPIs
●   Operate on large
    amounts of historic
    data or new data
●   Building BI
Presentation

●   Visualizing KPIs/BI
●   Custom Dashboards
●   Visualization tools
●   Not just dashboards!
Need for Scalability
BAM 2.x - Component Architecture
Data Agents
●   Push data to BAM
●   Collecting
    ●   Service data
    ●   Mediation data
    ●   Logs etc.
●   Various interceptors used
    ●   Axis2 Handlers
    ●   Synapse Mediators
    ●   Tomcat Valves
    ●   Log4j Appenders
Performance Considerations

●   Should be asynchronous
●   Event batching
●   SOAP?
●   Apache Thrift (Binary protocol)
Apache Thrift

  ●   A RPC framework
  ●   With a pluggable architecture
      for mixing different transports
      with different protocols
  ●   Has multiple language
      bindings (Java, C++, Python,
      Perl, C# etc.)
  ●   We mainly use Java binding
Not Just Performance...

●   Load balancing
●   Failover
●   All available within a Java SDK libary.
●   You can use it too.
Data Receiver

●   Capture and transfer data to subscribed sinks.
●   Not just the database.
●   Can be clustered.
●   Load balancing is handled from client side.
Data Bridge
Data Storage
      ●   Apache Cassandra
      ●   NoSQL column family
          implementation
      ●   Scalable, HA and no
          SPOF.
      ●   Very high write
          throughput and good
          read throughput
      ●   Tunable consistency
          with data replication
Deployment – Storage Cluster
Reciever Cluster
Results
With a single receiver node allocated 2GB heap with quad core on
RHEL.
Disk Growth
Analyzer Engine
●   Idea : Distribute processing to multiple nodes to
    run in parallel
●   Obvious choice : Hadoop
●   Uses Map Reduce Programming paradigm
Map Reduce
      ●   Process multiple data
          chunks paralley at
          Mappers.
      ●   Aggregate map
          outputs having similar
          keys at Reducers and
          store the result.
      ●   Let's think of a useful
          example..
Hadoop Components
● Job Tracker
● Name node

● Secondary Name Node

● Task Trackers

● Data Nodes
It's Cool But ..
                                ●   Do we need to have a
                                    Hadoop cluster in order to
                                    try out BAM?
                                ●   Are we supposed to code
                                    Hadoop jobs to get
                                    BAM to summarize some
                                    thing?
                                ●   Answers
                                    1) No
Courtesy: http://goo.gl/QEnpN       2) No. Ok may be very
                                    rarely at best.
Apache Hive
●   You write SQL. (Almost)
●   Let Hive convert to Map Reduce jobs.
●   So Hive does two things
    ●   Provide an abstraction for Hadoop Map Reduce
    ●   Submit the analytic jobs to Hadoop
●   Hive may spawn a Hadoop JVM locally or
    delegate to a Hadoop Cluster
A Typical Hive Script
Results
Task Framework
●   Run Hive scripts periodically
●    Can specify as cron expressions/ predefined
    templates
●   Handles task failover in case of node faliure
●   Uses Zookeeper for coordination
Zookeeper




●   Can be run seperately or embedded within
    BAM
Analyzer Cluster
Dashboard
●   Making dashboard scale.
Deployment Patterns
Single Node
High Availability
Fully Distributed Setup
Summary
●   BAM
●   Need for scalability
●   Scaling BAM components
●   Results
●   BAM deployment patterns

More Related Content

What's hot

Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafkapunesparkmeetup
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode AdaptorPivotalOpenSourceHub
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeCloudera, Inc.
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraRoopa Tangirala
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on DockerMariaDB plc
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted CloudColin Charles
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replicationVenu Ryali
 
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresM|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresMariaDB plc
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Niko Neugebauer
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformMariaDB plc
 
AWS Cloud SAA Relational Database presentation
AWS Cloud SAA Relational Database presentationAWS Cloud SAA Relational Database presentation
AWS Cloud SAA Relational Database presentationTATA LILIAN SHULIKA
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverviewDimas Prasetyo
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 

What's hot (20)

Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafka
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to Cassandra
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on Docker
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted Cloud
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
 
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored ProceduresM|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
 
AWS Cloud SAA Relational Database presentation
AWS Cloud SAA Relational Database presentationAWS Cloud SAA Relational Database presentation
AWS Cloud SAA Relational Database presentation
 
2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
 
Sun Web Server Brief
Sun Web Server BriefSun Web Server Brief
Sun Web Server Brief
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 

Similar to Scaling up wso2 bam for billions of requests and terabytes of data

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
MapReduce with Hadoop and Ruby
MapReduce with Hadoop and RubyMapReduce with Hadoop and Ruby
MapReduce with Hadoop and RubySwanand Pagnis
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Managerharidasnss
 
Balkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan Misirli
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12 H2O on Hadoop Dec 12
H2O on Hadoop Dec 12 Sri Ambati
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchEdward Capriolo
 
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learnedTom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learnedSri Ambati
 
Apache bigtopwg7142013
Apache bigtopwg7142013Apache bigtopwg7142013
Apache bigtopwg7142013Doug Chang
 
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Cloudera, Inc.
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingStanislav Osipov
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceChandra Sekhar
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containerspranav_joshi
 
Apache Jena Elephas and Friends
Apache Jena Elephas and FriendsApache Jena Elephas and Friends
Apache Jena Elephas and FriendsRob Vesse
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLinaro
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Ganesh Raju
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 

Similar to Scaling up wso2 bam for billions of requests and terabytes of data (20)

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
MapReduce with Hadoop and Ruby
MapReduce with Hadoop and RubyMapReduce with Hadoop and Ruby
MapReduce with Hadoop and Ruby
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
 
Balkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusion
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12 H2O on Hadoop Dec 12
H2O on Hadoop Dec 12
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learnedTom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
 
Apache bigtopwg7142013
Apache bigtopwg7142013Apache bigtopwg7142013
Apache bigtopwg7142013
 
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective Audience
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Apache Jena Elephas and Friends
Apache Jena Elephas and FriendsApache Jena Elephas and Friends
Apache Jena Elephas and Friends
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 

More from WSO2

Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2WSO2
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
WSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the CloudWSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the CloudWSO2
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 

More from WSO2 (20)

Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
WSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the CloudWSO2CON 2024 - Elevating the Integration Game to the Cloud
WSO2CON 2024 - Elevating the Integration Game to the Cloud
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

Scaling up wso2 bam for billions of requests and terabytes of data

  • 1. Scaling Up WSO2 BAM for Billions of Requests and Terabytes of Data Buddhika Chamith Software Engineer – WSO2 BAM
  • 2. Business Activity Monitoring “The aggregation, analysis, and presentation of real-time information about activities inside organizations and involving customers and partners.” - Gartner
  • 3. Aggregation ● Capturing data ● Data storage ● What data to capture?
  • 4. Analysis ● Data operations ● Building KPIs ● Operate on large amounts of historic data or new data ● Building BI
  • 5. Presentation ● Visualizing KPIs/BI ● Custom Dashboards ● Visualization tools ● Not just dashboards!
  • 7. BAM 2.x - Component Architecture
  • 8. Data Agents ● Push data to BAM ● Collecting ● Service data ● Mediation data ● Logs etc. ● Various interceptors used ● Axis2 Handlers ● Synapse Mediators ● Tomcat Valves ● Log4j Appenders
  • 9. Performance Considerations ● Should be asynchronous ● Event batching ● SOAP? ● Apache Thrift (Binary protocol)
  • 10. Apache Thrift ● A RPC framework ● With a pluggable architecture for mixing different transports with different protocols ● Has multiple language bindings (Java, C++, Python, Perl, C# etc.) ● We mainly use Java binding
  • 11. Not Just Performance... ● Load balancing ● Failover ● All available within a Java SDK libary. ● You can use it too.
  • 12. Data Receiver ● Capture and transfer data to subscribed sinks. ● Not just the database. ● Can be clustered. ● Load balancing is handled from client side.
  • 14. Data Storage ● Apache Cassandra ● NoSQL column family implementation ● Scalable, HA and no SPOF. ● Very high write throughput and good read throughput ● Tunable consistency with data replication
  • 17. Results With a single receiver node allocated 2GB heap with quad core on RHEL.
  • 19. Analyzer Engine ● Idea : Distribute processing to multiple nodes to run in parallel ● Obvious choice : Hadoop ● Uses Map Reduce Programming paradigm
  • 20. Map Reduce ● Process multiple data chunks paralley at Mappers. ● Aggregate map outputs having similar keys at Reducers and store the result. ● Let's think of a useful example..
  • 21. Hadoop Components ● Job Tracker ● Name node ● Secondary Name Node ● Task Trackers ● Data Nodes
  • 22. It's Cool But .. ● Do we need to have a Hadoop cluster in order to try out BAM? ● Are we supposed to code Hadoop jobs to get BAM to summarize some thing? ● Answers 1) No Courtesy: http://goo.gl/QEnpN 2) No. Ok may be very rarely at best.
  • 23. Apache Hive ● You write SQL. (Almost) ● Let Hive convert to Map Reduce jobs. ● So Hive does two things ● Provide an abstraction for Hadoop Map Reduce ● Submit the analytic jobs to Hadoop ● Hive may spawn a Hadoop JVM locally or delegate to a Hadoop Cluster
  • 24. A Typical Hive Script
  • 26. Task Framework ● Run Hive scripts periodically ● Can specify as cron expressions/ predefined templates ● Handles task failover in case of node faliure ● Uses Zookeeper for coordination
  • 27. Zookeeper ● Can be run seperately or embedded within BAM
  • 29. Dashboard ● Making dashboard scale.
  • 33. Summary ● BAM ● Need for scalability ● Scaling BAM components ● Results ● BAM deployment patterns