SlideShare a Scribd company logo
1 of 44
Introduction to
Large Scale Data
Analysis and
WSO2 Analytics
Platform
Srinath Perera
Director Research WSO2, Apache Member
(@srinath_perera)
srinath@wso2.com
At Indiana University Bloomington
Who We are?
We are an opensource Middleware
company
- We build systems upon which others
build their systems
Venture funded – Intel Capital, Cisco,
Toba Capital
400+ people & Offices at Silicon valley, Sri Lanka, London and
Bloomington
Customers including Banks, Aircraft Manufacturers, Governments
(State and Federal), Media Companies, Telco, Retail, Healthcare ..
Outline
Introduction to Big Data
The Problem we are trying to solve
WSO2 Big Data Platform
Next steps
A Day inYour Life
Think about a day in your life?
- What is the best road to take?
- Would there be any bad weather?
- How to invest my money?
- How is my health?
There are many decisions that you can do
better if only you can access the data and
process them.
http://www.flickr.com/photos/kcolwell/55124616
CC licence
Internet ofThings
Currently th physical world and
software worlds are detached
Internet of things promises to bridge
this
- It is about sensors and actuators
everywhere
- In your fridge, in your blanket, in your
chair, in your carpet.. Yes even in your
socks
- Umbrella that light up when there is
rain and medicine cups
What can We do with Big Data?
Optimize (World is inefficient)
- 30% food wasted farm to plate
- GE Save 1% initiative (http://goo.gl/eYC0QE )
- Trains => 2B/ year
- US healthcare => 20B/ year
Save lives
- Weather, Disease identification, Personalized treatment
Technology advancement
- Most high tech research are done via simulations
Big Data Architecture
Big data ProcessingTechnologies
Landscape
(Batch) Analytics
Scientists are doing this for 25 year with
MPI (1991) on special Hardware
- OpenMPI is being done at IU!
Took off with Google’s MapReduce
paper (2004), Apache Hadoop, Hive and
whole eco system created.
 It was successful, So we are here!!
But, processing takes time.
Usecase:Targeted Advertising
Analytics Implemented with MapReduce or Queries
- Min, Max, average, correlation, histograms, might join or group data in
many ways
- Heatmaps, temporal trends
Key Performance indicators (KPIs)
- E.g. Profit per square feet for retail
Usecase: Big Data for development
Done using CDR data
People density noon vs. midnight
(red => increased, blue =>
decreased)
Urban Planning
- People distribution
- Mobility
- Waste Management
- E.g. see http://goo.gl/jPujmM
From: http://lirneasia.net/2014/08/what-does-big-data-say-about-sri-lanka/
Value of some Insights degrade Fast!
For some usecases ( e.g. stock markets, traffic, surveillance, patient
monitoring) the value of insights degrades very quickly with time.
- E.g. stock markets and speed of light
We need technology that can produce
outputs fast
- Static Queries, but need very fast output
(Alerts, Realtime control)
- Dynamic and Interactive Queries ( Data
exploration)
Predictive Analytics
 If we know how to solve a problem, that is if we know
a finite set of rules, then we can programs it.
 For some problems (e.g. Drive a car, character
recognition), we do not know a finite fix rule set.
 Instead of programming, we give lot of examples and
ask the computer to learn (often called Machine
Learning)
 Lot of tools
- R ( Statistical language)
- Sci-kit learn (Phython)
- Apache Spark’s MLBase and Apache Mahout (Java)
Usecase: Predictive Maintenance
Idea is to fix the problem before it
happens, avoiding expensive
downtimes
- Airplanes, turbines, windmills
- Construction Equipment
- Car, Golf carts
How
- Build a model for normal operation
and compare deviation
- Match against known error patterns
Problem we are trying to
Solve!
Build a platform using which others can
build their analytics systems
- Collect, Analyze, Communicate
- End to end, starts from humans and ends
with humans
Different Audiences
- Technical (Developers)
- Non-technical (CXOs, sales, analysts)
There are two things you need to
know about business,: make
something users love and make
more than you spend.
--Paul Graham
( Lisp, Y-combinator)
Running Example
Monitor Temperature and hot airflow across multiple buildings (e.g.
central AC)
- More people => hot
Analytics
- Historical behavior of temperature by the hour
- Alerts if temperature falls too much or too high
- Modeling and predicating temperature to adjust proactively
define TemperatureStream(ts long, buildingNo long, t double);
define AirflowStream(ts long, buildingNo long,
aflow double, aT);
Collect Data
One Sensor API to publish events
- REST, Thrift, Java, JMS, Kafka
- Java clients, java script clients*
First you define streams (think it
as a infinite table in SQL DB)
Then send events via API
* Challenges ( performance,
guaranteed delivery, scale)
Can send to batch pipeline, Realtime pipeline or both via
configuration!
Collecting Data: Example
Java example: create and send events
Events send asynchronously
See client given in http://goo.gl/vIJzqc for more info
Agent agent = new Agent(agentConfiguration);
publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );
StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION);
definition.addPayloadData("sid", STRING);
...
publisher.addStreamDefinition(definition);
...
Event event = new Event();
event.setPayloadData(eventData);
publisher.publish(STREAM_NAME, VERSION, event); Send events
Define Stream
Initialize Stream
Batch Analytics: Spark
Two frameworks: Hadoop (http://hadoop.apache.org ) and
Spark (https://spark.apache.org )
- Hadoop is a MapReduce implementation
Spark is faster (30X and ) and much more flexible.
They set a record at Gray Sort (100TB) 3X faster with 10X less
machines, http://goo.gl/r5LGvD
For Hadoop and MapReduce resources, Google it.
file = spark.textFile("hdfs://...”)
file.flatMap(tsToHourFunction)
.reduceByKey(lambda a, b: a+b)
SQL like Queries: Hive
Apache Hive provides a SQL like data
processing language
Since many understands SQL, Hive
made large scale data processing Big
Data accessible to many
Expressive, short, and sweet.
Define core operations that covers 90%
of problems
Lets experts dig in when they like! (via
User Defined functions)
HourlyTemperature Average
Hive compile the SQL like query to set of MapReduce jobs running
in Hadoop or Spark (in WSO2 BAM from 15, Q2 release)
insert overwrite table TemperatureHistory
select hour, average(t) as avgT, buildingId
from TemperatureStream group by buildingId, getHour(ts);
Complex Event Processing
Operators: Filters
Assume a temperature stream
Here weather:convertFtoC() is a
user defined function. They are
used to extend the language.
define stream TemperatureStream(ts long, temp double);
from TemperatureStream[weather:convertFtoC(temp) > 30.0)
and roomNo != 2043]
select roomNo, temp
insert into HotRoomsStream ;
Usecases:
- Alerts , thresholds (e.g. Alarm on
high temperature)
- Preprocessing: filtering,
transformations (e.g. data cleanup)
Operators:Windows and Aggregation
Support many window types
- Batch Windows, Sliding windows, Custom windows
Usecases
- Simple counting (e.g. failure count)
- Counting with Windows ( e.g. failure count every hour)
from TemperatureStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
Operators: Patterns
Models a followed by relation: e.g.
event A followed by event B
Very powerful tool for tracking
and detecting patterns
from every (a1 = TemperatureStream)
-> a2 = TemperatureStream [temp > a1.temp + 5 ]
within 1 day
select a2.ts as ts, a2.temp – a1.temp as diff
insert into HotDayAlertStream;
Usecases
- Detecting Event Sequence Patterns
- Tracking
- Detect trends
Operators: Joins
Join two data streams based on a condition and windows
Usecases
- Data Correlation, Detect missing events, detecting erroneous data
- Joining event streams
from TemperatureStream [temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.length(1) as R on
T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action insert into
RegulatorActionStream
Operators:Access Data from the Disk
Event tables allow users to map a database to a window and join a
data stream with the window
Usecases
- Merge with data in a database, collect, update data conditionally
define table HistTempTable(day long, avgT double);
from TemperatureStream#window.length(1) join OldTempTable
on getDayOfYear(ts) == HistTempTable.day && ts > avgT
select ts, temp
insert into PurchaseUserStream ;
Realtime Analytics Patterns
Simple counting (e.g. failure count)
Counting with Windows ( e.g. failure count every hour)
Preprocessing: filtering, transformations (e.g. data cleanup)
Alerts , thresholds (e.g. Alarm on high temperature)
Data Correlation, Detect missing events, detecting erroneous data
(e.g. detecting failed sensors)
Joining event streams (e.g. detect a hit on soccer ball)
Merge with data in a database, collect, update data conditionally
Realtime Analytics Patterns (contd.)
Detecting Event Sequence Patterns (e.g. small transaction followed
by large transaction)
Tracking - follow some related entity’s state in space, time etc. (e.g.
location of airline baggage, vehicle, tracking wild life)
 Detect trends – Rise, turn, fall, Outliers, Complex trends like triple
bottom etc., (e.g. algorithmic trading, SLA, load balancing)
Learning a Model (e.g. Predictive maintenance)
Predicting next value and corrective actions (e.g. automated car)
Predictive Analytics
 Build models and use them with
WSO2 CEP, BAM and ESB using
upcoming WSO2 Machine Learner
Product ( 2015 Q2)
 Build model using R, export them as
PMML, and use within WSO2 CEP
 Call R Scripts from CEP queries
 Regression and Anomaly Detection
Operators in CEP
Predictive Analytics
 WSO2 Machine Learner provide
an wizard to explore and build
model
 E.g. Build a model to predict next 15
minutes temperature
- Trivial Option : (historical mean
+last 15m mean)/2
- Better model via ARIMA from time
series analysis
 To know more, take a ML class
Communicate:
Dashboards
 Idea is to given the “Overall idea” in a glance
(e.g. car dashboard)
 Support for personalization, you can build
your own dashboard.
 Also the entry point for Drill down
 How to build?
- Dashboard via Google Gadget and content
via HTML5 + java scripts
- Use WSO2 User Engagement Server to
build a dashboard. (or a JSP or PHP)
- Use charting libraries like Vega or D3
Communicate:
Dashboards
 Idea is to given the “Overall idea” in a glance
(e.g. car dashboard)
 Support for personalization, you can build
your own dashboard.
 Also the entry point for Drill down
 How to build?
- Dashboard via Google Gadget and content
via HTML5 + java scripts
- Use WSO2 User Engagement Server to
build a dashboard. (or a JSP or PHP)
- Use charting libraries like Vega or D3
Communicate:Alerts
 Detecting conditions can be done via
CEP Queries
 Key is the “Last Mile”
- Email
- SMS
- Push notifications to a UI
- Pager
- Trigger physical Alarm
 How?
- Select Email sender “Output Adaptor” from CEP, or send from CEP to ESB, and ESB has lot of
connectors
Communicate:APIs
 With mobile Apps, most data are
exposed and shared as APIs
(REST/Json ) to end users.
 Following are some challenges
- Security and Permissions
- API Discovery
- Billing, throttling, quote
- SLA enforcement
 How?
- Write data to a database from CEP event tables
- Build Services via WSO2 Data Service
- Expose them as APIs via API Manager
Smart Home
2015 yearly DEBS (Distributed Event Based Systems)
DEBS Grand Challenge (http://goo.gl/0htxlj)
Smart Home electricity data: 2000 sensors, 40 houses,
4 Billion events
We posted (400K events/sec) and close to one million
distributed throughput with 4 nodes.
WSO2 CEP based solution is one of the four finalists
(with Dresden University of Technology, Fraunhofer
Institute, and Imperial College London)
Only generic solution to become a finalist
Case Study: Realtime Soccer Analysis
Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
Case Study:TFLTraffic Analysis
Built using TFL
( Transport for
London) open data
feeds.
http://goo.gl/04tX6k
http://goo.gl/9xNiCm
WSO2 Big Data Analytics Platform
Conclusion
Goal: Build a platform using
which others can build their
analytics systems
- End to end, starts from humans
and ends with humans
Whole platform is opensource
under Apache License
What can you do with the
platform?
- Solve hard problems, build Great
Apps with the platform
- Add and contribute extensions to
the platform (e.g. GSoc
http://goo.gl/QNFP6Y )
- Fix problems ( Patches)
Find us at architecture@wso2.org list or Stackoverflow (tag
wso2)
Questions?

More Related Content

What's hot

Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming ApplicationsC4Media
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
 
IEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On TutorialIEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On TutorialSrinath Perera
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceDatabricks
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio
 
Spark Streaming and Expert Systems
Spark Streaming and Expert SystemsSpark Streaming and Expert Systems
Spark Streaming and Expert SystemsJim Haughwout
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Tin Ho
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Sparkelephantscale
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Stratio
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processingYogi Devendra Vyavahare
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesData Driven Innovation
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 

What's hot (20)

Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming Applications
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
 
IEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On TutorialIEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On Tutorial
 
Sensing the world with data of things
Sensing the world with  data of thingsSensing the world with  data of things
Sensing the world with data of things
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer Experience
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
 
Spark Streaming and Expert Systems
Spark Streaming and Expert SystemsSpark Streaming and Expert Systems
Spark Streaming and Expert Systems
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming Spark Summit - Stratio Streaming
Spark Summit - Stratio Streaming
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Introduction to Real-time data processing
Introduction to Real-time data processingIntroduction to Real-time data processing
Introduction to Real-time data processing
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple Sources
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 

Similar to Introduction to Large Scale Data Analysis with WSO2 Analytics Platform

Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Srinath Perera
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Stavros Kontopoulos
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera
 
Become Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-ServiceBecome Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-ServiceMammoth Data
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloudSankar Nagarajan
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
Distributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLDistributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLJorge Cardoso
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopRemas Ittahir
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightEduardo Castro
 

Similar to Introduction to Large Scale Data Analysis with WSO2 Analytics Platform (20)

Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
Become Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-ServiceBecome Data Driven With Hadoop as-a-Service
Become Data Driven With Hadoop as-a-Service
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloud
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Distributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLDistributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using ML
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 

More from Srinath Perera

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingSrinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsSrinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesSrinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsSrinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of BlockchainSrinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesSrinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata EraSrinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksSrinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeSrinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies TimelineSrinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglySrinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through AnalyticsSrinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
 

More from Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of Serverless
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata Era
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies Timeline
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

Introduction to Large Scale Data Analysis with WSO2 Analytics Platform

  • 1. Introduction to Large Scale Data Analysis and WSO2 Analytics Platform Srinath Perera Director Research WSO2, Apache Member (@srinath_perera) srinath@wso2.com At Indiana University Bloomington
  • 2. Who We are? We are an opensource Middleware company - We build systems upon which others build their systems Venture funded – Intel Capital, Cisco, Toba Capital 400+ people & Offices at Silicon valley, Sri Lanka, London and Bloomington Customers including Banks, Aircraft Manufacturers, Governments (State and Federal), Media Companies, Telco, Retail, Healthcare ..
  • 3. Outline Introduction to Big Data The Problem we are trying to solve WSO2 Big Data Platform Next steps
  • 4. A Day inYour Life Think about a day in your life? - What is the best road to take? - Would there be any bad weather? - How to invest my money? - How is my health? There are many decisions that you can do better if only you can access the data and process them. http://www.flickr.com/photos/kcolwell/55124616 CC licence
  • 5.
  • 6. Internet ofThings Currently th physical world and software worlds are detached Internet of things promises to bridge this - It is about sensors and actuators everywhere - In your fridge, in your blanket, in your chair, in your carpet.. Yes even in your socks - Umbrella that light up when there is rain and medicine cups
  • 7. What can We do with Big Data? Optimize (World is inefficient) - 30% food wasted farm to plate - GE Save 1% initiative (http://goo.gl/eYC0QE ) - Trains => 2B/ year - US healthcare => 20B/ year Save lives - Weather, Disease identification, Personalized treatment Technology advancement - Most high tech research are done via simulations
  • 10. (Batch) Analytics Scientists are doing this for 25 year with MPI (1991) on special Hardware - OpenMPI is being done at IU! Took off with Google’s MapReduce paper (2004), Apache Hadoop, Hive and whole eco system created.  It was successful, So we are here!! But, processing takes time.
  • 11. Usecase:Targeted Advertising Analytics Implemented with MapReduce or Queries - Min, Max, average, correlation, histograms, might join or group data in many ways - Heatmaps, temporal trends Key Performance indicators (KPIs) - E.g. Profit per square feet for retail
  • 12. Usecase: Big Data for development Done using CDR data People density noon vs. midnight (red => increased, blue => decreased) Urban Planning - People distribution - Mobility - Waste Management - E.g. see http://goo.gl/jPujmM From: http://lirneasia.net/2014/08/what-does-big-data-say-about-sri-lanka/
  • 13. Value of some Insights degrade Fast! For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrades very quickly with time. - E.g. stock markets and speed of light We need technology that can produce outputs fast - Static Queries, but need very fast output (Alerts, Realtime control) - Dynamic and Interactive Queries ( Data exploration)
  • 14.
  • 15. Predictive Analytics  If we know how to solve a problem, that is if we know a finite set of rules, then we can programs it.  For some problems (e.g. Drive a car, character recognition), we do not know a finite fix rule set.  Instead of programming, we give lot of examples and ask the computer to learn (often called Machine Learning)  Lot of tools - R ( Statistical language) - Sci-kit learn (Phython) - Apache Spark’s MLBase and Apache Mahout (Java)
  • 16. Usecase: Predictive Maintenance Idea is to fix the problem before it happens, avoiding expensive downtimes - Airplanes, turbines, windmills - Construction Equipment - Car, Golf carts How - Build a model for normal operation and compare deviation - Match against known error patterns
  • 17. Problem we are trying to Solve! Build a platform using which others can build their analytics systems - Collect, Analyze, Communicate - End to end, starts from humans and ends with humans Different Audiences - Technical (Developers) - Non-technical (CXOs, sales, analysts) There are two things you need to know about business,: make something users love and make more than you spend. --Paul Graham ( Lisp, Y-combinator)
  • 18.
  • 19. Running Example Monitor Temperature and hot airflow across multiple buildings (e.g. central AC) - More people => hot Analytics - Historical behavior of temperature by the hour - Alerts if temperature falls too much or too high - Modeling and predicating temperature to adjust proactively define TemperatureStream(ts long, buildingNo long, t double); define AirflowStream(ts long, buildingNo long, aflow double, aT);
  • 20. Collect Data One Sensor API to publish events - REST, Thrift, Java, JMS, Kafka - Java clients, java script clients* First you define streams (think it as a infinite table in SQL DB) Then send events via API * Challenges ( performance, guaranteed delivery, scale) Can send to batch pipeline, Realtime pipeline or both via configuration!
  • 21. Collecting Data: Example Java example: create and send events Events send asynchronously See client given in http://goo.gl/vIJzqc for more info Agent agent = new Agent(agentConfiguration); publisher = new AsyncDataPublisher("tcp://hostname:7612", .. ); StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION); definition.addPayloadData("sid", STRING); ... publisher.addStreamDefinition(definition); ... Event event = new Event(); event.setPayloadData(eventData); publisher.publish(STREAM_NAME, VERSION, event); Send events Define Stream Initialize Stream
  • 22. Batch Analytics: Spark Two frameworks: Hadoop (http://hadoop.apache.org ) and Spark (https://spark.apache.org ) - Hadoop is a MapReduce implementation Spark is faster (30X and ) and much more flexible. They set a record at Gray Sort (100TB) 3X faster with 10X less machines, http://goo.gl/r5LGvD For Hadoop and MapReduce resources, Google it. file = spark.textFile("hdfs://...”) file.flatMap(tsToHourFunction) .reduceByKey(lambda a, b: a+b)
  • 23. SQL like Queries: Hive Apache Hive provides a SQL like data processing language Since many understands SQL, Hive made large scale data processing Big Data accessible to many Expressive, short, and sweet. Define core operations that covers 90% of problems Lets experts dig in when they like! (via User Defined functions)
  • 24. HourlyTemperature Average Hive compile the SQL like query to set of MapReduce jobs running in Hadoop or Spark (in WSO2 BAM from 15, Q2 release) insert overwrite table TemperatureHistory select hour, average(t) as avgT, buildingId from TemperatureStream group by buildingId, getHour(ts);
  • 26. Operators: Filters Assume a temperature stream Here weather:convertFtoC() is a user defined function. They are used to extend the language. define stream TemperatureStream(ts long, temp double); from TemperatureStream[weather:convertFtoC(temp) > 30.0) and roomNo != 2043] select roomNo, temp insert into HotRoomsStream ; Usecases: - Alerts , thresholds (e.g. Alarm on high temperature) - Preprocessing: filtering, transformations (e.g. data cleanup)
  • 27. Operators:Windows and Aggregation Support many window types - Batch Windows, Sliding windows, Custom windows Usecases - Simple counting (e.g. failure count) - Counting with Windows ( e.g. failure count every hour) from TemperatureStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  • 28. Operators: Patterns Models a followed by relation: e.g. event A followed by event B Very powerful tool for tracking and detecting patterns from every (a1 = TemperatureStream) -> a2 = TemperatureStream [temp > a1.temp + 5 ] within 1 day select a2.ts as ts, a2.temp – a1.temp as diff insert into HotDayAlertStream; Usecases - Detecting Event Sequence Patterns - Tracking - Detect trends
  • 29. Operators: Joins Join two data streams based on a condition and windows Usecases - Data Correlation, Detect missing events, detecting erroneous data - Joining event streams from TemperatureStream [temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.length(1) as R on T.roomNo == R.roomNo select T.roomNo, R.deviceID, ‘start’ as action insert into RegulatorActionStream
  • 30. Operators:Access Data from the Disk Event tables allow users to map a database to a window and join a data stream with the window Usecases - Merge with data in a database, collect, update data conditionally define table HistTempTable(day long, avgT double); from TemperatureStream#window.length(1) join OldTempTable on getDayOfYear(ts) == HistTempTable.day && ts > avgT select ts, temp insert into PurchaseUserStream ;
  • 31. Realtime Analytics Patterns Simple counting (e.g. failure count) Counting with Windows ( e.g. failure count every hour) Preprocessing: filtering, transformations (e.g. data cleanup) Alerts , thresholds (e.g. Alarm on high temperature) Data Correlation, Detect missing events, detecting erroneous data (e.g. detecting failed sensors) Joining event streams (e.g. detect a hit on soccer ball) Merge with data in a database, collect, update data conditionally
  • 32. Realtime Analytics Patterns (contd.) Detecting Event Sequence Patterns (e.g. small transaction followed by large transaction) Tracking - follow some related entity’s state in space, time etc. (e.g. location of airline baggage, vehicle, tracking wild life)  Detect trends – Rise, turn, fall, Outliers, Complex trends like triple bottom etc., (e.g. algorithmic trading, SLA, load balancing) Learning a Model (e.g. Predictive maintenance) Predicting next value and corrective actions (e.g. automated car)
  • 33. Predictive Analytics  Build models and use them with WSO2 CEP, BAM and ESB using upcoming WSO2 Machine Learner Product ( 2015 Q2)  Build model using R, export them as PMML, and use within WSO2 CEP  Call R Scripts from CEP queries  Regression and Anomaly Detection Operators in CEP
  • 34. Predictive Analytics  WSO2 Machine Learner provide an wizard to explore and build model  E.g. Build a model to predict next 15 minutes temperature - Trivial Option : (historical mean +last 15m mean)/2 - Better model via ARIMA from time series analysis  To know more, take a ML class
  • 35. Communicate: Dashboards  Idea is to given the “Overall idea” in a glance (e.g. car dashboard)  Support for personalization, you can build your own dashboard.  Also the entry point for Drill down  How to build? - Dashboard via Google Gadget and content via HTML5 + java scripts - Use WSO2 User Engagement Server to build a dashboard. (or a JSP or PHP) - Use charting libraries like Vega or D3
  • 36. Communicate: Dashboards  Idea is to given the “Overall idea” in a glance (e.g. car dashboard)  Support for personalization, you can build your own dashboard.  Also the entry point for Drill down  How to build? - Dashboard via Google Gadget and content via HTML5 + java scripts - Use WSO2 User Engagement Server to build a dashboard. (or a JSP or PHP) - Use charting libraries like Vega or D3
  • 37. Communicate:Alerts  Detecting conditions can be done via CEP Queries  Key is the “Last Mile” - Email - SMS - Push notifications to a UI - Pager - Trigger physical Alarm  How? - Select Email sender “Output Adaptor” from CEP, or send from CEP to ESB, and ESB has lot of connectors
  • 38. Communicate:APIs  With mobile Apps, most data are exposed and shared as APIs (REST/Json ) to end users.  Following are some challenges - Security and Permissions - API Discovery - Billing, throttling, quote - SLA enforcement  How? - Write data to a database from CEP event tables - Build Services via WSO2 Data Service - Expose them as APIs via API Manager
  • 39. Smart Home 2015 yearly DEBS (Distributed Event Based Systems) DEBS Grand Challenge (http://goo.gl/0htxlj) Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events We posted (400K events/sec) and close to one million distributed throughput with 4 nodes. WSO2 CEP based solution is one of the four finalists (with Dresden University of Technology, Fraunhofer Institute, and Imperial College London) Only generic solution to become a finalist
  • 40. Case Study: Realtime Soccer Analysis Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
  • 41. Case Study:TFLTraffic Analysis Built using TFL ( Transport for London) open data feeds. http://goo.gl/04tX6k http://goo.gl/9xNiCm
  • 42. WSO2 Big Data Analytics Platform
  • 43. Conclusion Goal: Build a platform using which others can build their analytics systems - End to end, starts from humans and ends with humans Whole platform is opensource under Apache License What can you do with the platform? - Solve hard problems, build Great Apps with the platform - Add and contribute extensions to the platform (e.g. GSoc http://goo.gl/QNFP6Y ) - Fix problems ( Patches) Find us at architecture@wso2.org list or Stackoverflow (tag wso2)