More Related Content Similar to Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus StreamAnalytix Webinar (20) More from Impetus Technologies (20) Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus StreamAnalytix Webinar2. Our Speakers Today
2
© 2014 2 Impetus Technologies
Dr. Vijay Agneeswaran, Ph.Director, Big Data
R&D
Anand Venugopal
Sr. Director,
Business
Development
Recorded version available at http://bit.ly/1wb9SZg
3. Agenda
Introductions
and
Background
Streaming
Analytics
Options
StreamAnalytix
Introduction
© 2014 3 Impetus Technologies
StreamAnalytix
Demo
Q&A
Recorded version available at http://bit.ly/1wb9SZg
4. Real-time Streaming Analytics Outlook
A 2014 survey data revealed a 66% increase in firms’ use of
streaming analytics in the past two years
70% of the most profitable companies will manage their business
processes using real-time predictive analytics or extreme
collaboration by 2016
Analysts believe, that it has taken 15 years or so for
companies to harness about 50% of the productivity
potential of the Internet, and the next 50% of
productivity gains likely requires connecting things
Recorded version available at http://bit.ly/1wb9SZg
© 2014 4 Impetus Technologies
5. Business Value of Analytics
Diminishes with the age of data
$$$ ?
Befor
e
• Predictive analytics based on current
events
• Value depends on accuracy
$$
NOW
• Real-time
• Certainty is high – REAL
• Value based on quick
response
$$$
Later
© 2014 5 Impetus Technologies
• Descriptive
• Diagnostic
• Least value
5
The drop is non-linear
Value of Data
Age of Data
Recorded version available at http://bit.ly/1wb9SZg
6. Business Value of Real-time Streaming
Analytics
Recorded version available at http://bit.ly/1wb9SZg
© 2014 6 Impetus Technologies
7. SECTION 1
Open Source Options for
Stream Processing
Recorded version available at http://bit.ly/1wb9SZg
© 2014 7 Impetus Technologies
8. Stream Processing Open Source Options
• S4 from Yahoo
• MillWheel from Google
• Samza from LinkedIn
• Storm
• Spark Streaming
Recorded version available at http://bit.ly/1wb9SZg
© 2014 8 Impetus Technologies
9. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Processing
Methodology
Processes and
dispatches
messages as soon
as they are
received
Treats streaming computations
as a series of deterministic batch
computations on small time
intervals
Recorded version available at http://bit.ly/1wb9SZg
© 2014 9 Impetus Technologies
10. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Processing Latency Lower Latency Higher Latency,
Higher Throughput
Recorded version available at http://bit.ly/1wb9SZg
© 2014 10 Impetus Technologies
11. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Availability Available through
the support of
YARN and MESOS
Available through the support of
YARN and MESOS
Recorded version available at http://bit.ly/1wb9SZg
© 2014 11 Impetus Technologies
12. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Complex Event
Processing
Run SQL-like
commands using
Esper
Spark SQL works on top of
Spark Streaming, still in Beta
Recorded version available at http://bit.ly/1wb9SZg
© 2014 12 Impetus Technologies
13. FEATURE STORM SPARK STREAMING
Intermediate Data
Storage
Uses ZeroMQ /
Netty for exchange
of data amongst
different Storm
topology tasks
Stores the intermediate data in-memory
© 2014 13 Impetus Technologies
in the form of RDDs
Storm and Spark Positioning
Recorded version available at http://bit.ly/1wb9SZg
14. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Sliding Window
Concept
Achievable through
Esper
Built-in support
Recorded version available at http://bit.ly/1wb9SZg
© 2014 14 Impetus Technologies
15. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Lambda
Architecture
May need separate
batch pipeline
Can have same pipeline for
batch and stream computations
Recorded version available at http://bit.ly/1wb9SZg
© 2014 15 Impetus Technologies
16. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Message
Processing
Semantics
Exactly once
messaging
achieved with
Trident
Exactly once messaging
achieved through RDD lineage
Recorded version available at http://bit.ly/1wb9SZg
© 2014 16 Impetus Technologies
17. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Fault Tolerance Highly fault tolerant
through the use of
Zookeeper cluster
that stores the
cluster and
topology state
Maintains state information by
writing metadata information of
the DStreams to HDFS directory
and periodic check-pointing
Recorded version available at http://bit.ly/1wb9SZg
© 2014 17 Impetus Technologies
18. Storm and Spark Positioning
FEATURE STORM SPARK STREAMING
Production
Deployments
Groupon, Twitter,
The Weather
Channel,
Infochimps, Aeris,
The Ladders,
Yahoo and many
more.
Sharethrough, Yahoo, Ooyala,
Conviva
Recorded version available at http://bit.ly/1wb9SZg
© 2014 18 Impetus Technologies
19. + Storm + Spark Neutral
Feature Storm Spark Streaming
Processing Methodology
Processes and dispatches messages as
soon as they are received
Treats streaming computations as a series of
deterministic batch computations on small time
intervals
Processing Latency Lower Latency Higher Latency, Higher throughput
Availability
Available through the support of YARN and
MESOS
Available through the support of YARN and MESOS
Complex Event Processing Run SQL-like commands using Esper
Spark SQL works on top of Spark Streaming, still in
Beta
Intermediate Data Storage
Uses ZeroMQ / Netty for exchange of data
amongst different Storm topology tasks
Stores the intermediate data in-memory in the form of
RDDs
Sliding Window Concept Achievable through Esper Built-in support available
Lambda Architecture May need separate batch pipeline
Can have same pipeline for batch and stream
computations
Message Processing
Semantics
Exactly once messaging achieved with
Trident
Exactly once messaging achieved through RDD
lineage
Fault Tolerance
Highly fault tolerant through the use of
Zookeeper cluster that stores the cluster and
topology state
Maintains state information by writing metadata
information of the DStreams to HDFS directory and
periodic check-pointing
Production Deployments
Groupon, Twitter, The Weather Channel,
Infochimps, Aeris, The Ladders, Yahoo and
many more.
Sharethrough, Yahoo, Ooyala, Conviva
Recorded version available at http://bit.ly/1wb9SZg
© 2014 19 Impetus Technologies
21. StreamAnalytix – gives you a future proof option
STORM SPARK OTHERS
NOW
Time
Recorded version available at http://bit.ly/1wb9SZg
© 2014 21 Impetus Technologies
22. SECTION 2
Introduction to StreamAnalytix
Recorded version available at http://bit.ly/1wb9SZg
© 2014 22 Impetus Technologies
23. "Default" Approaches to Streaming Analytics
Proprietary Platforms
• No leverage of Open
Source
• Vendor lock-in
• Could be high cost
• Limited flexibility
" Do it yourself "
• Native Open source
• No vendor support
• Integration & maintenance
nightmare
• Significant delays in time-to-market
Recorded version available at http://bit.ly/1wb9SZg
© 2014 23 Impetus Technologies
24. The 3rd Approach: Best of Both Worlds
StreamAnalytix mitigates the disadvantages of the "default" approaches and
offers the benefits of both worlds to enterprises for streaming analytics.
Recorded version available at http://bit.ly/1wb9SZg
© 2014 24 Impetus Technologies
25. StreamAnalytix Platform Benefits
An “App Server” for real-time apps
Focus on your business logic - leave infra to
© 2014 25 Impetus Technologies
us
Handle all the 3V’s of Big Data on one
platform
12-18 months of time to market acceleration
Seamless integration with Hadoop and
NoSQL
Recorded version available at http://bit.ly/1wb9SZg
27. Key Features
High Speed
Data
Ingestion
Elastic
Scaling –
Volume,
Velocity
Data Parsing
- Variety
© 2014 27 Impetus Technologies
Pluggable
Persistence
Real-time
Index and
Search
Dynamic
Message
Routing
Rule Based
Alert
Pluggable
Workflow
Management
Fault
Tolerance
and Data
Integrity
Optimized for
High
Performance
Recorded version available at http://bit.ly/1wb9SZg
28. SECTION 3
Demo
Recorded version available at http://bit.ly/1wb9SZg
© 2014 28 Impetus Technologies
29. Recap of Key StreamAnalytix Functions
Read
• Ready-to-use connector
• Support for AMQP, KAFKA connectors
• Supports XML, JSON, DELIMITED, etc.
Analyze
• Analyze data in near real-time
• Rule based data routing and alerts
• Workflow management
Persist
• Cross vendor persistence abstraction
• Support for HBase, Cassandra, Oracle NoSQL DB
• Support of indexing and distributed caching
Reports
• Near real-time data visualization
• Historical data visualization
• Search on moving and historical data
Recorded version available at http://bit.ly/1wb9SZg
© 2014 29 Impetus Technologies
30. Value Adds over Open Source – 1 of 3
© 2014 30 Impetus Technologies
Full System Integration, Testing,
Benchmarking and Technical Support
• Kafka and/or Rabbit MQ
• Apache Storm
• Data-storage and Indexing layer -
abstraction and integration
• Alerting and CEP Framework
• Real-time Web based visualization
framework
• Distributed and off-heap caching
for reliability and shared-state
solutions
• BPM / Workflow integration
System
Level
Performance
Data compression and
other optimizations to
minimize
latency of event
processing
Admin Tool - Automated
Installation, Provisioning,
Management and Monitoring tool
for all components and
infrastructure with multi-cluster
management from one station
Monitoring and
Automation of full
system availability
Recorded version available at http://bit.ly/1wb9SZg
31. Value Adds over Open Source – 2 of 3
Visual definition of
incoming message
formats, fields
Storm
Workspaces – the
beginning of multi-tenancy
PMML integration over
Storm – UI to import
and run PMML models
on Storm
© 2014 31 Impetus Technologies
Visual definition
of Alerting,
regular
expressions
Visual topology creation
and integration with
Alerting, custom-logic,
web UI
Multi-topology creation,
linking and dynamic “run-time”
data routing
Application and pipeline
monitoring and management
(visual interface)
Recorded version available at http://bit.ly/1wb9SZg
32. Value Adds over Open Source – 3 of 3
© 2014 32 Impetus Technologies
• Web-socket framework for custom
real-time visualization
• Alerts, Raw Data, Enriched data
results
Persistence and Indexing
Real-time Visualization Support
• Abstraction layer supporting any
NoSQL database as persistence store
• Abstraction layer supporting Elastic
search (Solr support soon)
• Indexing and Persistence policy
definition at system level and per field
• Encryption support configurable at
column level
• Fast search support by orchestration
between Index and Storage
• ‘oData’ interface support or 3rd party
BI tools for offline analytics
Recorded version available at http://bit.ly/1wb9SZg
33. Licensing Model and Controls
TIME
• Perpetual license + Annual support and maintenance from second
year
OR
• Annual Subscription (includes support)
QUANTITY
• Total number of cores
OR
• Total number of nodes (plus a max cores per node)
FUNCTIONALITY
• Only Indexing / Only Persistence/ Indexing + Persistence
• Admin tool – Yes/ No
Recorded version available at http://bit.ly/1wb9SZg
© 2014 33 Impetus Technologies
34. Recorded version available at
http://bit.ly/1nMw8nQ
For general inquiries about the StreamAnalytix platform and related
services reach us at inquiry@streamanalytix.com
© 2014 34 Impetus Technologies
Editor's Notes Forrester ,
[Punit] – change the title to ‘Real-Time Streaming Analytics (RTSA) Outlook’ [Punit] – remove the line ‘An Editorial…. From Impetus Labs:’ - done
9 min av can be faster exactly at 6 mins
[Punit] – remove this slide
[PB ]Vijay will spend less than 1.5 mins on this one
[Punit] – this slide not needed. Slide#25 will suffice. See notes for slide#25
[PB] These are the talking slides as there is otherwise to much text on the single table slide # 19
AV and Vijay evaluated each and every point again today . It is either neutral or favors Storm
Need to end the poll by 25 mins - are you implementing rtsa currently . If so which is preferred option – Spark or Storm , to discuss this with AV Should we say 18 – 20 months . It reflect how quickly we can have the next product release [Punit] – remove the grey background Check with Ratish on distributed cache image
[Punit] – rename Ankush to Cluster Provisioning Tool
Correct the spelling of ‘Storm’. It currently reads ‘Story’ [Punit] – insert a slide after this for S-Ax as to what is its current stance in real-time world
Mention things like: adopted Storm because it is enterprise-ready. Closely watching the spark community and will adopt spark as well
Helps enterprises to become hassle free of technology upgrades and versions compatibility problems
Riding over proven popular open-source stack etc..
[PB] we are doing exactly the same in slide 23 Start at 38 mins ends at 45 mins [Punit] – we do not need this slide. The next slide covers it all. In case we do keep it, make it less verbose. No need to write complete english sentences like ‘Ready to Use Connector’ or ‘Out of the box support’ etc.
This graphic needs to be adjusted. [Punit] –Capitalize the title to ‘Licensing Model and Controls’ (title caps)