As organizations modernize their data and analytics platforms, the data lake concept has gained momentum as a shared enterprise resource for supporting insights across multiple lines of business. The perception is that data lakes are vast, slow-moving bodies of data, but innovations like Apache Kafka for streaming-first architectures put real-time data flows at the forefront. Combining real-time alerts and fast-moving data with rich historical analysis lets you respond quickly to changing business conditions with powerful data lake analytics to make smarter decisions.
Join this complimentary webinar with industry experts from 451 Research and Arcadia Data who will discuss:
- Business requirements for combining real-time streaming and ad hoc visual analytics.
- Innovations in real-time analytics using tools like Confluent’s KSQL.
- Machine-assisted visualization to guide business analysts to faster insights.
- Elevating user concurrency and analytic performance on data lakes.
- Applications in cybersecurity, regulatory compliance, and predictive maintenance on manufacturing equipment all benefit from streaming visualizations.
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Accelerating Data Lakes and Streams with Real-time Analytics
1. Arcadia Data. Proprietary and Confidential
Accelerating Data Lakes and Streams with Real-time Analytics
2. Arcadia Data. Proprietary and Confidential
Today’s Presenters
Matt Aslett
Research Director
Data Platforms and Analytics
Shant Hovsepian
Co-Founder, CTO
3. Arcadia Data. Proprietary and Confidential
Topics
1. Accelerating Data Lakes and Streams with Real-time Analytics
Matt Aslett, 451 Research
2. Native Visual Analytics for Data Lakes and Streams
Shant Hovsepian, Arcadia Data
3. Q&A
4. Arcadia Data. Proprietary and Confidential
a) Gathering knowledge - thinking about Hadoop or other scale-out data platforms.
b) Developing strategy - defining architecture, selecting tools.
c) Piloting - have big data analytics platform in place and beginning to experiment
d) Deployed - have defined use case and end-users are accessing and analyzing data
Poll 1 of 2: Where are you with your big data deployment?
5. Copyright (C) 2017 451 Research LLC
Accelerating Data Lakes and Streams with
Real-time Analytics
Matt Aslett, Research Director, Data Platforms & Analytics
6. Copyright (C) 2017 451 Research LLC
451 Research is a leading IT research & advisory company
6
Founded in 2000
300+ employees, including over 120 analysts
2,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
70,000+ IT professionals, business users and consumers in our research
community
Over 52 million data points published each quarter and 4,500+ reports
published each year
3,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions
of The 451 Group
Headquartered in New York City, with offices in London, Boston, San
Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia,
Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
11. Copyright (C) 2017 451 Research LLC
Data processing pipeline
11
Data
Ingestion
Data
Inventory
Data
Preparation
Data
Delivery
Data
Discovery
Data
Visualization
Self-Service
Data Management and Data Governance
12. Copyright (C) 2017 451 Research LLC
12
DATA-AS-A-SERVICE
PARTNERS
SUPPLIERS
IT
APPLICATIONS
DATA GOVERNANCE
Data lineage
Data inventoryData catalog
Data security Data quality
Data pipelines
DATA STEWARDS
ADVANCED ANALYTICS
DATA SCIENTISTS
SELF-SERVICE ANALYTICS
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA ANALYSTS
SELF-SERVICE
DATA PREPARATION
Data cleansing
Data harmonization
Data discovery
Collaboration
Data matching
Data enrichment
DATA LAKE
SCALE-OUT ANALYTICS ACCELERATION LAYER
15. Copyright (C) 2017 451 Research LLC
15
DATA-AS-A-SERVICE
PARTNERS
SUPPLIERS
IT
APPLICATIONS
DATA GOVERNANCE
Data lineage
Data inventoryData catalog
Data security Data quality
Data pipelines
DATA STEWARDS
ADVANCED ANALYTICS
DATA SCIENTISTS
SELF-SERVICE ANALYTICS
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA ANALYSTS
SELF-SERVICE
DATA PREPARATION
Data cleansing
Data harmonization
Data discovery
Collaboration
Data matching
Data enrichment
SCALE-OUT ANALYTICS ACCELERATION LAYER
DATA LAKE
16. Copyright (C) 2017 451 Research LLC
16
DECISION
MAKERS
DATA
ANALYSTS
IT PROSENTERPRISE
APPLICATIONS
DATA
WAREHOUSE
Democratization
17. Copyright (C) 2017 451 Research LLC
Democratization
17
ENTERPRISE
APPLICATIONS
CLOUD STORAGE
MOBILE
APPS
BOTS
IOT DEVICES
AND SENSORS
SOCIAL
MEDIA
BUSINESS
USERS
DATA-DRIVEN
APPLICATIONS
DATA
SCIENTISTS
DECISION
MAKERS
HADOOP
SPARK
STREAMS
DATA
ANALYSTS
IT PROS
LOG AND
CLICKSTREAM
DATA
OT
USERS
DATA
WAREHOUSE
21. Arcadia Data. Proprietary and Confidential
Data
Warehouse
RDBMS
Streaming
Sources
NoSQL
Data Lake
Users
Other Data
Data Lakes Are Comprehensive
22. Arcadia Data. Proprietary and Confidential
a) Development tools (e.g., Spark, MapReduce)
b) SQL engines (e.g., Hive, Impala, Spark SQL, Drill)
c) Traditional BI tools (e.g., Tableau, Qlik, MicroStrategy)
d) Data-native, distributed BI platforms
e) Other (please specify in the comments section)
Poll 2 of 2: How do you plan to give users access to analyze their data?
23. Arcadia Data. Proprietary and Confidential
1. Move beyond batch Enable LIVE, real-time analytics
(… and addressing business problems requiring both real-time and historical analysis)
2. Provide direct, interactive visual analysis to 100s of users
3. Let the data do the talking machine-assisted insights
3 Ways Customers Accelerate Value from Data Lakes
23
24. Arcadia Data. Proprietary and Confidential
Tip #1: Move Beyond Batch (But Why Real-Time Analytics?)
24
I want to respond faster to
recent events.
I want to be alerted
immediately.
I want to outperform the
competition.
25. Arcadia Data. Proprietary and Confidential
Why Don’t You Currently Use Real-Time Analytics?
25
I don’t know how to get
started.
It seems hard to setup and
maintain.
I’m still trying to get the basics
working.
26. Arcadia Data. Proprietary and Confidential26
Don’t fear the challenges.
Real-time can be achieved
and provide real value.
30. 30
No One Has Time to Sit There and Look at a Dashboard!
It is better use of human time to interact and
explore instead of monitor.
We can have systems (computers aka AI)
automatically alert us if something is wrong.
31. Arcadia Data. Proprietary and Confidential
The world of real-time applications had always been relegated to proprietary heavy
weight applications.
Modern technologies have improved:
The Web played a big role
WebSockets, WebRTC, SSE, Polling
Programming Models have evolved
Transformative – takes input, transforms and produces output
Interactive – respond to external input at speed they set themselves
Reactive – respond to external input at speed of the environment
31
Why Is Real-Time Getting So Popular Today?
32. Arcadia Data. Proprietary and Confidential
Think of Visual Analytics somewhere between Charting/Plotting & BI/Reporting.
Visual Analytics is about interactive visual interfaces, this makes it more interactive
than BI/Reporting but less so than Charting/Plotting
Visual Analytics tends to be more business user friendly than Charting/Plotting but less
than BI/Reporting
Visual Analytics incorporates more sophisticated analytics than BI/Reporting but less
than Charting/Plotting
32
What is Visual Analytics?
33. Arcadia Data. Proprietary and Confidential
Real-Time Visualizations: Current Approaches and Challenges
33
Current Approaches
• Require an intermediary store
• Data stores like Solr, HBase, Cassandra, etc.,
used to hold streaming data
• Lack real-time visuals
• Manual requests for refreshes are required to
redraw the screen
• Depend heavily on developers
• Java/Scala/Python required for streaming
analytics
Challenges with These
Approaches
• Complicated to setup
• Data staging inhibits real-time access
• Requires data modeling for the updatable
store
• Polling limits scalability across many clients
• No ability to ask dynamic questions of the
stream
• Not self-service since significant IT work is
required
34. Arcadia Data. Proprietary and Confidential
Visual Analytics + Real-time =
Streaming Visual Analytics?
Not Quite Yet!
35. Arcadia Data. Proprietary and Confidential
The world of real-time applications had always been regulated to proprietary heavy
weight applications.
The Web as recently changed that for us.
WebSockets, WebRTC, SSE, Polling
Programming Models have evolved
Transformative – takes input, transforms and produces output
Interactive – respond to external input at speed they set themselves
Reactive – respond to external input at speed of the environment
35
Architectures
36. Arcadia Data. Proprietary and Confidential
Strategy 1: Lambda Architecture
Pros
Well known setup
Lets you leverage existing setup
Cons
✘Lacks ad hoc freedom
✘Tricky to reason about
✘Logic is duplicated in two places
✘Data consolidation must happen
✘Increased administration – Separate security models,
administration
✘Pulling/Polling Model
Real-time Store and
Analytic Store (RDBMS)
Together
37. Arcadia Data. Proprietary and Confidential
Strategy 2: Staging/Kappa Store
Stream to a fast
updatable store
Solr, Elastic, AeroSpike,
Kudu, Hbase, MemSQL
Pros
Client only reasons about a single store
One Copy in the K/V store
Can leverage flexible querying of the store
Lower latency
Cons
✘Schema evolution gets tricky
✘Separate security models
✘Still need to maintain two systems
✘Many tradeoffs for a K/V store
38. Arcadia Data. Proprietary and Confidential
Strategy 3: Native Streaming
Pros
Direct access to data in the streams
Linear scalability
Agility for analysts to ask arbitrary queries
Supports complex data types
Truly Real time
Lowest TCO: simplified architecture
Push based
Cons
✘Newer technology and approach
✘Still not quite GA
39. Arcadia Data. Proprietary and Confidential
Streams/Topics
KSQL
Real-Time Data
SQL Engine
Visualizations
Other Consumers
Arcadia Enterprise Provides True Streaming Visualizations
Coming
Soon
Reads directly from the
Apache Kafka stream via
KSQL, including complex
types:
{
“device_no”: 12345,
“timestamp”: “0000001”,
“readings”: {
“rpm”: 3500,
“temp”: 120,
“start_time”: “8/1/17:00:00”
}
}
40. Arcadia Data. Proprietary and Confidential
1. Alert response
• A real-time machine learning or alerting system noticed a situation and issues an alert, or incident for
subject matter expert to investigate.
• The user may want a real-time dashboard about what happened, i.e., cybersecurity, healthcare
monitoring, etc.
2. Pivot from historic forensic analysis into real time
• An end user is looking through deep historic information with traditional OLAP techniques and they find
something interesting.
• They then want to pivot into a real-time view of the data to test their theory, i.e., misbehaving device, bad
marketing campaign, fraud at an atm, etc.
40
Three Typical Streaming Capabilities
41. Arcadia Data. Proprietary and Confidential
3. Stream data enrichment
• Join stream data with existing table data to add more information.
• E.g., Join “machine_id” in stream and table to get all data about the machine.
41
Three Typical Streaming Capabilities (cont.)
machine_id: 123
temp: 125
timestamp: 0:00:00
machine_id: 123
location: Building 10
manufacturer: Acme
model: 8800
machine_id: 123
temp: 125
timestamp: 0:00:00
location: Building 10
manufacturer: Acme
model: 8800
Kafka stream Lookup table
42. Example Big Data Application Areas
Customer Intelligence
Customer 360
Click-stream analysis
Campaign management
IoT Analytics
Data center monitoring
Network performance
optimization
Predictive maintenance
Cybersecurity
Incident response
Forensic analysis
Greenfield threat hunting
● Cross-organizational model
validation
● Stress test evaluation
● Fundamental review of
trading book (FRTB)
● Trade surveillance
Financial Services
Regulatory Compliance
43. Arcadia Data. Proprietary and Confidential
Modern Data
Platform
Results
(100x
Faster)
Tip #2: Scale to 100s of Users with Smart Acceleration
Consumption Layer
Processing Layer
Smart Acceleration™
1. Start with exploration of raw data, no
need to determine design of
acceleration structures such as cubes
ahead of time
2. Recommendation engine generates
Analytical Views, AVs, (derived forms
of raw data) based on dynamic data
usage
3. Re-routes data queries to AVs
transparently providing automated
acceleration when needed for
production/high concurrency uses
Automatically modeled and maintained
within data platform
Keep logical data models simple
without needing to target specific data
cube structures
1
2
3Queries
Queries
automatically
redirected
Analytical Views
Recommendation
Engine
Stores Derived Forms of
Raw Data in File System
Raw Data Storage
44. Arcadia Data. Proprietary and Confidential
Tip #3: Instant Visuals -- Analytical Recommendations
Select data fields, then one click…
Visualization Builder Recommended Visualizations
shows which visuals best represent your data.
46. Arcadia Data. Proprietary and Confidential
1. Move beyond batch Enable LIVE, real-time analytics
(… and addressing business problems requiring both real-time and historical analysis)
2. Provide direct, interactive visual analysis to 100s of users
3. Let the data do the talking machine-assisted insights
Summary: Accelerate Value from Data Lakes
46
47. Q&A & Next Steps
Learn More – Resource Center
https://www.arcadiadata.com/resources
Try Arcadia Instant– Free Download
www.arcadiadata.com/Instant
Read our Blog:
https://www.arcadiadata.com/blog/
Follow Arcadia on Social:
@arcadiadata
See Arcadia in Action: