Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Streaming Goes Mainstream:
Ellen Friedm...
®
© 2016 MapR Technologies 2®
© 2016 MapR Technologies 2
Contact Information
Ellen Friedman
Solutions Consultant, MapR Tec...
®
© 2016 MapR Technologies 3®
© 2016 MapR Technologies 3
Please support women in tech – help build
girls’ dreams of what t...
®
© 2016 MapR Technologies 4®
© 2016 MapR Technologies 4
The	
  entire	
  industry	
  is	
  undergoing	
  a	
  
career	
  ...
®
© 2016 MapR Technologies 5®
© 2016 MapR Technologies 5
Big Data has caught on
•  Potential value of big data approaches ...
®
© 2016 MapR Technologies 6®
© 2016 MapR Technologies 6
Why stream?
®
© 2016 MapR Technologies 7®
© 2016 MapR Technologies 7
“Our best understanding comes when
our conclusions fit the eviden...
®
© 2016 MapR Technologies 8®
© 2016 MapR Technologies 8
Life doesn’t happen in batches…
®
© 2016 MapR Technologies 9®
© 2016 MapR Technologies 9
Images © Friedman & Dunning from O’Reilly book A New Look at Anom...
®
© 2016 MapR Technologies 10®
© 2016 MapR Technologies 10
Big data project: Maury’s Wind and Currents charts
- Value from...
®
© 2016 MapR Technologies 11®
© 2016 MapR Technologies 11
Modern big data navigation: WAZE
•  Uses real-time streaming tr...
®
© 2016 MapR Technologies 12®
© 2016 MapR Technologies 12
Crowd-sourced Traffic
Streaming sensor data + long term mainten...
®
© 2016 MapR Technologies 13®
© 2016 MapR Technologies 13
Streaming	
  is	
  mainstream	
  
®
© 2016 MapR Technologies 14®
© 2016 MapR Technologies 14
Web-based Business
A: Real-time insights from
low latency appli...
®
© 2016 MapR Technologies 15®
© 2016 MapR Technologies 15
Web-based Business
A: Real-time insights from
low latency appli...
®
© 2016 MapR Technologies 16®
© 2016 MapR Technologies 16
Streaming data has value beyond
real-time insights
®
© 2016 MapR Technologies 17®
© 2016 MapR Technologies 17
Web-based Business
A: Real-time insights from
low latency appli...
®
© 2016 MapR Technologies 18®
© 2016 MapR Technologies 18
At the heart of an effective
streaming architecture is the
righ...
®
© 2016 MapR Technologies 19®
© 2016 MapR Technologies 19
Message Stream Transport
Apache Kafka
or
MapR Streams
Others
®
© 2016 MapR Technologies 20®
© 2016 MapR Technologies 20
Key capabilities
Message Transport Technology: Kafka & MapR Str...
®
© 2016 MapR Technologies 21®
© 2016 MapR Technologies 21
Alert: Pre-conceptions can make you miss new ideas
•  It’s hard...
®
© 2016 MapR Technologies 22®
© 2016 MapR Technologies 22
MapR Streams: Topics, Partitions
•  Data is assigned to topics ...
®
© 2016 MapR Technologies 23®
© 2016 MapR Technologies 23
Stream-first Architecture: Basis for MicroServices
Stream as th...
®
© 2016 MapR Technologies 24®
© 2016 MapR Technologies 24
MapR Streams: Part of MapR Converged Data Platform
Open Source ...
®
© 2016 MapR Technologies 25®
© 2016 MapR Technologies 25
Unique to MapR: Manage topics at Stream level
•  Topics are gro...
®
© 2016 MapR Technologies 26®
© 2016 MapR Technologies 26
MapR Streams:
Geo-distributed replication of
message stream acr...
®
© 2016 MapR Technologies 27®
© 2016 MapR Technologies 27
Multiple Stakeholders: Container Shipping
Image © Ellen Friedma...
®
© 2016 MapR Technologies 28®
© 2016 MapR Technologies 28
MapR Streams replication across data centers
A: Sensors stream ...
®
© 2016 MapR Technologies 29®
© 2016 MapR Technologies 29
MapR Streams: Replication Across Data Centers
What’s the value?...
®
© 2016 MapR Technologies 30®
© 2016 MapR Technologies 30
What about stream processing?
®
© 2016 MapR Technologies 31®
© 2016 MapR Technologies 31
Several good choices for stream processing
•  You choose the to...
®
© 2016 MapR Technologies 32®
© 2016 MapR Technologies 32
Overview: Apache Flink Stream Processing
Figure 2-1 from “Intro...
®
© 2016 MapR Technologies 33®
© 2016 MapR Technologies 33
Overview: Apache Flink
•  Top level Apache project with big int...
®
© 2016 MapR Technologies 34®
© 2016 MapR Technologies 34
Flink is BIG in Europe ;-)
®
© 2016 MapR Technologies 35®
© 2016 MapR Technologies 35
Stream Processing: Compare Choices
“Real-time” event-by-event
p...
®
© 2016 MapR Technologies 36®
© 2016 MapR Technologies 36
Capabilities for Stream Processing Options
Correct
under
stress...
®
© 2016 MapR Technologies 37®
© 2016 MapR Technologies 37
Overview: Apache Flink Windowing
A
B
C
Before:
Windows defined ...
®
© 2016 MapR Technologies 38®
© 2016 MapR Technologies 38
Overview: Apache Flink Event Time
Figure 3-3 from “Introduction...
®
© 2016 MapR Technologies 39®
© 2016 MapR Technologies 39
Overview: Apache Flink Event Time
Stephan Ewen, Apache Flink PM...
®
© 2016 MapR Technologies 40®
© 2016 MapR Technologies 40
Apache Flink: Useful Characteristics
•  Stateful processing & a...
®
© 2016 MapR Technologies 41®
© 2016 MapR Technologies 41
Streaming Resources from MapR (thank you)
Free resource from Ma...
®
© 2016 MapR Technologies 42®
© 2016 MapR Technologies 42
Streaming Resources from MapR (thank you)
Free resource from Ma...
®
© 2016 MapR Technologies 43®
© 2016 MapR Technologies 43
Streaming Resources from MapR (thank you)
Free resource from Ma...
®
© 2016 MapR Technologies 44®
© 2016 MapR Technologies 44
Short Books by Ted Dunning & Ellen Friedman
For sale from Amazo...
®
© 2016 MapR Technologies 45®
© 2016 MapR Technologies 45
Please support women in tech – help build
girls’ dreams of what...
®
© 2016 MapR Technologies 46®
© 2016 MapR Technologies 46
Thank you !
Upcoming SlideShare
Loading in …5
×

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Stream Transport and Processing

Women in big data oct 2016

Related Books

Free with a 30 day trial from Scribd

See all

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Stream Transport and Processing

  1. 1. ® © 2016 MapR Technologies 1® © 2016 MapR Technologies 1© 2016 MapR Technologies ® Streaming Goes Mainstream: Ellen Friedman 12 October 2016 Women in Big Data Meetup #datawomen Transport, Processing & Architecture
  2. 2. ® © 2016 MapR Technologies 2® © 2016 MapR Technologies 2 Contact Information Ellen Friedman Solutions Consultant, MapR Technologies Committer Apache Drill & Apache Mahout projects Author, O’Reilly short books Email ellenf@apache.org efriedman@maprtech.com Twitter @Ellen_Friedman #datawomen
  3. 3. ® © 2016 MapR Technologies 3® © 2016 MapR Technologies 3 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015
  4. 4. ® © 2016 MapR Technologies 4® © 2016 MapR Technologies 4 The  entire  industry  is  undergoing  a   career  change  
  5. 5. ® © 2016 MapR Technologies 5® © 2016 MapR Technologies 5 Big Data has caught on •  Potential value of big data approaches is widely recognized •  Technologies for distributed storage at low cost are maturing •  People are looking for operational and analytical solutions in order to take advantage of large scale data opportunities… •  Now there’s a new form of revolution based on streaming data
  6. 6. ® © 2016 MapR Technologies 6® © 2016 MapR Technologies 6 Why stream?
  7. 7. ® © 2016 MapR Technologies 7® © 2016 MapR Technologies 7 “Our best understanding comes when our conclusions fit the evidence. And that is most effectively done when our analyses fit the way life happens.” - Introduction to Apache Flink Friedman & Tzoumas (O’Reilly Sept 2016)
  8. 8. ® © 2016 MapR Technologies 8® © 2016 MapR Technologies 8 Life doesn’t happen in batches…
  9. 9. ® © 2016 MapR Technologies 9® © 2016 MapR Technologies 9 Images © Friedman & Dunning from O’Reilly book A New Look at Anomaly Detection, used with permission Time Series Data & the IoT Sensors in airplanes not only send data to the ERD (black box) They also report back to manufacturers of “smart parts” such as turbines found in jet engines or wind farms.
  10. 10. ® © 2016 MapR Technologies 10® © 2016 MapR Technologies 10 Big data project: Maury’s Wind and Currents charts - Value from big data in aggregate -  Crowd sourced -  But static: not real time insights
  11. 11. ® © 2016 MapR Technologies 11® © 2016 MapR Technologies 11 Modern big data navigation: WAZE •  Uses real-time streaming traffic & road information shared by 65 million drivers/ month •  Intended to save fuel and time during commute •  Partnered with Esri GSI software to help put data insights to work for cities, states 11 Oct 2016 article in Tech Crunch http://bit.ly/tech-crunch-waze-esri •  Time-value of data often is important “Outsmarting traffic, together” -WAZE website https://www.waze.com/
  12. 12. ® © 2016 MapR Technologies 12® © 2016 MapR Technologies 12 Crowd-sourced Traffic Streaming sensor data + long term maintenance histories ! •  Machine learning model detects anomalous pattern •  Signals need for maintenance before damage occurs Image courtesy Mtell; from Real World Hadoop by Dunning & Friedman ( © 2015) Chap 6
  13. 13. ® © 2016 MapR Technologies 13® © 2016 MapR Technologies 13 Streaming  is  mainstream  
  14. 14. ® © 2016 MapR Technologies 14® © 2016 MapR Technologies 14 Web-based Business A: Real-time insights from low latency applications (update a real-time dashboard) B: Current status updated in databases or search documents (Customer 360) C: Durable messages for auditable history (Security analytics) Real-time dashboards data Archived Customer 360 database Security analytics A B C Messages Logs
  15. 15. ® © 2016 MapR Technologies 15® © 2016 MapR Technologies 15 Web-based Business A: Real-time insights from low latency applications (update a real-time dashboard) B: Current status updated in databases or search documents (Customer 360) C: Durable messages for auditable history (Security analytics) Real-time dashboards data Archived Customer 360 database Security analytics A B C Messages Logs
  16. 16. ® © 2016 MapR Technologies 16® © 2016 MapR Technologies 16 Streaming data has value beyond real-time insights
  17. 17. ® © 2016 MapR Technologies 17® © 2016 MapR Technologies 17 Web-based Business A: Real-time insights from low latency applications (update a real-time dashboard) B: Current status updated in databases or search documents (Customer 360) C: Durable messages for auditable history (Security analytics) Real-time dashboards data Archived Customer 360 database Security analytics A B C Messages Logs
  18. 18. ® © 2016 MapR Technologies 18® © 2016 MapR Technologies 18 At the heart of an effective streaming architecture is the right choice of stream transport.
  19. 19. ® © 2016 MapR Technologies 19® © 2016 MapR Technologies 19 Message Stream Transport Apache Kafka or MapR Streams Others
  20. 20. ® © 2016 MapR Technologies 20® © 2016 MapR Technologies 20 Key capabilities Message Transport Technology: Kafka & MapR Streams ●  Highly scalable ●  High throughput, low latency ●  Decouple multiple producers & consumers ●  Durable messages with configurable time to live ●  Geo-distributed replication (MapR Streams) Consumer group Messages Producer Consumer group Consumer group Producer
  21. 21. ® © 2016 MapR Technologies 21® © 2016 MapR Technologies 21 Alert: Pre-conceptions can make you miss new ideas •  It’s hard to order a coffee if you want mostly milk •  Example: MapR Streams is part of the converged data platform so does not require a separate cluster for message transport (as you would with Kafka) •  Example: Message streams can support microservices “Getting Past Pre-conceptions” http://bit.ly/mapr-blog-ef-17-08
  22. 22. ® © 2016 MapR Technologies 22® © 2016 MapR Technologies 22 MapR Streams: Topics, Partitions •  Data is assigned to topics (as in Kafka) •  Topic can be partitioned for load balancing/ performance (as in Kafka) •  Topic partition is distributed across the MapR cluster (not restricted to one node as in Kafka) –  Makes long-term auditable history practical Producer 2 Producer 1 Topic 1 Consumer 2 Consumer 1 Consumer 3 Consumer group
  23. 23. ® © 2016 MapR Technologies 23® © 2016 MapR Technologies 23 Stream-first Architecture: Basis for MicroServices Stream as the shared “truth” instead of a database Database as local truth POS 1..n Fraud detector Last card use Updater Card analytics Other card activity
  24. 24. ® © 2016 MapR Technologies 24® © 2016 MapR Technologies 24 MapR Streams: Part of MapR Converged Data Platform Open Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Services Dat a Processing Enterprise Storage MapR-FS MapR-DB MapR Streams Database Event Streaming Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy Search & Others Cloud & Managed Services Custom Apps UnifiedManagementand Monitoring MapR Converged Data Platform has distributed files, NoSQL DB & message streams engineered into one technology
  25. 25. ® © 2016 MapR Technologies 25® © 2016 MapR Technologies 25 Unique to MapR: Manage topics at Stream level •  Topics are grouped together in Stream (different from Kafka) •  Policies are set at the Stream level such as time-to-live, ACEs (controlled access at this level is different than Kafka) •  Geo-distributed replication at Stream level (different from Kafka) Stream Topic 1 Topic 3 Topic 2
  26. 26. ® © 2016 MapR Technologies 26® © 2016 MapR Technologies 26 MapR Streams: Geo-distributed replication of message stream across data centers
  27. 27. ® © 2016 MapR Technologies 27® © 2016 MapR Technologies 27 Multiple Stakeholders: Container Shipping Image © Ellen Friedman 2015 Over 20% of world’s shipping containers pass through Singapore’s port.
  28. 28. ® © 2016 MapR Technologies 28® © 2016 MapR Technologies 28 MapR Streams replication across data centers A: Sensors stream data to on- board cluster that reports to onshore cluster while in port B: MapR Streams geo-replication sends data to next port before ship arrives. C: Real-time insights alert to “high humidity” in some containers Singapore Tokyo Sydney Corporate HQ A B C Find details on this use case in Chap 7 of book “Streaming Architecture” Read online here: http://bit.ly/streams-ebook-ch7
  29. 29. ® © 2016 MapR Technologies 29® © 2016 MapR Technologies 29 MapR Streams: Replication Across Data Centers What’s the value? –  Replication across data centers with preserved offsets (unlike Kafka) –  Opens new use cases: –  Example: Shared inventory, as with ad-tech use case Inventory model Global analytics Database Local state Inventory model Local state Data center 1 Data center 2 Central data center
  30. 30. ® © 2016 MapR Technologies 30® © 2016 MapR Technologies 30 What about stream processing?
  31. 31. ® © 2016 MapR Technologies 31® © 2016 MapR Technologies 31 Several good choices for stream processing •  You choose the tool you like for processing streaming data –  MapR ships & supports the full Apache Spark stack including Spark Streaming –  Apache Flink has been benchmarked on MapR with extremely good performance on MapR Streams transport; Flink not yet supported by MapR –  Other good options include Apache Apex (think Data Torrent) & Apache Storm
  32. 32. ® © 2016 MapR Technologies 32® © 2016 MapR Technologies 32 Overview: Apache Flink Stream Processing Figure 2-1 from “Introduction to Apache Flink” book, used with permission. Download free pdf here: http://bit.ly/mapr-intro-flink-book-pdf Kafka / MapR Streams Database File Flink Transport Processing
  33. 33. ® © 2016 MapR Technologies 33® © 2016 MapR Technologies 33 Overview: Apache Flink •  Top level Apache project with big international OSS community •  True stream processing –  Advantage if SLAs require extremely low latency (real-time) –  Good fit to continuous events •  Also works well for batch processing •  Being used in production (telecom; games)
  34. 34. ® © 2016 MapR Technologies 34® © 2016 MapR Technologies 34 Flink is BIG in Europe ;-)
  35. 35. ® © 2016 MapR Technologies 35® © 2016 MapR Technologies 35 Stream Processing: Compare Choices “Real-time” event-by-event processing • Apache Flink • Apache Apex • Apache Storm Not “real-time” processing: micro-batching •  Apache Spark Streaming But latency is just one issue to consider in choosing a stream processing technology…
  36. 36. ® © 2016 MapR Technologies 36® © 2016 MapR Technologies 36 Capabilities for Stream Processing Options Correct under stress Correct time / window semanticsEase of use / expressiveness Flink Streaming High throughput Spark Storm Low latency Figure 1-2 from “Introduction to Apache Flink” book, used with permission. Download free pdf here: http://bit.ly/mapr-intro-flink-book-pdf
  37. 37. ® © 2016 MapR Technologies 37® © 2016 MapR Technologies 37 Overview: Apache Flink Windowing A B C Before: Windows defined by micro-batches (not Flink) A B C Gap Now: Windows defined gap between activity (this is Flink) Figures 3-1 and 3-2 from “Introduction to Apache Flink” book, used with permission. Download free pdf here: http://bit.ly/mapr-intro-flink-book-pdf
  38. 38. ® © 2016 MapR Technologies 38® © 2016 MapR Technologies 38 Overview: Apache Flink Event Time Figure 3-3 from “Introduction to Apache Flink” book, used with permission. Processing time Event time Computation can be based on when data is processed OR When event occurred In many situations, processing by event time provides more accurate results.
  39. 39. ® © 2016 MapR Technologies 39® © 2016 MapR Technologies 39 Overview: Apache Flink Event Time Stephan Ewen, Apache Flink PMC Committer, explaining event time processing option for Flink in a Whiteboard Walkthrough video: http://bit.ly/mapr-whiteboard-walkthrough-flink-event-time When you analyze data by event time, you must take into account that events may arrive delayed or out of order. This is important for use cases in which you want to correlate events.
  40. 40. ® © 2016 MapR Technologies 40® © 2016 MapR Technologies 40 Apache Flink: Useful Characteristics •  Stateful processing & accuracy under stress: Checkpoints •  Windowing options are a good fit to the way natural sessions occur •  Event time option for accurate computation –  See Whiteboard Walkthrough video by Stephan Ewen (PMC member Apache Flink) on event time http://bit.ly/mapr-whiteboard-walkthrough-flink-event-time •  Savepoints let you reprocess data (bug fixes, updates, etc) –  See Whiteboard Walkthrough video by Stephan Ewen on Flink savepoints http://bit.ly/whiteboard-walkthrough-flink-1
  41. 41. ® © 2016 MapR Technologies 41® © 2016 MapR Technologies 41 Streaming Resources from MapR (thank you) Free resource from MapR: book on Apache Spark Download free pdf courtesy of MapR Technologies http://bit.ly/mapr-apache-spark- book-pdf Or read online: http://bit.ly/mapr-apache-spark- ebook
  42. 42. ® © 2016 MapR Technologies 42® © 2016 MapR Technologies 42 Streaming Resources from MapR (thank you) Free resource from MapR: book on stream-1st architecture & message transport Download free pdf courtesy of MapR Technologies http://bit.ly/mapr-streams-ebook Or read online: http://bit.ly/mapr-streaming-data- ebook
  43. 43. ® © 2016 MapR Technologies 43® © 2016 MapR Technologies 43 Streaming Resources from MapR (thank you) Free resource from MapR: book on Apache Flink stream processing Download free pdf courtesy of MapR Technologies http://bit.ly/mapr-intro-flink-book-pdf Or read online: <coming soon> Ellen Friedman & Kostas Tzoumas Introduction toApacheFlink Stream Processing for Real Time and Beyond New ebook by Ellen Friedman and Kostas Tzoumas In this book you’ll learn: · What Apache Flink can do · How it maintains consistency and provides flexibility · How people are using it, including in production · Best practices for streaming architectures Download your copy: mapr.com/flink-book
  44. 44. ® © 2016 MapR Technologies 44® © 2016 MapR Technologies 44 Short Books by Ted Dunning & Ellen Friedman For sale from Amazon or O’Reilly Free pdf download courtesy of MapR www.mapr.com/ebook http://bit.ly/ebook- real-world-hadoop http://bit.ly/mapr- tsdb-ebook http://bit.ly/ ebook-anomaly http://bit.ly/ recommendation -ebook http://bit.ly/mapr- ebook-sharing-data
  45. 45. ® © 2016 MapR Technologies 45® © 2016 MapR Technologies 45 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015
  46. 46. ® © 2016 MapR Technologies 46® © 2016 MapR Technologies 46 Thank you !

    Be the first to comment

    Login to see the comments

  • dev_done

    Nov. 1, 2016
  • mru_ven

    Nov. 3, 2016
  • ellen_friedman

    Jul. 18, 2017
  • caroljmcdonald

    Nov. 3, 2017
  • zealotdog

    Nov. 6, 2017
  • ssuserce170b

    Jan. 28, 2020

Women in big data oct 2016

Views

Total views

1,152

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

101

Shares

0

Comments

0

Likes

6

×