Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ted Dunning - Keynote: How Can We Take Flink Forward?

333 views

Published on

http://flink-forward.org/kb_sessions/keynote-tba/

Apache Flink has come a long way from its academic beginnings. It is now one of the most technically advanced solutions for streaming computation. And batch computation, too. Flink has serious technical advantages when compared with nearly every alternative system.

This success ironically means that Apache Flink is right on the cusp of a critical moment. Over the next few months it will be decided whether Flink is the Next Big Thing or if it is a fine technology with limited impact.

Right now, what you and I do can make a huge difference. But as business people like to say, what got Flink here isn’t what’s going to get it there. The challenges the Flink community faces now are different from the technical challenges it has met so far.

I will talk about what I think will help and how we can all pitch in to take Flink forward.

Published in: Data & Analytics
  • Login to see the comments

Ted Dunning - Keynote: How Can We Take Flink Forward?

  1. 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  2. 2. © 2014 MapR Technologies 2 Me, Us • Ted Dunning, MapR Chief Application Architect, Apache Member – Committer PMC member Zookeeper, Drill, others – Mentor for Flink, Beam (nee Dataflow), Drill, Storm, Zeppelin – VP Incubator – Bought the beer at the first HUG • MapR – Produces first converged platform for big and fast data – Includes data platform (files, streams, tables) + open source – Adds major technology for performance, HA, industry standard API’s • Contact @ted_dunning, ted.dunning@gmail.com, tdunning@mapr.com
  3. 3. © 2014 MapR Technologies 3 Note: I may need to rely on my laryngitis interpreter
  4. 4. © 2014 MapR Technologies 4 New book on Apache Flink Download free pdf courtesy of MapR Technologies mapr.com/flink-book
  5. 5. © 2014 MapR Technologies 5 What is happening now in computing has only happened a few times before
  6. 6. © 2014 MapR Technologies 6 Businesses are changing to become completely digital
  7. 7. © 2014 MapR Technologies 7 That is causing a complete re-implementation of the software that runs the world
  8. 8. © 2014 MapR Technologies 8 Comparable Events in Software • Accounting invented in Sumeria • Indic numerals (including zero) brought to Europe by Arabs • Banking by letter of credit • Open source data • Electronic automation of business processes • SQL and the relational model • The Internet • ?? Whatever it is that is happening now ??
  9. 9. © 2014 MapR Technologies 9 Early Accounting • Most early writing samples were accounting records • This one is from Crete and records grain inventories • Accounting is a major advance because it allows you to abstract the count of a thing from the thing
  10. 10. © 2014 MapR Technologies 10 Letters of Credit • Used by the knights Templar to record deposits to be protected on crusade • Popularized by the Italian banking system in the Renaissance • Destroyed competing systems that required transfer of silver such as the Hansa
  11. 11. © 2014 MapR Technologies 11 Big data project: Maury’s Wind and Currents charts At first, nobody was interested in them… …until Captain Jackson shaved a month off the run from Baltimore to Rio de Janeiro Then everybody wanted one!
  12. 12. © 2014 MapR Technologies 12 What is it that is happening now ?
  13. 13. © 2014 MapR Technologies 13 There is a revolution going on
  14. 14. © 2014 MapR Technologies 14 Companies get more value from our data than we can get from it ourselves
  15. 15. © 2014 MapR Technologies 15 Symbol Company Cap Rank Market Cap on 2/12/16 on 2/12/16 AAPL Apple 1 521.1 GOOGL Alphabet 2 485.9 MSFT Microsoft 3 399.4 XOM Exxon Mobil 4 336.8 BRK-A Berkshire Hathaway 5 318.7 FB Facebook 6 290.3 JNJ Johnson & Johnson 7 281.7 GE General Electric 8 275.4 WFC Wells Fargo 9 240.9 AMZN Amazon.com 10 238.8 How Much Value?
  16. 16. © 2014 MapR Technologies 16 Symbol Company Cap Rank Market Cap on 2/12/16 on 2/12/16 AAPL Apple 1 521.1 GOOGL Alphabet 2 485.9 MSFT Microsoft 3 399.4 XOM Exxon Mobil 4 336.8 BRK-A Berkshire Hathaway 5 318.7 FB Facebook 6 290.3 JNJ Johnson & Johnson 7 281.7 GE General Electric 8 275.4 WFC Wells Fargo 9 240.9 AMZN Amazon.com 10 238.8 How Much Value?
  17. 17. © 2014 MapR Technologies 17 Symbol Company Cap Rank Market Cap on 2/12/16 on 2/12/16 AAPL Apple 1 521.1 GOOGL Alphabet 2 485.9 MSFT Microsoft 3 399.4 XOM Exxon Mobil 4 336.8 BRK-A Berkshire Hathaway 5 318.7 FB Facebook 6 290.3 JNJ Johnson & Johnson 7 281.7 GE General Electric 8 275.4 WFC Wells Fargo 9 240.9 AMZN Amazon.com 10 238.8 How Much Value?
  18. 18. © 2014 MapR Technologies 18 Data has value in the aggregate and in the moment
  19. 19. © 2014 MapR Technologies 19 But we can’t aggregate it ourselves, nor pass it to each other
  20. 20. © 2014 MapR Technologies 20 But we can’t aggregate it ourselves, nor pass it to each other It’s big
  21. 21. © 2014 MapR Technologies 21 What’s Going On? • Revolution in computing A – Big data just works better • Revolution in computing B – The database is not the core • Change in social structure • Change in computing technology – Big three replatforming events (SQL, Internet, streams) • What does it mean to us?
  22. 22. © 2014 MapR Technologies 22 Revolution A Big is better
  23. 23. © 2014 MapR Technologies 23 More Data Beats Better Algorithms, ish BankoandBrill,2001,ScalingtoVeryVeryLargeCorporafor NaturalLanguageDisambiguation Increasing the data size has a much bigger effect than changing algorithm Does not imply big and stupid is best Big and smart is better
  24. 24. © 2014 MapR Technologies 24 Examples of Big Data Advantage • Credit card fraud detection – Data consortium wins therefore data consortium wins • Speech recognition – Siri and others • Image analysis – Can you identify which of 120 species of dog are in the picture? – Real applications coming – Facebook tagging just the start • Digital marketing – Google’s non-ad
  25. 25. © 2014 MapR Technologies 25 Revolution B How to build big systems
  26. 26. © 2014 MapR Technologies 26 Evolution Beyond Massive Monolithic Systems • In monoliths, complexity of mainframe systems led to specialization – Storage – DB – Systems analysis – Programmers – Operations – Data entry • This made n-tier architectures a natural next step
  27. 27. © 2014 MapR Technologies 27 3-tier Architecture Web tier Middle tier Data tier
  28. 28. © 2014 MapR Technologies 28 3-tier Architecture (essence) Web tier Middle tier Data tier
  29. 29. © 2014 MapR Technologies 29 3-tier, in Practice Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier Web tier Middle tier Data tier
  30. 30. © 2014 MapR Technologies 30 Enter micro-services
  31. 31. © 2014 MapR Technologies 31 RPC layer Logic Disk RPC layer Logic Disk RPC layer Logic Disk Start with Service Partitioning
  32. 32. © 2014 MapR Technologies 32 RPC layer Logic Disk RPC layer Logic Disk RPC layer Logic Disk Start with Service Partitioning
  33. 33. © 2014 MapR Technologies 33 RPC layer Logic Disk RPC layer Logic Disk RPC layer Logic Disk Make Systems Opaque
  34. 34. © 2014 MapR Technologies 34 Give Them a Job, and a Way to Communicate Keep it very light-weight!
  35. 35. © 2014 MapR Technologies 35 This is called micro-services
  36. 36. © 2014 MapR Technologies 36 Results Can Be Stunning • Companies who adopted this style are associated with stunning success – Google, Facebook, Netflix (after DVD mail), Amazon, LinkedIn (v. 2) – And a gazillion less well known companies • Companies that did not are associated with … • Of course, this may just be what happens when you hire smart folk – Correlation, causation, et cetera
  37. 37. © 2014 MapR Technologies 37 But … • Much of the discussion talks about RPC (call/response) services • This fine, but limiting • Key idiom is deferred processing – Do something urgently – Queue message to complete later
  38. 38. © 2014 MapR Technologies 38 Sender Receiver Who Has the Ball? Sender wants to send a message
  39. 39. © 2014 MapR Technologies 39 Sender Receiver Who Has the Ball? But the receiver might be indisposed for the moment
  40. 40. © 2014 MapR Technologies 40 Sender Receiver Who Has the Ball? After sending, the sender may exit
  41. 41. © 2014 MapR Technologies 41 Sender Receiver Who Has the Ball? The receiver has returned, but who has the message?
  42. 42. © 2014 MapR Technologies 42 Sender Receiver Who Has the Ball? The message queue must retain the message
  43. 43. © 2014 MapR Technologies 43 For Message Based Services • We need a persistent queue • The number of messages is plausibly very high – Total number of external requests (x 5-10) – Total number of persistence ops (x 2-3) • Millions of messages, GB/s of traffic quite plausible • Moving this to enterprise from startups adds challenges
  44. 44. © 2014 MapR Technologies 44 Summary • Micro-services requires durable, high-performance message queues • These systems don’t just like durable, high performance queues • These systems require durability. And high performance. • Old school queues need not apply
  45. 45. © 2014 MapR Technologies 45 Streaming data is different
  46. 46. © 2014 MapR Technologies 46 Δt tprovisional Input Output Note that the existence of provisional outputs implies we have to handle provisional inputs as well
  47. 47. © 2014 MapR Technologies 47 More Complications • Our latency isn’t the only story • We don’t get data instantly • So we don’t even start with zero latency • In fact, delay is the key problem in flow-based computing
  48. 48. © 2014 MapR Technologies 48 Thought Problem • What is the temperature everywhere on earth – Right now – This is impossible • What was the temperature everywhere on earth an hour ago? – This is hard • What was the temperature everywhere on earth last month? – This is pretty easy • Does this mean we cannot talk about today’s weather?
  49. 49. © 2014 MapR Technologies 49 The Problem of State • The present temperature of Earth may or may not exist • Only the delayed temperature can matter to a practical computation • But computations in different places will see different delays • (promise me you know that I’m not just talking temperature)
  50. 50. © 2014 MapR Technologies 50 Summary • For important problems, we have to represent distributed computations as messages and flows • This isn’t a matter of convenience • The concept of “now” is either dead or dying
  51. 51. © 2014 MapR Technologies 51 Getting stuff done in the real world
  52. 52. © 2014 MapR Technologies 52 Looking forward
  53. 53. © 2014 MapR Technologies 53 by_sender log-synth sort by time replay explode [2] by_recipient query by sender query by recipient 300k/s 300k/s 3M/s real-time tick by_sender Replica for off-line purposes timemark time timemark time Real-time processing [1]
  54. 54. © 2014 MapR Technologies 54 Looking backwards
  55. 55. © 2014 MapR Technologies 55 mySQL Web-site Auth service Upload service Image extractor Transcoder User profiles Search User action logging Recommendation analysis mySQL mySQL Oracle Solr Elastic mySQL mySQL files Video metadata
  56. 56. © 2014 MapR Technologies 56 mySQL Web-site Auth service Upload service Image extractor Transcoder User profiles Search User action logging Recommendation analysis mySQL mySQL Oracle Solr Elastic mySQL mySQL files Video metadata
  57. 57. © 2014 MapR Technologies 57 Upload service Image extractor Transcoder mySQL mySQL files Video metadata
  58. 58. © 2014 MapR Technologies 58 recodesTranscoder Files Upload service Files thumbs Thumbnail extractor uploads Files video adds Video metadata
  59. 59. © 2014 MapR Technologies 59 Micro-service Diagram Upload service Raw files Thumbnail extractor Transcoder Video metadata Video files uploads thumbs recodes Image files
  60. 60. © 2014 MapR Technologies 60 Real World Implications • Messaging must be durable and infrastructural – Can’t depend on sender or receiver actually running • Messages aren’t great for everything – 1TB message? • We need (scalable) files • We need (scalable) tables • We need (scalable) streams • We still should isolate persistence if possible
  61. 61. © 2014 MapR Technologies 61 The Third Replatforming • From 1970-1995 … relational database • From 1991-2005 ... Internet • From 2005-? … flow-based, streaming computing
  62. 62. © 2014 MapR Technologies 62 Where does this go?
  63. 63. © 2014 MapR Technologies 63 General Questions to Ponder • What are the consequences of listening to customers? – Really listening? • We are willing to pay people to listen to us – Did we want that? Are the fears rational? • Will more data, better algorithms lead to a “cuddly” internet?
  64. 64. © 2014 MapR Technologies 64 Will Flink be at the core of this revolution?
  65. 65. © 2014 MapR Technologies 65 Will Flink be at the core of this revolution? It could be
  66. 66. © 2014 MapR Technologies 66 Will Flink be at the core of this revolution? It could be Or not
  67. 67. © 2014 MapR Technologies 67 It really depends on us Everyone here How can we drive adoption?
  68. 68. © 2014 MapR Technologies 68 The Lessons • Flink was built for the future • It is right in the core of these changes happening now • But what got Flink here isn’t enough to get it there • Large-scale production adoption is the key
  69. 69. © 2014 MapR Technologies 69 New book on Apache Flink Download free pdf courtesy of MapR Technologies mapr.com/flink-book
  70. 70. © 2014 MapR Technologies 70 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free signed hard copies at MapR booth at Flink Forward http://bit.ly/mapr-ebook-streams
  71. 71. © 2014 MapR Technologies 71 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 - 2016 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR Download pdfs: mapr.com/ebooks-pdf
  72. 72. © 2014 MapR Technologies 72 Thank You!
  73. 73. © 2014 MapR Technologies 73 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies

×