Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloudera streaming with flink oct 29, 2020 meetup london

317 views

Published on

Cloudera streaming with flink oct 29, 2020 meetup london
Future of Data: London presents: Cloudera Streaming with Flink

Published in: Technology
  • Be the first to comment

Cloudera streaming with flink oct 29, 2020 meetup london

  1. 1. Future of Data: London presents: Cloudera Streaming with Flink Timothy Spann - Principal DataFlow Field Engineer
  2. 2. © 2020 Cloudera, Inc. All rights reserved. 2 More Information Cloudera’s Next Presentation: Validating a Jet Engine Predictive Model in a Cloud Environment Wednesday, November 18, 2020 11:00 AM to 12:30 PM CST https://www.meetup.com/futureofdata-austin/events/273929240/
  3. 3. © 2020 Cloudera, Inc. All rights reserved. 4 Welcome to Future of Data - Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  4. 4. © 2020 Cloudera, Inc. All rights reserved. 5 Today’s Lead Who am I? Data in Motion Field Engineer @PaasDev DZone Zone Leader and Big Data MVB Princeton NJ Future of Data Meetup ex-Pivotal Field Engineer https://github.com/tspannhw https://www.datainmotion.dev/ Speaking at Flink Forward, AI Dev World, Nethope and OSS.
  5. 5. © 2020 Cloudera, Inc. All rights reserved. 6 Yes, Franz, It’s Kafka Let’s do a metamorphosis on your data. Don’t fear changing data. You don’t need to be a brilliant writer to stream data. Franz Kafka was a German-speaking Bohemian novelist and short-story writer, widely regarded as one of the major figures of 20th-century literature. His work fuses elements of realism and the fantastic. Wikipedia
  6. 6. © 2019 Cloudera, Inc. All rights reserved. 7 Apache Kafka • Highly reliable distributed messaging system • Decouple applications, enables many-to-many patterns • Publish-Subscribe semantics • Horizontal scalability • Efficient implementation to operate at speed with big data volumes • Organized by topic to support several use cases Source System Source System Source System Kafka Fraud Detection Security Systems Real-Time Monitoring Source System Source System Source System Fraud Detection Security Systems Real-Time Monitoring Many-To-Many Publish-Subscribe Point-To-Point Request-Response
  7. 7. © 2020 Cloudera, Inc. All rights reserved. 8 Weather Streaming Pipeline Weather Weather Climate Aggregates Sensors SQL Analytics Sources Pollution
  8. 8. © 2020 Cloudera, Inc. All rights reserved. 9 Weather Streaming Pipeline
  9. 9. © 2020 Cloudera, Inc. All rights reserved. 10 Weather Streaming Pipeline
  10. 10. https://dzone.com/articles/lets-build-a-simple-ingest-to-cloud-data-warehouse
  11. 11. © 2020 Cloudera, Inc. All rights reserved. 12 Simplifying the User Experience
  12. 12. © 2020 Cloudera, Inc. All rights reserved. 13 DATA-IN-MOTION COMPONENTS IN CONTEXT
  13. 13. © 2019 Cloudera, Inc. All rights reserved. 14 SDX ENABLES GOVERNANCE FOR THE ENTIRE DATA LIFECYCLE CDF CDE CML Data Engineering Transformations Hive Table Fraud ML Project Fraud Model Training NiFi Processors Fraud Model Running in Prod
  14. 14. End-to-End Streaming with Cloudera DataFlow
  15. 15. © 2020 Cloudera, Inc. All rights reserved. 16 CLOUDERA FLOW AND EDGE MANAGEMENT Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance Advanced tooling to industrialize flow development (Flow Development Life Cycle) ACQUIRE • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG PROCESS HASH MERGE EXTRACT DUPLICATE SPLIT ENCRYPT TALL EVALUATE EXECUTE GEOENRICH SCAN REPLACE TRANSLATE CONVERT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT ROUTE RATE DISTRIBUTE LOAD DELIVER • Guaranteed Delivery • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG
  16. 16. © 2020 Cloudera, Inc. All rights reserved. 17 Processing one billion events per second with NiFi https://blog.cloudera.com/benchmarking-nifi-performance-and-scalability/
  17. 17. © 2020 Cloudera, Inc. All rights reserved. 18 New features delivered with Cloudera Streaming Analytics (CSA) 1.2 Next Generation Streaming Analytics Flink SQL Support Agile Streaming App Development using SQL New Flink Atlas Hook Capture operational Flink app metadata and lineage Single View of Flink Yarn Jobs Improve Developer Experience & operational visibility
  18. 18. © 2020 Cloudera, Inc. All rights reserved. 19 ● https://www.cloudera.com/tutorials/building-a-sentiment-analysis-application/3.html ● https://blog.cloudera.com/integrating-machine-learning-models-into-your-big-data-pipeli nes-in-real-time-with-no-coding/ ● https://community.cloudera.com/t5/Community-Articles/Processing-Real-Time-Social-Me dia-Twitter-with-Apache-NiFi-1/ta-p/248354 ● https://www.datainmotion.dev/2020/09/using-djlai-for-deep-learning-based.html ● https://github.com/tspannhw/airline-sentiment-streaming ● https://github.com/tspannhw/nifi-corenlp-processor ● https://github.com/tspannhw/nifi-cdsw ● https://github.com/tspannhw/nifi-djlsentimentanalysis-processor ● https://www.datainmotion.dev/2020/04/harnessing-data-lifecycle-for-customer.html REFERENCES
  19. 19. © 2020 Cloudera, Inc. All rights reserved. 20 TH N Y U

×