Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)

249 views

Published on

Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)

ntroducing the FLaNK stack which combines Apache Flink, Apache NiFi, Apache Kafka and Apache Kudu to build fast applications for IoT, AI, rapid ingest.

FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

https://www.flankstack.dev/

Tools
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS

References
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Track
Community and Industry Impact

Published in: Technology
  • Be the first to comment

Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)

  1. 1. Using The FLaNK Stack for Edge AI Timothy Spann Principal DataFlow Field Engineer Cloudera @PaasDev
  2. 2. 2© 2020 Cloudera, Inc. All rights reserved.
  3. 3. © 2020 Cloudera, Inc. All rights reserved. 3 FLaNK Speaker Who am I? Principal DataFlow Field Engineer @PaasDev DZone Zone Leader and Big Data MVB Future of Data Meetup Leader ex-Pivotal Field Engineer https://github.com/tspannhw https://www.datainmotion.dev/
  4. 4. © 2020 Cloudera, Inc. All rights reserved. 4 Welcome to Future of Data - Princeton - Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  5. 5. © 2020 Cloudera, Inc. All rights reserved. 5 Cloudera Commitment to the Community Keynote @ Flink Forward 2020 San Francisco https://www.youtube.com/watch?v=ckcOyRA6ZOc ● We support Flink deployments both in the public cloud and on premise ● 2 PMC members onboard ● Multiple service integrations: Schema Registry HBase Kudu Atlas Knox ● Regular blog posts, talks and Apache code contributions
  6. 6. 6 Analyze Streaming OLAP Analytics & Time Series Store Powered by Druid & Kudu Buffer Apache Kafka Topics Ingest Gateway Powered by Kafka Distribute Apache NiFi Data Flow Apps Powered by NiFi Buffer Apache Kafka Syndicate topics Syndicate Services Powered by Kafka Collect Syndicate topics Syndicate Services Powered by Kafka Replication / Data Deployment Analyze Streaming Analytics Apps Stream Processing Powered by Flink Demo Reference Architecture Data Collection at the Edge Apache NiFi / MiNiFi - sensors, IoT - databases - file systems - app sidecar - live streams - MQ - logs - network Anything… you name it!
  7. 7. © 2020 Cloudera, Inc. All rights reserved. 7 THE COMPLETE AND CONNECTED DATA LIFECYCLE Collect Edge & Flow Management ActData-in- Motion Curate Data Engineering Report Data Warehouse Serve Operational Database Predict Machine Learning and AI Data-at- Rest A Connected Data Lifecycle is Critical to Meet the Needs of Real-time Use CasesPOWERED BY Distribute Flow Management Buffer Streams Messaging Analyze Streaming Analytics Enrichment Operational insights ScoringBatch curation
  8. 8. © 2020 Cloudera, Inc. All rights reserved. 8 End-to-End Schema and Lineage Management
  9. 9. © 2020 Cloudera, Inc. All rights reserved. 9 Streaming Data Pipelines with NiFi + Kafka + Flink
  10. 10. © 2020 Cloudera, Inc. All rights reserved. 10 CLOUDERA DATAFLOW DATA-IN-MOTION PLATFORM
  11. 11. 11 Where Can I Run FLaNK Easily? CDP services are optimized for the elastic compute & ‘always-on’ storage services provided by any cloud provider Web service hosted and managed by Cloudera Hosted in the your cloud environment, but managed by the CDP Management Console Shared Data Experience (SDX) technologies form a secure and governed data lake backed by object storage (S3, ADLS, GCS) Flow Management Streams Messaging Streaming Analytics
  12. 12. 12 What is NiFi used for?
  13. 13. © 2020 Cloudera, Inc. All rights reserved. 13 Cloudera Flow Management Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance ACQUIRE PROCESS DELIVER • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle)
  14. 14. © 2020 Cloudera, Inc. All rights reserved. 14 https://blog.cloudera.com/benchmarking-nifi-performance-and-scalability/ NiFi Processing Billions of Events
  15. 15. © 2020 Cloudera, Inc. All rights reserved. 15 STREAMS MESSAGING / APACHE KAFKA Kafka Connect Support Simple Data Movement In/Out of Kafka Schema Registry Ranger Plugin Improved ACL and Audit for Kafka and Schema Registry Cruise Control Support Intelligent Kafka Cluster Rebalancing & Self Healing
  16. 16. © 2020 Cloudera, Inc. All rights reserved. 16 Key Capabilities STREAMING ANALYTICS / APACHE FLINK Flink SQL Support Agile Streaming App Development using SQL Apache Flink Atlas Hook Capture operational Flink app metadata and lineage Single View of Flink Yarn Jobs Improve Developer Experience & operational visibility
  17. 17. Demo
  18. 18. 18 Edge AI to Cloud Streaming Pipeline Device Data SensorsEnergy Logs Weather Sensors Aggregates Energy SQL Analytics MiNiFi Agent Deep Learning Classification Edge Private Cloud Multi-Public Cloud
  19. 19. If You Missed This Live
  20. 20. MiNiFi Agents
  21. 21. © 2020 Cloudera, Inc. All rights reserved. 21 MiNiFi Java Agent ● Reads Sensor Logs ● OpenVino NCC2 AI ● Reads Images ● Sends to NiFi Gateway Apache NiFi Gateway processors, validates, transforms, cleans, routes and streams events for additional processing through Apache Kafka topics. MiNiFi Agents Running Deep Learning Classification And Sending Images and Results to NiFi Gateways
  22. 22. 22© 2020 Cloudera, Inc. All rights reserved. FLINK SQL DEMO - DML - CATALOG AND TABLES SHOW catalogs; USE CATALOG registry; SHOW tables; SELECT * FROM energy;
  23. 23. © 2020 Cloudera, Inc. All rights reserved. 23 INSERT INTO global_sensor_events SELECT scada.uuid,scada.systemtime,scada.temperaturef, scada.pressure,scada.humidity,scada.lux,scada.proximity, scada.oxidising,scada.reducing,scada.nh3,scada.gasko, energy.`current`,energy.voltage,energy.`power`, energy.`total`,energy.fanstatus FROM energy, scada WHERE scada.systemtime = energy.systemtime; WHERE IS THE FLINK CODE!??!
  24. 24. 24© 2020 Cloudera, Inc. All rights reserved. FLINK SQL DEMO
  25. 25. 25© 2020 Cloudera, Inc. All rights reserved.
  26. 26. 26© 2020 Cloudera, Inc. All rights reserved. FLINK SQL DEMO - FLINK DASHBOARD
  27. 27. 27© 2020 Cloudera, Inc. All rights reserved. FLINK SQL DEMO
  28. 28. 28© 2020 Cloudera, Inc. All rights reserved. FLINK SQL DEMO
  29. 29. © 2020 Cloudera, Inc. All rights reserved. 29 {"uuid": "rpi4_uuid_jfx_20200826203733", "amplitude100": 1.2, "amplitude500": 0.6, "amplitude1000": 0.3, "lownoise": 0.6, "midnoise": 0.2, "highnoise": 0.2, "amps": 0.3, "ipaddress": "192.168.1.76", "host": "rp4", "host_name": "rp4", "macaddress": "6e:37:12:08:63:e1", "systemtime": "08/26/2020 16:37:34", "endtime": "1598474254.75", "runtime": "28179.03", "starttime": "08/26/2020 08:47:54", "cpu": 48.3, "cpu_temp": "72.0", "diskusage": "40219.3 MB", "memory": 24.3, "id": "20200826203733_28ce9520-6832-4f80-b17d-f36c21fd8fc9", "temperature": "47.2", "adjtemp": "35.8", "adjtempf": "76.4", "temperaturef": "97.0", "pressure": 1010.0, "humidity": 8.3, "lux": 67.4, "proximity": 0, "oxidising": 77.9, "reducing": 184.6, "nh3": 144.7, "gasKO": "Oxidising: 77913.04 OhmsnReducing: 184625.00 OhmsnNH3: 144651.47 Ohms"} SHOW ME THE DATA
  30. 30. © 2020 Cloudera, Inc. All rights reserved. 30 BME280 - temperature, pressure, humidity sensor LTR-559 - light and proximity sensor MICS6814 - analog gas sensor ADS1015 ADC MEMS - microphone 0.96-inch, 160 x 80 color LCD WHERE DID THAT DATA COME FROM?
  31. 31. © 2020 Cloudera, Inc. All rights reserved. 31
  32. 32. © 2020 Cloudera, Inc. All rights reserved. 32
  33. 33. © 2020 Cloudera, Inc. All rights reserved. 33
  34. 34. © 2020 Cloudera, Inc. All rights reserved. 34
  35. 35. © 2020 Cloudera, Inc. All rights reserved. 35
  36. 36. © 2020 Cloudera, Inc. All rights reserved. 36
  37. 37. © 2020 Cloudera, Inc. All rights reserved. 37
  38. 38. Learn More
  39. 39. 39© 2020 Cloudera, Inc. All rights reserved. DEMO SOURCE CODE ● https://github.com/tspannhw/FlinkForwardGlobal2020 ● https://github.com/tspannhw/ApacheConAtHome2020 ● https://github.com/tspannhw/minifi-xaviernx ● https://github.com/tspannhw/minifi-jetson-nano ● https://github.com/tspannhw/minifi-enviroplus The code, build scripts, schemas, table DDL, Flink SQL, Kafka Connect configuration, NiFi flows, HBase tables, Kudu tables, Hive tables, HDFS directories, alerts, images, HTML, docs, links and all the goodies are here. Please fork and contribute.
  40. 40. 40© 2020 Cloudera, Inc. All rights reserved. FLINK SQL LINKS ● https://github.com/cloudera/flink-tutorials ● https://docs.cloudera.com/csa/1.1.0/job-lifecycle/topics/csa-run-job.html ● https://github.com/asdaraujo/edge2ai-workshop/tree/master/flink ● https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html ● https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sqlClient.html ● https://github.com/tspannhw/FlinkSQLDemo ● https://www.datainmotion.dev/2020/05/flank-low-code-streaming-populating.html ● https://github.com/tspannhw/meetup-sensors/tree/master/flink-sql ● https://www.datainmotion.dev/2020/05/flink-sql-preview.html
  41. 41. 41© 2020 Cloudera, Inc. All rights reserved. Flink SQL LINKS ● https://docs.cloudera.com/csa/1.2.0/flink-sql-table-api/topics/csa-kafka-regi stry-avro.html ● https://github.com/tspannhw/ApacheConAtHome2020/ ● https://github.com/tspannhw/FlinkSQLDemo ● https://github.com/tspannhw/meetup-sensors/tree/main/flink-sql ● https://github.com/cloudera/flink-tutorials/tree/master/flink-sql-tutorial
  42. 42. © 2020 Cloudera, Inc. All rights reserved. 42 TH N Y U

×