Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning the basics of Apache NiFi for iot OSS Europe 2020

314 views

Published on

Learning the basics of Apache NiFi for iot OSS Europe 2020

Best practices in using NiFi, Hue, Impala, Kafka, Flink, CDSW for iot applications

Published in: Technology
  • Be the first to comment

Learning the basics of Apache NiFi for iot OSS Europe 2020

  1. 1. Learning the Basics of Apache NiFi for IoT Timothy Spann Principal DataFlow Field Engineer Cloudera #ossummit @PaasDev
  2. 2. #ossummit #lfelc Speaker - Timothy Spann Principal DataFlow Field Engineer @PaasDev DZone Zone Leader and Big Data MVB Princeton NJ Future of Data Meetup https://github.com/tspannhw https://www.datainmotion.dev/
  3. 3. #ossummit #lfelc Future of Data - Princeton (Global via YouTube) @Pa asDe https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  4. 4. #ossummit #lfelc BASICS of APACHE NIFI •A general overview of capabilitiesWhat Is NiFi •Navigating the Apache NiFi canvasGrand Tour •Examples of processing IoT data from edge to consumptionExample IoT Flows
  5. 5. #ossummit #lfelc ● ● ● ● ● ● Learning the Basics of Apache NiFi
  6. 6. #ossummit #lfelc STORAGE LAYER sensors IoT REFERENCE ARCHITECTURE Apache NiFi Apache Kafka DATA SYNDICATION SERVICE BY KAFKA Kafka Topic iot DATA FLOW APPS POWERED BY NIFI Apache Impala Deep Learning & Machine Learning MODEL EXECUTION REST
  7. 7. #ossummit #lfelc End to End Logs Pipeline Routers Databases Firewalls Logs Logs Errors Aggregates Alerts Other data ETL Analytics Enterprise Analysis Real Time Analytics Complexity Reduction Events
  8. 8. #ossummit #lfelc Apache Hue VISUALIZATION SQL and Query Editor & Performance Diagnostics Tool for the Cloudera Data Platform
  9. 9. What is Apache NiFi?
  10. 10. #ossummit #lfelc Apache NiFi ● ● ● ● ● ● ● ●
  11. 11. #ossummit #lfelc Apache NiFi High Level Capabilities • Scale horizontal and vertically • Scale your data flow to millions event/s • Ingest TB to PB of data per day • Adapt to your flow requirements • Back pressure & Dynamic prioritization • Loss tolerant vs guaranteed delivery • Low latency vs high throughput • Secure • SSL, HTTPS, SFTP, etc. • Governance and data provenance • Extensible • Build your own processors and Controller services (providers) • Integrate with external systems (Security, Monitoring, Governance, etc)
  12. 12. #ossummit #lfelc FLOW FILES ARE LIKE HTTP DATA HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’Value: './’ Binary Content * Header Content
  13. 13. #ossummit #lfelc Apache NiFi Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  14. 14. #ossummit #lfelc Provenance/Lineage
  15. 15. #ossummit #lfelc Prioritization • Configure a prioritizer per connection • Determine what is important for your data – time based, arrival order, importance of a data set • Funnel many connections down to a single connection to prioritize across data sets • Develop your own prioritizer if needed
  16. 16. #ossummit #lfelc SQL BASED ROUTING WITH NiFi’s QueryRecord Processor • QueryRecord Processor- Executes a SQL statement against records and writes the results to the flow file content. • CSVReader: Looking up schema from SR, it will converts CSV Records into ProcessRecords • SQL execution via Apache Calcite: execute configured SQL against the ProcessRecords for routing • CSVRecordSetWriter: Converts the result of the query from Process records into CSV for the for the flow file content Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular expressions.
  17. 17. #ossummit #lfelc STATELESS ENGINE • Granular containers per flow • Flows From NiFi Registry https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html bin/nifi.sh stateless RunFromRegistry Continuous --file kafka.json https://github.com/apache/nifi/blob/ea1becac4fc519c54b8b4d21773e68f8da364755/nifi-nar-bundles/nifi-framework-bundle/nifi- framework/nifi-stateless/README.md
  18. 18. #ossummit #lfelc STATELESS ENGINE • See also Parameters • Docker • YARN • Kubernetes (K8) • Stateful NiFi clusters • Apache OpenWhisk (FaaS) https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html {"registryUrl": "http://tspann-mbp15-hw14277:18080", "bucketId": "140b30f0-5a47-4747-9021-19d4fde7f993", "flowId": "0540e1fd-c7ca-46fb-9296-e37632021945", "ssl": { "keystoreFile": "","keystorePass": "","keyPass": "","keystoreType": "", "truststoreFile": "/Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home/lib/sec urity/cacerts", "truststorePass": "changeit", "truststoreType": "JKS" }, "parameters": { "broker" : "4.317.852.100:9092", "topic" : "iot", "group_id" : "nifi-stateless-kafka-consumer", "DestinationDirectory" : "/tmp/nifistateless/output2/", "output_dir": "/Users/tspann/Documents/nifi-1.10.0-SNAPSHOT/logs/output" } } https://github.com/tspannhw/stateless-examples
  19. 19. #ossummit #lfelc PARAMETER CONTEXT • Parameters • Parameter Context https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  20. 20. #ossummit #lfelc PARAMETERS • Parameters • Parameter Context https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  21. 21. #ossummit #lfelc RETRYFLOWFILE • Configurable Retries • Maximum # • Penalties • When to Fail • Reuse Mode https://medium.com/@abdelkrim.hadjidj/apache-nifi-1-10-series-simplifying-error-handling-7de86f130acd
  22. 22. #ossummit #lfelc BACKPRESSURE PREDICTION OrdinaryLeastSquares SimpleRegression Enable analytics feature http://lonnifi.blogspot.com/2019/11/back-pressure-prediction-deep-dive.html?es_id=5233333939 https://youtu.be/Tt8TSlHu7PE
  23. 23. #ossummit #lfelc PARQUET READER AND WRITER • Native Record Processors for Apache Parquet Files! • CSV <-> Parquet • XML <-> Parquet • AVRO <-> Parquet • JSON <-> Parquet • More... https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apac he_7.html
  24. 24. #ossummit #lfelc MANY OTHER FEATURES • Prometheus Reporting Task • Experimental Encrypted content repository • PublishKafka Partition Support • Toolkit module to generate and build Swagger • GeoEnrichIPRecord Processor • Command Line Diagnostics • RocksDB FlowFile Repository • PutBigQueryStreaming Processor • Enhanced DevOps and CD/CI ELT/ETL Lookup Services • DatabaseRecordLookupService • KuduLookupService • HBase_2_ListLookupService
  25. 25. #ossummit #lfelc Scalable and distributed architecture
  26. 26. #ossummit #lfelc NiFi Flow Registry
  27. 27. #ossummit #lfelc Example of NiFi Transformations Data enrichment Enrich events by adding the classification based on the host Use reference lookup table from a CSV file [ {   "time" : ”7845800765",   "host" : ”web-...",   "sourcetype" : ”cpu_resource_usage",   "source" : "...",   "index" : "_metrics",   "meta" : "...",   "event" : "..."}}",   "classification" : internal },  ... [ {   "time" : ”7845800765",   "host" : ”web-...",   "sourcetype" : ”cpu_resource_usage",   "source" : "...",   "index" : "_metrics",   "meta" : "...",   "event" : "..."}}",   "classification" : null },  ...
  28. 28. #ossummit #lfelc INGEST RDBMS TABLES https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Your-Hadoop-DataLake/ta-p/247927 https://community.cloudera.com/t5/Community-Articles/Ingesting-Golden-Gate-Rec ords-From-Apache-Kafka-and/ta-p/247557 https://community.cloudera.com/t5/Community-Articles/Incremental-Fetch-in-NiFi- with-QueryDatabaseTable/ta-p/247073
  29. 29. #ossummit #lfelc EXAMPLE IoT Flows
  30. 30. #ossummit #lfelc IoT Reference Architecture STORAGE LAYER sensors Apache NiFi Apache Kafka DATA SYNDICATION SERVICE BY KAFKA Kafka Topic iot DATA FLOW APPS POWERED BY NIFI Apache Impala Deep Learning & Machine Learning MODEL EXECUTION REST
  31. 31. #ossummit #lfelc Best Practices https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html ● Reduce, Reuse, Recycle. Use Parameters to reuse common modules. ● Put flows, reusable chunks into separate Process Groups. ● Write custom processors if you need new or specialized features ● Use Cloudera supported NiFi Processors ● Use Record Processors everywhere
  32. 32. #ossummit #lfelc Cloudera Communities Got questions? Leverage community.cloudera.com Join our meetup: www.meetup/pro/futureofdata

×