FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Learning the basics of Apache NiFi for iot OSS Europe 2020
1. Learning the Basics of
Apache NiFi for IoT
Timothy Spann
Principal DataFlow Field Engineer
Cloudera
#ossummit @PaasDev
2. #ossummit #lfelc
Speaker - Timothy Spann
Principal DataFlow Field Engineer
@PaasDev
DZone Zone Leader and Big Data MVB
Princeton NJ Future of Data Meetup
https://github.com/tspannhw
https://www.datainmotion.dev/
3. #ossummit #lfelc
Future of Data - Princeton (Global via YouTube)
@Pa
asDe
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
4. #ossummit #lfelc
BASICS of APACHE NIFI
•A general overview of capabilitiesWhat Is NiFi
•Navigating the Apache NiFi canvasGrand Tour
•Examples of processing IoT data from edge to consumptionExample IoT Flows
6. #ossummit #lfelc
STORAGE LAYER
sensors
IoT REFERENCE ARCHITECTURE
Apache NiFi
Apache Kafka
DATA SYNDICATION
SERVICE BY KAFKA
Kafka Topic
iot
DATA FLOW APPS
POWERED BY NIFI
Apache Impala
Deep Learning & Machine
Learning
MODEL EXECUTION
REST
7. #ossummit #lfelc
End to End Logs Pipeline
Routers
Databases
Firewalls
Logs
Logs
Errors
Aggregates
Alerts
Other data
ETL
Analytics
Enterprise Analysis Real Time Analytics
Complexity Reduction
Events
11. #ossummit #lfelc
Apache NiFi High Level Capabilities
• Scale horizontal and vertically
• Scale your data flow to millions event/s
• Ingest TB to PB of data per day
• Adapt to your flow requirements
• Back pressure & Dynamic prioritization
• Loss tolerant vs guaranteed delivery
• Low latency vs high throughput
• Secure
• SSL, HTTPS, SFTP, etc.
• Governance and data provenance
• Extensible
• Build your own processors and Controller services (providers)
• Integrate with external systems (Security, Monitoring, Governance, etc)
12. #ossummit #lfelc
FLOW FILES ARE LIKE HTTP DATA
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT
2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT
2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’Value: './’
Binary Content *
Header
Content
13. #ossummit #lfelc
Apache NiFi
Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud,
data center) to any downstream system with built in end-to-end security and provenance
• Over 300 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
• Guaranteed Delivery
• Full data provenance
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
15. #ossummit #lfelc
Prioritization
• Configure a prioritizer per
connection
• Determine what is important for
your data – time based, arrival
order, importance of a data set
• Funnel many connections down
to a single connection to
prioritize across data sets
• Develop your own prioritizer if
needed
16. #ossummit #lfelc
SQL BASED ROUTING WITH NiFi’s QueryRecord Processor
• QueryRecord Processor- Executes a SQL
statement against records and writes the results
to the flow file content.
• CSVReader: Looking up schema from SR, it will
converts CSV Records into ProcessRecords
• SQL execution via Apache Calcite: execute
configured SQL against the ProcessRecords for
routing
• CSVRecordSetWriter: Converts the result of
the query from Process records into CSV for the
for the flow file content
Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular expressions.