SlideShare a Scribd company logo
1 of 35
Introducing #ApacheNiFi
Saptak Sen [@saptak]
Technical Product Manager, Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
#seascale
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Agenda
• New Data Sources and the Rise of the Internet of Anything
• Introducing: Hortonworks DataFlow powered by Apache NiFi
• Key concepts, architecture, and use cases
• Demo
• Q&A
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
IoAT Data Grows Faster Than We Consume It
Much of the new data
exists in-flight, between
systems and devices as
part of the Internet of
AnythingNEW
TRADITIONAL
The Opportunity
Unlock transformational business value
from a full fidelity of data and analytics
for all data.
Geolocation
Server logs
Files & emails
ERP, CRM, SCM
Traditional Data Sources
Internet of Anything
Sensors
and machines
Clickstream
Social media
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Interconnectedness Demands User Centricity
Changes Organizations into Data Companies
Hortonworks Data Platform
for rich historical insights
from data-at-rest
NEW Hortonworks DataFlow
for securely collecting,
conducting, and curating
data-in-motion while ALSO
driving value for data-at-rest
analytics and use cases
Source: Gartner - Architecture Options for Big Data Analytics on Hadoop, July 2015
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of IoAT & Data Flow
The Data Flow Thing
Process and
Analyze Data
Acquire Data
Store Data
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Global interactions with customers, business partners, and things
spanning different volume, velocity, bandwidth, and latency needs
Realistic View of IoAT and Data Flow
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Meeting IoAT Edge Requirements
GATHE
R
DELIVER
PRIORITIZE
Track from the edge Through to the datacenter
Small Footprints
operate with very little power
Limited Bandwidth
can create high latency
Data Availability
exceeds transmission bandwidth
Data Must Be Secured
throughout its journey
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Acquires Onyara
Turn Internet of Anything Data Into Actionable
Insights
• Onyara is the creator of and key contributor to Apache NiFi,
an open source solution for processing and distributing data.
• Over the past 8 years, Onyara engineers developed the U.S.
government software project called “Niagara Files”, the
precursor to Apache NiFi.
• Apache NiFi was made available as an Apache Incubator
project through the NSA Technology Transfer Program in the
Fall of 2014.
NEW Hortonworks DataFlow offering will
securely and easily collect, conduct and curate
any data, from anything, anywhere.
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The IoAT Data Flow
Hortonworks Data Platform
powered by Apache Hadoop
Hortonworks Data Platform
powered by Apache Hadoop
Enrich
Context
Store Data
and Metadata
Internet
of Anything
Hortonworks DataFlow
powered by Apache NiFi
Perishable
Insights
Historical
Insights
Introducing Hortonworks DataFlow powered by
Apache NiFi
Hortonworks DataFlow and the Hortonworks Data Platform
deliver the industry’s most complete solution for management of Big Data.
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi: Three key concepts
• Manage the flow of information
• Data Provenance
• Secure the control plane and data plane
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi – Key Features
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Common Apache NiFi Use Cases
Predictive Analytics
Ensure the highest value data is captured and available for analysis
Compliance
Gain full transparency into provenance and flow of data
IoT Optimization
Secure, Prioritize, Enrich and Trace data at the edge
Fraud Detection
Move sales transaction data in real time to analyze on demand
Big Data Ingest
Easily and efficiently ingest data into Hadoop
Value Resources
Gain visibility into how data sources are used to determine value
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Architecture
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
Master
NiFi Cluster
Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Slaves
NiFi Nodes
High Availability: Control plane vs Data plane…
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDF – Powered by Apache NiFi
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Add processor for data intake
1 Drag and drop processor icon from the top menu
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Choose the specific processor
2 Choose one of the processors – currently 90 available – designed for extension
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Example: Pick Twitter Processor
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Configure the processor
3 Select processor and
choose option to Configure
4
Adjust
parameters as
required
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Another processor for data output
5 Drag and drop processor icon from the top menu
6 Example: choose PutHDFS processor
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Configure second processor
7 Configure 2nd processor
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Connect processors, configure connection
8
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Click Start to begin processing
9
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
See processors update with real time changes
10
As data flows, GUI interface updates in real
time.
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamically adjust and tune data flow as needed
11 Dynamically adjust and tune dataflow as needed, in
real time. Can also replicate data for testing and
comparison.
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Understand the data path with Data Provenance
14 Select Data Provenance
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Trace lineage of a particular piece of data
15
Icon for Data Lineage
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Every change to data is tracked: processing, views
16
Provenance event is tracked
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Updates as changes happen
17 Updates as data flows
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Easily access and trace changes to dataflow
Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Audit trail of Hortonworks DataFlow User Actions
Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Operations: Planned
Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Q & A
Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

More Related Content

What's hot

MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkJoe Percivall
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteHortonworks
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationIsheeta Sanghi
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with ZeppelinHortonworks
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doHortonworks
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Timothy Spann
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationMatthew Ring
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it DataWorks Summit/Hadoop Summit
 

What's hot (20)

MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
The Elephant in the Clouds
The Elephant in the CloudsThe Elephant in the Clouds
The Elephant in the Clouds
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJDataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
ODPi 101: Who we are, What we do
ODPi 101: Who we are, What we doODPi 101: Who we are, What we do
ODPi 101: Who we are, What we do
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 

Similar to Introduction to Apache NiFi - Seattle Scalability Meetup

Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto MeetupHortonworks
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureMats Johansson
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fiNAVER D2
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitAldrin Piri
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motionRaúl Marín
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方HortonworksJapan
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks
 

Similar to Introduction to Apache NiFi - Seattle Scalability Meetup (20)

Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
 

More from Saptak Sen

Apache Spark with Hortonworks Data Platform - Seattle Meetup
Apache Spark with Hortonworks Data Platform - Seattle MeetupApache Spark with Hortonworks Data Platform - Seattle Meetup
Apache Spark with Hortonworks Data Platform - Seattle MeetupSaptak Sen
 
Data Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your DataData Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your DataSaptak Sen
 
Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Saptak Sen
 
LINQ to HPC: Developing Big Data Applications on Windows HPC Server
LINQ to HPC: Developing Big Data Applications on Windows HPC ServerLINQ to HPC: Developing Big Data Applications on Windows HPC Server
LINQ to HPC: Developing Big Data Applications on Windows HPC ServerSaptak Sen
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Saptak Sen
 
Do You Have Big Data? (Most Likely!)
Do You Have Big Data? (Most Likely!)Do You Have Big Data? (Most Likely!)
Do You Have Big Data? (Most Likely!)Saptak Sen
 
Predictive Analytics with Microsoft Big Data
Predictive Analytics with Microsoft Big DataPredictive Analytics with Microsoft Big Data
Predictive Analytics with Microsoft Big DataSaptak Sen
 
Data Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your DataData Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your DataSaptak Sen
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitSaptak Sen
 

More from Saptak Sen (9)

Apache Spark with Hortonworks Data Platform - Seattle Meetup
Apache Spark with Hortonworks Data Platform - Seattle MeetupApache Spark with Hortonworks Data Platform - Seattle Meetup
Apache Spark with Hortonworks Data Platform - Seattle Meetup
 
Data Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your DataData Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your Data
 
Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and
 
LINQ to HPC: Developing Big Data Applications on Windows HPC Server
LINQ to HPC: Developing Big Data Applications on Windows HPC ServerLINQ to HPC: Developing Big Data Applications on Windows HPC Server
LINQ to HPC: Developing Big Data Applications on Windows HPC Server
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
 
Do You Have Big Data? (Most Likely!)
Do You Have Big Data? (Most Likely!)Do You Have Big Data? (Most Likely!)
Do You Have Big Data? (Most Likely!)
 
Predictive Analytics with Microsoft Big Data
Predictive Analytics with Microsoft Big DataPredictive Analytics with Microsoft Big Data
Predictive Analytics with Microsoft Big Data
 
Data Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your DataData Management in Microsoft HDInsight: How to Move and Store Your Data
Data Management in Microsoft HDInsight: How to Move and Store Your Data
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 

Recently uploaded

2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 

Recently uploaded (20)

2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 

Introduction to Apache NiFi - Seattle Scalability Meetup

  • 1. Introducing #ApacheNiFi Saptak Sen [@saptak] Technical Product Manager, Hortonworks © Hortonworks Inc. 2011 – 2015. All Rights Reserved #seascale
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Agenda • New Data Sources and the Rise of the Internet of Anything • Introducing: Hortonworks DataFlow powered by Apache NiFi • Key concepts, architecture, and use cases • Demo • Q&A
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved IoAT Data Grows Faster Than We Consume It Much of the new data exists in-flight, between systems and devices as part of the Internet of AnythingNEW TRADITIONAL The Opportunity Unlock transformational business value from a full fidelity of data and analytics for all data. Geolocation Server logs Files & emails ERP, CRM, SCM Traditional Data Sources Internet of Anything Sensors and machines Clickstream Social media
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Interconnectedness Demands User Centricity Changes Organizations into Data Companies Hortonworks Data Platform for rich historical insights from data-at-rest NEW Hortonworks DataFlow for securely collecting, conducting, and curating data-in-motion while ALSO driving value for data-at-rest analytics and use cases Source: Gartner - Architecture Options for Big Data Analytics on Hadoop, July 2015
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Simplistic View of IoAT & Data Flow The Data Flow Thing Process and Analyze Data Acquire Data Store Data
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Global interactions with customers, business partners, and things spanning different volume, velocity, bandwidth, and latency needs Realistic View of IoAT and Data Flow
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Meeting IoAT Edge Requirements GATHE R DELIVER PRIORITIZE Track from the edge Through to the datacenter Small Footprints operate with very little power Limited Bandwidth can create high latency Data Availability exceeds transmission bandwidth Data Must Be Secured throughout its journey
  • 8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hortonworks Acquires Onyara Turn Internet of Anything Data Into Actionable Insights • Onyara is the creator of and key contributor to Apache NiFi, an open source solution for processing and distributing data. • Over the past 8 years, Onyara engineers developed the U.S. government software project called “Niagara Files”, the precursor to Apache NiFi. • Apache NiFi was made available as an Apache Incubator project through the NSA Technology Transfer Program in the Fall of 2014. NEW Hortonworks DataFlow offering will securely and easily collect, conduct and curate any data, from anything, anywhere.
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved The IoAT Data Flow Hortonworks Data Platform powered by Apache Hadoop Hortonworks Data Platform powered by Apache Hadoop Enrich Context Store Data and Metadata Internet of Anything Hortonworks DataFlow powered by Apache NiFi Perishable Insights Historical Insights Introducing Hortonworks DataFlow powered by Apache NiFi Hortonworks DataFlow and the Hortonworks Data Platform deliver the industry’s most complete solution for management of Big Data.
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache NiFi: Three key concepts • Manage the flow of information • Data Provenance • Secure the control plane and data plane
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache NiFi – Key Features • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Common Apache NiFi Use Cases Predictive Analytics Ensure the highest value data is captured and available for analysis Compliance Gain full transparency into provenance and flow of data IoT Optimization Secure, Prioritize, Enrich and Trace data at the edge Fraud Detection Move sales transaction data in real time to analyze on demand Big Data Ingest Easily and efficiently ingest data into Hadoop Value Resources Gain visibility into how data sources are used to determine value
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Architecture OS/Host JVM NiFi Cluster Manager – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes High Availability: Control plane vs Data plane…
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDF – Powered by Apache NiFi
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Add processor for data intake 1 Drag and drop processor icon from the top menu
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Choose the specific processor 2 Choose one of the processors – currently 90 available – designed for extension
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Example: Pick Twitter Processor
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Configure the processor 3 Select processor and choose option to Configure 4 Adjust parameters as required
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Another processor for data output 5 Drag and drop processor icon from the top menu 6 Example: choose PutHDFS processor
  • 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Configure second processor 7 Configure 2nd processor
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Connect processors, configure connection 8
  • 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Click Start to begin processing 9
  • 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved See processors update with real time changes 10 As data flows, GUI interface updates in real time.
  • 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamically adjust and tune data flow as needed 11 Dynamically adjust and tune dataflow as needed, in real time. Can also replicate data for testing and comparison.
  • 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Understand the data path with Data Provenance 14 Select Data Provenance
  • 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Trace lineage of a particular piece of data 15 Icon for Data Lineage
  • 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Every change to data is tracked: processing, views 16 Provenance event is tracked
  • 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Updates as changes happen 17 Updates as data flows
  • 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Easily access and trace changes to dataflow
  • 30. Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Audit trail of Hortonworks DataFlow User Actions
  • 31. Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 32. Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Operations: Planned
  • 33. Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 34. Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 35. Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Q & A Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Editor's Notes

  1. TALK TRACK The emergence and explosion from the Internet of Anything data puts tremendous pressure on the existing platforms.   Exponential Growth. As of 2014 there was an estimated 4ZB of data across the cybersphere, and that is expected to grow to 44ZB by 2020, with 85% of this data growth coming from newer types of data from sources like sensors and machines, geo-location tracking devices, server logs, clickstreams, social media or emails and shared files. Variable structures. The incoming data is often unstructured, or its structure changes too frequently for reliable schema creation at time of ingest. Low Value Per Unit, but High in Aggregate. The incoming data can have little or no value as individual, or small groups of, records. But at high volumes and with longer retention horizons, the enterprise can find previously unknown patterns. Advanced analytic applications turn these new insights into business value.   This insight is transforming business outcomes in every major industry, but to participate in that transformation, companies must first ingest that new data into an analytic platform.   [NEXT SLIDE]
  2. TALK TRACK The IoAT data edges created specific data flow requirements that Hortonworks DataFlow satisfies: Edges with small footprints operate with very little power Limited bandwidth and high latency are commonplace Data availability often exceeds transmission bandwidth Data must be secured throughout its journey [NEXT SLIDE]
  3. What is the announcement? Hortonworks has signed a definitive agreement to acquire Onyara, including the Onyara products and team of engineers developing and supporting their products. The new Hortonworks DataFlow powered by Apache NiFi, an open source project based on technology that has been in development at the NSA as “Niagara Files” for the last 8 years, is complementary to the Hortonworks Data Platform. With this acquisition, customers will be able to securely and easily collect, conduct and curate any type of data from any origin with the new Hortonworks DataFlow offering. Traditional Data at rest as well as real time data in motion can now be blended to provide historical and perishable insights for predictive analytic. What is the rationale behind the acquisition? As more and more data is generated from every possible source (machines, sensors, IoT, streaming, social, etc) Hortonworks capitalized on the opportunity to acquire key technology to augment and complement the Hortonworks Data Platform. Onyara, a spin out of the NSA Technology Transfer Program, has contributed and developed Apache NiFi over the last 8 years and have created a compelling set of tools to collect, conduct, and curate data. The new Hortonworks DataFlow powered by Apache NiFi provides the ability for more data to be delivered into the Hortonworks Data Platform and delivers full fidelity analytics on all data for every Hortonworks customer. Onyara’s employees, technology and products are complementary to Hortonworks’. With this acquisition, Hortonworks will be positioned as a leader in IoAT and Big Data with the Hortonworks DataFlow and Hortonworks Data Platform.
  4. Focus on predictive analytics case – use the uptake/cat/etc.. Case but generified.
  5. Introduce the architecture of NiFi, describe major system components, and describe the single node and clustering models. For each component describe its available (and potential)deployment models (relate it to Hadoop). Focus on the two deployment models (single node & cluster) roughly think of this as ‘edge’ vs ‘data center’
  6. Questions?