SlideShare a Scribd company logo
1 of 50
Best practices and lessons learnt
from Running NiFi at Renault
Kamelia Benchekroun- Big Data Architect
Renault
Abdelkrim Hadjidj - Solution Engineer
Hortonworks
2
Agenda
 The NiFi journey at at Renault
 Best practices for running NiFi in production
 Lessons learnt at Renault
 Questions & answers
3
The NiFi journey at
Renault
4
About Us
Platform Design
Deploy and Automate
Capture and Analyze
Raw Logs and Machine
Data
Improve
Reliability
Speed
Operations
Monitor and
React
Datalake Squad Activities
The Datalake Squad:
• A passionate Team who work really hard to
deliver solutions and services and empower
Data Initiatives at Renault
Datalake Platform Metrics:
450
Connected
Users
30
Data Initiatives
and use cases
8000
Daily
Queries
3500
Service
Requests
300TB
Efficient
Storage
5
Data Lake at Renault – the beginning of the story
CFT
HDP Admin team
DataLake Renault
Node 1 …. Node 16
June 2016
- 10 Users
- Requests by emails, 1 Admin
- Manual provisioning
NFS-GWExports
6
Big Data Ingestion
7
HDF came in (Q1 2017)
Data Flow Platform
Node 1 …. Node 5
HDF Admin team HDP Admin team
DataLake Renault
Node 1 …. Node 40
2016
- 10 Users
- Requests by emails, 1 Admin
- Manual provisioning
2017
- 200 users
- Jira portal, Admins/devops
- NiFi, Automation (Jenkins for HDP)
Today
- 300 – 500 active users
- 7000 query per day
- 40 data source in production
8
Transit Zone RAW Zone
DATA
SOURCES
Data Lake
Data Engineers (DL)NIFI
Other Zones
Data Ingestion Process
Data + Triggers
HDFS / KAFKA
Mail to admin if critical isssues
Technical metrics to AMS
Other Zones
Other Zones
9
Data ingestion workflow
9
10
Dev to prod testing
/tmp/devp/${DESTINATION}
S2S
Dev Prod
11
NiFi Value for Renault
90% of data
ingestion since2016
One Platform for
all data sources : Files,
DBs, API, Brokers, etc.
Offload CFT
100 active data
flows, + 2000
processors in
production,.
Accelerate time to
insights, use case
development and
improve monitoring and
governance
Also used export
data from Data Lake
to other
systems/sites/Cloud
Enables new use
cases connected
plants, package
tracking, IoT
12
Packaging Traceability
13
Packaging Traceability
 POC: 2500 Packages running in the Loop
– 1 package lost = 800 € (= IPhone)
 If the solution is generalized : 600k package
 2016 Status
– 400 K€ packaging re-investment due to packaging
losses
 2017 Status
– 100 K€ cardboard in January
 Test Expectations:
– Reduce cardboard costs and packaging losses
– Test LoRa technology in an industrial context and
Renault activities
– Validate operational added Value of LoRa technology
UMCD
Pitesti Meca
UVD
Pitesti (Dacia)
Bursa
Nissan Sunderland NMUK
Maubeuge (MCA)
Cross dock
(Hungaria)
Nissan Barcelone NMISA
Douai
14
Example flow
 Pitesti -> Barcelona (full)
– 3 transports/week
– 1 truck each time
– Lead Time : 6 days
 NMUK -> Pitesti (Empty)
– once a week
– 1 truck each time
– Lead Time : 6 days
– Cross Dock in Strančice (Tchek Rep.)
15
Use case scope
NETWORK & SECURITY
SENSORS
IOT PLATEFORME
OBJECT VISUALISATION
MÉTIER INTELLIGENCE
IOT PLATEFORME
DATA VISUALISATION
Connect
Collect
Innovate
Drive
Analyse
Objenious
Manual
Treatment
2017
Objenious
Manual
Treatment
S1 2018
Renault IT
(DLK +
DFP)
Objenious
S2 2018
Renault IT
(DLK + DFP
+ IS
integration
+ Analytics)
Telecom
WTB
Renault IT
(DLK + DFP
+ IS
integration
+ Analytics)
Sensor
Not in Scope of Objenious offer
16
Renault Data Centers
Access Control List
or
Certificates
Architecture 1
Remote Process Group
over Proxy
POST HTTP LISTEN HTTP OUTPUT PORT
(Passerelle)
17
Renault Data Centers
Architecture 2
Remote Process Group
over Proxy
OUTPUT PORT
MQTT
AMQP
Websocket…
POST HTTPs LISTEN HTTPs
Or
Any other Operator (Worldwide)
(Passerelle)
IOT Platform
- Device management/provisioning
- Network & Security
18
Reduce cloud communication cost by 50%
Tag Data Decoding with NiFi
ExecuteScript Processor
Raw Data out of sensor : ff1401aa
Decoded Data out of sensor : [{"data":{"battery":3.6007,"temperature":20}}]
19
Architecture + Cloud ingestion
Data Flow Platform
Node 1 …. Node 3
HDF Admin team HDP Admin team
DataLake Renault
Node 1 …. Node 40
Data Flow Platform
Node 1 Node 2
S2S outbound
20
Connected plants
21
Connected plants
MiNiFi
Source 1
Source 2
FTP
MQTT
Source 3
OPCUA
Aggregation server
Collect, processing,
ingestion
OPCUA JMS, Kafka, MQTT,
WebSocket, *
S2S (http, security,
HA)
HDFS, Hive, Solr,
Hbase, etc
JMS
Other sources /
protocols
D A T A I N M O T I O N D A T A A T R E S T
Facility Level Plant Level Corporate Level
Governance
&Integration
Security
Operations
Data Access
Data
Management
22
Overview of Plant Assembly Factory UC
23
Sensors
Full HDF use case (NiFi, Kafka and Storm)
24
Connected plants
Site
To
Site
Demo with MQTTool from
Hand to Data Lake !
Handy, MQTT Broker, NiFi
ConsumeMQTT to Kafka,
NIFI consume Kafka to
ElasticSearch and Hive…
25
Screwing tools valladolid data analysis
26
Data Flow Platform
Node 1 …. Node 3
HDF Admin team HDP Admin team
DataLake Renault
Node 1 …. Node 40
Data Flow Platform
Node 1 Node 2
S2S outbound
DFP Usine - ----------
HDF Flow Mgmt – X coeursN
o
d
e
1
N
o
d
e
X
….
DFP Usine - ----------
HDF Flow Mgmt – X coeursN
o
d
e
1
N
o
d
e
X
….
Data Flow Platform
Node 1 Node 2
S2S
Connected plants
Architecture + Connected plants
27
Best practices for
running NiFi in
production
28
Architecture & sizing
29
Typical NiFi Production Cluster Logical View
30
Sizing considerations
 There are baselines for NiFi sizing but NiFi is like a “programming language” !!
 NiFi resources usage depends on used processors but it’s always IO intensive
– Use different disks / volumes for the three repositories : flow file, content & provenance
– For content repository, it’s recommended to have multiple mount points
– For content and provenance repositories, SSD can provide the best performances
– Heap sizing depends on the use case and used processors
– Depending on the workload, we can scale vertically or horizontally
 Default settings are only for getting started. Tune based on your use case.
– Thread pool size : Maximum Timer Driven Thread Count
– Timeouts: nifi.cluster.node.connection/read.timeout, nifi.zookeeper.connect/session.timeout
– Pluggable modules: WAL Provenance Repository
31
Development &
deployment
32
Let’s consider a simple data flow
Public class Project {
private int foo(int x) {
//do something with x -> y
return y;
}
public static void main(String [] args) {
//some code
try {
BufferedReader bufferRead = new
BufferedReader ...;
String data =bufferRead.readLine(
data = foo(data);
System.out.println(data);
} catch ...
}
}
33
Flow development = software development
 Integrated Development Environment
 Algorithm, code, instructions
 Functions (arguments, results)
 Libraries
 …
 Apache NiFi UI
 Flows, processors, funnels
 Process groups (input/out ports)
 Templates
 …
Programing language Apache NiFi
Design Code Test Deploy Monitor Evolve
34
General guideline
 Use separate environments
 No flows at root level: use a PG per
department, BL, project, etc
 Break your flow into process groups
 Use a naming convention, use comments
(labels, comments)
 Use variable when possible
 Organize your projects into three PGs:
ingestion, test & monitoring
 Performance, security and SLA
 Easy to secure, everything in NiFi is
attached to a PG, heritage d
 Easy to test, update & version
 Very useful for development/monitoring.
Ideally use unique names
 Promotion (dev > test > prod), update
 NiFi can generate data for TDD (Test
Driven Dev), can collect/parse logs for
BAM (Business App Monitoring)
Principle Benefits
35
NiFi organization example
36
NiFi organization example
37
Know & use NiFi Design Patterns
Fan IN/OUT
List & Fetch
Attributes
promotion
Extract, Update Attribute
Throttling
ControlRate, expiration
Funneling
RPG, RouteOnAttribute
Error loops
Relations, Counters
etc
38
FDLC: Flow Development LifeCycle
NiFi Cluster
Node 1 Node x
Tests cluster
Registry
..
NiFi Cluster
Node 1 Node x
Production cluster
Registry
..
NiFi Cluster
Node 1 Node x
Dev cluster
Registry
..
Dev Ops chain
NiFi CLI, NipyAPI, Jenkins
1- push
new flow
version
2- get new
version of
the flow 3 & 6 - automated testing / promotion
(naming convention, scheduling, variable
replacement)
4- push
new flow
version
5- get new
version of
the flow
7- push
new flow
version
39
Monitoring
40
NiFi Monitoring
 Is NiFi service running correctly?
 Monitor global system metrics such as
threads, JVM, disk, etc
 Monitor global flow metrics such as
number of flow files sent, received or
queued, processors stopped, etc
 Solutions
– NiFi UI
– Reporting tasks
– Ambari
– Grafana
 Are a particular flow running correctly?
 Monitor per application (flow, PG,
processor) metrics such as number of
flow files, data size, queues, back
pressure, etc
 Solution
– S2S Reporting tasks
– Custom flow developments (integrate
monitoring and reporting in the application
logic)
Service monitoring Applications (Flow) monitoring
41
NiFi UI for service monitoring
42
Tools available for service monitoring
 Bootstrap notifier: send notification when the NiFi starts,
stops or died unexpectedly
– Email/HTTP notification services
 Use reporting tasks: export metrics to your monitoring
solution
– AmbariReportingTask (global, process group)
– MonitorDiskUsage (Flowfile, content, provenance repositories)
– MonitorMemory
 Also, monitor inactivity
– NiFi has a built-in MonitorActivity processor
– To be used with the S2SBulletinReportingTask
– You can use InvokeHTTP to call the reporting Rest API
43
How to achieve granular monitoring?
 Integrate your monitoring logic in the
flow design
– Count data (lines, tables, events, etc)
– Extract business metadata from data (table
names, project name, source directory, etc)
 Handle different types of errors
(connection, format, schema, etc)
 Send extracted KPI to brokers,
dashboards, API, files, etc
Monitoring Driven Development (MDD)
 NIFInception: NiFi can export metrics to
to another cluster or to itself
 Bulletins provide information on errors
 Status provide metrics on usage
 These reports become data and we can
use the power of NiFi to extract our KPI
 Naming convention is key
S2S reporting tasks
44
PutHDFS UpdateAttribute
Monitoring Driven Development
PutHiveQL
PutElasticSearch
UpdateAttribute
…
…
45
NiFInception architecture
NiFi Cluster
Node 1 Node y
Production cluster
S2S Bulletins
S2S Status
Other applications
Monitoring
Alerting
• Generate status and bulletins
• Send them through S2S to a local
input port
• Ingest bulletin and status reports
• Parse reports (JSON) and extract
useful KPI
• Send KPI to a dashboard tool
(AMS, Elastic, etc)
• Use a broker for alerting (Kafka)
• Can ingest logs from other
systems or application
46
NiFInception architecture 2
NiFi Cluster
Node 1 Node x
Production cluster
NiFi Cluster
Node 1 Node y
Monitoring cluster
S2S Bulletins
S2S Status
Other applications
Monitoring
Alerting
• Ingest data
• Generate status and bulletins
• Send them through S2S to a
remote cluster
• Ingest bulletin and status reports
• Parse reports (JSON) and extract
useful KPI
• Send KPI to a dashboard tool
(AMS, Elastic, etc)
• Use a broker for alerting (Kafka)
• Can ingest logs from other
systems or application
47
Lessons learnt at
Renault
48
Recommendations from day to day life
 High availability means several weeks to validate before Go live
 Use Ranger authorizations instead of NIFI build in ACL management
 Too much « if then do else »: do the data flow and do not offload other processing
solutions
 S2S : Minimize the number of RPG and self-RPG. Use attributes and routing.
 Put hive via Two redundant processors
 Discuss with DBA to better handle impacts (Views VS Tables, Open Sockets ..etc).
 Backup flowfile.xml.gz (users.xml et autorizations.xml) using NiFi itself.
49
Recommendations from day to day life
 Success flag can be done trough counters instead of InvokHTTP
 In case of CIFS, do not forget to use AUTO MOUNT for NFS client side on NiFi Servers.
 Check ‘0’ size before transmitting into HDFS (especially for IoT use case).
 Configure a TIMER for when we use FAILURE redirections to avoid back-pressure
scenario.
 Build separate clusters for separate use cases (Real Time with SSD, Batch, etc …)
 CA PKI very useful for internal communications, no need to wait Security teams answers
but need security skills (SSL) .
50
Questions

More Related Content

What's hot

Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityThomas Graf
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018Timothy Spann
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 

What's hot (20)

Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Nifi
NifiNifi
Nifi
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 

Similar to Best practices and lessons learnt from Running NiFi at Renault

Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Splunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
FIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.pptAssadLeo1
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoTJim Haughwout
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backendSebastian Poxhofer
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Servicesdavidemartin
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) Surendar S
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureBruno Cornec
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...Mullaiselvan Mohan
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsGanesan Narayanasamy
 

Similar to Best practices and lessons learnt from Running NiFi at Renault (20)

Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Splunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAP
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
FIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media Server
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Services
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
ION Costa Rica - About the IETF and How to Get Involved
ION Costa Rica - About the IETF and How to Get InvolvedION Costa Rica - About the IETF and How to Get Involved
ION Costa Rica - About the IETF and How to Get Involved
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined Infrastructure
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Best practices and lessons learnt from Running NiFi at Renault

  • 1. Best practices and lessons learnt from Running NiFi at Renault Kamelia Benchekroun- Big Data Architect Renault Abdelkrim Hadjidj - Solution Engineer Hortonworks
  • 2. 2 Agenda  The NiFi journey at at Renault  Best practices for running NiFi in production  Lessons learnt at Renault  Questions & answers
  • 3. 3 The NiFi journey at Renault
  • 4. 4 About Us Platform Design Deploy and Automate Capture and Analyze Raw Logs and Machine Data Improve Reliability Speed Operations Monitor and React Datalake Squad Activities The Datalake Squad: • A passionate Team who work really hard to deliver solutions and services and empower Data Initiatives at Renault Datalake Platform Metrics: 450 Connected Users 30 Data Initiatives and use cases 8000 Daily Queries 3500 Service Requests 300TB Efficient Storage
  • 5. 5 Data Lake at Renault – the beginning of the story CFT HDP Admin team DataLake Renault Node 1 …. Node 16 June 2016 - 10 Users - Requests by emails, 1 Admin - Manual provisioning NFS-GWExports
  • 7. 7 HDF came in (Q1 2017) Data Flow Platform Node 1 …. Node 5 HDF Admin team HDP Admin team DataLake Renault Node 1 …. Node 40 2016 - 10 Users - Requests by emails, 1 Admin - Manual provisioning 2017 - 200 users - Jira portal, Admins/devops - NiFi, Automation (Jenkins for HDP) Today - 300 – 500 active users - 7000 query per day - 40 data source in production
  • 8. 8 Transit Zone RAW Zone DATA SOURCES Data Lake Data Engineers (DL)NIFI Other Zones Data Ingestion Process Data + Triggers HDFS / KAFKA Mail to admin if critical isssues Technical metrics to AMS Other Zones Other Zones
  • 10. 10 Dev to prod testing /tmp/devp/${DESTINATION} S2S Dev Prod
  • 11. 11 NiFi Value for Renault 90% of data ingestion since2016 One Platform for all data sources : Files, DBs, API, Brokers, etc. Offload CFT 100 active data flows, + 2000 processors in production,. Accelerate time to insights, use case development and improve monitoring and governance Also used export data from Data Lake to other systems/sites/Cloud Enables new use cases connected plants, package tracking, IoT
  • 13. 13 Packaging Traceability  POC: 2500 Packages running in the Loop – 1 package lost = 800 € (= IPhone)  If the solution is generalized : 600k package  2016 Status – 400 K€ packaging re-investment due to packaging losses  2017 Status – 100 K€ cardboard in January  Test Expectations: – Reduce cardboard costs and packaging losses – Test LoRa technology in an industrial context and Renault activities – Validate operational added Value of LoRa technology UMCD Pitesti Meca UVD Pitesti (Dacia) Bursa Nissan Sunderland NMUK Maubeuge (MCA) Cross dock (Hungaria) Nissan Barcelone NMISA Douai
  • 14. 14 Example flow  Pitesti -> Barcelona (full) – 3 transports/week – 1 truck each time – Lead Time : 6 days  NMUK -> Pitesti (Empty) – once a week – 1 truck each time – Lead Time : 6 days – Cross Dock in Strančice (Tchek Rep.)
  • 15. 15 Use case scope NETWORK & SECURITY SENSORS IOT PLATEFORME OBJECT VISUALISATION MÉTIER INTELLIGENCE IOT PLATEFORME DATA VISUALISATION Connect Collect Innovate Drive Analyse Objenious Manual Treatment 2017 Objenious Manual Treatment S1 2018 Renault IT (DLK + DFP) Objenious S2 2018 Renault IT (DLK + DFP + IS integration + Analytics) Telecom WTB Renault IT (DLK + DFP + IS integration + Analytics) Sensor Not in Scope of Objenious offer
  • 16. 16 Renault Data Centers Access Control List or Certificates Architecture 1 Remote Process Group over Proxy POST HTTP LISTEN HTTP OUTPUT PORT (Passerelle)
  • 17. 17 Renault Data Centers Architecture 2 Remote Process Group over Proxy OUTPUT PORT MQTT AMQP Websocket… POST HTTPs LISTEN HTTPs Or Any other Operator (Worldwide) (Passerelle) IOT Platform - Device management/provisioning - Network & Security
  • 18. 18 Reduce cloud communication cost by 50% Tag Data Decoding with NiFi ExecuteScript Processor Raw Data out of sensor : ff1401aa Decoded Data out of sensor : [{"data":{"battery":3.6007,"temperature":20}}]
  • 19. 19 Architecture + Cloud ingestion Data Flow Platform Node 1 …. Node 3 HDF Admin team HDP Admin team DataLake Renault Node 1 …. Node 40 Data Flow Platform Node 1 Node 2 S2S outbound
  • 21. 21 Connected plants MiNiFi Source 1 Source 2 FTP MQTT Source 3 OPCUA Aggregation server Collect, processing, ingestion OPCUA JMS, Kafka, MQTT, WebSocket, * S2S (http, security, HA) HDFS, Hive, Solr, Hbase, etc JMS Other sources / protocols D A T A I N M O T I O N D A T A A T R E S T Facility Level Plant Level Corporate Level Governance &Integration Security Operations Data Access Data Management
  • 22. 22 Overview of Plant Assembly Factory UC
  • 23. 23 Sensors Full HDF use case (NiFi, Kafka and Storm)
  • 24. 24 Connected plants Site To Site Demo with MQTTool from Hand to Data Lake ! Handy, MQTT Broker, NiFi ConsumeMQTT to Kafka, NIFI consume Kafka to ElasticSearch and Hive…
  • 26. 26 Data Flow Platform Node 1 …. Node 3 HDF Admin team HDP Admin team DataLake Renault Node 1 …. Node 40 Data Flow Platform Node 1 Node 2 S2S outbound DFP Usine - ---------- HDF Flow Mgmt – X coeursN o d e 1 N o d e X …. DFP Usine - ---------- HDF Flow Mgmt – X coeursN o d e 1 N o d e X …. Data Flow Platform Node 1 Node 2 S2S Connected plants Architecture + Connected plants
  • 27. 27 Best practices for running NiFi in production
  • 29. 29 Typical NiFi Production Cluster Logical View
  • 30. 30 Sizing considerations  There are baselines for NiFi sizing but NiFi is like a “programming language” !!  NiFi resources usage depends on used processors but it’s always IO intensive – Use different disks / volumes for the three repositories : flow file, content & provenance – For content repository, it’s recommended to have multiple mount points – For content and provenance repositories, SSD can provide the best performances – Heap sizing depends on the use case and used processors – Depending on the workload, we can scale vertically or horizontally  Default settings are only for getting started. Tune based on your use case. – Thread pool size : Maximum Timer Driven Thread Count – Timeouts: nifi.cluster.node.connection/read.timeout, nifi.zookeeper.connect/session.timeout – Pluggable modules: WAL Provenance Repository
  • 32. 32 Let’s consider a simple data flow Public class Project { private int foo(int x) { //do something with x -> y return y; } public static void main(String [] args) { //some code try { BufferedReader bufferRead = new BufferedReader ...; String data =bufferRead.readLine( data = foo(data); System.out.println(data); } catch ... } }
  • 33. 33 Flow development = software development  Integrated Development Environment  Algorithm, code, instructions  Functions (arguments, results)  Libraries  …  Apache NiFi UI  Flows, processors, funnels  Process groups (input/out ports)  Templates  … Programing language Apache NiFi Design Code Test Deploy Monitor Evolve
  • 34. 34 General guideline  Use separate environments  No flows at root level: use a PG per department, BL, project, etc  Break your flow into process groups  Use a naming convention, use comments (labels, comments)  Use variable when possible  Organize your projects into three PGs: ingestion, test & monitoring  Performance, security and SLA  Easy to secure, everything in NiFi is attached to a PG, heritage d  Easy to test, update & version  Very useful for development/monitoring. Ideally use unique names  Promotion (dev > test > prod), update  NiFi can generate data for TDD (Test Driven Dev), can collect/parse logs for BAM (Business App Monitoring) Principle Benefits
  • 37. 37 Know & use NiFi Design Patterns Fan IN/OUT List & Fetch Attributes promotion Extract, Update Attribute Throttling ControlRate, expiration Funneling RPG, RouteOnAttribute Error loops Relations, Counters etc
  • 38. 38 FDLC: Flow Development LifeCycle NiFi Cluster Node 1 Node x Tests cluster Registry .. NiFi Cluster Node 1 Node x Production cluster Registry .. NiFi Cluster Node 1 Node x Dev cluster Registry .. Dev Ops chain NiFi CLI, NipyAPI, Jenkins 1- push new flow version 2- get new version of the flow 3 & 6 - automated testing / promotion (naming convention, scheduling, variable replacement) 4- push new flow version 5- get new version of the flow 7- push new flow version
  • 40. 40 NiFi Monitoring  Is NiFi service running correctly?  Monitor global system metrics such as threads, JVM, disk, etc  Monitor global flow metrics such as number of flow files sent, received or queued, processors stopped, etc  Solutions – NiFi UI – Reporting tasks – Ambari – Grafana  Are a particular flow running correctly?  Monitor per application (flow, PG, processor) metrics such as number of flow files, data size, queues, back pressure, etc  Solution – S2S Reporting tasks – Custom flow developments (integrate monitoring and reporting in the application logic) Service monitoring Applications (Flow) monitoring
  • 41. 41 NiFi UI for service monitoring
  • 42. 42 Tools available for service monitoring  Bootstrap notifier: send notification when the NiFi starts, stops or died unexpectedly – Email/HTTP notification services  Use reporting tasks: export metrics to your monitoring solution – AmbariReportingTask (global, process group) – MonitorDiskUsage (Flowfile, content, provenance repositories) – MonitorMemory  Also, monitor inactivity – NiFi has a built-in MonitorActivity processor – To be used with the S2SBulletinReportingTask – You can use InvokeHTTP to call the reporting Rest API
  • 43. 43 How to achieve granular monitoring?  Integrate your monitoring logic in the flow design – Count data (lines, tables, events, etc) – Extract business metadata from data (table names, project name, source directory, etc)  Handle different types of errors (connection, format, schema, etc)  Send extracted KPI to brokers, dashboards, API, files, etc Monitoring Driven Development (MDD)  NIFInception: NiFi can export metrics to to another cluster or to itself  Bulletins provide information on errors  Status provide metrics on usage  These reports become data and we can use the power of NiFi to extract our KPI  Naming convention is key S2S reporting tasks
  • 44. 44 PutHDFS UpdateAttribute Monitoring Driven Development PutHiveQL PutElasticSearch UpdateAttribute … …
  • 45. 45 NiFInception architecture NiFi Cluster Node 1 Node y Production cluster S2S Bulletins S2S Status Other applications Monitoring Alerting • Generate status and bulletins • Send them through S2S to a local input port • Ingest bulletin and status reports • Parse reports (JSON) and extract useful KPI • Send KPI to a dashboard tool (AMS, Elastic, etc) • Use a broker for alerting (Kafka) • Can ingest logs from other systems or application
  • 46. 46 NiFInception architecture 2 NiFi Cluster Node 1 Node x Production cluster NiFi Cluster Node 1 Node y Monitoring cluster S2S Bulletins S2S Status Other applications Monitoring Alerting • Ingest data • Generate status and bulletins • Send them through S2S to a remote cluster • Ingest bulletin and status reports • Parse reports (JSON) and extract useful KPI • Send KPI to a dashboard tool (AMS, Elastic, etc) • Use a broker for alerting (Kafka) • Can ingest logs from other systems or application
  • 48. 48 Recommendations from day to day life  High availability means several weeks to validate before Go live  Use Ranger authorizations instead of NIFI build in ACL management  Too much « if then do else »: do the data flow and do not offload other processing solutions  S2S : Minimize the number of RPG and self-RPG. Use attributes and routing.  Put hive via Two redundant processors  Discuss with DBA to better handle impacts (Views VS Tables, Open Sockets ..etc).  Backup flowfile.xml.gz (users.xml et autorizations.xml) using NiFi itself.
  • 49. 49 Recommendations from day to day life  Success flag can be done trough counters instead of InvokHTTP  In case of CIFS, do not forget to use AUTO MOUNT for NFS client side on NiFi Servers.  Check ‘0’ size before transmitting into HDFS (especially for IoT use case).  Configure a TIMER for when we use FAILURE redirections to avoid back-pressure scenario.  Build separate clusters for separate use cases (Real Time with SSD, Batch, etc …)  CA PKI very useful for internal communications, no need to wait Security teams answers but need security skills (SSL) .

Editor's Notes

  1. Only files and DB via Sqoop && CFT
  2. MSG and API Ingestions, Use a dedicated servers and bandwith and separate from the compute, simplified FW rules, and better credentials administration, one point of monitoring…
  3. Expliquer la Zone de Transit Cas le plus courant : ORC file
  4. ${Destination} étant dans l’update attribut prévue avant le S2S. Bien que nos devellopeur aujourdhui l’utilisent tèrs rarement, les flux de capture étant réalisé par la SQUAD qui assure l’opérationel.
  5. CDC actif avec NIFI pour une source MySQL, Etude en cours pour Oracle et DB2
  6. if it’s foreseen to have millions of flow files being processed by NiFi at one point in time, the heap size should be adjusted accordingly. A common value is to set the heap size to 8GB and it is usually not recommended to exceed 16GB unless very specific processors are used requiring a lot of memory. Since there is no resource isolation in NiFi, it’s important not to set this value too high in order to ensure that sufficient resources are available for the operating system and the other running processes. It is usually recommended to set this value to 2 or 3 times the number of available cores on the NiFi host. Increasing the value to a higher number should only be performed if there is a need for it (NiFi UI shows that the maximum number of threads is constantly used), and if a strict monitoring of the host is performed to ensure correct CPU/load metrics. the "nifi.cluster.node.connection.timeout" and "nifi.cluster.node.read.timeout" properties in nifi.properties) are set to 5 seconds. This is fine for a "getting started" type of cluster. However, when processing large numbers of FlowFiles, the JVM's garbage collection can sometimes cause some fairly lengthy pauses. This could cause timeouts. It is recommended to increase that to 10-15 seconds or more. the "nifi.zookeeper.connect.timeout" and "nifi.zookeeper.session.timeout" default to 3 seconds, as this is what the ZooKeeper default is. However, in case ZooKeeper is very busy being used by multiple components it could create timeouts which would cause the Primary Node and Cluster Coordinator to change frequently. Changing the value to 10 seconds is recommended.
  7. If you need to success in your NiFi projects, you really need to consider NiFi flow like software development Your code or your algorithm is a set of instructions just like your flows are a set of processor. You can have if else statments with routing processor, you can have for or while loops with update attributes, you can have catch statment with error relations When you build your flow you divide it into process groups like you use functions in your code. This make your applications easier to understand, maintain, and debug You can have templates for repetitive things like libraries that you build and use accross your projects
  8. Controller services should be local within the Process Group, not in the root or parent process group. All Flows should be contained within a child process group For larger flows, develop several smaller flows that can be discretely tested, and then combine them into composites using input/output or S2S ports Use Variable Registry for attributes that vary between environments Updating only a part is useful for RT system where data flow continuosly
  9. Manque Variable