SlideShare a Scribd company logo
1 of 25
Copyright © 2016, Schlumberger, All rights reserved.
From Zero to Data Flow
In Hours with Apache Nifi
Hadoop Summit – San Jose 2016
Chris Herrera
Schlumberger
Copyright © 2016, Schlumberger, All rights reserved.
Agenda
• Why is composable data flow important to the drilling industry
• Current State of the System
• The Breaking Point to the new system
• An unexpected workflow in testing
• How are we using it today
• What’s Next
Copyright © 2016, Schlumberger, All rights reserved.
Legal Notices
This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE
THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE
PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND
THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE
INFORMATION IN this presentation.
This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio,
and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or
proprietary rights related thereto.
Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names
and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are
not endorsements or approvals.
Copyright © 2016, Schlumberger, All rights reserved.
Copyright © 2016, Schlumberger, All rights reserved.
Introduction
• 2 Years managing product
development and innovation teams
working on real time data ingestion
and delivery
• 5 years of experience in the Hadoop
ecosystem
• 11 years of experience with various
aspects of the oilfield (operational
and technical)
Chris Herrera
Schlumberger
Copyright © 2016, Schlumberger, All rights reserved.
Wireline
Measurement / Logging
While Drilling
Mud logging
Fluids
Completions
Cementing
Rig • Several contractors brought in to
develop and complete the well
• Can be comprised of one, or most of
the time many companies
• All bringing their own system, a lot of
times without a central repository of
data
• Can be within decent cell connectivity,
or out deep in the middle of a jungle
with only 128k of high latency
bandwidth
The Major Components of a Drilling Project
Copyright © 2016, Schlumberger, All rights reserved.
Where Does This Data Need to Go?
RT Server
Operational
Support
Client
Monitoring
Processing and
Print Centers
Copyright © 2016, Schlumberger, All rights reserved.
Workflow of Data During and Post Operations
ProcessingCenter
Acquisition
DataServer
Classification
& Labelling
Quality Control
Classification
Quality Control
Hosting
QC & Labelling
Conversion
Data Delivery
KPI&Reporting
ProcessingAcq
Sales and Job Planning
Data
Processor
Customer
Manager
Client Data Delivery
Sales
Field
Engineer
Copyright © 2016, Schlumberger, All rights reserved.
Input
DLIS
LAS
1.2
2.0
3.0
WITS
Level 0
Level 1
Level 2
CSV
Profibus Modbus
What Does This Mean In A Data Sense
Output
CSV PDS
LAS
1.2
2.0
3.0
DLIS
RT Server
Copyright © 2016, Schlumberger, All rights reserved.
What Does This Mean in a Volume Sense
~9000
Users / Month
~10
Files / Minute
~480
Data
Queries / sec
~3050
Wells / month
Copyright © 2016, Schlumberger, All rights reserved.
Context
Fidelity
Time
Acquisition - Field Interpretation - Office
A Quick(ish) Note On The Importance of Data Provenance
• Need to retain the
fidelity
throughout the
flow.
Copyright © 2016, Schlumberger, All rights reserved.
Typical Data Problems Concerns
• What is the time zone of the data we are receiving – one day UTC...
• ”Ahh, I see you did not implement that part of the standard...”
• Wait, Why are you sending data at 5 times the sampling rate of the
sensor...
• I did not get the memo that you were changing your data model
today...
• Governmental / Client data residency concerns
Copyright © 2016, Schlumberger, All rights reserved.
Current Solution…
• 100+ Man Years of effort
over 14 years
• ~2,000,000 + Lines of Code
• Extreme barrier to entry
for workflow changes
• Very little understanding of
what happened to the data
Input
DLIS
LAS
1.2
2.0 3.0
WITS
Level 0
Level 1
Level 2
CSV
Profibus Modbus
Output
CSV PDS
LAS
1.2
2.0 3.0
DLIS
RT Server
Copyright © 2016, Schlumberger, All rights reserved.
We Needed A Simpler – Maintainable Solution…
Copyright © 2016, Schlumberger, All rights reserved.
The Original Plan…
Rabbit
MQ
DLIS
Parser
ETP
Endpoint
LAS
Parser Data
Writer
{}
DB
Event
Publisher
Node
JS
What About:
• Data cleansing
• Routing
• The ability to debug what
has gone wrong
• TIME (estimated 6 man
months)
Copyright © 2016, Schlumberger, All rights reserved.
How does Nifi fit into the equation?
• Knowing where data came from is crucial (and
often missing) to real time decision making
• The ability to visualize the data flow at a
granular level aids in troubleshooting and
operational understanding
• With several processors already available, there
is a low barrier to entry when it comes to data
flow creation
Copyright © 2016, Schlumberger, All rights reserved.
Enter Nifi…
Processor Creation
Data Flow Creation
Creation
Play…
10 Man Hours
ETP
WITSML 1.3.1.1 / 1.4.1.1
LAS 1.2 / 2.0
1 Man Day
Copyright © 2016, Schlumberger, All rights reserved.
Prototype Setup
Data Source
Processor
Input
Data Cleansing
Data
Enrichment
{ }
Repo
Data
Storage
Put Data
2 Man Days
• Append Well Name
• Append Client Name
• Append Run name
• Append Pass Name
Process Group:
Get
Update
Process Group:
Fix Time Zone
Remove Absent indexes
Data Cleansing
Routing
Copyright © 2016, Schlumberger, All rights reserved.
What About Testing!
Copyright © 2016, Schlumberger, All rights reserved.
Testing Landscape Today
2.2 TB Test Data
• 22 Applications
• 14 Different formats of data
• Data of questionable quality
• Stored on a file share
Effort
• .5 man effort / sprint on
maintenance
• 2 weeks to perform a full test
Copyright © 2016, Schlumberger, All rights reserved.
Step 1: Data Set Curation – Creating the Set of Reference
LAS
1.2
2.0
3.0
WITS
Level 0
Level 1
Level 2
CSV
Clean
Test
Data
Set
2.2 TB Test Data
6 Hours
Copyright © 2016, Schlumberger, All rights reserved.
Docker
Step 2: Immediate Test Harness
Clean
Test
Data
Set
• Step 1: Need Data
• Step 2: Docker pull
xxx.xxx.xxx.xxx:xxxx/flowTest
• Step 3: add put processor
• Step 4: start dataflow
From: 2 weeks to setup a test to:
Copyright © 2016, Schlumberger, All rights reserved.
• Docker
Step 3: Immediate Live Data Testing
Production
RT System
Processor
Input
Testing
Processor
Group
Anonymize
Data
• Significantly cuts
down time to test
application against
real data
• Especially in
brownfield
applications
• Brings a level of
confidence to the
project that
otherwise would be
missing.
Copyright © 2016, Schlumberger, All rights reserved.
Next Steps
Copyright © 2016, Schlumberger, All rights reserved.
Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance
RT Server
• Understanding the chain of custody from sensor to user
• Tracking the provenance of the data as it traverses through
the system
Copyright © 2016, Schlumberger, All rights reserved.
Thank You! Questions?

More Related Content

What's hot

Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer GuideDeon Huang
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonTimothy Spann
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningKai Wähner
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 

What's hot (20)

Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with Python
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Nifi
NifiNifi
Nifi
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Apache flink
Apache flinkApache flink
Apache flink
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 

Similar to From Zero to Data Flow in Hours with Apache NiFi

Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an examplehadooparchbook
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Cloudera, Inc.
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic IntelAPAC
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformDataStax Academy
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...donaghmccabe
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage ServerCustomer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage ServerRed_Hat_Storage
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataMatt Stubbs
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLTDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLtdc-globalcode
 

Similar to From Zero to Data Flow in Hours with Apache NiFi (20)

Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage ServerCustomer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage Server
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLTDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

From Zero to Data Flow in Hours with Apache NiFi

  • 1. Copyright © 2016, Schlumberger, All rights reserved. From Zero to Data Flow In Hours with Apache Nifi Hadoop Summit – San Jose 2016 Chris Herrera Schlumberger
  • 2. Copyright © 2016, Schlumberger, All rights reserved. Agenda • Why is composable data flow important to the drilling industry • Current State of the System • The Breaking Point to the new system • An unexpected workflow in testing • How are we using it today • What’s Next
  • 3. Copyright © 2016, Schlumberger, All rights reserved. Legal Notices This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE INFORMATION IN this presentation. This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio, and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or proprietary rights related thereto. Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are not endorsements or approvals. Copyright © 2016, Schlumberger, All rights reserved.
  • 4. Copyright © 2016, Schlumberger, All rights reserved. Introduction • 2 Years managing product development and innovation teams working on real time data ingestion and delivery • 5 years of experience in the Hadoop ecosystem • 11 years of experience with various aspects of the oilfield (operational and technical) Chris Herrera Schlumberger
  • 5. Copyright © 2016, Schlumberger, All rights reserved. Wireline Measurement / Logging While Drilling Mud logging Fluids Completions Cementing Rig • Several contractors brought in to develop and complete the well • Can be comprised of one, or most of the time many companies • All bringing their own system, a lot of times without a central repository of data • Can be within decent cell connectivity, or out deep in the middle of a jungle with only 128k of high latency bandwidth The Major Components of a Drilling Project
  • 6. Copyright © 2016, Schlumberger, All rights reserved. Where Does This Data Need to Go? RT Server Operational Support Client Monitoring Processing and Print Centers
  • 7. Copyright © 2016, Schlumberger, All rights reserved. Workflow of Data During and Post Operations ProcessingCenter Acquisition DataServer Classification & Labelling Quality Control Classification Quality Control Hosting QC & Labelling Conversion Data Delivery KPI&Reporting ProcessingAcq Sales and Job Planning Data Processor Customer Manager Client Data Delivery Sales Field Engineer
  • 8. Copyright © 2016, Schlumberger, All rights reserved. Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus What Does This Mean In A Data Sense Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
  • 9. Copyright © 2016, Schlumberger, All rights reserved. What Does This Mean in a Volume Sense ~9000 Users / Month ~10 Files / Minute ~480 Data Queries / sec ~3050 Wells / month
  • 10. Copyright © 2016, Schlumberger, All rights reserved. Context Fidelity Time Acquisition - Field Interpretation - Office A Quick(ish) Note On The Importance of Data Provenance • Need to retain the fidelity throughout the flow.
  • 11. Copyright © 2016, Schlumberger, All rights reserved. Typical Data Problems Concerns • What is the time zone of the data we are receiving – one day UTC... • ”Ahh, I see you did not implement that part of the standard...” • Wait, Why are you sending data at 5 times the sampling rate of the sensor... • I did not get the memo that you were changing your data model today... • Governmental / Client data residency concerns
  • 12. Copyright © 2016, Schlumberger, All rights reserved. Current Solution… • 100+ Man Years of effort over 14 years • ~2,000,000 + Lines of Code • Extreme barrier to entry for workflow changes • Very little understanding of what happened to the data Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
  • 13. Copyright © 2016, Schlumberger, All rights reserved. We Needed A Simpler – Maintainable Solution…
  • 14. Copyright © 2016, Schlumberger, All rights reserved. The Original Plan… Rabbit MQ DLIS Parser ETP Endpoint LAS Parser Data Writer {} DB Event Publisher Node JS What About: • Data cleansing • Routing • The ability to debug what has gone wrong • TIME (estimated 6 man months)
  • 15. Copyright © 2016, Schlumberger, All rights reserved. How does Nifi fit into the equation? • Knowing where data came from is crucial (and often missing) to real time decision making • The ability to visualize the data flow at a granular level aids in troubleshooting and operational understanding • With several processors already available, there is a low barrier to entry when it comes to data flow creation
  • 16. Copyright © 2016, Schlumberger, All rights reserved. Enter Nifi… Processor Creation Data Flow Creation Creation Play… 10 Man Hours ETP WITSML 1.3.1.1 / 1.4.1.1 LAS 1.2 / 2.0 1 Man Day
  • 17. Copyright © 2016, Schlumberger, All rights reserved. Prototype Setup Data Source Processor Input Data Cleansing Data Enrichment { } Repo Data Storage Put Data 2 Man Days • Append Well Name • Append Client Name • Append Run name • Append Pass Name Process Group: Get Update Process Group: Fix Time Zone Remove Absent indexes Data Cleansing Routing
  • 18. Copyright © 2016, Schlumberger, All rights reserved. What About Testing!
  • 19. Copyright © 2016, Schlumberger, All rights reserved. Testing Landscape Today 2.2 TB Test Data • 22 Applications • 14 Different formats of data • Data of questionable quality • Stored on a file share Effort • .5 man effort / sprint on maintenance • 2 weeks to perform a full test
  • 20. Copyright © 2016, Schlumberger, All rights reserved. Step 1: Data Set Curation – Creating the Set of Reference LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Clean Test Data Set 2.2 TB Test Data 6 Hours
  • 21. Copyright © 2016, Schlumberger, All rights reserved. Docker Step 2: Immediate Test Harness Clean Test Data Set • Step 1: Need Data • Step 2: Docker pull xxx.xxx.xxx.xxx:xxxx/flowTest • Step 3: add put processor • Step 4: start dataflow From: 2 weeks to setup a test to:
  • 22. Copyright © 2016, Schlumberger, All rights reserved. • Docker Step 3: Immediate Live Data Testing Production RT System Processor Input Testing Processor Group Anonymize Data • Significantly cuts down time to test application against real data • Especially in brownfield applications • Brings a level of confidence to the project that otherwise would be missing.
  • 23. Copyright © 2016, Schlumberger, All rights reserved. Next Steps
  • 24. Copyright © 2016, Schlumberger, All rights reserved. Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance RT Server • Understanding the chain of custody from sensor to user • Tracking the provenance of the data as it traverses through the system
  • 25. Copyright © 2016, Schlumberger, All rights reserved. Thank You! Questions?

Editor's Notes

  1. Different arrival times Different Data streams Exchanging data amongst themselves Unknown quality