SlideShare a Scribd company logo
1 of 34
Download to read offline
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
 Überraschend mehr Möglichkeiten
© OPITZ CONSULTING 2020
Automatisierung mit Hilfe von
Infrastructure as Code
Fabian Hardt,
Senior Consultant – Big Data / Analytics
Data Lab Umgebung für Data
Scientists
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich© OPITZ CONSULTING 2020
#Figures€49,0
55,5 56,0
2017 2018 2019
Turnover in million euros Ø Sector distribution
Retail/
Logistics/
Services
Chemical
&
pharma
Other
Public
IndustryFinance
455 482
555
2017 2018 2019
Employees
91% customer satisfaction
Recommendation: NPS 40.26
> 130
actively managed
service customers
> 600
active
customers
> 5000
databases
in support
> 4000
systems
in support
> 99.5%
SLA compliance
§
> 90% of our
customers’ projects
use agile methods
“Sound management in conjunction with a
long-term customer-oriented strategy
guarantees financial success.”
– Torsten Schlautmann
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich Seite 3
Agenda
1
2
3
4
5
Data Lake
Data Lab vs. Data Factory
Infrastructure as Code
Demo
Summary
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich Seite 4
Data Lake
 Purpose
 Requirements
 Architecture
1
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Lake - Purpose
 Primary task: Centralized data provisioning
 All data of a company is collected in one centralized data platform
 Also external data is collected (social-media, weather, stock market prices, etc.)
 Similar to DWH Systems, integration of different data sources
 Store Raw data without knowing the intended purpose
 Structured data
 Unstructured data
 Suitable for very large amounts of data
 Mostly implemented with big data technologies
Data Lake
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Lake – Enterprise Requirements (Data Factory)
 Usability of stored Data
 Searchability of data
 Evaluability of data
 Availability of data
 Data Security / Data protection regulations
 Authentication of users
 Authorization of users
 Data Governance
 Quality Assurance of data
 Governs the compliance requirements
} Users should only see the data that they’re allowed to see
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Core components of a Data Lake
Refined Data
- Quality assured data,
- typically data that could
transition into a classical DWH
Data Refinery
- Preprocessing Area
Raw Data
- sensor data,
- streaming,
- social media,
- documents,
- Images
- …
Metadata
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich Seite 9
Data Lab vs. Data Factory
 Purpose
 Requirements
 Architecture
2
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Lab - purpose
 Develop processes and algorithms to gain data insights
 Development Area for Data Scientists
 Allows Data Scientist to install and use any software of their choice
 Allows Data Scientist experimental processes with external data
 Allows Data Scientist to manage all data and processes
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Lab - Requirements
 Should give all freedoms to Data Scientist
 Use all kinds of technologies and software (in container)
 Use data as close as possible to real enterprise data
 Use data from external sources
 Should match all Requirements from a Software Development
Environment
 Software artifact should be runnable in Data Factory
 Should prevent Data Scientist from seeing data he should not see
 Should ensure data lake does not get messed up
 Should ensure even resource hungry processes do not influence the data
lake
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Lab – Requirements implementation
 Data Lab as Sandbox dedicated to Data Scientist
 No Authorization needed
 Gives all freedoms to Data Scientist
 Access to Data Lake to use Data from there
 Only read access at most
 Sometimes only working on copies of data from Data Lake
 Sandbox Architecture prevents Data Lab from influencing Data Lake
processes
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Lab - Architecture
Refined Data
- Quality assured data,
- typically data that could
transition into a classical
DWH
Data Refinery
- Preprocessing Area
Raw Data
- sensor data,
- streaming,
- social media,
- documents,
- Images
- …
Metadata
Analytics Sandbox
External Data
Data
from
Lake
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Factory - Requirements
 New models and algorithms should be integrated easily
 Established algorithms should be updated easily
 Found insights should be integrated in Data Lake to be evalueable
 Challenge to design extensible data model
 Single Data Factory components should not influence ohter Data Lake
processes
 Must be complient conform
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Science vs. Enterprise
 Looking for business insights in a
huge amount of data
 Mainly used to train algorithms
 Not a production system
 no operating team necessary
 Usually no other target groups
 Often no further authorization is needed
 Usually only authentication is enabled
Data Lab Enterprise Data Lake
 Is very integrated into the daily
business
 Production system
 Typically many target applications are
based on this collection of data
 High availability
 Backup & Recovery
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Science vs. Enterprise
Data Lab
 Explorative approach
 Insights generating
 Work with production near data
 Use Sandboxes
 Data Scientists
 Experts from the specialist department
 Goal: Train models and algorithms
 Generating value from trained
models and algorithms
 Automatic processing of data from
Data Lake
 Further development like in other
software development products
 Makes use of trained models and
algorithms
Data Factory
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Deployment from Data Lab to Factory
Data FactoryData Lab Trained Algorithms
Generated Insights from Analysis
Refined Data
Data Refinery
Raw Data
DWH
Weitere Datenquellen
No Data Transfer from Lab to
Factory
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Conclusion
Why do we need a Data Lab?
 Where is the problem? Why should we do this?
 „Playground“ / sandbox for Data Scientists
 No risk to break something in EDL / Data Factory
 No dependency on release cycles, etc.
 "Free" choice of tools
 Made possible by Infrastructure as Code
 Setting up entire environments in minutes
 Very similar or identical setup to EDL / Data Factory
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich Seite 19
Infrastructure as Code (IaC)
 What is IaC
 Advantages
3
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Keyword – Infrastructure as Code
 Called IaC
 Process of managing and provision computers, vm‘s, networks
 Version controlled infrastructure definitions
 Repeatable and reliable
 Frameworks
 Hashicorp Terraform
 Puppet
 SaltStack
 RedHat Ansible
 …
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Advantages of IaC
 Costs
 Reduce costs by
 Avoiding repetitive tasks
 Avoiding mistakes
 Reuse code / infrastructure
 Speed
 Provisioning on multiple machines in parallel
 Calculations / regex in code
 Risk
 Reduces the risk through
 Better preparation
 Idempotent approach
 Prepared updates
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Challenges for IaC
 A few new skills needed
 Typical development workflows (e.g. GitFlow)
 Some basic coding paradigms
 JSON
 Ruby
 YAML
 HCL
 …
 Monitoring
 Who is provisioning – what?, where?, how?
 Legacy tracking via worksheet not possible
 → Solution: e.g. Ansible Tower / AWX
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Reuse / reproducible result
Use Playbooks / scripts / templates for all stages
DEV TEST PROD
Playbook
Script
Reliable and easier
deployment in next
stages
Development of
scripts / playbooks
in DEV environment
Inventory Inventory Inventory
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Often seen like this
TEST
Oh, this is our Data
Science environment!
Yes, I trained this
modelon TEST
environment first.
No, we can‘t use TEST
for new Release now,
Data Scientist using it!
DEV Team
Data Science Team
Product-
management
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Better like this…
PROD
TEST
Data Lab /
SandboxFork
Trained models
New Reports
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Terraform (Hashicorp)
 Open Source software tool
 Released 2014
 Code in HCL (Hashicorp Configuration Language) or JSON
 Terraform CLI to trigger Terraform actions
 Manages public / private cloud infrastructure
 Extendable with providers – AWS, Azure, OCI, vSphere
 Performs CRUD operations to accomplish the target state
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Cloud-init
 Customizes OS directly after cloning from template
 Many public- / private clouds and OS Types are supported
 Perfectly for „low-level stuff“
 Set default locale
 Set hostnames
 Set up SSH keys
 Mount some shared folders
 It can also install some software packages
 Transfer point to e.g. Ansible
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Ansible (RedHat)
 Open Source software, founded by AnsibleWorks in 2012
 Very good supported on many Unix-like systems – but also on Windows
 Strengths: Provisioning, configuration management, some kind of
application development
 Agentless – working with temporary SSH access
 Features
 Inventory based configuration of different stages
 Ansible Vault – Store sensitive data
 A playbook can be written idempotent
 Ansible Tower / Upstream project AWX – REST API, web bases console for Ansible
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich Seite 29
Demo
 Here comes a small sample…
4
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Data Lab creation workflow
Admin
Chooses template
according to requirements
Environment is
getting provisioned
Data Lab –
ready to use
Pull data through APIs
Push data into Data Lab
Analytics Sandbox
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Sample workflow (on premise)
Admin
triggers runs a playbook
vm creation
add dns entries /
AD or LDAP users
do some configuration
copy data into Data Lab
notify Data Scientists
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Sample workflow (Oracle Cloud)
Admin
triggers use predefined stack
do some configuration
copy data into Data Lab
notify Data Scientists
create vm instances and
networks
Optional: add
some firewall
rules
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
Summary
 First of all → Don‘t use TEST as Data Science ENV
 Use a dedicated Data Lab environment to innovate
 IaC is the way to build and destroy a Data Lab environment
 Scalable and repeatable
 Mix multiple clouds or on-premise environments
 Use the mix of roles to succeed – admins, developers, data scientists
 Increase satisfaction (company-wide)
 More flexibility for Data Scientists
 Matching processes for administration and development in Data Factory
 Speed up innovation lifecycle
© OPITZ CONSULTING 2020
Informationsklassifikation:
Öffentlich
 Überraschend mehr Möglichkeiten
@OC_WIRE
OPITZCONSULTING
opitzconsulting
opitz-consulting-bcb8-1009116
WWW.OPITZ-CONSULTING.COM
Vielen Dank für Ihr Interesse, wir freuen uns auf
den gemeinsamen Austausch mit Ihnen!
Fabian Hardt
Senior Consultant,
Business Intelligence & Analytics
Kirchstraße 6
51647 Gummersbach
Fabian.Hardt@opitz-consulting.com
+49 (0) 2261 6001-1045

More Related Content

What's hot

Ten Pillars of World Class Data Virtualization
Ten Pillars of World Class Data VirtualizationTen Pillars of World Class Data Virtualization
Ten Pillars of World Class Data VirtualizationDenodo
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 
Hortonworks & IBM solutions
Hortonworks & IBM solutionsHortonworks & IBM solutions
Hortonworks & IBM solutionsThiago Santiago
 
Oracle IoT Cloud Service - First practical experience
Oracle IoT Cloud Service - First practical experience Oracle IoT Cloud Service - First practical experience
Oracle IoT Cloud Service - First practical experience OPITZ CONSULTING Deutschland
 
Teradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data StreamingTeradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data StreamingTeradata
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...DataWorks Summit/Hadoop Summit
 
Unit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-assUnit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-assARVIND SARDAR
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTIBCO Spotfire
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?Denodo
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho EvaluationPentaho
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...Dataconomy Media
 
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...Gina Pabalan
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)Denodo
 
Pentaho Enterprise vs. Pentaho Community
Pentaho Enterprise vs. Pentaho CommunityPentaho Enterprise vs. Pentaho Community
Pentaho Enterprise vs. Pentaho CommunityMauricio Murillo
 

What's hot (20)

Ten Pillars of World Class Data Virtualization
Ten Pillars of World Class Data VirtualizationTen Pillars of World Class Data Virtualization
Ten Pillars of World Class Data Virtualization
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Hortonworks & IBM solutions
Hortonworks & IBM solutionsHortonworks & IBM solutions
Hortonworks & IBM solutions
 
Oracle IoT Cloud Service - First practical experience
Oracle IoT Cloud Service - First practical experience Oracle IoT Cloud Service - First practical experience
Oracle IoT Cloud Service - First practical experience
 
Teradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data StreamingTeradata Listener™: Radically Simplify Big Data Streaming
Teradata Listener™: Radically Simplify Big Data Streaming
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 
Unit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-assUnit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-ass
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made Easy
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...D365 Finance & Operations - Data & Analytics (see newer release of this docum...
D365 Finance & Operations - Data & Analytics (see newer release of this docum...
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)Secure Your Data with Virtual Data Fabric (ASEAN)
Secure Your Data with Virtual Data Fabric (ASEAN)
 
Pentaho Enterprise vs. Pentaho Community
Pentaho Enterprise vs. Pentaho CommunityPentaho Enterprise vs. Pentaho Community
Pentaho Enterprise vs. Pentaho Community
 

Similar to Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists

Next Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und KubernetesNext Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und KubernetesSven Bernhardt
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...John Archer
 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Denodo
 
Managing the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data VirtualizationManaging the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data VirtualizationDenodo
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product ManagersPentaho
 
dell-emc-powerscale-for-ngs.pptx
dell-emc-powerscale-for-ngs.pptxdell-emc-powerscale-for-ngs.pptx
dell-emc-powerscale-for-ngs.pptxSriramFreelance
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorNicolas Sarramagna
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Capgemini
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData Inc.
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroDenodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Denodo
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
Webinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data AnalyticsWebinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data AnalyticsStorage Switzerland
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Alluxio, Inc.
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)MarketingArrowECS_CZ
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 

Similar to Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists (20)

Next Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und KubernetesNext Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
 
Managing the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data VirtualizationManaging the Impact of COVID-19 Using Data Virtualization
Managing the Impact of COVID-19 Using Data Virtualization
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
Big Data for Product Managers
Big Data for Product ManagersBig Data for Product Managers
Big Data for Product Managers
 
dell-emc-powerscale-for-ngs.pptx
dell-emc-powerscale-for-ngs.pptxdell-emc-powerscale-for-ngs.pptx
dell-emc-powerscale-for-ngs.pptx
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sector
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Webinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data AnalyticsWebinar: Improving Time to Value for Enterprise Big Data Analytics
Webinar: Improving Time to Value for Enterprise Big Data Analytics
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 

More from Fabian Hardt

Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & SecurityFabian Hardt
 
Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & SecurityFabian Hardt
 
Mit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten OrganisationMit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten OrganisationFabian Hardt
 
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...Fabian Hardt
 
Analytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIsAnalytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIsFabian Hardt
 
Service Mesh Advanced Use Cases
Service Mesh Advanced Use CasesService Mesh Advanced Use Cases
Service Mesh Advanced Use CasesFabian Hardt
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackFabian Hardt
 
Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?Fabian Hardt
 
Persönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und ChatbotPersönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und ChatbotFabian Hardt
 
Augmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon AlexaAugmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon AlexaFabian Hardt
 

More from Fabian Hardt (10)

Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & Security
 
Advanced Observability & Security
Advanced Observability & SecurityAdvanced Observability & Security
Advanced Observability & Security
 
Mit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten OrganisationMit APIs auf der Überholspur zur produktorientierten Organisation
Mit APIs auf der Überholspur zur produktorientierten Organisation
 
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
Data Mesh und Domain Driven Design - rücken Analytics und SD nun doch näher z...
 
Analytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIsAnalytics meets Integration – Modern Development mit Data APIs
Analytics meets Integration – Modern Development mit Data APIs
 
Service Mesh Advanced Use Cases
Service Mesh Advanced Use CasesService Mesh Advanced Use Cases
Service Mesh Advanced Use Cases
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data Stack
 
Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?Modern Data Stack – Buzzword oder echter Game-Changer?
Modern Data Stack – Buzzword oder echter Game-Changer?
 
Persönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und ChatbotPersönliche Filmtipps mittels Recommender System und Chatbot
Persönliche Filmtipps mittels Recommender System und Chatbot
 
Augmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon AlexaAugmented Analytics mit Amazon Alexa
Augmented Analytics mit Amazon Alexa
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 

Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists

  • 1. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich  Überraschend mehr Möglichkeiten © OPITZ CONSULTING 2020 Automatisierung mit Hilfe von Infrastructure as Code Fabian Hardt, Senior Consultant – Big Data / Analytics Data Lab Umgebung für Data Scientists
  • 2. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich© OPITZ CONSULTING 2020 #Figures€49,0 55,5 56,0 2017 2018 2019 Turnover in million euros Ø Sector distribution Retail/ Logistics/ Services Chemical & pharma Other Public IndustryFinance 455 482 555 2017 2018 2019 Employees 91% customer satisfaction Recommendation: NPS 40.26 > 130 actively managed service customers > 600 active customers > 5000 databases in support > 4000 systems in support > 99.5% SLA compliance § > 90% of our customers’ projects use agile methods “Sound management in conjunction with a long-term customer-oriented strategy guarantees financial success.” – Torsten Schlautmann
  • 3. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Seite 3 Agenda 1 2 3 4 5 Data Lake Data Lab vs. Data Factory Infrastructure as Code Demo Summary
  • 4. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Seite 4 Data Lake  Purpose  Requirements  Architecture 1
  • 5. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Lake - Purpose  Primary task: Centralized data provisioning  All data of a company is collected in one centralized data platform  Also external data is collected (social-media, weather, stock market prices, etc.)  Similar to DWH Systems, integration of different data sources  Store Raw data without knowing the intended purpose  Structured data  Unstructured data  Suitable for very large amounts of data  Mostly implemented with big data technologies Data Lake
  • 6. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Lake – Enterprise Requirements (Data Factory)  Usability of stored Data  Searchability of data  Evaluability of data  Availability of data  Data Security / Data protection regulations  Authentication of users  Authorization of users  Data Governance  Quality Assurance of data  Governs the compliance requirements } Users should only see the data that they’re allowed to see
  • 7. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Core components of a Data Lake Refined Data - Quality assured data, - typically data that could transition into a classical DWH Data Refinery - Preprocessing Area Raw Data - sensor data, - streaming, - social media, - documents, - Images - … Metadata
  • 8. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Seite 9 Data Lab vs. Data Factory  Purpose  Requirements  Architecture 2
  • 9. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Lab - purpose  Develop processes and algorithms to gain data insights  Development Area for Data Scientists  Allows Data Scientist to install and use any software of their choice  Allows Data Scientist experimental processes with external data  Allows Data Scientist to manage all data and processes
  • 10. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Lab - Requirements  Should give all freedoms to Data Scientist  Use all kinds of technologies and software (in container)  Use data as close as possible to real enterprise data  Use data from external sources  Should match all Requirements from a Software Development Environment  Software artifact should be runnable in Data Factory  Should prevent Data Scientist from seeing data he should not see  Should ensure data lake does not get messed up  Should ensure even resource hungry processes do not influence the data lake
  • 11. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Lab – Requirements implementation  Data Lab as Sandbox dedicated to Data Scientist  No Authorization needed  Gives all freedoms to Data Scientist  Access to Data Lake to use Data from there  Only read access at most  Sometimes only working on copies of data from Data Lake  Sandbox Architecture prevents Data Lab from influencing Data Lake processes
  • 12. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Lab - Architecture Refined Data - Quality assured data, - typically data that could transition into a classical DWH Data Refinery - Preprocessing Area Raw Data - sensor data, - streaming, - social media, - documents, - Images - … Metadata Analytics Sandbox External Data Data from Lake
  • 13. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Factory - Requirements  New models and algorithms should be integrated easily  Established algorithms should be updated easily  Found insights should be integrated in Data Lake to be evalueable  Challenge to design extensible data model  Single Data Factory components should not influence ohter Data Lake processes  Must be complient conform
  • 14. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Science vs. Enterprise  Looking for business insights in a huge amount of data  Mainly used to train algorithms  Not a production system  no operating team necessary  Usually no other target groups  Often no further authorization is needed  Usually only authentication is enabled Data Lab Enterprise Data Lake  Is very integrated into the daily business  Production system  Typically many target applications are based on this collection of data  High availability  Backup & Recovery
  • 15. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Science vs. Enterprise Data Lab  Explorative approach  Insights generating  Work with production near data  Use Sandboxes  Data Scientists  Experts from the specialist department  Goal: Train models and algorithms  Generating value from trained models and algorithms  Automatic processing of data from Data Lake  Further development like in other software development products  Makes use of trained models and algorithms Data Factory
  • 16. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Deployment from Data Lab to Factory Data FactoryData Lab Trained Algorithms Generated Insights from Analysis Refined Data Data Refinery Raw Data DWH Weitere Datenquellen No Data Transfer from Lab to Factory
  • 17. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Conclusion Why do we need a Data Lab?  Where is the problem? Why should we do this?  „Playground“ / sandbox for Data Scientists  No risk to break something in EDL / Data Factory  No dependency on release cycles, etc.  "Free" choice of tools  Made possible by Infrastructure as Code  Setting up entire environments in minutes  Very similar or identical setup to EDL / Data Factory
  • 18. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Seite 19 Infrastructure as Code (IaC)  What is IaC  Advantages 3
  • 19. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Keyword – Infrastructure as Code  Called IaC  Process of managing and provision computers, vm‘s, networks  Version controlled infrastructure definitions  Repeatable and reliable  Frameworks  Hashicorp Terraform  Puppet  SaltStack  RedHat Ansible  …
  • 20. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Advantages of IaC  Costs  Reduce costs by  Avoiding repetitive tasks  Avoiding mistakes  Reuse code / infrastructure  Speed  Provisioning on multiple machines in parallel  Calculations / regex in code  Risk  Reduces the risk through  Better preparation  Idempotent approach  Prepared updates
  • 21. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Challenges for IaC  A few new skills needed  Typical development workflows (e.g. GitFlow)  Some basic coding paradigms  JSON  Ruby  YAML  HCL  …  Monitoring  Who is provisioning – what?, where?, how?  Legacy tracking via worksheet not possible  → Solution: e.g. Ansible Tower / AWX
  • 22. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Reuse / reproducible result Use Playbooks / scripts / templates for all stages DEV TEST PROD Playbook Script Reliable and easier deployment in next stages Development of scripts / playbooks in DEV environment Inventory Inventory Inventory
  • 23. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Often seen like this TEST Oh, this is our Data Science environment! Yes, I trained this modelon TEST environment first. No, we can‘t use TEST for new Release now, Data Scientist using it! DEV Team Data Science Team Product- management
  • 24. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Better like this… PROD TEST Data Lab / SandboxFork Trained models New Reports
  • 25. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Terraform (Hashicorp)  Open Source software tool  Released 2014  Code in HCL (Hashicorp Configuration Language) or JSON  Terraform CLI to trigger Terraform actions  Manages public / private cloud infrastructure  Extendable with providers – AWS, Azure, OCI, vSphere  Performs CRUD operations to accomplish the target state
  • 26. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Cloud-init  Customizes OS directly after cloning from template  Many public- / private clouds and OS Types are supported  Perfectly for „low-level stuff“  Set default locale  Set hostnames  Set up SSH keys  Mount some shared folders  It can also install some software packages  Transfer point to e.g. Ansible
  • 27. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Ansible (RedHat)  Open Source software, founded by AnsibleWorks in 2012  Very good supported on many Unix-like systems – but also on Windows  Strengths: Provisioning, configuration management, some kind of application development  Agentless – working with temporary SSH access  Features  Inventory based configuration of different stages  Ansible Vault – Store sensitive data  A playbook can be written idempotent  Ansible Tower / Upstream project AWX – REST API, web bases console for Ansible
  • 28. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Seite 29 Demo  Here comes a small sample… 4
  • 29. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich
  • 30. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Data Lab creation workflow Admin Chooses template according to requirements Environment is getting provisioned Data Lab – ready to use Pull data through APIs Push data into Data Lab Analytics Sandbox
  • 31. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Sample workflow (on premise) Admin triggers runs a playbook vm creation add dns entries / AD or LDAP users do some configuration copy data into Data Lab notify Data Scientists
  • 32. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Sample workflow (Oracle Cloud) Admin triggers use predefined stack do some configuration copy data into Data Lab notify Data Scientists create vm instances and networks Optional: add some firewall rules
  • 33. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich Summary  First of all → Don‘t use TEST as Data Science ENV  Use a dedicated Data Lab environment to innovate  IaC is the way to build and destroy a Data Lab environment  Scalable and repeatable  Mix multiple clouds or on-premise environments  Use the mix of roles to succeed – admins, developers, data scientists  Increase satisfaction (company-wide)  More flexibility for Data Scientists  Matching processes for administration and development in Data Factory  Speed up innovation lifecycle
  • 34. © OPITZ CONSULTING 2020 Informationsklassifikation: Öffentlich  Überraschend mehr Möglichkeiten @OC_WIRE OPITZCONSULTING opitzconsulting opitz-consulting-bcb8-1009116 WWW.OPITZ-CONSULTING.COM Vielen Dank für Ihr Interesse, wir freuen uns auf den gemeinsamen Austausch mit Ihnen! Fabian Hardt Senior Consultant, Business Intelligence & Analytics Kirchstraße 6 51647 Gummersbach Fabian.Hardt@opitz-consulting.com +49 (0) 2261 6001-1045