SlideShare a Scribd company logo
1 of 8
Download to read offline
Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
How Baker Hughes, a GE Company
Migrated its Data Lake to AWS and Greenplum
March 19, 2019
Rajan, Senior Director Data & Analytics
BakerHughes – a GE Company
Venkat Gullapalli, VP
Wissen Infotech
Co-Presenter Presenter
From the reservoir to the refinery
We are fullstream
$23B
REVENUE
March 26, 2019 2
Confidential. Not to be copied, distributed, or reproduced without prior approval.
64K+
EMPLOYEES
120
COUNTRIES
BHGE
Only BHGE has fullstream capability: the
portfolio, the technology, and the
people to radically transform the oil and
gas industry and deliver unparalleled
improvement in industrial yield for our
customers.
Confidential. Not to be copied, distributed, or reproduced without prior approval.
BHGE: We are fullstream
Fullstream projects leverage the breadth of the BHGE portfolio, combined with new commercial models,
to drive successful outcomes for our customers.
UPSTREAM
• Evaluation
• Drilling
• Completion and
production
• Subsea
MIDSTREAM
• LNG
• Pipeline
• Storage
DOWNSTREAM
• Refining
• Petrochemical and
fertilizer
• Processing
INDUSTRIAL
• Power and
renewables
• Control and sensing
Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
March 26, 2019 4
BHGE Data Lake Platform
@ AWS
DataLake On-Premise
Ø Part of Corporate DataLake
Ø Shared infra with other GE Biz
Ø Defined TechStack
Ø Multiple instance of Greenplum
DataLake @ AWS
Ø Dedicated DataLake Instance
Ø Dedicated instance @AWS VPC
Ø Flexible TechStack
Ø One Consolidated GP Instance
20B+
Data Lake
Records per
instance
~PB
Storage Size
Storage
100M+
Daily
Real-Time Transactions
3.5B+
Daily
Batch Transactions
70+
Data Sources
ERP & NON-ERP
March 26, 2019 5
BHGE’s D&A Platform @
Massive growth of data demanding huge Compute & I/O Resources
70+ Data
Sources
Mirror | Analytical | Consumption layers
# of Tables ~ 30K+
# Storage ~ 1PB
# Records ~ 50+ Billion
Daily update: # Size 20+TB; # Records : 500+M updated/inserted
(Dev + QA + Prod)
Environments
Ingestion layer Visualization layer| | |
Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
SEGMENT CATEGORY 16 pt BOLD, ALL CAPSKEY Lessons Learned
March 26, 2019 6
Additional Considerations
• Data Classification
• Firewall Ports
• Security Groups
• Re-Validate access
Consolidation Efforts
• De-Duplication of Data
• Consolidate before move
• License / software
Upgrades
Cloud: Different Ball Game
• Resource Intensive
• ST-1 Vs IO-1 Disk Storage
• Horizontal Vs Vertical
Scaling
• Iterative process
• Performance vs Cost
OnPrem to Cloud
• Large volume of Data >
100TB
• Time Sensitive Load
• Data Replicate for < 150GB
• Used GP Utilities for > 150GB
(GPCronDump / GP Transfer)
Security One Data Lake Performance Data Transfer
Major Challenges
Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
March 26, 2019 7
Secret to Success
Compliance, Security & Risk
Early Engagement for Security Clearance
One of the major milestones
C-Level Leadership Engagement
Paramount Importance
One of the key success factors
Cross Functional Team Engagement
All Hands ON Deck
Keeping every major stakeholder on same page
Partnership
Partner Engagement (AWS, Pivotal, Wissen)
Helped project to progress on track
Transparency
Team Building for Great Results
24x7 efforts between on-site and offshore teams
The team spirit won the game!
How Baker Hughes, a GE Company
 Migrated its Data Lake to AWS and Greenplum - Greenplum Summit 2019

More Related Content

What's hot

What's hot (8)

Aviation GE Step Ahead
Aviation   GE Step AheadAviation   GE Step Ahead
Aviation GE Step Ahead
 
Technology in BP - Factsheet
Technology in BP - FactsheetTechnology in BP - Factsheet
Technology in BP - Factsheet
 
NYC BoileRx Program -2015
NYC BoileRx Program -2015NYC BoileRx Program -2015
NYC BoileRx Program -2015
 
St solar holdings_pitch_deck
St solar holdings_pitch_deckSt solar holdings_pitch_deck
St solar holdings_pitch_deck
 
Mark Clayton, QUBE Renewables
Mark Clayton, QUBE RenewablesMark Clayton, QUBE Renewables
Mark Clayton, QUBE Renewables
 
Colin Steel, Weltec Biopower
Colin Steel, Weltec BiopowerColin Steel, Weltec Biopower
Colin Steel, Weltec Biopower
 
Project Liberty
Project LibertyProject Liberty
Project Liberty
 
Imi remosa - Hydrogen for energy and business transition in industry (Alberto...
Imi remosa - Hydrogen for energy and business transition in industry (Alberto...Imi remosa - Hydrogen for energy and business transition in industry (Alberto...
Imi remosa - Hydrogen for energy and business transition in industry (Alberto...
 

Similar to How Baker Hughes, a GE Company
 Migrated its Data Lake to AWS and Greenplum - Greenplum Summit 2019

Harel Kodesh, Vice President, Predix and CTO, GE Digital
Harel Kodesh, Vice President, Predix and CTO, GE DigitalHarel Kodesh, Vice President, Predix and CTO, GE Digital
Harel Kodesh, Vice President, Predix and CTO, GE DigitalMIT Enterprise Forum Cambridge
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
Top Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud PlatformTop Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud PlatformKinsta WordPress Hosting
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...DataWorks Summit
 
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture Ceph Community
 
Jong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrow
Jong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrowJong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrow
Jong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrowVu Hung Nguyen
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsSkillspeed
 
Emerging BI trends 2019
Emerging BI trends 2019Emerging BI trends 2019
Emerging BI trends 2019Visual_BI
 
Graph + AI World Opening Keynote
Graph + AI World Opening KeynoteGraph + AI World Opening Keynote
Graph + AI World Opening KeynoteTigerGraph
 
The Impact of Data Analytics on Plant O&M, by GE Power's James Dicampli
The Impact of Data Analytics on Plant O&M, by GE Power's James DicampliThe Impact of Data Analytics on Plant O&M, by GE Power's James Dicampli
The Impact of Data Analytics on Plant O&M, by GE Power's James DicampliGE Power
 
Augmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big DataAugmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big DataTyler Wishnoff
 
Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big DataLuke Han
 
GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장
GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장
GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장GE코리아
 
New Strategies for Database Modernization
New Strategies for Database ModernizationNew Strategies for Database Modernization
New Strategies for Database ModernizationEDB
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
legacy application modernization in US | JK Tech
legacy application modernization in US | JK Techlegacy application modernization in US | JK Tech
legacy application modernization in US | JK TechBobRapier
 
Next Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und KubernetesNext Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und KubernetesSven Bernhardt
 

Similar to How Baker Hughes, a GE Company
 Migrated its Data Lake to AWS and Greenplum - Greenplum Summit 2019 (20)

Harel Kodesh, Vice President, Predix and CTO, GE Digital
Harel Kodesh, Vice President, Predix and CTO, GE DigitalHarel Kodesh, Vice President, Predix and CTO, GE Digital
Harel Kodesh, Vice President, Predix and CTO, GE Digital
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Top Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud PlatformTop Advantages of Using Google Cloud Platform
Top Advantages of Using Google Cloud Platform
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
 
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
Ceph Day New York 2014: Ceph and the Open Ethernet Drive Architecture
 
Jong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrow
Jong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrowJong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrow
Jong Won Koh - HP apollo reinventing HPC to accelerate the world of tomorrow
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
Emerging BI trends 2019
Emerging BI trends 2019Emerging BI trends 2019
Emerging BI trends 2019
 
Graph + AI World Opening Keynote
Graph + AI World Opening KeynoteGraph + AI World Opening Keynote
Graph + AI World Opening Keynote
 
The Impact of Data Analytics on Plant O&M, by GE Power's James Dicampli
The Impact of Data Analytics on Plant O&M, by GE Power's James DicampliThe Impact of Data Analytics on Plant O&M, by GE Power's James Dicampli
The Impact of Data Analytics on Plant O&M, by GE Power's James Dicampli
 
Augmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big DataAugmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big Data
 
Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
GE Hitachi. Nuclear Innovation
GE Hitachi. Nuclear InnovationGE Hitachi. Nuclear Innovation
GE Hitachi. Nuclear Innovation
 
GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장
GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장
GE 이노베이션 포럼 2017 LIVE 발표자료 - 빌 루 GE 최고디지털책임자 겸 GE Digital 사장
 
New Strategies for Database Modernization
New Strategies for Database ModernizationNew Strategies for Database Modernization
New Strategies for Database Modernization
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
legacy application modernization in US | JK Tech
legacy application modernization in US | JK Techlegacy application modernization in US | JK Tech
legacy application modernization in US | JK Tech
 
Higher ROI-N
Higher ROI-NHigher ROI-N
Higher ROI-N
 
Next Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und KubernetesNext Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
Next Gen Big Data Plattform mit Hadoop, APIs und Kubernetes
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 

Recently uploaded (20)

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 

How Baker Hughes, a GE Company
 Migrated its Data Lake to AWS and Greenplum - Greenplum Summit 2019

  • 1. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved. How Baker Hughes, a GE Company Migrated its Data Lake to AWS and Greenplum March 19, 2019 Rajan, Senior Director Data & Analytics BakerHughes – a GE Company Venkat Gullapalli, VP Wissen Infotech Co-Presenter Presenter
  • 2. From the reservoir to the refinery We are fullstream $23B REVENUE March 26, 2019 2 Confidential. Not to be copied, distributed, or reproduced without prior approval. 64K+ EMPLOYEES 120 COUNTRIES BHGE Only BHGE has fullstream capability: the portfolio, the technology, and the people to radically transform the oil and gas industry and deliver unparalleled improvement in industrial yield for our customers.
  • 3. Confidential. Not to be copied, distributed, or reproduced without prior approval. BHGE: We are fullstream Fullstream projects leverage the breadth of the BHGE portfolio, combined with new commercial models, to drive successful outcomes for our customers. UPSTREAM • Evaluation • Drilling • Completion and production • Subsea MIDSTREAM • LNG • Pipeline • Storage DOWNSTREAM • Refining • Petrochemical and fertilizer • Processing INDUSTRIAL • Power and renewables • Control and sensing
  • 4. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved. March 26, 2019 4 BHGE Data Lake Platform @ AWS DataLake On-Premise Ø Part of Corporate DataLake Ø Shared infra with other GE Biz Ø Defined TechStack Ø Multiple instance of Greenplum DataLake @ AWS Ø Dedicated DataLake Instance Ø Dedicated instance @AWS VPC Ø Flexible TechStack Ø One Consolidated GP Instance 20B+ Data Lake Records per instance ~PB Storage Size Storage 100M+ Daily Real-Time Transactions 3.5B+ Daily Batch Transactions 70+ Data Sources ERP & NON-ERP
  • 5. March 26, 2019 5 BHGE’s D&A Platform @ Massive growth of data demanding huge Compute & I/O Resources 70+ Data Sources Mirror | Analytical | Consumption layers # of Tables ~ 30K+ # Storage ~ 1PB # Records ~ 50+ Billion Daily update: # Size 20+TB; # Records : 500+M updated/inserted (Dev + QA + Prod) Environments Ingestion layer Visualization layer| | |
  • 6. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved. SEGMENT CATEGORY 16 pt BOLD, ALL CAPSKEY Lessons Learned March 26, 2019 6 Additional Considerations • Data Classification • Firewall Ports • Security Groups • Re-Validate access Consolidation Efforts • De-Duplication of Data • Consolidate before move • License / software Upgrades Cloud: Different Ball Game • Resource Intensive • ST-1 Vs IO-1 Disk Storage • Horizontal Vs Vertical Scaling • Iterative process • Performance vs Cost OnPrem to Cloud • Large volume of Data > 100TB • Time Sensitive Load • Data Replicate for < 150GB • Used GP Utilities for > 150GB (GPCronDump / GP Transfer) Security One Data Lake Performance Data Transfer Major Challenges
  • 7. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved. March 26, 2019 7 Secret to Success Compliance, Security & Risk Early Engagement for Security Clearance One of the major milestones C-Level Leadership Engagement Paramount Importance One of the key success factors Cross Functional Team Engagement All Hands ON Deck Keeping every major stakeholder on same page Partnership Partner Engagement (AWS, Pivotal, Wissen) Helped project to progress on track Transparency Team Building for Great Results 24x7 efforts between on-site and offshore teams The team spirit won the game!