Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
How Baker Hughes, a GE Company Migrated its Data Lake to AWS and Greenplum - Greenplum Summit 2019
1. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
How Baker Hughes, a GE Company
Migrated its Data Lake to AWS and Greenplum
March 19, 2019
Rajan, Senior Director Data & Analytics
BakerHughes – a GE Company
Venkat Gullapalli, VP
Wissen Infotech
Co-Presenter Presenter
2. From the reservoir to the refinery
We are fullstream
$23B
REVENUE
March 26, 2019 2
Confidential. Not to be copied, distributed, or reproduced without prior approval.
64K+
EMPLOYEES
120
COUNTRIES
BHGE
Only BHGE has fullstream capability: the
portfolio, the technology, and the
people to radically transform the oil and
gas industry and deliver unparalleled
improvement in industrial yield for our
customers.
3. Confidential. Not to be copied, distributed, or reproduced without prior approval.
BHGE: We are fullstream
Fullstream projects leverage the breadth of the BHGE portfolio, combined with new commercial models,
to drive successful outcomes for our customers.
UPSTREAM
• Evaluation
• Drilling
• Completion and
production
• Subsea
MIDSTREAM
• LNG
• Pipeline
• Storage
DOWNSTREAM
• Refining
• Petrochemical and
fertilizer
• Processing
INDUSTRIAL
• Power and
renewables
• Control and sensing
4. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
March 26, 2019 4
BHGE Data Lake Platform
@ AWS
DataLake On-Premise
Ø Part of Corporate DataLake
Ø Shared infra with other GE Biz
Ø Defined TechStack
Ø Multiple instance of Greenplum
DataLake @ AWS
Ø Dedicated DataLake Instance
Ø Dedicated instance @AWS VPC
Ø Flexible TechStack
Ø One Consolidated GP Instance
20B+
Data Lake
Records per
instance
~PB
Storage Size
Storage
100M+
Daily
Real-Time Transactions
3.5B+
Daily
Batch Transactions
70+
Data Sources
ERP & NON-ERP
5. March 26, 2019 5
BHGE’s D&A Platform @
Massive growth of data demanding huge Compute & I/O Resources
70+ Data
Sources
Mirror | Analytical | Consumption layers
# of Tables ~ 30K+
# Storage ~ 1PB
# Records ~ 50+ Billion
Daily update: # Size 20+TB; # Records : 500+M updated/inserted
(Dev + QA + Prod)
Environments
Ingestion layer Visualization layer| | |
6. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
SEGMENT CATEGORY 16 pt BOLD, ALL CAPSKEY Lessons Learned
March 26, 2019 6
Additional Considerations
• Data Classification
• Firewall Ports
• Security Groups
• Re-Validate access
Consolidation Efforts
• De-Duplication of Data
• Consolidate before move
• License / software
Upgrades
Cloud: Different Ball Game
• Resource Intensive
• ST-1 Vs IO-1 Disk Storage
• Horizontal Vs Vertical
Scaling
• Iterative process
• Performance vs Cost
OnPrem to Cloud
• Large volume of Data >
100TB
• Time Sensitive Load
• Data Replicate for < 150GB
• Used GP Utilities for > 150GB
(GPCronDump / GP Transfer)
Security One Data Lake Performance Data Transfer
Major Challenges
7. Copyright 2019 Baker Hughes, a GE company, LLC (“BHGE”). All rights reserved.
March 26, 2019 7
Secret to Success
Compliance, Security & Risk
Early Engagement for Security Clearance
One of the major milestones
C-Level Leadership Engagement
Paramount Importance
One of the key success factors
Cross Functional Team Engagement
All Hands ON Deck
Keeping every major stakeholder on same page
Partnership
Partner Engagement (AWS, Pivotal, Wissen)
Helped project to progress on track
Transparency
Team Building for Great Results
24x7 efforts between on-site and offshore teams
The team spirit won the game!