SlideShare a Scribd company logo
1 of 20
Cloud Independent
Cloud Data Platform
from EMR/DMS/GlueJobs to Kubernetes
Kubernetes?
Presto on k8s
Spark on k8s
Kubernetes 운영
왜 Kubernetes 일까
- Managed service
- Software lists
- Presto
- Spark server
- Hive/Hue
- Hadoop
- Zeppelin
- JupyterHub
- 장점
- 설치/환경설정/maintenance
- 단점
- version update(그래도 빠름)
EMR
Services
EMR HDFS s3
Hadoop/Hive/pi
g/...
spark이나 presto로 변환
Spark Job : spark on kubernetes
Server : spark server는 대체할 방법을 찾지못함
Presto prestodb, trino, starburst 등에 kubernetes 배포가 잘 나와있음
Zeppelin spark server가 필요한 경우는 EMR 사용
Glue/DMS glue/DMS는 서비스를 빠르게 만들고 안정적으로 운영하기에는 여전히
좋은 서비스 이지만, 성능/비용/데이터변환에 대한 다양한 요구가 있다고
한다면 ETL image를 만들어서 실행하는것도 좋은 선택
Data Platform services on AWS 에 대한 재고
Kubernetes
장점(Why)
- 서비스 안정성/container관리
- nodegroup 설정으로 서버관리 용이
- Fast container deployment(ec2 create, container실행등의 과정)
- Helm이나 Vendor에서 제공하는 yaml 파일을 이용하면 새로운 제품의 설치,
POC, Test이 용이
추가 Resources
- EMR 대신 Kubernetes를 운영필요
- Kubernetes 운영 R&R
Presto
Presto
- prestodb, trino(prestosql)
- Auto scaling
- resource 활용기반 auto scaling
- schedule based
- K8s
- Internet/Internal service
- subnetAZ는 하나만
- ELB -> Ingress-Nginx -> Service(k8s) -> Coordinator(Deployment) -> Worker(Deployment)
- EMR에 비해 장점
- restart 쉬움
- 장애시, 설정변경시
- scale out 이 용이
- worker deployment 의 replica 숫자만 변경하면 scale
Trino(prestosql) - image
1. git clone https://github.com/trinodb/trino.git
2. ./mvnw clean install -DskipTests
3. 추가library
- docker/Dockerfile에 추가
- JMX 를 활용해서 prometheus 추가하기위한 llbrary
- RUN yum -y -q install wget && 
- RUN wget
https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_pr
ometheus_javaagent-0.12.0.jar -P /usr/lib/presto/
4. docker/build-local.sh
5. ECR 에 push
- yaml파일
- 주요 configMap
- coordinator, executor 는 deployment를 이용
Presto deploy
jvm.config: |
-server
-Xmx51g
-javaagent:/usr/lib/presto/jmx_prometheus_javaagent-0.12.0.jar=8081:/usr/lib/presto/etc/presto.yml
config.properties: |
coordinator=true #worker일경우 false
query.max-memory=200GB
query.max-memory-per-node = 20GB
query.max-total-memory-per-node =35GB
discovery-server.enabled=true
query.max-stage-count=250
Spark
Architecture of Spark-on-Kubernetes
- spark driver, executor 가 pod
형태로 실행
- EMR대비 장점
- 기존DAG별로 EMR을 사용했
을때와 비교해서 emr 시작시간
절약
- 동시 여러 task를 완전하게 독립
적으로 실행(emr에서 여러개
task가 실행시 서로 영향받는거
고려할필요x)
- airflow등 job scheduler에서
EMR실행할 필요없이 task 단독
으로 실행가능
- spark-submit
- master를 kubernetes 로
- spark on k8s operator
- yaml 제출 형태로 spark 실행
- https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-
guide.md
- Image build
- airflow operator
- 기존 EMR submit operator대체
- https://github.com/kubernetes-client/python
- k8s
- network 비용발생 문제로 subnetAZ는 하나만
- r5.2xlarge 이상(cpu 5core지정을 위한)
Spark
- download spark
- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz
- 설정파일
- Dockerfile
- scala : kubernetes/dockerfiles/spark/Dockerfile
- pyspark: kubernetes/dockerfiles/spark/bindings/python/Dockerfile,
dockerfiles/spark/bindings/python/requirements.txt
- 설정파일 - 세부설정은 EMR참조
- COPY $conf_path/hive-site.xml /opt/spark/spark-conf/
- COPY $conf_path/spark-defaults.conf /opt/spark/spark-conf/
- COPY $conf_path/core-site.xml /opt/spark/spark-conf/
- Library
- aws, s3, kinesis, delta lake 등 필요한 library COPY
- bin/docker-image-tool.sh 후 ECR에 push
Spark - image
Spark default
./bin/spark-submit --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> 
--deploy-mode cluster  --name spark-pi 
--class org.apache.spark.examples.SparkPi 
--conf spark.kubernetes.container.image=<spark-image> 
local:///path/to/examples.jar
Spark on k8s operator
- apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
Resource Conf
The cpu limits are set by spark.kubernetes.{driver,executor}.limit.cores.
The cpu is set by spark.{driver,executor}.cores.
The memory request and limit are set by summing the values of spark.{driver,executor}.memory and
spark.{driver,executor}.memoryOverhead.
Spark kubernetes - 실행
- ETL image개발
- source/target 관련한 library 가 편리하고 다양함
- 데이터 변환/monitor 관련한 기능추가
- Data Extraction을 위해 Python 을 이용하고, Load는 spark을 이용해서 적재(hive metastore)
- airflow
- kubernetesPodOperator 를 통해 실행
- airflow를 통해 실행/모니터를 다른 Data Processing작업과 통합해서 관리
DMS/Glue
Kubernetes 운영
- Fargate
- w/ no wait
- resource 제한
- Serverless data processing 성능개선(Redshift spectrum)
- snowflake나 databricks
기대하는 AWS services
Kubernetes 운영
- Nodegroup monitor 를 통해 manual scheduling
- prometheus, grafana
- 안정적인 운영 필요시 ASG에 scheduled action 추가
- Nodegroup AMI 주기적인 upgrade
(https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-amis.html)
Data engineer role
- Service운영
- Image build/관리
- Kubernetes 관리
- Pod/Deployments
- Nodegroup
- Services
- SG
- AWS ASG
- AWS 관리

More Related Content

What's hot

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...Puppet
 
Ansible 2 and Ansible Galaxy 2
Ansible 2 and Ansible Galaxy 2Ansible 2 and Ansible Galaxy 2
Ansible 2 and Ansible Galaxy 2Jeff Geerling
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
NLIT 2011: Chef & Capistrano
NLIT 2011: Chef & CapistranoNLIT 2011: Chef & Capistrano
NLIT 2011: Chef & Capistranonickblah
 
Deployment with Fabric
Deployment with FabricDeployment with Fabric
Deployment with Fabricandymccurdy
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentationKumar Y
 
Meetup - Principles of the kube api and how to extend it
Meetup - Principles of the kube api and how to extend itMeetup - Principles of the kube api and how to extend it
Meetup - Principles of the kube api and how to extend itStefan Schimanski
 
Ansible Devops North East - slides
Ansible Devops North East - slides Ansible Devops North East - slides
Ansible Devops North East - slides InfinityPP
 
V2 and beyond
V2 and beyondV2 and beyond
V2 and beyondjimi-c
 
Ansible with oci
Ansible with ociAnsible with oci
Ansible with ociDonghuKIM2
 
Spring Boot - Microservice Metrics Monitoring
Spring Boot - Microservice Metrics MonitoringSpring Boot - Microservice Metrics Monitoring
Spring Boot - Microservice Metrics MonitoringDonghuKIM2
 
Charla - SharePoint en la Nube (17Jul2013)
Charla - SharePoint en la Nube (17Jul2013)Charla - SharePoint en la Nube (17Jul2013)
Charla - SharePoint en la Nube (17Jul2013)Juan Andrés Valenzuela
 
Automating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngageAutomating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngageVishal Uderani
 
Basics of Ansible - Sahil Davawala
Basics of Ansible - Sahil DavawalaBasics of Ansible - Sahil Davawala
Basics of Ansible - Sahil DavawalaSahil Davawala
 
Frontend JS workflow - Gulp 4 and the like
Frontend JS workflow - Gulp 4 and the likeFrontend JS workflow - Gulp 4 and the like
Frontend JS workflow - Gulp 4 and the likeDamien Seguin
 
Infrastructure = Code
Infrastructure = CodeInfrastructure = Code
Infrastructure = CodeGeorg Sorst
 

What's hot (20)

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
 
Ansible 2 and Ansible Galaxy 2
Ansible 2 and Ansible Galaxy 2Ansible 2 and Ansible Galaxy 2
Ansible 2 and Ansible Galaxy 2
 
Cyansible
CyansibleCyansible
Cyansible
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Introducing Ansible
Introducing AnsibleIntroducing Ansible
Introducing Ansible
 
NLIT 2011: Chef & Capistrano
NLIT 2011: Chef & CapistranoNLIT 2011: Chef & Capistrano
NLIT 2011: Chef & Capistrano
 
Deployment with Fabric
Deployment with FabricDeployment with Fabric
Deployment with Fabric
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentation
 
Meetup - Principles of the kube api and how to extend it
Meetup - Principles of the kube api and how to extend itMeetup - Principles of the kube api and how to extend it
Meetup - Principles of the kube api and how to extend it
 
Ansible Devops North East - slides
Ansible Devops North East - slides Ansible Devops North East - slides
Ansible Devops North East - slides
 
V2 and beyond
V2 and beyondV2 and beyond
V2 and beyond
 
Ansible with oci
Ansible with ociAnsible with oci
Ansible with oci
 
Spring Boot - Microservice Metrics Monitoring
Spring Boot - Microservice Metrics MonitoringSpring Boot - Microservice Metrics Monitoring
Spring Boot - Microservice Metrics Monitoring
 
Charla - SharePoint en la Nube (17Jul2013)
Charla - SharePoint en la Nube (17Jul2013)Charla - SharePoint en la Nube (17Jul2013)
Charla - SharePoint en la Nube (17Jul2013)
 
Automating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngageAutomating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngage
 
Basics of Ansible - Sahil Davawala
Basics of Ansible - Sahil DavawalaBasics of Ansible - Sahil Davawala
Basics of Ansible - Sahil Davawala
 
Frontend JS workflow - Gulp 4 and the like
Frontend JS workflow - Gulp 4 and the likeFrontend JS workflow - Gulp 4 and the like
Frontend JS workflow - Gulp 4 and the like
 
Ansible and AWS
Ansible and AWSAnsible and AWS
Ansible and AWS
 
vBACD - Introduction to Opscode Chef - 2/29
vBACD - Introduction to Opscode Chef - 2/29vBACD - Introduction to Opscode Chef - 2/29
vBACD - Introduction to Opscode Chef - 2/29
 
Infrastructure = Code
Infrastructure = CodeInfrastructure = Code
Infrastructure = Code
 

Similar to Transfer to kubernetes data platform from EMR

yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909Yusuke Wada
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IWei Jen Lu
 
Rails Deployment with NginX
Rails Deployment with NginXRails Deployment with NginX
Rails Deployment with NginXStoyan Zhekov
 
Working With Rails
Working With RailsWorking With Rails
Working With RailsDali Wang
 
Ruby off Rails (japanese)
Ruby off Rails (japanese)Ruby off Rails (japanese)
Ruby off Rails (japanese)Stoyan Zhekov
 
How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPAtsuhiro Kubo
 
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series Amazon Web Services Korea
 
Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1hutuworm
 
2007 0822 Antelope Php
2007 0822 Antelope Php2007 0822 Antelope Php
2007 0822 Antelope Phpgmaxsonic
 
090309seminar talk about Cloud Computing
090309seminar talk about Cloud Computing090309seminar talk about Cloud Computing
090309seminar talk about Cloud ComputingKohei Nishikawa
 
Nginx常见应用技术指南(Nginx Tips)
Nginx常见应用技术指南(Nginx Tips)Nginx常见应用技术指南(Nginx Tips)
Nginx常见应用技术指南(Nginx Tips)dreamwing.org
 
Velocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youVelocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youPatrick Meenan
 
はじめてのanything-c-source-*
はじめてのanything-c-source-*はじめてのanything-c-source-*
はじめてのanything-c-source-*Kenichirou Oyama
 
Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionLibin Pan
 

Similar to Transfer to kubernetes data platform from EMR (20)

yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909
 
XS Japan 2008 Xen Mgmt Japanese
XS Japan 2008 Xen Mgmt JapaneseXS Japan 2008 Xen Mgmt Japanese
XS Japan 2008 Xen Mgmt Japanese
 
What Can Compilers Do for Us?
What Can Compilers Do for Us?What Can Compilers Do for Us?
What Can Compilers Do for Us?
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part I
 
Rails Deployment with NginX
Rails Deployment with NginXRails Deployment with NginX
Rails Deployment with NginX
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
 
Ruby off Rails (japanese)
Ruby off Rails (japanese)Ruby off Rails (japanese)
Ruby off Rails (japanese)
 
How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHP
 
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
AWS 서비스로 웹 애플리케이션 만들기 – 김주영, AWS 솔루션즈 아키텍트:: AWS Builders Online Series
 
Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1Peeling The Onion For Ipdc Forum09 Mix Ver1
Peeling The Onion For Ipdc Forum09 Mix Ver1
 
2007 0822 Antelope Php
2007 0822 Antelope Php2007 0822 Antelope Php
2007 0822 Antelope Php
 
Sinatra
SinatraSinatra
Sinatra
 
090309seminar talk about Cloud Computing
090309seminar talk about Cloud Computing090309seminar talk about Cloud Computing
090309seminar talk about Cloud Computing
 
Nginx常见应用技术指南(Nginx Tips)
Nginx常见应用技术指南(Nginx Tips)Nginx常见应用技术指南(Nginx Tips)
Nginx常见应用技术指南(Nginx Tips)
 
Ruby Postgres
Ruby PostgresRuby Postgres
Ruby Postgres
 
Velocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and youVelocity EU 2012 - Third party scripts and you
Velocity EU 2012 - Third party scripts and you
 
はじめてのanything-c-source-*
はじめてのanything-c-source-*はじめてのanything-c-source-*
はじめてのanything-c-source-*
 
QQ
QQQQ
QQ
 
Seize The Cloud
Seize The CloudSeize The Cloud
Seize The Cloud
 
Ruby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese VersionRuby on Rails 2.1 What's New Chinese Version
Ruby on Rails 2.1 What's New Chinese Version
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Transfer to kubernetes data platform from EMR

  • 1. Cloud Independent Cloud Data Platform from EMR/DMS/GlueJobs to Kubernetes
  • 2. Kubernetes? Presto on k8s Spark on k8s Kubernetes 운영
  • 4. - Managed service - Software lists - Presto - Spark server - Hive/Hue - Hadoop - Zeppelin - JupyterHub - 장점 - 설치/환경설정/maintenance - 단점 - version update(그래도 빠름) EMR
  • 5. Services EMR HDFS s3 Hadoop/Hive/pi g/... spark이나 presto로 변환 Spark Job : spark on kubernetes Server : spark server는 대체할 방법을 찾지못함 Presto prestodb, trino, starburst 등에 kubernetes 배포가 잘 나와있음 Zeppelin spark server가 필요한 경우는 EMR 사용 Glue/DMS glue/DMS는 서비스를 빠르게 만들고 안정적으로 운영하기에는 여전히 좋은 서비스 이지만, 성능/비용/데이터변환에 대한 다양한 요구가 있다고 한다면 ETL image를 만들어서 실행하는것도 좋은 선택 Data Platform services on AWS 에 대한 재고
  • 6. Kubernetes 장점(Why) - 서비스 안정성/container관리 - nodegroup 설정으로 서버관리 용이 - Fast container deployment(ec2 create, container실행등의 과정) - Helm이나 Vendor에서 제공하는 yaml 파일을 이용하면 새로운 제품의 설치, POC, Test이 용이 추가 Resources - EMR 대신 Kubernetes를 운영필요 - Kubernetes 운영 R&R
  • 8. Presto - prestodb, trino(prestosql) - Auto scaling - resource 활용기반 auto scaling - schedule based - K8s - Internet/Internal service - subnetAZ는 하나만 - ELB -> Ingress-Nginx -> Service(k8s) -> Coordinator(Deployment) -> Worker(Deployment) - EMR에 비해 장점 - restart 쉬움 - 장애시, 설정변경시 - scale out 이 용이 - worker deployment 의 replica 숫자만 변경하면 scale
  • 9. Trino(prestosql) - image 1. git clone https://github.com/trinodb/trino.git 2. ./mvnw clean install -DskipTests 3. 추가library - docker/Dockerfile에 추가 - JMX 를 활용해서 prometheus 추가하기위한 llbrary - RUN yum -y -q install wget && - RUN wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_pr ometheus_javaagent-0.12.0.jar -P /usr/lib/presto/ 4. docker/build-local.sh 5. ECR 에 push
  • 10. - yaml파일 - 주요 configMap - coordinator, executor 는 deployment를 이용 Presto deploy jvm.config: | -server -Xmx51g -javaagent:/usr/lib/presto/jmx_prometheus_javaagent-0.12.0.jar=8081:/usr/lib/presto/etc/presto.yml config.properties: | coordinator=true #worker일경우 false query.max-memory=200GB query.max-memory-per-node = 20GB query.max-total-memory-per-node =35GB discovery-server.enabled=true query.max-stage-count=250
  • 11. Spark
  • 12. Architecture of Spark-on-Kubernetes - spark driver, executor 가 pod 형태로 실행 - EMR대비 장점 - 기존DAG별로 EMR을 사용했 을때와 비교해서 emr 시작시간 절약 - 동시 여러 task를 완전하게 독립 적으로 실행(emr에서 여러개 task가 실행시 서로 영향받는거 고려할필요x) - airflow등 job scheduler에서 EMR실행할 필요없이 task 단독 으로 실행가능
  • 13. - spark-submit - master를 kubernetes 로 - spark on k8s operator - yaml 제출 형태로 spark 실행 - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user- guide.md - Image build - airflow operator - 기존 EMR submit operator대체 - https://github.com/kubernetes-client/python - k8s - network 비용발생 문제로 subnetAZ는 하나만 - r5.2xlarge 이상(cpu 5core지정을 위한) Spark
  • 14. - download spark - https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz - 설정파일 - Dockerfile - scala : kubernetes/dockerfiles/spark/Dockerfile - pyspark: kubernetes/dockerfiles/spark/bindings/python/Dockerfile, dockerfiles/spark/bindings/python/requirements.txt - 설정파일 - 세부설정은 EMR참조 - COPY $conf_path/hive-site.xml /opt/spark/spark-conf/ - COPY $conf_path/spark-defaults.conf /opt/spark/spark-conf/ - COPY $conf_path/core-site.xml /opt/spark/spark-conf/ - Library - aws, s3, kinesis, delta lake 등 필요한 library COPY - bin/docker-image-tool.sh 후 ECR에 push Spark - image
  • 15. Spark default ./bin/spark-submit --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.kubernetes.container.image=<spark-image> local:///path/to/examples.jar Spark on k8s operator - apiVersion: "sparkoperator.k8s.io/v1beta1" kind: SparkApplication Resource Conf The cpu limits are set by spark.kubernetes.{driver,executor}.limit.cores. The cpu is set by spark.{driver,executor}.cores. The memory request and limit are set by summing the values of spark.{driver,executor}.memory and spark.{driver,executor}.memoryOverhead. Spark kubernetes - 실행
  • 16. - ETL image개발 - source/target 관련한 library 가 편리하고 다양함 - 데이터 변환/monitor 관련한 기능추가 - Data Extraction을 위해 Python 을 이용하고, Load는 spark을 이용해서 적재(hive metastore) - airflow - kubernetesPodOperator 를 통해 실행 - airflow를 통해 실행/모니터를 다른 Data Processing작업과 통합해서 관리 DMS/Glue
  • 18. - Fargate - w/ no wait - resource 제한 - Serverless data processing 성능개선(Redshift spectrum) - snowflake나 databricks 기대하는 AWS services
  • 19. Kubernetes 운영 - Nodegroup monitor 를 통해 manual scheduling - prometheus, grafana - 안정적인 운영 필요시 ASG에 scheduled action 추가 - Nodegroup AMI 주기적인 upgrade (https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-amis.html)
  • 20. Data engineer role - Service운영 - Image build/관리 - Kubernetes 관리 - Pod/Deployments - Nodegroup - Services - SG - AWS ASG - AWS 관리