SlideShare a Scribd company logo
1 of 17
Download to read offline
A Data orchestration system
with Open Source Alluxio on
AWS EKS with Terraform
About me
Vasista Polali
Founder @ boolean UG
Berlin, Germany
http://booleancomputing.com
Email: vasista.polali@booleancomputing.com
Agenda
• Bootstrap EKS cluster in AWS with Terraform.
• Deploy open source Alluxio in a namespace with persistence in AWS EFS.
• Scale up and down the Alluxio worker nodes as Daemon sets by Scaling the EKS
nodes with Terraform.
• Accessing data with S3 mount.
• Controlling the access to Alluxio with “setfacl” functionality, and Kubernetes service
accounts.
• Re-using the metadata in the persistence layer on a new cluster
Use case:
Data Security:
• There was a need for sharing data stored in different storage systems like AWS S3 and Azure Blob
storage and also in different buckets in the same object store controlled by different teams.
• This caused a lot of data movement and also time delay in getting approvals from data security
around making multiple copies of data, access control , data retention and deletion of data owing to
GDPR not to mention the additional ETL development effort and maintenance.
Data Sharing and intermediate data persistence:
• There was a need for data sharing between various spark jobs in the ETL and Iterative machine
learning workflows where intermediate data had to be written back to the storage systems and re-
ingested by the consecutive steps causing higher processing time, increased data transfer and
increased costs.
Fault Tolerance:
• Long running job crashes cause loss of data in memory and intermediate persistence to storage
increases processing time, causing pain.
MVP
The goal was to build a cloud native data sharing system by taking open source Alluxio
and wrapping it in set a of processes that would adhere to minimally required
Enterprise wide standards and DevOps principles of Security, Automation, Infrastructure
as code, Continuous improvement and Deployment and short lead times.
In scope:
• Everything that open source Alluxio provides out-of-box.
• Should be cloud native and deployable on AWS EKS.
Out of scope:
• No forking , customization, or maintenance of open source code.
https://www.alluxio.io/
AWS EKS - quick look
Amazon Elastic Kubernetes Service (Amazon EKS) gives you the flexibility to start, run, and scale
Kubernetes applications in the AWS cloud or on-premises
Environment:
• AWS EKS with assigned minimum size of 5 nodes.
• Infrastructure as code provisioned by Terraform.
• A workspace in Terraform Cloud to plan and apply
the configuration and maintain remote state
storage.
• Auto scaling group to provision and maintain EC2
instance capacity.
• CICD pipeline with Git actions.
• Official Alluxio docker images.
• Kubernetes Persistence Volume mounted on AWS
EFS
+
Bootstrap AWS EKS with Terraform
Terraform Cloud
Maintain and apply
configuration
CICD with Git Actions
Deploy Open Source Alluxio on Kubernetes
• Deploy alluxio-master as a StatefulSet, which when scaling the
master pods provides guarantees about the ordering and
uniqueness of these Pods.
• Deploy worker pods as DaemonSet which provides guarantees that
one worker is running on each node.
• Scalability- as nodes are added to the cluster, additional worker
pods are added to them. As nodes are removed from the cluster,
those Pods are garbage collected.
• Set the Alluxio configuration properties through Kubernetes
ConfigMap.
• Deploy Persistence Volume claims with AWS EFS as persistence
volume provisioned by a Storage class.
• None of the services are exposed to the outside world and remain
accessible only on the internal network.
Node 2
Node 1 Node 3
• Alluxio-master-0
• Alluxio Worker
• Alluxio-master-1
• Alluxio Worker
• Alluxio Worker
StatefulSet
DaemonSet
StatefulSet
DaemonSet DaemonSet
ConfigMap
Persistent Volume
Idea was to provide a unified namespace for accessing and processing data
Mount S3 bucket to alluxio fs directory:
• Create CICD action to run alluxio fs mount
alluxio fs mount --option aws.accessKeyId=<aws key> --
option aws.secretKey=<aws secret> <alluxio dir> s3://<s3
bucket>
• Create service user in IAM with access to the S3 bucket.
• Store security credentials service user as secrets in the CICD tool.
• Create Kubernetes user and role to port-forward a pod and limit It to
the namespace that Alluxio is deployed in.
• Generate Kube config file for the Kubernetes user.
• Generate Git workflow yaml to run the setup on merge to master.
Alluxio fs
Implementation:
Master Branch
Add the list of the directories to be
mounted
Checkout Branch
Create pull request
Merge to master on approvals from
Data owners
Trigger Workflow Kubectl port forward <allluxio-master-pod> on 19998
• Alluxio master will be available on localhost:19998
Checkout repo with alluxio binaries
• Provide AWS credentials of the AWS service user
from secrets.
• Run alluxio fs mount
Actions
Control Access to data in Alluxio fs:
• Access to the mounted data from S3 and other directories in
open source Alluxio are accessible only by the user that
created it.
• Create a CICD action to run alluxio fs setfacl command
alluxio fs setfacl -R -m user:<user>:<permissions>
<dir>
• Set POSIX permissions to a user that needs access to data
in an alluxio directory in the form of rwx
• Alluxio fs setfacl can also be used to remove permissions
with –x flag
alluxio fs setfacl -R –x user:<user> <dir>
• Generate Git workflow yaml to run the setup on merge to
master.
Idea was to implement access control for data security on the mounted data
Alluxio fs
Implementation:
Master Branch
Add the list of the users, directories
and corresponding permissions to
be applied
Checkout Branch
Create pull request
Merge to master on approvals from
Data owners
Trigger Workflow Kubectl port forward <allluxio-master-pod> on 19998
• Alluxio master will be available on localhost:19998
Checkout repo with alluxio binaries
• Provide AWS credentials of the AWS service user
from secrets.
• Run alluxio fs setfacl to add or remove permissions
Actions
Add the list of the users and
corresponding permissions to
be removed
Idea was to establish a mechanism to process data with Kubernetes Spark adhering to Alluxio fs permissions
Run an application in deploy mode Cluster with Kubernetes
Spark to process the data in Alluxio fs:
• Build a Spark docker image and “adduser” who needs to access
the data in alluxio and set it as default using USER instruction
• Make sure the default user in Spark container has the necessary
access permissions for the data in alluxio.
• Build the Spark image along with the application jar or provide it as
a mount and access it with scheme local:// while executing spark-
submit
• Alternatively place the artifact in S3 or other storage of choice and
have Spark download it at runtime.
• The diver and the number of executors specified during submission
are created as pods in the specified namespace.
• The user submitting the Spark application is also the user of client
process accessing Alluxio fs. Hence is only allowed to perform
operations based on his access privileges
• Any other user in the client process without the requisite access to
the data is rejected.
Git workflow to build
Spark Docker image
Push image to ECR
Pull image from
ECR
Spark-submit
--master=Kubernetes API
Create Driver Pod
C
r
e
a
t
e
E
x
e
c
u
t
o
r
P
o
d
s
Access Alluxio
• Alluxio workers run as Daemon sets in the EKS cluster. Every additional node added to the cluster will have Kubernetes
spin an alluxio worker as a daemon set pod on that node.
• Scaling down can also be achieved in the same way.
• This allows to move up and down the cluster capacity on-demand in an automated way
• Using DevOps processes to bootstrap and control EKS and Alluxio cluster size makes the system flexible resulting in
optimal use of resources and reducing costs.
• The /journal folder of alluxio master is persisted in EFS through a Persistent Volume Claim. Keeping the EFS fs up and
running will allow us to spin alluxio clusters on-demand with all the mount points and user permissions related metadata
intact and tear it down after the data processing has finished, thereby saving time, effort and reducing costs. This will be
most beneficial when we need big clusters to run computations on huge batches of data running into TB’s.
• Time bound operations with scheduling git in git actions.
Idea was to make the setup scalable to resize the Alluxio cluster on demand and save costs.
Other options.
• Adopting this process to centralized operations and data security teams.
• Opening up the EKS cluster to spark-submit jobs in a client mode, where the
driver runs on a remote system and controlling user access to data in alluxio.
THANK YOU ALL

More Related Content

More from Alluxio, Inc.

Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio, Inc.
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio, Inc.
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio, Inc.
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio, Inc.
 
Alluxio Product School Webinar - Get Started with Alluxio on Kubernetes
Alluxio Product School Webinar - Get Started with Alluxio on KubernetesAlluxio Product School Webinar - Get Started with Alluxio on Kubernetes
Alluxio Product School Webinar - Get Started with Alluxio on KubernetesAlluxio, Inc.
 
Alluxio Product School Webinar - Boosting Trino Performance.
Alluxio Product School Webinar - Boosting Trino Performance.Alluxio Product School Webinar - Boosting Trino Performance.
Alluxio Product School Webinar - Boosting Trino Performance.Alluxio, Inc.
 
Alluxio Product School Webinar - Transparent URI
Alluxio Product School Webinar - Transparent URIAlluxio Product School Webinar - Transparent URI
Alluxio Product School Webinar - Transparent URIAlluxio, Inc.
 
Alluxio 2.9 Release Overview
Alluxio 2.9 Release OverviewAlluxio 2.9 Release Overview
Alluxio 2.9 Release OverviewAlluxio, Inc.
 

More from Alluxio, Inc. (20)

Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to Production
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model Training
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AI
 
Alluxio Product School Webinar - Get Started with Alluxio on Kubernetes
Alluxio Product School Webinar - Get Started with Alluxio on KubernetesAlluxio Product School Webinar - Get Started with Alluxio on Kubernetes
Alluxio Product School Webinar - Get Started with Alluxio on Kubernetes
 
Alluxio Product School Webinar - Boosting Trino Performance.
Alluxio Product School Webinar - Boosting Trino Performance.Alluxio Product School Webinar - Boosting Trino Performance.
Alluxio Product School Webinar - Boosting Trino Performance.
 
Alluxio Product School Webinar - Transparent URI
Alluxio Product School Webinar - Transparent URIAlluxio Product School Webinar - Transparent URI
Alluxio Product School Webinar - Transparent URI
 
Alluxio 2.9 Release Overview
Alluxio 2.9 Release OverviewAlluxio 2.9 Release Overview
Alluxio 2.9 Release Overview
 

Recently uploaded

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 

Recently uploaded (20)

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 

Integrating Open Source Alluxio in AWS EKS with Terraform

  • 1. A Data orchestration system with Open Source Alluxio on AWS EKS with Terraform
  • 2. About me Vasista Polali Founder @ boolean UG Berlin, Germany http://booleancomputing.com Email: vasista.polali@booleancomputing.com
  • 3. Agenda • Bootstrap EKS cluster in AWS with Terraform. • Deploy open source Alluxio in a namespace with persistence in AWS EFS. • Scale up and down the Alluxio worker nodes as Daemon sets by Scaling the EKS nodes with Terraform. • Accessing data with S3 mount. • Controlling the access to Alluxio with “setfacl” functionality, and Kubernetes service accounts. • Re-using the metadata in the persistence layer on a new cluster
  • 4. Use case: Data Security: • There was a need for sharing data stored in different storage systems like AWS S3 and Azure Blob storage and also in different buckets in the same object store controlled by different teams. • This caused a lot of data movement and also time delay in getting approvals from data security around making multiple copies of data, access control , data retention and deletion of data owing to GDPR not to mention the additional ETL development effort and maintenance. Data Sharing and intermediate data persistence: • There was a need for data sharing between various spark jobs in the ETL and Iterative machine learning workflows where intermediate data had to be written back to the storage systems and re- ingested by the consecutive steps causing higher processing time, increased data transfer and increased costs. Fault Tolerance: • Long running job crashes cause loss of data in memory and intermediate persistence to storage increases processing time, causing pain.
  • 5. MVP The goal was to build a cloud native data sharing system by taking open source Alluxio and wrapping it in set a of processes that would adhere to minimally required Enterprise wide standards and DevOps principles of Security, Automation, Infrastructure as code, Continuous improvement and Deployment and short lead times. In scope: • Everything that open source Alluxio provides out-of-box. • Should be cloud native and deployable on AWS EKS. Out of scope: • No forking , customization, or maintenance of open source code.
  • 7. AWS EKS - quick look Amazon Elastic Kubernetes Service (Amazon EKS) gives you the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises
  • 8. Environment: • AWS EKS with assigned minimum size of 5 nodes. • Infrastructure as code provisioned by Terraform. • A workspace in Terraform Cloud to plan and apply the configuration and maintain remote state storage. • Auto scaling group to provision and maintain EC2 instance capacity. • CICD pipeline with Git actions. • Official Alluxio docker images. • Kubernetes Persistence Volume mounted on AWS EFS + Bootstrap AWS EKS with Terraform Terraform Cloud Maintain and apply configuration CICD with Git Actions
  • 9. Deploy Open Source Alluxio on Kubernetes • Deploy alluxio-master as a StatefulSet, which when scaling the master pods provides guarantees about the ordering and uniqueness of these Pods. • Deploy worker pods as DaemonSet which provides guarantees that one worker is running on each node. • Scalability- as nodes are added to the cluster, additional worker pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. • Set the Alluxio configuration properties through Kubernetes ConfigMap. • Deploy Persistence Volume claims with AWS EFS as persistence volume provisioned by a Storage class. • None of the services are exposed to the outside world and remain accessible only on the internal network. Node 2 Node 1 Node 3 • Alluxio-master-0 • Alluxio Worker • Alluxio-master-1 • Alluxio Worker • Alluxio Worker StatefulSet DaemonSet StatefulSet DaemonSet DaemonSet ConfigMap Persistent Volume
  • 10. Idea was to provide a unified namespace for accessing and processing data Mount S3 bucket to alluxio fs directory: • Create CICD action to run alluxio fs mount alluxio fs mount --option aws.accessKeyId=<aws key> -- option aws.secretKey=<aws secret> <alluxio dir> s3://<s3 bucket> • Create service user in IAM with access to the S3 bucket. • Store security credentials service user as secrets in the CICD tool. • Create Kubernetes user and role to port-forward a pod and limit It to the namespace that Alluxio is deployed in. • Generate Kube config file for the Kubernetes user. • Generate Git workflow yaml to run the setup on merge to master. Alluxio fs
  • 11. Implementation: Master Branch Add the list of the directories to be mounted Checkout Branch Create pull request Merge to master on approvals from Data owners Trigger Workflow Kubectl port forward <allluxio-master-pod> on 19998 • Alluxio master will be available on localhost:19998 Checkout repo with alluxio binaries • Provide AWS credentials of the AWS service user from secrets. • Run alluxio fs mount Actions
  • 12. Control Access to data in Alluxio fs: • Access to the mounted data from S3 and other directories in open source Alluxio are accessible only by the user that created it. • Create a CICD action to run alluxio fs setfacl command alluxio fs setfacl -R -m user:<user>:<permissions> <dir> • Set POSIX permissions to a user that needs access to data in an alluxio directory in the form of rwx • Alluxio fs setfacl can also be used to remove permissions with –x flag alluxio fs setfacl -R –x user:<user> <dir> • Generate Git workflow yaml to run the setup on merge to master. Idea was to implement access control for data security on the mounted data Alluxio fs
  • 13. Implementation: Master Branch Add the list of the users, directories and corresponding permissions to be applied Checkout Branch Create pull request Merge to master on approvals from Data owners Trigger Workflow Kubectl port forward <allluxio-master-pod> on 19998 • Alluxio master will be available on localhost:19998 Checkout repo with alluxio binaries • Provide AWS credentials of the AWS service user from secrets. • Run alluxio fs setfacl to add or remove permissions Actions Add the list of the users and corresponding permissions to be removed
  • 14. Idea was to establish a mechanism to process data with Kubernetes Spark adhering to Alluxio fs permissions Run an application in deploy mode Cluster with Kubernetes Spark to process the data in Alluxio fs: • Build a Spark docker image and “adduser” who needs to access the data in alluxio and set it as default using USER instruction • Make sure the default user in Spark container has the necessary access permissions for the data in alluxio. • Build the Spark image along with the application jar or provide it as a mount and access it with scheme local:// while executing spark- submit • Alternatively place the artifact in S3 or other storage of choice and have Spark download it at runtime. • The diver and the number of executors specified during submission are created as pods in the specified namespace. • The user submitting the Spark application is also the user of client process accessing Alluxio fs. Hence is only allowed to perform operations based on his access privileges • Any other user in the client process without the requisite access to the data is rejected. Git workflow to build Spark Docker image Push image to ECR Pull image from ECR Spark-submit --master=Kubernetes API Create Driver Pod C r e a t e E x e c u t o r P o d s Access Alluxio
  • 15. • Alluxio workers run as Daemon sets in the EKS cluster. Every additional node added to the cluster will have Kubernetes spin an alluxio worker as a daemon set pod on that node. • Scaling down can also be achieved in the same way. • This allows to move up and down the cluster capacity on-demand in an automated way • Using DevOps processes to bootstrap and control EKS and Alluxio cluster size makes the system flexible resulting in optimal use of resources and reducing costs. • The /journal folder of alluxio master is persisted in EFS through a Persistent Volume Claim. Keeping the EFS fs up and running will allow us to spin alluxio clusters on-demand with all the mount points and user permissions related metadata intact and tear it down after the data processing has finished, thereby saving time, effort and reducing costs. This will be most beneficial when we need big clusters to run computations on huge batches of data running into TB’s. • Time bound operations with scheduling git in git actions. Idea was to make the setup scalable to resize the Alluxio cluster on demand and save costs.
  • 16. Other options. • Adopting this process to centralized operations and data security teams. • Opening up the EKS cluster to spark-submit jobs in a client mode, where the driver runs on a remote system and controlling user access to data in alluxio.