SlideShare a Scribd company logo
1 of 37
Download to read offline
Apache Spark Development Lifecycle @ Workday
Pavel Hardak – Eren Avsarogullari
•What is Workday?
•“Power of One” and Prism Analytics
•How Apache Spark fits in?
•Custom Spark Upgrade Model
•Runtime Metrics Pipeline
•What is the next?
Agenda
• FY20 Revenue $3.6B
• ~28% Y/Y Growth
• >7,700 customers
• >45% of Fortune 500
• >12,300 employees
• NASDAQ: WDAY
About Workday
Enterprise Business Applications for a Changing World
• Human Capital, Financials, Planning,
Analytics
• Cloud native, multi-tenant
• 30% revenue re-invested in product
each year
• >40 Advisory Partners
• >200 Software Partners
Planning
Financial
Management
Human
Capital Management
Analytics & Benchmarking
Planning
Financial
Management
Human Capital
Management
Analytics
Business Process
Framework
Object
Data Model
Reporting and
Analytics
Security Integration
Cloud
One Source for Data | One Security Model | One Experience | One Community
Machine
Learning
One Platform
Durable
Object Data Model
MetadataExtensible
Business Process
Framework
Object
Data Model
Reporting and
Analytics
Security Integration
Cloud
One Source for Data | One Security Model | One Experience | One Community
Machine
Learning
One Platform
Security
Encryption Privacy and
Compliance
Trust
Business Process
Framework
Object
Data Model
Reporting and
Analytics
Security Integration
Cloud
One Source for Data | One Security Model | One Experience | One Community
Machine
Learning
One Platform
Reporting and Analytics
ExploratoryDescriptive
Business Process
Framework
Object
Data Model
Reporting and
Analytics
Security Integration
Cloud
One Source for Data | One Security Model | One Experience | One Community
Machine
Learning
One Platform
Augmented
The Leading Enterprise Cloud for Finance and HR
37 Million +
workers
100 Billion +
transactions per year
96.1%
transactions < 1
seconds
99.9%
actual availability
200+
companies
#1
Future 50, Fortune
#2
40 Best Workplaces in
Technology, Fortune
10 Thousand +
certified resources
in the ecosystem
Planning
Financial
Management
Human Capital
Management
Analytics
Financial Employees
GL HR &
Payroll
Third-Party
HR & FIN
Industry &
Homegrown
CRM Marketing Service Subsidiaries Contract
Labor
Workday Maintains Your Data Gravity
Workday Prism Analytics
The full spectrum of workforce,
financial, and operational
insights, all within Workday.
Workday
Data
Non-Workday
Data
Prism Analytics Momentum - 100% YoY growth
Workday Confidential
Over
Prism Analytics
Customers500
Table
Ingestion
Data Prep
Examples
Engine
Lens Build
Engine
Query Engine
and Mercury
Workday Spark Runtime Engine
Compute (YARN) and Storage (HDFS/S3)
Prism UI and APIs
Accounting
Center
People
Analytics
DBFR Analytics PlatformCosmos DD4A
Apache Spark as foundational technology
HDFS / S3
Prism 01
Tenant 01
Prism 02
Tenant 02
Prism 03
Tenant 03
Prism 04
Tenant 04
Spark Cluster Spark Cluster Spark Cluster Spark Cluster
Prism Tenants - Deployment (simplified)
Workday in the Cloud
ASH
PDX
ATL
PROD & NPRD
ENG
PROD & NPRD
DR for PDX
SALES
DR for ASH
PROD & NPRD
DUB
AMS
DR for DUB
ORE
MTL
PROD & NPRD
NPRD
COL
PROD
Prism
Prism
Prism
Prism
HDFS / S3
Spark
Driver
Data Prep
Interactive
Spark
Driver
Spark
Executor
Spark
Executor
Spark
Driver Lens Build
Phase 1
Lens Build
Phase 2
YARN
ADS
Spark
Executor
Query
Engine
Prism-enabled Tenant - Today
Workday Spark = Apache Spark ++
Apache Spark
Autonomous
Operational
Stability
Core
Stability
Complex Application logic
as Spark Plans
Performance & Scalability for
batch processing
Serviceability
Multi-tenancy
Ingest Latency
Interactive Query
Performance
With this scale, complexity, dependencies…
How can you do Spark version upgrades?
Spark Upgrade challenges:
‒ high number of tenants,
‒ long-running Spark Applications,
‒ progressive roll-out,
‒ rollback case,
‒ maintaining custom Spark fork
Custom Spark Upgrade Model
Custom
Repo
Spark
Version
Custom
Repo
Spark Current
Version
Shim API
Spark Next
Version
Previous Approach
New Approach
Spark single-version support against a single repo
Spark multi-versions support against a single repo
This upgrade model is not specific for Spark upgrade so can be
applied for any internal & external API upgrades when dealing with
these kind of challenges.
This upgrade model is also
used for major and minor
Spark version upgrades.
•Remove PII Data from Logs: Spark query plans and DataFrame schema
obfuscation.
•Catalyst Optimizer: Additional optimization rules on aggregation and large
case statements optimizations.
•Extension for Physical Plan: Enable correlation between Physical Operators
and their runtime metrics.
•Rest APIs: SQL Rest API improvements to query and aggregate physical
operation level metrics.
•Benchmark Module: Additional module to run benchmark tests on introduced
new Spark patches by using standard TPCH and custom queries.
Custom Spark Release Preparation
Shim API
SparkShim
Interface
SparkShimImpl
for Spark v2.3.0
SparkShimImpl
for Spark v2.4.4
Spark API diffs between
both versions may introduce
both compile-time(e.g: Invalid type) and/or
runtime issues (e.g: NoSuchMethodError)
Compile-time & Runtime Version Selections
Classpath Types Description
Compile
-Time
compileClasspath +
testCompileClasspath
Spark compile-time version is
the current version.
Runtime runtimeClasspath +
testRuntimeClasspath
Spark runtime version is
selected by feature toggle as
current or next version.
A sample Gradle build script code snippet on selections of
both Spark and Shim compile-time and runtime classpath versions:Selected Spark versions by classpath types:
Feature Toggle is being used to select Spark version on:
- Build Time (runtime version selection for classpath)
- Test Pipelines (to run UT, IT and Perf Tests by Spark version)
- Environment (to enable Spark version at env level – test, preprod or prod)
Shim API artifacts are shipped in addition to Spark artifacts (by version)
Verification & Progressive Roll-out & Cleanup
Progressive Roll-out Phase
WAVE III
Scope: All Tenants (Internal/Impl/Prod)
Duration: 4 Weeks
WAVE II
Scope: Multiple Tenants (Impl / NonProd)
Duration: 2 Weeks
WAVE I
Scope: Single Tenant (Internal)
Duration: 2 Weeks
Verification Phase
Verify following test pipelines against to both Spark
versions:
• Automated Regression Testing: Running Unit &
Integration Test Pipelines
• Performance Testing:
‒ Spark Benchmark Pipeline: Spark current vs
new version Perf Tests (by executing standard
TPCH and custom queries.) + Hadoop
‒ End2End Perf Pipeline: Custom applications +
Spark + Hadoop
Previous Spark version:
‒ Fork,
‒ Artifacts from artifactory /
mvn repository)
Shim API
Cleanup Phase
Spark SQL Engine - Query Planning & Execution
SQL
Dataset
DataFrame
Unresolved
Logical Plan
Logical
Plan
Optimized
Logical Plan Physical
Plan
CostModel
Selected
Physical
Plan
DAG
Execution
SQL
Metrics
Application Job Stage Task
Spark UI Rest APIs Event Logs
Logical Planning Physical Planning Execution
Analysis Optimizations Physical Plans
Generation
Runtime Metrics Pipeline Architecture
Proton
(Application Server)
Data
Acquisition
Data
Preparation
Query
Engine
HDFS / S3
Spark History
Server Data Warehouse
Stats
App
Hadoop Cluster
Spark
Applications
Spark Hive Tables
• app_metrics
• job_metrics
• stage_metrics
• task_metrics
• executor_metrics
• sql_metrics
Spark Rest APIs
• Application
• Job
• Stage
• Task
• Executors
• SQL (New)
1x1
New Spark SQL Rest API [coming with v3.1.0]
New SQL Rest Endpoints
Comparison of new Spark SQL Rest API Json Outputs
Improved VersionOlder Version (Cherry-picked from OSS)
Improvements
1. Correlation between
physical operators
and their runtime
metrics
2. wholeStageCodege
nId support across
multiple physical
operators
3. Normalization on
metric values to be
able to run
aggregations
Sample Queries on Spark SQL Metrics
What is total loaded
number of input/output
rows by file type, tenant,
application, date?
What are the top 25
tenants running Join, Filter,
Sort (etc..) operations?
What are the mostly used
operations by tenants,
applications, dates?
File Scan Operation
What is number of files
by file type, tenant,
application, date?
What is total scan time
and total metadata time
by min, med, max, file
type, tenant, application,
date?
What is total number of
operations by tenants,
applications, dates?
What are Top 25 Tenants
Having Max Broadcasted
Data Size (GB)?
What is the total number of
joins, BroadcastHashJoin
or SortMergeJoin across all
tenants by day?
Join
What are Top 25 Tenants
Having Max Time to
Collect during Broadcast
(Minute)?
What are Top 25
Tenants Having Max
Time To Broadcast
(Minute) or To Build
during Broadcast
(Minute)?
...
...
...
Correlation between Physical Operators & SQL Metrics
Workday Confidential
•We also integrated our physical plans with runtime SQL metrics
•We can have correlation between Physical Operators and their Runtime Metrics from application logs for troubleshooting and debugging purposes
Developed patches were also backported to OSS repo for community usage:
•[SPARK-31440][SQL] Improve SQL Rest API
https://github.com/apache/spark/pull/28208
•[SPARK-32548][SQL] - Add Application attemptId support to SQL Rest API
https://github.com/apache/spark/pull/29364
•[SPARK-31566][SQL][DOCS] Add SQL Rest API Documentation
https://github.com/apache/spark/pull/28354
Backported Patches to Spark OSS Repo [v3.1.0]
Spark 3.0 introduced following features:
‒ Adaptive Query Execution (SPARK-31412)
‒ Dynamic Partition Pruning (SPARK-11150)
‒ Scala 2.12 Support (SPARK-26132)
‒ JDK 11 Support (SPARK-24417)
‒ Hadoop 3 Support (SPARK-23534)
• Spark 3.x Upgrade (+ Scala, JDK, Hadoop)
• Performance, Troubleshooting and Debugging Improvements
• Multi-Tenancy Support
What is the next?
One more thing...
HDFS / S3
Prism 01
Tenant 01
Prism 02
Tenant 02
Prism 03
Tenant 03
Prism 04
Tenant 04
Spark Cluster Spark Cluster Spark Cluster Spark Cluster
Prism Deployment - Today
Prism Deployment - “Multiverse”
Spark Cluster Spark Cluster
HDFS / S3
Tenant 02 Tenant 04Tenant 03 Tenant 06Tenant 05 Tenant 07Tenant 01 Tenant 08
Prism 01 Prism 02 Prism 03
Spark Cluster
Thank You!
Q & A
Workday Confidential
TM

More Related Content

What's hot

SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 

What's hot (20)

Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 

Similar to Spark Development Lifecycle at Workday - ApacheCon 2020

Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 

Similar to Spark Development Lifecycle at Workday - ApacheCon 2020 (20)

Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Scale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on AzureScale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on Azure
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-Architect
 
Application Modernisation with PKS
Application Modernisation with PKSApplication Modernisation with PKS
Application Modernisation with PKS
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 

Spark Development Lifecycle at Workday - ApacheCon 2020

  • 1. Apache Spark Development Lifecycle @ Workday Pavel Hardak – Eren Avsarogullari
  • 2. •What is Workday? •“Power of One” and Prism Analytics •How Apache Spark fits in? •Custom Spark Upgrade Model •Runtime Metrics Pipeline •What is the next? Agenda
  • 3. • FY20 Revenue $3.6B • ~28% Y/Y Growth • >7,700 customers • >45% of Fortune 500 • >12,300 employees • NASDAQ: WDAY About Workday Enterprise Business Applications for a Changing World • Human Capital, Financials, Planning, Analytics • Cloud native, multi-tenant • 30% revenue re-invested in product each year • >40 Advisory Partners • >200 Software Partners Planning Financial Management Human Capital Management Analytics & Benchmarking
  • 5. Business Process Framework Object Data Model Reporting and Analytics Security Integration Cloud One Source for Data | One Security Model | One Experience | One Community Machine Learning One Platform
  • 6. Durable Object Data Model MetadataExtensible Business Process Framework Object Data Model Reporting and Analytics Security Integration Cloud One Source for Data | One Security Model | One Experience | One Community Machine Learning One Platform
  • 7. Security Encryption Privacy and Compliance Trust Business Process Framework Object Data Model Reporting and Analytics Security Integration Cloud One Source for Data | One Security Model | One Experience | One Community Machine Learning One Platform
  • 8. Reporting and Analytics ExploratoryDescriptive Business Process Framework Object Data Model Reporting and Analytics Security Integration Cloud One Source for Data | One Security Model | One Experience | One Community Machine Learning One Platform Augmented
  • 9. The Leading Enterprise Cloud for Finance and HR 37 Million + workers 100 Billion + transactions per year 96.1% transactions < 1 seconds 99.9% actual availability 200+ companies #1 Future 50, Fortune #2 40 Best Workplaces in Technology, Fortune 10 Thousand + certified resources in the ecosystem
  • 11. Financial Employees GL HR & Payroll Third-Party HR & FIN Industry & Homegrown CRM Marketing Service Subsidiaries Contract Labor Workday Maintains Your Data Gravity
  • 12. Workday Prism Analytics The full spectrum of workforce, financial, and operational insights, all within Workday. Workday Data Non-Workday Data
  • 13. Prism Analytics Momentum - 100% YoY growth Workday Confidential Over Prism Analytics Customers500
  • 14. Table Ingestion Data Prep Examples Engine Lens Build Engine Query Engine and Mercury Workday Spark Runtime Engine Compute (YARN) and Storage (HDFS/S3) Prism UI and APIs Accounting Center People Analytics DBFR Analytics PlatformCosmos DD4A Apache Spark as foundational technology
  • 15. HDFS / S3 Prism 01 Tenant 01 Prism 02 Tenant 02 Prism 03 Tenant 03 Prism 04 Tenant 04 Spark Cluster Spark Cluster Spark Cluster Spark Cluster Prism Tenants - Deployment (simplified)
  • 16. Workday in the Cloud ASH PDX ATL PROD & NPRD ENG PROD & NPRD DR for PDX SALES DR for ASH PROD & NPRD DUB AMS DR for DUB ORE MTL PROD & NPRD NPRD COL PROD
  • 17. Prism Prism Prism Prism HDFS / S3 Spark Driver Data Prep Interactive Spark Driver Spark Executor Spark Executor Spark Driver Lens Build Phase 1 Lens Build Phase 2 YARN ADS Spark Executor Query Engine Prism-enabled Tenant - Today
  • 18. Workday Spark = Apache Spark ++ Apache Spark Autonomous Operational Stability Core Stability Complex Application logic as Spark Plans Performance & Scalability for batch processing Serviceability Multi-tenancy Ingest Latency Interactive Query Performance
  • 19. With this scale, complexity, dependencies… How can you do Spark version upgrades?
  • 20. Spark Upgrade challenges: ‒ high number of tenants, ‒ long-running Spark Applications, ‒ progressive roll-out, ‒ rollback case, ‒ maintaining custom Spark fork Custom Spark Upgrade Model Custom Repo Spark Version Custom Repo Spark Current Version Shim API Spark Next Version Previous Approach New Approach Spark single-version support against a single repo Spark multi-versions support against a single repo This upgrade model is not specific for Spark upgrade so can be applied for any internal & external API upgrades when dealing with these kind of challenges. This upgrade model is also used for major and minor Spark version upgrades.
  • 21. •Remove PII Data from Logs: Spark query plans and DataFrame schema obfuscation. •Catalyst Optimizer: Additional optimization rules on aggregation and large case statements optimizations. •Extension for Physical Plan: Enable correlation between Physical Operators and their runtime metrics. •Rest APIs: SQL Rest API improvements to query and aggregate physical operation level metrics. •Benchmark Module: Additional module to run benchmark tests on introduced new Spark patches by using standard TPCH and custom queries. Custom Spark Release Preparation
  • 22. Shim API SparkShim Interface SparkShimImpl for Spark v2.3.0 SparkShimImpl for Spark v2.4.4 Spark API diffs between both versions may introduce both compile-time(e.g: Invalid type) and/or runtime issues (e.g: NoSuchMethodError)
  • 23. Compile-time & Runtime Version Selections Classpath Types Description Compile -Time compileClasspath + testCompileClasspath Spark compile-time version is the current version. Runtime runtimeClasspath + testRuntimeClasspath Spark runtime version is selected by feature toggle as current or next version. A sample Gradle build script code snippet on selections of both Spark and Shim compile-time and runtime classpath versions:Selected Spark versions by classpath types: Feature Toggle is being used to select Spark version on: - Build Time (runtime version selection for classpath) - Test Pipelines (to run UT, IT and Perf Tests by Spark version) - Environment (to enable Spark version at env level – test, preprod or prod) Shim API artifacts are shipped in addition to Spark artifacts (by version)
  • 24. Verification & Progressive Roll-out & Cleanup Progressive Roll-out Phase WAVE III Scope: All Tenants (Internal/Impl/Prod) Duration: 4 Weeks WAVE II Scope: Multiple Tenants (Impl / NonProd) Duration: 2 Weeks WAVE I Scope: Single Tenant (Internal) Duration: 2 Weeks Verification Phase Verify following test pipelines against to both Spark versions: • Automated Regression Testing: Running Unit & Integration Test Pipelines • Performance Testing: ‒ Spark Benchmark Pipeline: Spark current vs new version Perf Tests (by executing standard TPCH and custom queries.) + Hadoop ‒ End2End Perf Pipeline: Custom applications + Spark + Hadoop Previous Spark version: ‒ Fork, ‒ Artifacts from artifactory / mvn repository) Shim API Cleanup Phase
  • 25. Spark SQL Engine - Query Planning & Execution SQL Dataset DataFrame Unresolved Logical Plan Logical Plan Optimized Logical Plan Physical Plan CostModel Selected Physical Plan DAG Execution SQL Metrics Application Job Stage Task Spark UI Rest APIs Event Logs Logical Planning Physical Planning Execution Analysis Optimizations Physical Plans Generation
  • 26. Runtime Metrics Pipeline Architecture Proton (Application Server) Data Acquisition Data Preparation Query Engine HDFS / S3 Spark History Server Data Warehouse Stats App Hadoop Cluster Spark Applications Spark Hive Tables • app_metrics • job_metrics • stage_metrics • task_metrics • executor_metrics • sql_metrics Spark Rest APIs • Application • Job • Stage • Task • Executors • SQL (New) 1x1
  • 27. New Spark SQL Rest API [coming with v3.1.0] New SQL Rest Endpoints
  • 28. Comparison of new Spark SQL Rest API Json Outputs Improved VersionOlder Version (Cherry-picked from OSS) Improvements 1. Correlation between physical operators and their runtime metrics 2. wholeStageCodege nId support across multiple physical operators 3. Normalization on metric values to be able to run aggregations
  • 29. Sample Queries on Spark SQL Metrics What is total loaded number of input/output rows by file type, tenant, application, date? What are the top 25 tenants running Join, Filter, Sort (etc..) operations? What are the mostly used operations by tenants, applications, dates? File Scan Operation What is number of files by file type, tenant, application, date? What is total scan time and total metadata time by min, med, max, file type, tenant, application, date? What is total number of operations by tenants, applications, dates? What are Top 25 Tenants Having Max Broadcasted Data Size (GB)? What is the total number of joins, BroadcastHashJoin or SortMergeJoin across all tenants by day? Join What are Top 25 Tenants Having Max Time to Collect during Broadcast (Minute)? What are Top 25 Tenants Having Max Time To Broadcast (Minute) or To Build during Broadcast (Minute)? ... ... ...
  • 30. Correlation between Physical Operators & SQL Metrics Workday Confidential •We also integrated our physical plans with runtime SQL metrics •We can have correlation between Physical Operators and their Runtime Metrics from application logs for troubleshooting and debugging purposes
  • 31. Developed patches were also backported to OSS repo for community usage: •[SPARK-31440][SQL] Improve SQL Rest API https://github.com/apache/spark/pull/28208 •[SPARK-32548][SQL] - Add Application attemptId support to SQL Rest API https://github.com/apache/spark/pull/29364 •[SPARK-31566][SQL][DOCS] Add SQL Rest API Documentation https://github.com/apache/spark/pull/28354 Backported Patches to Spark OSS Repo [v3.1.0]
  • 32. Spark 3.0 introduced following features: ‒ Adaptive Query Execution (SPARK-31412) ‒ Dynamic Partition Pruning (SPARK-11150) ‒ Scala 2.12 Support (SPARK-26132) ‒ JDK 11 Support (SPARK-24417) ‒ Hadoop 3 Support (SPARK-23534) • Spark 3.x Upgrade (+ Scala, JDK, Hadoop) • Performance, Troubleshooting and Debugging Improvements • Multi-Tenancy Support What is the next?
  • 34. HDFS / S3 Prism 01 Tenant 01 Prism 02 Tenant 02 Prism 03 Tenant 03 Prism 04 Tenant 04 Spark Cluster Spark Cluster Spark Cluster Spark Cluster Prism Deployment - Today
  • 35. Prism Deployment - “Multiverse” Spark Cluster Spark Cluster HDFS / S3 Tenant 02 Tenant 04Tenant 03 Tenant 06Tenant 05 Tenant 07Tenant 01 Tenant 08 Prism 01 Prism 02 Prism 03 Spark Cluster
  • 36. Thank You! Q & A Workday Confidential
  • 37. TM