SlideShare a Scribd company logo
1 of 32
Download to read offline
Pavel Hardak, Dir Product (Workday)
Ned Borisov (Ph.D), Sr Eng Mgr (Workday)
Lightning-Fast Analytics for
Workday Transactional Data
#ExpSAIS18
Agenda
• Workday (Pavel H)
– Introduction to Workday
– Business challenges
– Platform for Transactional Apps
• Prism Analytics (Ned B)
– High Level Architecture
– Functional Modules
– Problems encountered
• Wrap-up (Pavel H)
2#ExpSAIS18
Workday
• Pure SaaS company (founded in 2005)
• Enterprise cloud apps – HCM and Finances
– Named as “Leader” in Gartner Magic Quadrants
• 2200+ customers, 175+ of Fortune 500
– Revenue: $2.1B, 36% YoY
• 8600+ employees worldwide
– #7 in FORTUNE "100 Best Companies to Work For”
– Pleasanton (HQ), San Mateo, San Francisco
– Boulder (CO), Dublin (Ireland), Victoria (BC), …
3#ExpSAIS18
Workday Confidential
#ExpSAIS18
Continuous Innovation in Cloud
5#ExpSAIS18
6#ExpSAIS18
Enterprise SaaS Challenges
• Concurrency
– From small to huge companies - every ‘worker’ is Workday user
• Reliability
– All users add and change data, generating many transactions
• Security
– Customers trust us with very confidential and private information
• Scalability
– Import several years from the previous system(s) and keep growing
• Speed
– Everybody wants fast response time J
7#ExpSAIS18
Business Process
Framework
Object
Data Model
Reporting and
Analytics
Security Integration
Cloud
One Source for Data | One Security Model | One Experience | One Community
Machine
Learning
One Platform
#ExpSAIS18
Object
Data Model
One Source for Data | One Security Model | One Experience | One Community
One Platform
Object Data Model
MetadataExtensibleDurable
#ExpSAIS18
Reporting and
Analytics
One Source for Data | One Security Model | One Experience | One Community
One Platform
Reporting and Analytics
Dashboards CollaborationDistribution
But we want more…
• Import 3rd party data from external sources
– Unknown schema, need validations and cleansing
• Blend external data with Workday data
– Self Service Data Preparation
– Publish custom report sources
– Leverage the same security paradigms
• Data Discovery and Reporting
– Visualize, slice and dice by any dimension
– Perform faster than ever before
11#ExpSAIS18
12#ExpSAIS18
Just add some …
• Water (?)
• Coffee (?)
• Energy drink (?)
• Apache Spark (!)
13#ExpSAIS18
Why Apache Spark
• Wanted to standardize on ONE data processing
technology which keeps evolving
• Needed extensibility to handle diverse use cases
• Scalability for on-disk views and in-memory
processing
• SQL processing is a HUGE plus
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Data Prep:
Interactive Transforms
HDFS
Workday Data
External Data
Samples
#ExpSAIS18
Prism Server
Data Preparation
• A dataset may import
other datasets to
transform them (think
SQL View)
• Transforms include:
Filter, Join, Union,
Group By, etc.
• Example data are shown
to help verify the
transformation
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Data Prep:
Interactive Transforms
Lens Build:
Batch Transforms
HDFS
Workday Data
External Data
Samples
Data
#ExpSAIS18
Prism Server
Lens Build
Lens
• Materializing all
transforms
• Columnar format with
further split into small
blocks
Spark
Jobs
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Query Engine:
Interactive BI Queries
Data Prep:
Interactive Transforms
Lens Build:
Batch Transforms
HDFS
Workday Data
External Data
Samples
Lens
Data
#ExpSAIS18
Prism Server
Query Engine
• Analyst-driven Analysis
• Drag & drop chart creation
• Analyst defined computed fields
• Quick measurement aggregates
• Execution
• Query Engine executes the queries
• Interactive response is required
#ExpSAIS18
High Level Prism Architecture
Report Queries Web UI Requests
Query Engine:
Interactive BI Queries
Data Prep:
Interactive Transforms
Lens Build:
Batch Transforms
HDFS
Workday Data
External Data
Samples
Lens
Data
#ExpSAIS18
Prism Server
Spark in Prism Architecture
Prism Analytics launches and maintains lifecycle of three types
of Spark Applications
• Data Prep: a single (smaller) always-on Spark Application
– executes dataset transformations over small samples of data
• Lens Build: on-demand batch Application
– one per Lens Build process
– executes dataset transformations over full datasets
• Query Engine: a single (larger) always-on Application
– executes reporting queries over Lens data
– caches columns of Lenses in memory
#ExpSAIS18
Query Engine & Spark
Query Engine
Prism
Spark
Server
Spark
Driver
Prism Server
Data Prep
. . .
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
Spark
Executor
#ExpSAIS18
Notable Observations
• Memory Allocation Strategy
• Row Level Security
#ExpSAIS18
Memory Allocation Strategy
• Executors
• Driver
Column Data
Cache
30%
Execution
60% 10%
Buffer
Accumulators
20%
Streaming
60% 20%
Buffer
Executor JVM
Driver JVM
#ExpSAIS18
à 20% faster queries
Row-Level Security
• Implemented as a dimension predicate. For example:
• In-List for supervisory_org could be very large
• More than one In-List
• Complex list values (e.g. nested conjunctions)
SELECT employee, SUM(quantity)
FROM Employee_Stock_Grants
WHERE supervisory_org IN (org1, org33, org_508)
GROUP BY employee;
#ExpSAIS18
Scenario Details
• Customer Use Case
– Predicates with 10+ In-Lists
– Values between 6K and 12K
– Additional mix of conjunctions and disjunctions
• The Same Query
With Security = 100X Without Security
#ExpSAIS18
Analysis
• Finding 1
– Parsing, planning and optimizing was taking ~27 seconds
– We did it 4 times
• Finding 2
– Major cause is the number of times the Catalyst
expressions (In and InSet) and their arguments were
being traversed and copied during plan analysis and
optimization.
– Minor cause is the amount of time spent in serializing
Scala’s TrieSet when shipping the plan to executors
#ExpSAIS18
Solution
• Custom InSet-Like expressions (case classes)
– Hide the large literals sets through a curried-argument
– Resulted in queries going from 27 sec to 4 sec.
• Further Optimizations
– Our InSet-Like expression did not materialize the target
in-sets until after the plan was de-serialized on the
executors
– Resulted in improvement from 4 sec to 2 sec.
#ExpSAIS18
Future Plans
• Better query latency for big datasets
• Deeper integration with reports and apps
• Integration with Kubernetes and AWS
• Improved scalability and concurrency
• Achieve ‘Zero DownTime’
…and much more I can not share here J
30#ExpSAIS18
Questions?
• IF ( you are looking for …
Great work culture &&
Technology challenges &&
Lots of fun and perks )
• THEN
Come to work with us!!!
workday.com/jobs
31#ExpSAIS18
More Info
• Building a modern data discovery and BI platform using
Apache Spark and Catalyst by Kevin Beyer
• Data Preparation in Workday Prism Analytics: Solving
Complex Problems the Workday Way by Jianneng Li
• Exploring Workday’s Architecture by James Pasley
32#ExpSAIS18

More Related Content

What's hot

Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharingconfluent
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record ProcessingBryan Bende
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To KibanaJen Stirrup
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Product management class rookie to pro
Product management class rookie to proProduct management class rookie to pro
Product management class rookie to proBim Akinfenwa
 

What's hot (20)

Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To Kibana
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Product management class rookie to pro
Product management class rookie to proProduct management class rookie to pro
Product management class rookie to pro
 

Similar to Workday Prism Analytics for Fast Transactional Data Insights

From Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case studyFrom Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case studyElizabeth Szabo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayAmazon Web Services
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceDeepak Chandramouli
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...StampedeCon
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneMongoDB
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 

Similar to Workday Prism Analytics for Fast Transactional Data Insights (20)

From Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case studyFrom Legacy Web Application To SharePoint - a case study
From Legacy Web Application To SharePoint - a case study
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBay
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Sandeep Grandhi (1)
Sandeep Grandhi (1)Sandeep Grandhi (1)
Sandeep Grandhi (1)
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
 
Talend introduction v1
Talend introduction v1Talend introduction v1
Talend introduction v1
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 

Recently uploaded

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 

Recently uploaded (20)

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 

Workday Prism Analytics for Fast Transactional Data Insights

  • 1. Pavel Hardak, Dir Product (Workday) Ned Borisov (Ph.D), Sr Eng Mgr (Workday) Lightning-Fast Analytics for Workday Transactional Data #ExpSAIS18
  • 2. Agenda • Workday (Pavel H) – Introduction to Workday – Business challenges – Platform for Transactional Apps • Prism Analytics (Ned B) – High Level Architecture – Functional Modules – Problems encountered • Wrap-up (Pavel H) 2#ExpSAIS18
  • 3. Workday • Pure SaaS company (founded in 2005) • Enterprise cloud apps – HCM and Finances – Named as “Leader” in Gartner Magic Quadrants • 2200+ customers, 175+ of Fortune 500 – Revenue: $2.1B, 36% YoY • 8600+ employees worldwide – #7 in FORTUNE "100 Best Companies to Work For” – Pleasanton (HQ), San Mateo, San Francisco – Boulder (CO), Dublin (Ireland), Victoria (BC), … 3#ExpSAIS18
  • 5. Continuous Innovation in Cloud 5#ExpSAIS18
  • 7. Enterprise SaaS Challenges • Concurrency – From small to huge companies - every ‘worker’ is Workday user • Reliability – All users add and change data, generating many transactions • Security – Customers trust us with very confidential and private information • Scalability – Import several years from the previous system(s) and keep growing • Speed – Everybody wants fast response time J 7#ExpSAIS18
  • 8. Business Process Framework Object Data Model Reporting and Analytics Security Integration Cloud One Source for Data | One Security Model | One Experience | One Community Machine Learning One Platform #ExpSAIS18
  • 9. Object Data Model One Source for Data | One Security Model | One Experience | One Community One Platform Object Data Model MetadataExtensibleDurable #ExpSAIS18
  • 10. Reporting and Analytics One Source for Data | One Security Model | One Experience | One Community One Platform Reporting and Analytics Dashboards CollaborationDistribution
  • 11. But we want more… • Import 3rd party data from external sources – Unknown schema, need validations and cleansing • Blend external data with Workday data – Self Service Data Preparation – Publish custom report sources – Leverage the same security paradigms • Data Discovery and Reporting – Visualize, slice and dice by any dimension – Perform faster than ever before 11#ExpSAIS18
  • 13. Just add some … • Water (?) • Coffee (?) • Energy drink (?) • Apache Spark (!) 13#ExpSAIS18
  • 14. Why Apache Spark • Wanted to standardize on ONE data processing technology which keeps evolving • Needed extensibility to handle diverse use cases • Scalability for on-disk views and in-memory processing • SQL processing is a HUGE plus #ExpSAIS18
  • 15. High Level Prism Architecture Report Queries Web UI Requests Data Prep: Interactive Transforms HDFS Workday Data External Data Samples #ExpSAIS18 Prism Server
  • 16. Data Preparation • A dataset may import other datasets to transform them (think SQL View) • Transforms include: Filter, Join, Union, Group By, etc. • Example data are shown to help verify the transformation #ExpSAIS18
  • 17. High Level Prism Architecture Report Queries Web UI Requests Data Prep: Interactive Transforms Lens Build: Batch Transforms HDFS Workday Data External Data Samples Data #ExpSAIS18 Prism Server
  • 18. Lens Build Lens • Materializing all transforms • Columnar format with further split into small blocks Spark Jobs #ExpSAIS18
  • 19. High Level Prism Architecture Report Queries Web UI Requests Query Engine: Interactive BI Queries Data Prep: Interactive Transforms Lens Build: Batch Transforms HDFS Workday Data External Data Samples Lens Data #ExpSAIS18 Prism Server
  • 20. Query Engine • Analyst-driven Analysis • Drag & drop chart creation • Analyst defined computed fields • Quick measurement aggregates • Execution • Query Engine executes the queries • Interactive response is required #ExpSAIS18
  • 21. High Level Prism Architecture Report Queries Web UI Requests Query Engine: Interactive BI Queries Data Prep: Interactive Transforms Lens Build: Batch Transforms HDFS Workday Data External Data Samples Lens Data #ExpSAIS18 Prism Server
  • 22. Spark in Prism Architecture Prism Analytics launches and maintains lifecycle of three types of Spark Applications • Data Prep: a single (smaller) always-on Spark Application – executes dataset transformations over small samples of data • Lens Build: on-demand batch Application – one per Lens Build process – executes dataset transformations over full datasets • Query Engine: a single (larger) always-on Application – executes reporting queries over Lens data – caches columns of Lenses in memory #ExpSAIS18
  • 23. Query Engine & Spark Query Engine Prism Spark Server Spark Driver Prism Server Data Prep . . . Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor Spark Executor #ExpSAIS18
  • 24. Notable Observations • Memory Allocation Strategy • Row Level Security #ExpSAIS18
  • 25. Memory Allocation Strategy • Executors • Driver Column Data Cache 30% Execution 60% 10% Buffer Accumulators 20% Streaming 60% 20% Buffer Executor JVM Driver JVM #ExpSAIS18 à 20% faster queries
  • 26. Row-Level Security • Implemented as a dimension predicate. For example: • In-List for supervisory_org could be very large • More than one In-List • Complex list values (e.g. nested conjunctions) SELECT employee, SUM(quantity) FROM Employee_Stock_Grants WHERE supervisory_org IN (org1, org33, org_508) GROUP BY employee; #ExpSAIS18
  • 27. Scenario Details • Customer Use Case – Predicates with 10+ In-Lists – Values between 6K and 12K – Additional mix of conjunctions and disjunctions • The Same Query With Security = 100X Without Security #ExpSAIS18
  • 28. Analysis • Finding 1 – Parsing, planning and optimizing was taking ~27 seconds – We did it 4 times • Finding 2 – Major cause is the number of times the Catalyst expressions (In and InSet) and their arguments were being traversed and copied during plan analysis and optimization. – Minor cause is the amount of time spent in serializing Scala’s TrieSet when shipping the plan to executors #ExpSAIS18
  • 29. Solution • Custom InSet-Like expressions (case classes) – Hide the large literals sets through a curried-argument – Resulted in queries going from 27 sec to 4 sec. • Further Optimizations – Our InSet-Like expression did not materialize the target in-sets until after the plan was de-serialized on the executors – Resulted in improvement from 4 sec to 2 sec. #ExpSAIS18
  • 30. Future Plans • Better query latency for big datasets • Deeper integration with reports and apps • Integration with Kubernetes and AWS • Improved scalability and concurrency • Achieve ‘Zero DownTime’ …and much more I can not share here J 30#ExpSAIS18
  • 31. Questions? • IF ( you are looking for … Great work culture && Technology challenges && Lots of fun and perks ) • THEN Come to work with us!!! workday.com/jobs 31#ExpSAIS18
  • 32. More Info • Building a modern data discovery and BI platform using Apache Spark and Catalyst by Kevin Beyer • Data Preparation in Workday Prism Analytics: Solving Complex Problems the Workday Way by Jianneng Li • Exploring Workday’s Architecture by James Pasley 32#ExpSAIS18