SlideShare a Scribd company logo
1 of 30
Download to read offline
RRE: Faster than SAS
Results from Benchmarking
Thomas W. Dinsmore, Revolution Analytics
John Wallace, DataSong
Polling Question
Do you currently use:
– A) R or Revolution R Enterprise (RRE)
– B) SAS
– C) Both
– D) Neither
Benchmarking RRE vs. SAS
Background
Approach
Results
Discussion
4
Revolution R Enterprise
Open source R
Commercially support distribution
Enhanced for enterprise use:
– Scalable analytics
– Developer tools
– Integration tools
– Deployment tools
5
2012: Allstate Benchmark
0 50 100 150 200 250 300
6
300
Runtime, Minutes
SAS PROC GENMOD RRE
Poisson Regression, 150MM rows
Criticism: “Apples to Oranges”
6
20 Cores16 Cores
7
Most SAS/STAT PROCs (including PROC
GENMOD) run single-threaded.
SAS/STAT: 91 PROCs
• 69 single threaded
• 13 multi-threaded
• 9 distributed (if you license SAS HP Statistics)
8
9
2013: SAS Benchmark
PROC HPGENSELECT
– SAS/STAT
– SAS High Performance Statistics
Massive grid (140/144 nodes)
– 16 cores per node
– 2,240/2,304 cores
Conclusion: SAS on 2,304 cores is competitive
with RRE on 20 cores.
Honest Benchmarking
Compare RRE and SAS/STAT performance
– Same data
– Same environment
– Same tasks
Test under real-world conditions
Make the test fair and transparent
Data
11
 Manufactured data
 Reproducible in any environment
 Designed to emulate “typical” working data
 “Entity” tables: 1MM, 5MM rows
 “Predict” tables: 10MM, 50MM rows
Fact
Pre-
dict
Entity 1
Entity 2
Entity key
571 Columns
21 Columns
Benchmarking Environment
12
SAS 9.4:
• Base
• STAT
• Grid Manager
Commodity servers:
• 4 cores
• 16GB Memory
Gbit network
CentOS
RRE 7.0
Platform LSF 9
Analytic Tasks
13
Task SAS Capability RRE Capability
Descriptive Statistics PROC SURVEYMEANS rxSummary
Median and Deciles PROC SURVEYMEANS rxQuantile
Frequency Distribution PROC FREQ rxCube
Linear Regression (Numeric predictors) PROC REG, HPREG rxLinMod
Linear Regression (Mixed predictors) PROC GENMOD rxLinMod
Stepwise Linear (100 predictors) PROC REG rxLinMod/rxStepControl
Logistic Regression PROC LOGISTIC rxLogit
Generalized Linear PROC GENMOD rxGLM
K-Means Clustering PROC FASTCLUS rxKMeans
Score PROC SCORE rxPredict
14
Preparation
Generated data with randomized procedure
Loaded data into native formats:
– RRE: XDF file
– SAS: SAS DATA set
Generation and load times not included
No meaningful differences
15
RRE: 42 Times Faster Than SAS 9.4
0 1,000 2,000 3,000 4,000 5,000 6,000
124
5,192
Runtime, Seconds
N=5,000,000
SAS 9.4 RRE RRE ~2 minutes
SAS ~1 hour, 26 minutes
Complete script: ten analytic tasks.
16
RRE: Linear Scalability
68 124
623
5,192
0
1,000
2,000
3,000
4,000
5,000
6,000
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000
Runtime,Seconds
# Rows in Entity Table
RRE 7
SAS 9.4
RRE: consistent
performance with
increased data volume.
17
RRE: Up to 350X Faster Than SAS
0
50
100
150
200
250
300
350
400
RRE Speed Multiple
213 185
351
39 37
19
58
18
101
32
Runtime,Seconds
N=5MM
Stats
Quintiles
Freq
Lin Reg 1
Lin Reg 2
Step Lin
Logistic
GLM
Kmeans 1
Kmeans 2
18
Why is RRE faster than SAS?
RRE supports scalable computing out of the
box
– Multi-threaded processing
– Distributed processing
Legacy SAS is mostly single-threaded
– DATA Step processing
– Most SAS/STAT PROCs
19
SAS HP PROCs
9 new SAS PROCs
Bundled into SAS 9.4
Designed for scalability
Multiple operating modes:
– Single machine
– Distributed (must license SAS HP
Statistics)
20
HP PROCs: Minimal Improvement
0 50 100 150 200 250 300
6.8
267.17
253.82
Runtime, Seconds
N=5,000,000
SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod
Linear regression, 20 predictors
HPREG running in single machine mode.
21
Summary
 RRE is faster than Legacy SAS:
– Same tasks
– Same hardware
 RRE speed:
– Efficient engineering
– Multi-threaded and distributed processing
 SAS performance claims:
– Massive hardware requirements
– Force you to license more software from SAS
– Don’t apply to Legacy SAS
22
Polling Question
Which of the following analytic software
benefits is most important to you:
– A) Completing projects faster
– B) Building better predictive models
– C) High performance with low infrastructure costs
23
John Wallace, Founder & CEO
 Background
 Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens.
 Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles,
Singapore, and India.
 Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from
consulting through close partnerships with our clients.
 Our Offerings
 Customer interaction insight that powers applications for customer-level revenue attribution,
targeting, media optimization.
 Descriptive and predictive modeling of hidden trends and relationships in big data.
 Custom development including applications, process automation, and decision support solutions.
DataSong at a Glance
DataSong Offerings
Hosted Applications
● Revenue Attribution
● Customer Targeting
● Marketing Planning
We know Big Data. We analyze and provide the “so what”.
DataSong Architecture
• ETL
• N marketing channels
• Behavioral variables
• Promotional data
• Overlay data
• Functions to read Hadoop output;
xdf creation
• Exploratory data analysis
• GAM survival models
• Scoring for inference
• Scoring for prediction
• 5 billion scores per day
per customer
DATASONG DATA
FORMAT (DDF)
CUSTOM VARIABLES
(PMML)
Where Speed Matters3 key dimensions
● how many rows
● how many variables
● how many iterations of a model
Trade offs for speed
● Sampling variance
● Test fewers features
● Have less understanding of the signal
This 3rd dimension means we must multiply any benchmark by N
28
29
30
Thank You

More Related Content

What's hot

Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
Revolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Willy Marroquin (WillyDevNET)
 

What's hot (20)

Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeed
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
DeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsDeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence Applications
 

Viewers also liked

00025233
0002523300025233
00025233
fpem
 

Viewers also liked (20)

ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Facebook og søk for BRAK
Facebook og søk for BRAKFacebook og søk for BRAK
Facebook og søk for BRAK
 
Почему не работают корпоративные социальные сети?
Почему не работают корпоративные социальные сети?Почему не работают корпоративные социальные сети?
Почему не работают корпоративные социальные сети?
 
Unit 2: NUTRITION
Unit 2: NUTRITIONUnit 2: NUTRITION
Unit 2: NUTRITION
 
islam & art
islam & artislam & art
islam & art
 
Krijesa të mrekullueshme. albanian (shqip)
Krijesa të mrekullueshme. albanian (shqip)Krijesa të mrekullueshme. albanian (shqip)
Krijesa të mrekullueshme. albanian (shqip)
 
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITALR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
 
World Computer Congress Keynote
World Computer Congress KeynoteWorld Computer Congress Keynote
World Computer Congress Keynote
 
Bolsas de Estudo para Australia
Bolsas de Estudo para AustraliaBolsas de Estudo para Australia
Bolsas de Estudo para Australia
 
Digital Marketing
Digital MarketingDigital Marketing
Digital Marketing
 
Tutorial for the ReportLinker App
Tutorial for the ReportLinker AppTutorial for the ReportLinker App
Tutorial for the ReportLinker App
 
Medier i en digital verden 150922
Medier i en digital verden 150922Medier i en digital verden 150922
Medier i en digital verden 150922
 
Google analytics konferenz gtm hands on alkan_cem_webalytics
Google analytics konferenz gtm hands on alkan_cem_webalyticsGoogle analytics konferenz gtm hands on alkan_cem_webalytics
Google analytics konferenz gtm hands on alkan_cem_webalytics
 
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
 
Turkey is a New Kind Of Silicon Valley
Turkey is a New Kind Of Silicon ValleyTurkey is a New Kind Of Silicon Valley
Turkey is a New Kind Of Silicon Valley
 
11-16
11-1611-16
11-16
 
00025233
0002523300025233
00025233
 
Bear gss experiences shareing
Bear gss experiences shareingBear gss experiences shareing
Bear gss experiences shareing
 
Boletín XVII abril 2016
Boletín XVII abril 2016Boletín XVII abril 2016
Boletín XVII abril 2016
 
Međuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatoriMeđuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatori
 

Similar to Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
Revolution Analytics
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
MongoDB
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
MongoDB
 

Similar to Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed (20)

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Doug
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, Python
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Sql 2016 2017 full
Sql 2016   2017 fullSql 2016   2017 full
Sql 2016 2017 full
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
DataVard BW Fitness Test and HeatMap
DataVard BW Fitness Test and HeatMapDataVard BW Fitness Test and HeatMap
DataVard BW Fitness Test and HeatMap
 
Sql 2017 net raf
Sql 2017  net rafSql 2017  net raf
Sql 2017 net raf
 
Resume_Rahim
Resume_RahimResume_Rahim
Resume_Rahim
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Resume
ResumeResume
Resume
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 

More from Revolution Analytics

The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
Revolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
Revolution Analytics
 

More from Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

  • 1. RRE: Faster than SAS Results from Benchmarking Thomas W. Dinsmore, Revolution Analytics John Wallace, DataSong
  • 2. Polling Question Do you currently use: – A) R or Revolution R Enterprise (RRE) – B) SAS – C) Both – D) Neither
  • 3. Benchmarking RRE vs. SAS Background Approach Results Discussion
  • 4. 4 Revolution R Enterprise Open source R Commercially support distribution Enhanced for enterprise use: – Scalable analytics – Developer tools – Integration tools – Deployment tools
  • 5. 5 2012: Allstate Benchmark 0 50 100 150 200 250 300 6 300 Runtime, Minutes SAS PROC GENMOD RRE Poisson Regression, 150MM rows
  • 6. Criticism: “Apples to Oranges” 6 20 Cores16 Cores
  • 7. 7 Most SAS/STAT PROCs (including PROC GENMOD) run single-threaded. SAS/STAT: 91 PROCs • 69 single threaded • 13 multi-threaded • 9 distributed (if you license SAS HP Statistics)
  • 8. 8
  • 9. 9 2013: SAS Benchmark PROC HPGENSELECT – SAS/STAT – SAS High Performance Statistics Massive grid (140/144 nodes) – 16 cores per node – 2,240/2,304 cores Conclusion: SAS on 2,304 cores is competitive with RRE on 20 cores.
  • 10. Honest Benchmarking Compare RRE and SAS/STAT performance – Same data – Same environment – Same tasks Test under real-world conditions Make the test fair and transparent
  • 11. Data 11  Manufactured data  Reproducible in any environment  Designed to emulate “typical” working data  “Entity” tables: 1MM, 5MM rows  “Predict” tables: 10MM, 50MM rows Fact Pre- dict Entity 1 Entity 2 Entity key 571 Columns 21 Columns
  • 12. Benchmarking Environment 12 SAS 9.4: • Base • STAT • Grid Manager Commodity servers: • 4 cores • 16GB Memory Gbit network CentOS RRE 7.0 Platform LSF 9
  • 13. Analytic Tasks 13 Task SAS Capability RRE Capability Descriptive Statistics PROC SURVEYMEANS rxSummary Median and Deciles PROC SURVEYMEANS rxQuantile Frequency Distribution PROC FREQ rxCube Linear Regression (Numeric predictors) PROC REG, HPREG rxLinMod Linear Regression (Mixed predictors) PROC GENMOD rxLinMod Stepwise Linear (100 predictors) PROC REG rxLinMod/rxStepControl Logistic Regression PROC LOGISTIC rxLogit Generalized Linear PROC GENMOD rxGLM K-Means Clustering PROC FASTCLUS rxKMeans Score PROC SCORE rxPredict
  • 14. 14 Preparation Generated data with randomized procedure Loaded data into native formats: – RRE: XDF file – SAS: SAS DATA set Generation and load times not included No meaningful differences
  • 15. 15 RRE: 42 Times Faster Than SAS 9.4 0 1,000 2,000 3,000 4,000 5,000 6,000 124 5,192 Runtime, Seconds N=5,000,000 SAS 9.4 RRE RRE ~2 minutes SAS ~1 hour, 26 minutes Complete script: ten analytic tasks.
  • 16. 16 RRE: Linear Scalability 68 124 623 5,192 0 1,000 2,000 3,000 4,000 5,000 6,000 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 Runtime,Seconds # Rows in Entity Table RRE 7 SAS 9.4 RRE: consistent performance with increased data volume.
  • 17. 17 RRE: Up to 350X Faster Than SAS 0 50 100 150 200 250 300 350 400 RRE Speed Multiple 213 185 351 39 37 19 58 18 101 32 Runtime,Seconds N=5MM Stats Quintiles Freq Lin Reg 1 Lin Reg 2 Step Lin Logistic GLM Kmeans 1 Kmeans 2
  • 18. 18 Why is RRE faster than SAS? RRE supports scalable computing out of the box – Multi-threaded processing – Distributed processing Legacy SAS is mostly single-threaded – DATA Step processing – Most SAS/STAT PROCs
  • 19. 19 SAS HP PROCs 9 new SAS PROCs Bundled into SAS 9.4 Designed for scalability Multiple operating modes: – Single machine – Distributed (must license SAS HP Statistics)
  • 20. 20 HP PROCs: Minimal Improvement 0 50 100 150 200 250 300 6.8 267.17 253.82 Runtime, Seconds N=5,000,000 SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod Linear regression, 20 predictors HPREG running in single machine mode.
  • 21. 21 Summary  RRE is faster than Legacy SAS: – Same tasks – Same hardware  RRE speed: – Efficient engineering – Multi-threaded and distributed processing  SAS performance claims: – Massive hardware requirements – Force you to license more software from SAS – Don’t apply to Legacy SAS
  • 22. 22 Polling Question Which of the following analytic software benefits is most important to you: – A) Completing projects faster – B) Building better predictive models – C) High performance with low infrastructure costs
  • 24.  Background  Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens.  Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles, Singapore, and India.  Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from consulting through close partnerships with our clients.  Our Offerings  Customer interaction insight that powers applications for customer-level revenue attribution, targeting, media optimization.  Descriptive and predictive modeling of hidden trends and relationships in big data.  Custom development including applications, process automation, and decision support solutions. DataSong at a Glance
  • 25. DataSong Offerings Hosted Applications ● Revenue Attribution ● Customer Targeting ● Marketing Planning We know Big Data. We analyze and provide the “so what”.
  • 26. DataSong Architecture • ETL • N marketing channels • Behavioral variables • Promotional data • Overlay data • Functions to read Hadoop output; xdf creation • Exploratory data analysis • GAM survival models • Scoring for inference • Scoring for prediction • 5 billion scores per day per customer DATASONG DATA FORMAT (DDF) CUSTOM VARIABLES (PMML)
  • 27. Where Speed Matters3 key dimensions ● how many rows ● how many variables ● how many iterations of a model Trade offs for speed ● Sampling variance ● Test fewers features ● Have less understanding of the signal This 3rd dimension means we must multiply any benchmark by N
  • 28. 28
  • 29. 29