SlideShare a Scribd company logo
1 of 38
Performance In Geode:
How Fast Is It, How Is It Measured, and How Can It Be Improved?
Helena Bales, Senior Software Engineer at Pivotal
What Is The Performance Of Geode?
2
Performance of Geode 1.9.0
3
203,855
244,463
181,655
207,697
What do those number mean?
4
● 200,000 operations per second means nothing to a person.
○ Is that good?
○ Is the performance consistent and accurate?
○ Has it improved or regressed since the last version?
○ Can it be better?
What do those number mean?
5
● 200,000 operations per second means nothing to a person.
○ Is that good? Pretty good, yes.
○ Is the performance consistent and accurate? Not yet.
○ Has it improved since the last version? Yes, slightly.
○ Can it be better? YES.
What do those number mean?
6
● 200,000 operations per second means nothing to a person.
○ Is that good? Pretty good, yes.
○ Is the performance consistent and accurate? Not yet.
○ Has it improved since the last version? Yes, slightly.
○ Can it be better? YES.
How do you know???
How Is Performance Measured?
7
Creating the Geode Benchmark - Features
8
● On demand
● Against any revision of Geode
● On AWS cluster deployment of Geode
● On any dev machine in the office
● From Concourse CI pipeline
● With a profiler attached
● Compare two runs of benchmarks for performance changes
Creating the Geode Benchmark - Goals
9
● Run by anyone interested in Geode
● Have others create benchmarks
● Visualize benchmark results over time
● Increase benchmark coverage of Geode
Tests Currently in the Benchmarks
10
○ ReplicatedGetBenchmark
○ ReplicatedGetLongBenchmark
○ ReplicatedPutBenchmark
○ ReplicatedPutLongBenchmark
○ ReplicatedPutAllBenchmark
○ ReplicatedPutAllLongBenchmark
○ ReplicatedFunctionExecutionBenchmark
○ ReplicatedFunctionExecutionWithArgum
entsBenchmark
○ ReplicatedFunctionExecutionWithFilters
Benchmark
○ PartitionedGetBenchmark
○ PartitionedGetLongBenchmark
○ PartitionedPutBenchmark
○ PartitionedPutLongBenchmark
○ PartitionedPutAllBenchmark
○ PartitionedPutAllLongBenchmark
○ PartitionedIndexedQueryBenchmark
○ PartitionedFunctionExecutionWithArgum
entsBenchmark
○ PartitionedFunctionExecutionWithFilters
Benchmark
Other Tested Configurations
11
● With SSL
● With JDKs: 8, 11, 12, 13
● With Security Manager
● With Garbage Collectors:
○ CMS
○ G1
○ Z
○ Shenandoah
● Adjustable max heap size
How Can Performance Be Improved?
12
Finding Performance Bottlenecks
13
● Monitor locks
● Thread Park/Unpark Reentrant Locks
● Allocations/GC
● Overuse of synchronization
● Getting a system property in a hot path
● Lazy initialization of objects in a hot path
● Synchronization on a container (ex. hash map)
Case Study – The Connection Pool
14
● Why were we even looking for anything?
○ Couldn’t saturate network, CPU, memory; no matter the available
resources
○ Profiler gave us no suspect hot spots
● How did we find the issue?
○ Found the secret profiler option to measure zero-time reentrant locks
○ Thread.park() became a hot spot, with reentrant lock and connection
pool as callers
○ The connection pool was holding a reentrant lock in a hot path
while using a deque.
Case Study – Finding the Problem
15
16
Case Study – Finding the Problem
Case Study - Finding the Problem
17
Case Study- Finding the Problem
18
19
Case Study – Solving the Problem
20
no lock!
Case Study – Solving the Problem
21
lock free structure
Case Study – Solving the Problem
22
no locks!
Case Study – Solving the Problem
23
Case Study - Profiling
24
Case Study – Testing
25
● Unit testing
● Integration Testing
● Distributed Testing
● Concurrency Testing
● Performance Testing
Case Study - Performance Testing
26
197,686
before
659,980
after
Case Study - Performance Testing
27
Other Bottlenecks – Over Eager Allocations
28
2 potentially
unused objects
per call –
new HashSet() =>
1 HashSet
& 1 HashMap
Other Bottlenecks – Over Eager Allocations (fixed)
● Do not allocate eagerly
● Allocate near first use
● Allocate after early returns that don’t use the allocated object
29
Other Bottlenecks – Know Your Structures
30
Methods called for every
operation and results in
1 add and 1 remove per op
Other Bottlenecks – Know Your Structures (fixed)
31
Methods still called for every
operation but does not allocate/gc
How much has performance improved?
32
Comparing Performance of 1.9.0 & 1.10.0
33
203,855
1.9.0
244,463
1.9.0 181,655
1.9.0
207,697
1.9.0
692,725
1.10.0
736,022
1.10.0
357,507
1.10.0
372,430
1.10.0
Comparing Performance of 1.9.0 & 1.10.0
34
1,764,765
1.9.0
518,534
1.10.0
488,051
1.10.0
1,005,730
1.10.0
965,404
1.10.0
1,980,391
1.9.0
1,471,434
1.9.0
1,731,946
1.9.0
Why Upgrade to Geode 1.10.0?
35
Comparing Performance of 1.9.0 & 1.10.0
36
v. 1.10.0
v. 1.9.0
PartitionedGetBenchmark
Relevant Links
37
● Geode repo: https://github.com/apache/geode
● Benchmark repo: https://github.com/apache/geode-benchmarks
● JIRA query for Performance Issues:
https://issues.apache.org/jira/browse/GEODE-
7134?jql=project%20%3D%20GEODE%20AND%20labels%20%3D
%20performance
Thank You
38

More Related Content

What's hot

Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
MATLAB ODE
MATLAB ODEMATLAB ODE
MATLAB ODEKris014
 
закон збереження енергії
закон збереження енергіїзакон збереження енергії
закон збереження енергіїolga_ruo
 
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018Amazon Web Services
 
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleNetflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleJen Aman
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Rooted & binary tree
Rooted & binary treeRooted & binary tree
Rooted & binary treeMANISH T I
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Математика в житті людини
Математика в житті людиниМатематика в житті людини
Математика в житті людиниВова Попович
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...Databricks
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Eigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to DiagonalisationEigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to DiagonalisationChristopher Gratton
 
新浪微博开放平台Redis实战
新浪微博开放平台Redis实战新浪微博开放平台Redis实战
新浪微博开放平台Redis实战mysqlops
 
03 truncation errors
03 truncation errors03 truncation errors
03 truncation errorsmaheej
 
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer taeseon ryu
 
ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...
ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...
ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...Om Muktar
 

What's hot (20)

Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
MATLAB ODE
MATLAB ODEMATLAB ODE
MATLAB ODE
 
2D Plot Matlab
2D Plot Matlab2D Plot Matlab
2D Plot Matlab
 
закон збереження енергії
закон збереження енергіїзакон збереження енергії
закон збереження енергії
 
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
 
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleNetflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Rooted & binary tree
Rooted & binary treeRooted & binary tree
Rooted & binary tree
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Математика в житті людини
Математика в житті людиниМатематика в житті людини
Математика в житті людини
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Project Reactor By Example
Project Reactor By ExampleProject Reactor By Example
Project Reactor By Example
 
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Eigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to DiagonalisationEigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to Diagonalisation
 
1542 inner products
1542 inner products1542 inner products
1542 inner products
 
新浪微博开放平台Redis实战
新浪微博开放平台Redis实战新浪微博开放平台Redis实战
新浪微博开放平台Redis实战
 
03 truncation errors
03 truncation errors03 truncation errors
03 truncation errors
 
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
 
ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...
ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...
ความผิดฐานข่มขืนกระทำชำเรา ศึกษาเปรียบเทียบระหว่างกฎหมายอิสลามกับประมวลกฎหมาย...
 

Similar to Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved?

Continuous Deployment of Architectural Change
Continuous Deployment of Architectural ChangeContinuous Deployment of Architectural Change
Continuous Deployment of Architectural ChangeMatt Graham
 
Making Effective, Useful Software Development Tools
Making Effective, Useful Software Development ToolsMaking Effective, Useful Software Development Tools
Making Effective, Useful Software Development ToolsGail Murphy
 
Dataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAGDataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAGgree_tech
 
2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in Check2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in CheckTargetX
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentMatt Graham
 
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...Atlassian
 
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Lucidworks
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoPGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoEqunix Business Solutions
 
Db migrations equal pain
Db migrations equal painDb migrations equal pain
Db migrations equal painEugen Oskin
 
Keeping code clean
Keeping code cleanKeeping code clean
Keeping code cleanBrett Child
 
CD in Machine Learning Systems
CD in Machine Learning SystemsCD in Machine Learning Systems
CD in Machine Learning SystemsThoughtworks
 
Where refactoring meets big $$$
Where refactoring meets big $$$Where refactoring meets big $$$
Where refactoring meets big $$$Michał Gruca
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlcAlexey Tokar
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarGlobalLogic Ukraine
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project PresentationCan Köklü
 
Refactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeRefactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeKnoldus Inc.
 
Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...Aleksandr Tarasov
 

Similar to Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved? (20)

Continuous Deployment of Architectural Change
Continuous Deployment of Architectural ChangeContinuous Deployment of Architectural Change
Continuous Deployment of Architectural Change
 
Making Effective, Useful Software Development Tools
Making Effective, Useful Software Development ToolsMaking Effective, Useful Software Development Tools
Making Effective, Useful Software Development Tools
 
Dataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAGDataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAG
 
2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in Check2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in Check
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous Deployment
 
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
 
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoPGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
 
Db migrations equal pain
Db migrations equal painDb migrations equal pain
Db migrations equal pain
 
Keeping code clean
Keeping code cleanKeeping code clean
Keeping code clean
 
CD in Machine Learning Systems
CD in Machine Learning SystemsCD in Machine Learning Systems
CD in Machine Learning Systems
 
Where refactoring meets big $$$
Where refactoring meets big $$$Where refactoring meets big $$$
Where refactoring meets big $$$
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlc
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr Bodnar
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project Presentation
 
Refactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeRefactoring: Improving the design of existing code
Refactoring: Improving the design of existing code
 
Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 

Recently uploaded (20)

Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 

Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved?

  • 1. Performance In Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved? Helena Bales, Senior Software Engineer at Pivotal
  • 2. What Is The Performance Of Geode? 2
  • 3. Performance of Geode 1.9.0 3 203,855 244,463 181,655 207,697
  • 4. What do those number mean? 4 ● 200,000 operations per second means nothing to a person. ○ Is that good? ○ Is the performance consistent and accurate? ○ Has it improved or regressed since the last version? ○ Can it be better?
  • 5. What do those number mean? 5 ● 200,000 operations per second means nothing to a person. ○ Is that good? Pretty good, yes. ○ Is the performance consistent and accurate? Not yet. ○ Has it improved since the last version? Yes, slightly. ○ Can it be better? YES.
  • 6. What do those number mean? 6 ● 200,000 operations per second means nothing to a person. ○ Is that good? Pretty good, yes. ○ Is the performance consistent and accurate? Not yet. ○ Has it improved since the last version? Yes, slightly. ○ Can it be better? YES. How do you know???
  • 7. How Is Performance Measured? 7
  • 8. Creating the Geode Benchmark - Features 8 ● On demand ● Against any revision of Geode ● On AWS cluster deployment of Geode ● On any dev machine in the office ● From Concourse CI pipeline ● With a profiler attached ● Compare two runs of benchmarks for performance changes
  • 9. Creating the Geode Benchmark - Goals 9 ● Run by anyone interested in Geode ● Have others create benchmarks ● Visualize benchmark results over time ● Increase benchmark coverage of Geode
  • 10. Tests Currently in the Benchmarks 10 ○ ReplicatedGetBenchmark ○ ReplicatedGetLongBenchmark ○ ReplicatedPutBenchmark ○ ReplicatedPutLongBenchmark ○ ReplicatedPutAllBenchmark ○ ReplicatedPutAllLongBenchmark ○ ReplicatedFunctionExecutionBenchmark ○ ReplicatedFunctionExecutionWithArgum entsBenchmark ○ ReplicatedFunctionExecutionWithFilters Benchmark ○ PartitionedGetBenchmark ○ PartitionedGetLongBenchmark ○ PartitionedPutBenchmark ○ PartitionedPutLongBenchmark ○ PartitionedPutAllBenchmark ○ PartitionedPutAllLongBenchmark ○ PartitionedIndexedQueryBenchmark ○ PartitionedFunctionExecutionWithArgum entsBenchmark ○ PartitionedFunctionExecutionWithFilters Benchmark
  • 11. Other Tested Configurations 11 ● With SSL ● With JDKs: 8, 11, 12, 13 ● With Security Manager ● With Garbage Collectors: ○ CMS ○ G1 ○ Z ○ Shenandoah ● Adjustable max heap size
  • 12. How Can Performance Be Improved? 12
  • 13. Finding Performance Bottlenecks 13 ● Monitor locks ● Thread Park/Unpark Reentrant Locks ● Allocations/GC ● Overuse of synchronization ● Getting a system property in a hot path ● Lazy initialization of objects in a hot path ● Synchronization on a container (ex. hash map)
  • 14. Case Study – The Connection Pool 14 ● Why were we even looking for anything? ○ Couldn’t saturate network, CPU, memory; no matter the available resources ○ Profiler gave us no suspect hot spots ● How did we find the issue? ○ Found the secret profiler option to measure zero-time reentrant locks ○ Thread.park() became a hot spot, with reentrant lock and connection pool as callers ○ The connection pool was holding a reentrant lock in a hot path while using a deque.
  • 15. Case Study – Finding the Problem 15
  • 16. 16 Case Study – Finding the Problem
  • 17. Case Study - Finding the Problem 17
  • 18. Case Study- Finding the Problem 18
  • 19. 19
  • 20. Case Study – Solving the Problem 20 no lock!
  • 21. Case Study – Solving the Problem 21 lock free structure
  • 22. Case Study – Solving the Problem 22 no locks!
  • 23. Case Study – Solving the Problem 23
  • 24. Case Study - Profiling 24
  • 25. Case Study – Testing 25 ● Unit testing ● Integration Testing ● Distributed Testing ● Concurrency Testing ● Performance Testing
  • 26. Case Study - Performance Testing 26 197,686 before 659,980 after
  • 27. Case Study - Performance Testing 27
  • 28. Other Bottlenecks – Over Eager Allocations 28 2 potentially unused objects per call – new HashSet() => 1 HashSet & 1 HashMap
  • 29. Other Bottlenecks – Over Eager Allocations (fixed) ● Do not allocate eagerly ● Allocate near first use ● Allocate after early returns that don’t use the allocated object 29
  • 30. Other Bottlenecks – Know Your Structures 30 Methods called for every operation and results in 1 add and 1 remove per op
  • 31. Other Bottlenecks – Know Your Structures (fixed) 31 Methods still called for every operation but does not allocate/gc
  • 32. How much has performance improved? 32
  • 33. Comparing Performance of 1.9.0 & 1.10.0 33 203,855 1.9.0 244,463 1.9.0 181,655 1.9.0 207,697 1.9.0 692,725 1.10.0 736,022 1.10.0 357,507 1.10.0 372,430 1.10.0
  • 34. Comparing Performance of 1.9.0 & 1.10.0 34 1,764,765 1.9.0 518,534 1.10.0 488,051 1.10.0 1,005,730 1.10.0 965,404 1.10.0 1,980,391 1.9.0 1,471,434 1.9.0 1,731,946 1.9.0
  • 35. Why Upgrade to Geode 1.10.0? 35
  • 36. Comparing Performance of 1.9.0 & 1.10.0 36 v. 1.10.0 v. 1.9.0 PartitionedGetBenchmark
  • 37. Relevant Links 37 ● Geode repo: https://github.com/apache/geode ● Benchmark repo: https://github.com/apache/geode-benchmarks ● JIRA query for Performance Issues: https://issues.apache.org/jira/browse/GEODE- 7134?jql=project%20%3D%20GEODE%20AND%20labels%20%3D %20performance

Editor's Notes

  1. Hi, my name is Helena Bales, and my pronouns are they/them. I am a Senior Software Engineer at Pivotal, working on GemFire, and have been a Geode committer for about a year and a half. Today I want to talk to you about Geode’s performance. Specifically, what is the performance, how it is measured, and how it can be improved.
  2. So let’s start with the most basic of those three questions: what is the performance of geode?
  3. So this is the performance of Geode. On the vertical axis is the throughput in operations per second, and the horizontal axis has four different benchmark tests. So we can see that PartitionedGetBenchmark had an average throughput of 200,000 operations per second. But what does that mean? AWS Machine info: type - c5.9xlarge; vCPU - 36; Memory - 72GiB; Network – 10 Gbps; EBS bandwidth – 7,000 Mbps
  4. To describe performance, just a number doesn’t tell much about the performance of geode, and just raises more questions. Like is 200,000 good? Is the measurement consistent and accurate? And has it changed since the last version? Perhaps most importantly, can it be improved?
  5. Well here’s my answers to those questions from when we started these new benchmarks. We had pretty good performance, but we were seeing some variance between runs, and some issues with stop-the-world garbage collections. We also saw some improvements from the previous version, but also, a lot of room for improvement.
  6. But that brings up one more question. How do we know any of this?
  7. To answer that, lets start by talking about what the benchmarks test now.
  8. When the Performance team started replacing the previous bare metal performance testing of Geode, we had several goals for the project. These are the ones that we have completed so far. The benchmark can be run on demand against again revision of Geode (released or in development), on an AWS cluster or on any dev machine. They can also be run from Concourse CI pipelines. We also enabled running with a profiler attached for use in debugging performance bottlenecks. And finally, we can compare any two runs of benchmarks for changes in performance.
  9. And these are the goals that we are still working on. We want benchmarks to be run by members of the Geode community against their changes to Geode, or against their deployments. So far we have not received feedback that anyone outside of our office has used this project. We also would like for members of the community to create their own benchmarks to add to the existing list. The visualization of data is also something that is in progress, as that requires many iterations to get right. And finally, we would like to increase the test coverage that Benchmarks provide over Geode.
  10. This is our current lists of tests. We’re only going to focus on the highlighted four today, but you can see that we do have some good coverage over operations so far. The four that we are going to focus on are ReplicatedGetBenchmark, PartitionedGetBenchmark, ReplicatedPutBenchmark, and PartitionedPutBenchmark. That’s because gets and puts on replicated and partitioned regions are some of the most commonly used and basic operations that Geode supports.
  11. With those tests, we also provide many different configuration options for running the benchmarks. We support running the cluster with and without SSL enabled, with JDKs 8 through 13, with or without security manager enabled, with a variety of garbage collectors, and with adjustable max heap size. These options all change the geode cluster, so it has increased our coverage to run with them.
  12. So now that we all know a bit about the goals and features of this new benchmarking framework, lets pivot and talk about how we can improve performance.
  13. And the first step to fixing performance issues is finding them. Here are some of the things to look for using the profiler, including both monitor and reentrant locks, extra allocations and garbage collections. Other things to look for are overuse of synchronization, getting a system property in a hot path, lazy initialization of objects in a hot path, and synchronization on a container such as a hash map. And I’ll go over some examples of these in a bit.
  14. So now lets focus on a specific example of a performance refactor. Starting with the reason that we thought that anything was wrong in the first place. When the benchmark was run, none of the resources were saturated, and we couldn’t figure out where the bottleneck was since the profiler gave us no hot spots. Eventually we found the secret profiler option that shows the zero-time reentrant locks and found that Threak.park became a hotspot, and its callers were reentrant lock and the connection pool. So eventually we found that the connection pool was holding a reentrant lock in a hot path while using a deque.
  15. Highlight 36 vcpu aws instance This graph shows the average performance of the get operation with different numbers of threads on the client, using version 1.9.0 of Geode. As you can see, the performance stops scaling pretty quickly after 32 threads. This ended up being due to the connection pool. It did not support enough concurrent operations for more than 32 threads, causing decreasing performance.
  16. And the profiler shows where the issue is occurring in the code. Every operation that is executed on the server results in one call to borrowConnection and one call to returnConnection. Both of those methods get a reentrant lock. This lock is responsible for almost half the time spent in these two methods. This is the cause of that taper in performance as the thread count increases, and contention for the lock increases as operations both borrow and return connections concurrently.
  17. Here is the issue in code. This is a parred down version of the ConnectionManagerImpl which implements the ConnectionManager. With the first arrow here I have highlighted that the available connections are being stored in a deque. Because a deque is not a thread-safe structure, the second arrow highlights the reentrant lock that was appearing in the profiler.
  18. I’m going to focus on the borrow operation when talking about this issue, but it is also an issue in the returnConnection method. There are also two signatures of borrowConnection, one of which looks for a connection to a specific server. The other just takes a timeout and gets a connection to any available server. This is the one that I’ll be focusing on from here on.
  19. So this is the borrowConntection method. The red arrows on the left highlights that the lock is held for a significant portion of the method. And note that this is a collapsed view of the method to fit on one page. Holding the lock for this long makes it difficult for multiple threads to use the connection pool at the same time. Another issue with this code is that there is an await in here. The await causes the thread to be paused until the condition has been met and a signal is received. During this time, the lock is returned. This means that it must be reacquired before the thread can continue. This further delays the return of a connection to the caller by, in the worst case, the duration of the timeout plus the time it takes to reacquire the locks in with contention.
  20. Let’s move on and discuss the solution to this issue. The first part is to replace the deque in the connection manager with something else. In order to introduce some modularity of this code, all of the behavior related to the available connections will be moved into another class, called the AvailableConnectionManager. This allows us to get rid of the lock in the connection manager. This is due to the implementation of AvailableConnectionManager.
  21. This is the signature for the AvailableConnectionManager. As you can see, the deque has been replaced with a concurrent linked deque. The linked nature of the deque does provide some performance hits due to the need to allocate and garbage collect the nodes. This structure relies on Compare And Swap for a lock-free implementation, making the ConcurrentLinkedDeque the ideal choice for this implementation.
  22. With that change in mind, this is what borrowConnection() in the ConnectionManager looks like now. There are no locks in this method. Instead, we call useFirst on the available connection manager, with a predicate to get a connection to the server that we want.
  23. And this is the implementation of useFirst. There is still no locking in this method, and removeFirstOccurence is thread-safe, meaning that with a sufficiently large pool of connections, scaling should continue well past 32 threads on the client.
  24. To test if this new solution has other hot spots, we can use a profiler. And this time, you can see that the operation still results in an execution on the server, which calls both borrow and return connection, but both of those calls take 0-1% of the time spent in those methods. This provides good confidence that this implementation does not have a performance bottleneck.
  25. The new implementation of ConnectionManagerImpl and AvailableConnectionManager have been thoroughly tested at every level. I’m sure most of you are familiar with the concepts of unit and integration testing. But this has also been tested in three other ways. Distributed tests test how the connection manager behaves in a real Geode cluster. A cluster is spun up in several VMs and operations are run, causing connections to be created and destroyed, borrowed and returned. The next type of test is the Concurrency Test. For concurrency testing, an executor is given multiple threads to be run in parallel, applying pressure to the connection manager to test that certain timings do not result in concurrency issues. And finally, the testing that we’ve been talking about this whole time, performance testing.
  26. These are the results of the performance test, comparing the commit before the refactor with the refactor code. As you can see, this one commit results in a 239% increase in PartitionedGetBenchmark. And with this run of the tests the CPU of the client was saturated.
  27. Here is how the performance scales with the number of threads on the client in version 1.10.0, which includes the connection pool refactor as well as several other smaller refactors. As you can see, scaling continues significantly beyond 32 threads.
  28. So now let’s quickly look at a couple of other performance bottlenecks, starting with over eager allocations. What I mean by that is allocation objects long before they are used, resulting in excess garbage production. So once again, ignore most of the code here, and focus on the highlighted areas. Note that the declaration of the attemptedServers object (the first highlighted aread) occurs well before the first use of that object, the second yellow highlighted area. And since there is an early return, highlighted in green, between the declaration and the first usage, there is a chance that the object could be allocated and garbage collected without ever having been used. And in this case, a HashMap is being allocated, which results in one HashSet and one HashMap, creating a significant amount of garbage.
  29. The best way to avoid this issue is to allocate close to the first use of that object. And make sure that you’re allocating before early returns that would allow you to avoid allocating the object in the first place.
  30. Another common performance bottleneck is caused by choosing the wrong structure for the implementation. In this case, a linked list is used. This code is a hot path, and the borrowConnection and returnConnection are each called once per operation. This means that each operation results in one allocation of a node and one dereference of a node.
  31. In this case, a deque is a better choice, since the connection pool is of relatively constant size, the deque will not need to be resized very often. This shows the importance for performance of understanding your data structures.
  32. So to wrap things up, let’s talk about how much performance has improved.
  33. This is a graph of the throughputs of our four tests in version 1.10.0 compared to version 1.9.0. Each of these tests saw a significant improvement in performance due to the connection pool and other refactors.
  34. This is a similar graph to the previous slide but for latency. It shows that latency was also reduced by a significant amount between versions 1.9.0 and 1.10.0.
  35. So why should you upgrade to Geode 1.10.0?
  36. Well I think we’d all rather see a vsd output like the red line instead of the blue.
  37. Finally, I’d like to point you to some useful resources. We’d love to have more people use these benchmarks. There are instructions for running, and adding new benchmarks, in the benchmarks repository. We also have a great list of performance bottlenecks that we’ve found in our investigations but have not been able to prioritize. If you’re interested in working on performance issues, check out this JIRA search.