SlideShare a Scribd company logo
1 of 25
Download to read offline
Comparative Performance Analysis of an
Algebraic-Multigrid Solver on
Leading Multicore Architectures
Brian Austin, Alex Druinsky, Pieter Ghysels, Xiaoye Sherry Li,
Osni A. Marques, Eric Roman, Samuel Williams
Lawrence Berkeley National Laboratory
Andrew Barker, Panayot Vassilevski
Lawrence Livermore National Laboratory
Delyan Kalchev
University of Colorado, Boulder
What this talk is about
Performance optimization, comparison and modeling of
a novel shared-memory algebraic-multigrid solver
using the SPE10 reservoir-modeling problem
on a node of Cray XC30 and on a Xeon Phi.
How our multigrid solver works
Repeat until converged:
pre-smoothing y ← x + M−1(b − Ax)
coarse-grid correction z ← y + PA−1
c PT (b − Ay)
post-smoothing x ← z + M−1(b − Az)
How we construct the interpolator
=
SP P�
How we construct the coarse-grid matrix
=
Ac PAPT
What the spe10 problem is and how we are solving it
Credit: http://www.spe10.org
oil-reservoir modeling benchmark problem
solved using Darcy’s equation (in primal form)
− · (κ(x) p(x)) = f (x) ,
where p(x) = pressure, and κ(x) = permeability
defined over a 60 × 220 × 85 grid
with isotropic and anisotropic versions
What the spe10 problem is and how we are solving it
oil-reservoir modeling benchmark problem
solved using Darcy’s equation (in primal form)
− · (κ(x) p(x)) = f (x) ,
where p(x) = pressure, and κ(x) = permeability
defined over a 60 × 220 × 85 grid
with isotropic and anisotropic versions
What are the machines that we study?
Edison Babbage
name Ivy Bridge Knights Corner
model Xeon E5-2695 v2 Xeon Phi 5110P
clock speed 2.4 GHz 1.053 GHz
cores 12 60
SMT threads 2 4
SIMD width 4 8
peak gflop/s 230.4 1010.88
bandwidth 48.5 GB/s 122.9 GB/s
per-core caches:
L1-D 32 KB 32 KB
L2 256 KB 512 KB
shared cache:
L3 30 MB none
What the coarse-grid system is
n = 7,782; nnz = 1,412,840; nnz/n = 181.6
How we chose the preconditioner for PCG
preconditioner operator
Jacobi z = D−1r
Symmetric Gauss–Seidel z = (L + D)−1D(L + D)−T r
= + +
Ac L D LT
How we chose the preconditioner for PCG
unprecond Jacobi SGS
conditioning
isotropic 3.37 × 104 1.35 × 103 1.83 × 102
anisotropic 9.68 × 106 1.89 × 104 2.91 × 103
iterations
isotropic 605.53 194.57 78.87
anisotropic 1,267.85 288.32 122.85
How we chose the preconditioner for PCG
SGS Jacobi
1 thread 1 thread 12 threads
time (s)
isotropic 83.0 80.3 29.2
anisotropic 128.6 121.6 43.8
Where does the AMG cycle spend most of its time?
1 2 4 8 16 32 60 120
32
64
128
256
512
1,024
2,048
smoothing
PCG
total
number of threads
runtime (s)
How to improve the performance of PCG
1: while not converged do
2: ρ ← σ
3: omp parallel for: w ← Ap
4: omp parallel for: τ ← w · p
5: α ← ρ/τ
6: omp parallel for: x ← x + αp
7: omp parallel for: r ← r − αw
8: omp parallel for: z ← M−1r
9: omp parallel for: σ ← z · r
10: β ← σ/ρ
11: omp parallel for: p ← z + βp
12: end while
How to improve the performance of PCG
1: omp parallel
2: while not converged do
3: omp single: τ ← 0.0 implied barrier
4: omp single nowait: ρ ← σ, σ ← 0.0
5: omp for nowait: w ← Ap
6: omp for reduction: τ ← w · p implied barrier
7: α ← ρ/τ
8: omp for nowait: x ← x + αp
9: omp for nowait: r ← r − αw
10: omp for nowait: z ← M−1r
11: omp for reduction: σ ← z · r implied barrier
12: β ← σ/ρ
13: omp for nowait: p ← z + βp
14: end while
15: end omp parallel
How to improve the performance of PCG
1: omp parallel
2: while not converged do
3: omp for: w ← Ap
4: omp single
5: τ ← w · p
6: α ← ρ/τ
7: x ← x + αp
8: r ← r − αw
9: z ← M−1r
10: ρ ← σ
11: σ ← z · r
12: β ← σ/ρ
13: p ← z + βp
14: end omp single
15: end while
16: end omp parallel
How to improve the performance of PCG
1: while not converged do
2: ρ ← σ
3: omp parallel for: w ← Ap
4: τ ← w · p
5: α ← ρ/τ
6: x ← x + αp
7: r ← r − αw
8: z ← M−1r
9: σ ← z · r
10: β ← σ/ρ
11: p ← z + βp
12: end while
How to improve the performance of PCG
1 2 4 8 16 32 60 120
16
32
64
128
256
number of threads
runtime (s)
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
How the sparse HSS solver works
sparse matrix-factorization
algorithm
represents the frontal matrices
as hierarchically-semiseparable
(HSS) matrices
uses randomized sampling for
faster compression
D1
D2
D4
D5
D8
D9
D11
D12
U3B3V6
H 7 B14
H
U6B6V3
H
B
U7
U3R3
U6R6
=
More details in Pieter Ghysels’ talk tomorrow!
How do the parameters of the solver affect performance?
Parameter Values
coarse solver HSS, PCG
elements-per-agglomerate 64, 128, 256, 512
νP 0, 1, 2
νM−1 1, 3, 5
θ 0.001, 0.001 × 100.5, 0.01
How do the parameters of the solver affect performance?
1%2%4%8%16%32%64%
8
16
32
64
128
percentile rank
runtime (s)
Babbage (HSS)
Babbage (PCG)
Edison (HSS)
Edison (PCG)
default configuration
What our performance model is
stage bytes flops
pre- and post-smooth (3ν + 1)(12 nza + 3 · 8n) 2(3ν + 1)(nza + 2n)
restriction 12 nza + 12 nzp + 3 · 8n 2(nza + nzp)
one coarse solve
multiply by Ac 12 nzc 2 nzc
preconditioner 2 · 8nc nc
vector operations 5 · 8nc 2 · 5nc
interpolation 12 nzp + 8n 2 nzp
stopping criterion 12 nza + 4 · 8n 2(nza + n)
What our performance model is
1 2 4 8 12
8
16
32
64
128
memory bound
flops bound
actual
number of cores
runtime (s)
Final comments
HSS is an attractive option for solving coarse systems
performance is quite sensitive to parameter tuning
performance model indicates where the bottlenecks are
Thank you!

More Related Content

What's hot

All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code Ehsan Sharifi
 
Linux-Permission
Linux-PermissionLinux-Permission
Linux-PermissionColin Su
 
JVM memory management & Diagnostics
JVM memory management & DiagnosticsJVM memory management & Diagnostics
JVM memory management & DiagnosticsDhaval Shah
 
Ethereum 9¾ @ Devcon5
Ethereum 9¾ @ Devcon5Ethereum 9¾ @ Devcon5
Ethereum 9¾ @ Devcon5Wanseob Lim
 
aiboのAI:DeepLearning認識
aiboのAI:DeepLearning認識aiboのAI:DeepLearning認識
aiboのAI:DeepLearning認識Naoki Fujiwara
 
Lagrangian Relaxation of Magnetic Fields
Lagrangian Relaxation of Magnetic FieldsLagrangian Relaxation of Magnetic Fields
Lagrangian Relaxation of Magnetic FieldsSimon Candelaresi
 
Cloud flare jgc bigo meetup rolling hashes
Cloud flare jgc   bigo meetup rolling hashesCloud flare jgc   bigo meetup rolling hashes
Cloud flare jgc bigo meetup rolling hashesCloudflare
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalfSalo Shp
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Dr. Volkan OBAN
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning FrameworksTakeo Imai
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
証明駆動開発のたのしみ@名古屋reject会議
証明駆動開発のたのしみ@名古屋reject会議証明駆動開発のたのしみ@名古屋reject会議
証明駆動開発のたのしみ@名古屋reject会議Hiroki Mizuno
 
Gc in golang
Gc in golangGc in golang
Gc in golangGenchi Lu
 
The impact of supercomputers on MSR
The impact of supercomputers on MSRThe impact of supercomputers on MSR
The impact of supercomputers on MSRYasutaka Kamei
 

What's hot (20)

Auto
AutoAuto
Auto
 
Stream
StreamStream
Stream
 
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
 
Linux-Permission
Linux-PermissionLinux-Permission
Linux-Permission
 
JVM memory management & Diagnostics
JVM memory management & DiagnosticsJVM memory management & Diagnostics
JVM memory management & Diagnostics
 
Ethereum 9¾ @ Devcon5
Ethereum 9¾ @ Devcon5Ethereum 9¾ @ Devcon5
Ethereum 9¾ @ Devcon5
 
aiboのAI:DeepLearning認識
aiboのAI:DeepLearning認識aiboのAI:DeepLearning認識
aiboのAI:DeepLearning認識
 
Lagrangian Relaxation of Magnetic Fields
Lagrangian Relaxation of Magnetic FieldsLagrangian Relaxation of Magnetic Fields
Lagrangian Relaxation of Magnetic Fields
 
Cloud flare jgc bigo meetup rolling hashes
Cloud flare jgc   bigo meetup rolling hashesCloud flare jgc   bigo meetup rolling hashes
Cloud flare jgc bigo meetup rolling hashes
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalf
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
doc
docdoc
doc
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
 
ALPSチュートリアル
ALPSチュートリアルALPSチュートリアル
ALPSチュートリアル
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
証明駆動開発のたのしみ@名古屋reject会議
証明駆動開発のたのしみ@名古屋reject会議証明駆動開発のたのしみ@名古屋reject会議
証明駆動開発のたのしみ@名古屋reject会議
 
Gc in golang
Gc in golangGc in golang
Gc in golang
 
The impact of supercomputers on MSR
The impact of supercomputers on MSRThe impact of supercomputers on MSR
The impact of supercomputers on MSR
 

Similar to Druinsky_SIAMCSE15

ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...Deepak Malani
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxMannyK4
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingMartino Ferrari
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Symbolic Regression on Network Properties
Symbolic Regression on Network PropertiesSymbolic Regression on Network Properties
Symbolic Regression on Network PropertiesMarcus Märtens
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number GeneratorsDarshini Parikh
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Manchor Ko
 
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores
 

Similar to Druinsky_SIAMCSE15 (20)

ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsx
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 
fast-matmul-cse15
fast-matmul-cse15fast-matmul-cse15
fast-matmul-cse15
 
sheet6.pdf
sheet6.pdfsheet6.pdf
sheet6.pdf
 
doc6.pdf
doc6.pdfdoc6.pdf
doc6.pdf
 
paper6.pdf
paper6.pdfpaper6.pdf
paper6.pdf
 
lecture5.pdf
lecture5.pdflecture5.pdf
lecture5.pdf
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive Subsampling
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Symbolic Regression on Network Properties
Symbolic Regression on Network PropertiesSymbolic Regression on Network Properties
Symbolic Regression on Network Properties
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
 

More from Karen Pao

LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15Karen Pao
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15Karen Pao
 
Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15Karen Pao
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15Karen Pao
 
Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15Karen Pao
 
Slattery_SIAMCSE15
Slattery_SIAMCSE15Slattery_SIAMCSE15
Slattery_SIAMCSE15Karen Pao
 
Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Karen Pao
 
Dubey_SIAMCSE15
Dubey_SIAMCSE15Dubey_SIAMCSE15
Dubey_SIAMCSE15Karen Pao
 

More from Karen Pao (8)

LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15
 
Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15
 
Slattery_SIAMCSE15
Slattery_SIAMCSE15Slattery_SIAMCSE15
Slattery_SIAMCSE15
 
Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Loffeld_SIAMCSE15
Loffeld_SIAMCSE15
 
Dubey_SIAMCSE15
Dubey_SIAMCSE15Dubey_SIAMCSE15
Dubey_SIAMCSE15
 

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Druinsky_SIAMCSE15

  • 1. Comparative Performance Analysis of an Algebraic-Multigrid Solver on Leading Multicore Architectures Brian Austin, Alex Druinsky, Pieter Ghysels, Xiaoye Sherry Li, Osni A. Marques, Eric Roman, Samuel Williams Lawrence Berkeley National Laboratory Andrew Barker, Panayot Vassilevski Lawrence Livermore National Laboratory Delyan Kalchev University of Colorado, Boulder
  • 2. What this talk is about Performance optimization, comparison and modeling of a novel shared-memory algebraic-multigrid solver using the SPE10 reservoir-modeling problem on a node of Cray XC30 and on a Xeon Phi.
  • 3. How our multigrid solver works Repeat until converged: pre-smoothing y ← x + M−1(b − Ax) coarse-grid correction z ← y + PA−1 c PT (b − Ay) post-smoothing x ← z + M−1(b − Az)
  • 4. How we construct the interpolator = SP P�
  • 5. How we construct the coarse-grid matrix = Ac PAPT
  • 6. What the spe10 problem is and how we are solving it Credit: http://www.spe10.org oil-reservoir modeling benchmark problem solved using Darcy’s equation (in primal form) − · (κ(x) p(x)) = f (x) , where p(x) = pressure, and κ(x) = permeability defined over a 60 × 220 × 85 grid with isotropic and anisotropic versions
  • 7. What the spe10 problem is and how we are solving it oil-reservoir modeling benchmark problem solved using Darcy’s equation (in primal form) − · (κ(x) p(x)) = f (x) , where p(x) = pressure, and κ(x) = permeability defined over a 60 × 220 × 85 grid with isotropic and anisotropic versions
  • 8. What are the machines that we study? Edison Babbage name Ivy Bridge Knights Corner model Xeon E5-2695 v2 Xeon Phi 5110P clock speed 2.4 GHz 1.053 GHz cores 12 60 SMT threads 2 4 SIMD width 4 8 peak gflop/s 230.4 1010.88 bandwidth 48.5 GB/s 122.9 GB/s per-core caches: L1-D 32 KB 32 KB L2 256 KB 512 KB shared cache: L3 30 MB none
  • 9. What the coarse-grid system is n = 7,782; nnz = 1,412,840; nnz/n = 181.6
  • 10. How we chose the preconditioner for PCG preconditioner operator Jacobi z = D−1r Symmetric Gauss–Seidel z = (L + D)−1D(L + D)−T r = + + Ac L D LT
  • 11. How we chose the preconditioner for PCG unprecond Jacobi SGS conditioning isotropic 3.37 × 104 1.35 × 103 1.83 × 102 anisotropic 9.68 × 106 1.89 × 104 2.91 × 103 iterations isotropic 605.53 194.57 78.87 anisotropic 1,267.85 288.32 122.85
  • 12. How we chose the preconditioner for PCG SGS Jacobi 1 thread 1 thread 12 threads time (s) isotropic 83.0 80.3 29.2 anisotropic 128.6 121.6 43.8
  • 13. Where does the AMG cycle spend most of its time? 1 2 4 8 16 32 60 120 32 64 128 256 512 1,024 2,048 smoothing PCG total number of threads runtime (s)
  • 14. How to improve the performance of PCG 1: while not converged do 2: ρ ← σ 3: omp parallel for: w ← Ap 4: omp parallel for: τ ← w · p 5: α ← ρ/τ 6: omp parallel for: x ← x + αp 7: omp parallel for: r ← r − αw 8: omp parallel for: z ← M−1r 9: omp parallel for: σ ← z · r 10: β ← σ/ρ 11: omp parallel for: p ← z + βp 12: end while
  • 15. How to improve the performance of PCG 1: omp parallel 2: while not converged do 3: omp single: τ ← 0.0 implied barrier 4: omp single nowait: ρ ← σ, σ ← 0.0 5: omp for nowait: w ← Ap 6: omp for reduction: τ ← w · p implied barrier 7: α ← ρ/τ 8: omp for nowait: x ← x + αp 9: omp for nowait: r ← r − αw 10: omp for nowait: z ← M−1r 11: omp for reduction: σ ← z · r implied barrier 12: β ← σ/ρ 13: omp for nowait: p ← z + βp 14: end while 15: end omp parallel
  • 16. How to improve the performance of PCG 1: omp parallel 2: while not converged do 3: omp for: w ← Ap 4: omp single 5: τ ← w · p 6: α ← ρ/τ 7: x ← x + αp 8: r ← r − αw 9: z ← M−1r 10: ρ ← σ 11: σ ← z · r 12: β ← σ/ρ 13: p ← z + βp 14: end omp single 15: end while 16: end omp parallel
  • 17. How to improve the performance of PCG 1: while not converged do 2: ρ ← σ 3: omp parallel for: w ← Ap 4: τ ← w · p 5: α ← ρ/τ 6: x ← x + αp 7: r ← r − αw 8: z ← M−1r 9: σ ← z · r 10: β ← σ/ρ 11: p ← z + βp 12: end while
  • 18. How to improve the performance of PCG 1 2 4 8 16 32 60 120 16 32 64 128 256 number of threads runtime (s) Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4
  • 19. How the sparse HSS solver works sparse matrix-factorization algorithm represents the frontal matrices as hierarchically-semiseparable (HSS) matrices uses randomized sampling for faster compression D1 D2 D4 D5 D8 D9 D11 D12 U3B3V6 H 7 B14 H U6B6V3 H B U7 U3R3 U6R6 = More details in Pieter Ghysels’ talk tomorrow!
  • 20. How do the parameters of the solver affect performance? Parameter Values coarse solver HSS, PCG elements-per-agglomerate 64, 128, 256, 512 νP 0, 1, 2 νM−1 1, 3, 5 θ 0.001, 0.001 × 100.5, 0.01
  • 21. How do the parameters of the solver affect performance? 1%2%4%8%16%32%64% 8 16 32 64 128 percentile rank runtime (s) Babbage (HSS) Babbage (PCG) Edison (HSS) Edison (PCG) default configuration
  • 22. What our performance model is stage bytes flops pre- and post-smooth (3ν + 1)(12 nza + 3 · 8n) 2(3ν + 1)(nza + 2n) restriction 12 nza + 12 nzp + 3 · 8n 2(nza + nzp) one coarse solve multiply by Ac 12 nzc 2 nzc preconditioner 2 · 8nc nc vector operations 5 · 8nc 2 · 5nc interpolation 12 nzp + 8n 2 nzp stopping criterion 12 nza + 4 · 8n 2(nza + n)
  • 23. What our performance model is 1 2 4 8 12 8 16 32 64 128 memory bound flops bound actual number of cores runtime (s)
  • 24. Final comments HSS is an attractive option for solving coarse systems performance is quite sensitive to parameter tuning performance model indicates where the bottlenecks are