SlideShare a Scribd company logo
1 of 25
Download to read offline
Parallel Algorithms
for Monte Carlo
Linear Solvers
Stuart Slattery
Steven Hamilton
Tom Evans (PI)
Oak Ridge National Laboratory
March 17, 2015
Acknowledgments
This material is based upon work supported by the U.S. Department of
Energy, Office of Science, Advanced Scientific Computing Research
program.
This research used resources of the Oak Ridge Leadership Computing
Facility at the Oak Ridge National Laboratory, which is supported by the
Office of Science of the U.S. Department of Energy under Contract No.
DE-AC05-00OR22725.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 2 / 22
1. Motivation
2. Monte Carlo Methods
3. Domain Decomposition and Replication
4. Scaling Studies
5. Algorithm Variations
6. Current/Future Work and Conclusions
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 3 / 22
Motivation
As we move towards exascale computing, the rate of errors is
expected to increase dramatically
The probability that a compute node will fail during the course of a
large scale calculation may be near 1
Algorithms need to not only have increased concurrency/scalability
but have the ability to recover from hardware faults
Lightweight machines
Heterogeneous machines
Both characterized by low power and high concurrency
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 4 / 22
Towards Exascale Concurrency and Resiliency
Two basic strategies:
1 Start with current “state of the art” methods and make incremental
modifications to improve scalability and fault tolerance
Many efforts are heading in this direction, attempting to find additional
concurrency to exploit
2 Start with methods having natural scalability and resiliency aspects and
work at improving performance (e.g. Monte Carlo)
Soft failures introduce an additional stochastic error component
Hard failures potentially mitigated by replication
Concurrency enabled by several levels of parallelism
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 5 / 22
Monte Carlo for Linear Systems
Suppose we want to solve Ax = b
If ρ(I − A) < 1, we can write the solution using the Neumann series
x =
∞
n=0
(I − A)n
b =
∞
n=0
Hn
b
where H ≡ (I − A) is the Richardson iteration matrix
Build the Neumann series stochastically
xi =
∞
k=0
N
i1
N
i2
. . .
N
ik
hi,i1 hi1,i2 . . . hik−1,ik
bik
Define a sequence of state transitions
ν = i → i1 → · · · → ik−1 → ik
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 6 / 22
Forward Monte Carlo
Choose a row-stochastic matrix P and weight matrix W such that
H = P ◦ W
Typical choice (Monte Carlo Almost-Optimal):
Pij =
|Hij|
N
j=1 |Hij|
To compute solution component xi:
Start a history in state i (with initial weight of 1)
Transition to new state j based probabilities determined by Pi
Modify history weight based on corresponding entry in Wij
Add contribution to xi based on current history weight and value of bj
A given random walk can only contribute to a single component of
the solution vector with x ≈ MMCb
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 7 / 22
Sampling Example (Forward Monte Carlo)
Suppose
A =


1.0 −0.2 −0.6
−0.4 1.0 −0.4
−0.1 −0.4 1.0

 → H ≡ (I − A) =


0.0 0.2 0.6
0.4 0.0 0.4
0.1 0.4 0.0


then
P =


0.0 0.25 0.75
0.5 0.0 0.5
0.2 0.8 0.0

 , W =


0.0 0.8 0.8
0.8 0.0 0.8
0.5 0.5 0.0


If a history is started in state 3, there is a 20% chance of it
transitioning to state 1 and an 80% chance of moving to state 2
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 8 / 22
Solving the Heat Equation: Forward Method
Figure : Forward solution. 2.5 × 103
total histories.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
Solving the Heat Equation: Forward Method
Figure : Forward solution. 2.5 × 104
total histories.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
Solving the Heat Equation: Forward Method
Figure : Forward solution. 2.5 × 105
total histories.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
Solving the Heat Equation: Forward Method
Figure : Forward solution. 2.5 × 106
total histories.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
Domain Decomposed Monte Carlo
Each parallel process owns a
piece of the domain (linear
system)
Random walks must be
transported between adjacent
domains through parallel
communication
Domain decomposition
determined by the input system
Load balancing not yet
addressed
C
A
B
Figure : Domain decomposition example
illustrating how domain-to-domain
transport creates communication costs.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 10 / 22
Asynchronous Monte Carlo Transport Kernel
General extension of the Milagro
algorithm (LANL)
Asynchronous nearest neighbor
communication of histories
System-tunable communication
parameters of buffer size and check
frequency (performance impact)
Need an asynchronous strategy for
exiting the transport loop without a
collective (running sum)
8
0 1 2
3 4 5
6 7
True
Continue
Send message if buffer full
Transport Source History
Transport Bank History
Send message if buffer full
False
False
False
False
False
True
True
True
TrueLocal Source and
Bank Empty
Add Messages to Bank
Exit Loop
Check Frequency
Local Bank Not Empty
Local Source Not Empty
Completion Signal
Control Termination
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 11 / 22
Exiting the Transport Loop without Collectives
# RUN
1 2 3 4 5 6 7 8
0
STOPSTOP
# RUN
# RUN
STOP
# RUN STOP
STOP
# RUN
STOP
# RUN
STOP
# RUN
STOP
# RUN
Figure : Master/Slave
0
STOPSTOP
STOP
STOP STOP
STOP
STOP
# RUN
# RUN# RUN
# RUN
# RUN# RUN # RUN
# RUN
STOP
# RUN
3
7 8
4 5 6
1 2
Figure : Binary Tree
10 100 1000 10000 100000
# of Cores
0.50
0.60
0.70
0.80
0.90
1.00
1.10
AbsoluteEciency
MCSA Binary Tree MCSA Master/Slave
Figure : Weak scaling absolute efficiency
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 12 / 22
Replication
Different batches of Monte Carlo samples can be combined in summation
via superposition if they have different random number streams. For two
different batches:
MMCx =
1
2
(M1 + M2)x
Consider each of these batches independent subsets of a Monte Carlo
operator where now the operator can be formed as a general additive
decomposition of NS subsets:
MMC =
1
NS
Ns
n=1
Mn
We replicate the linear problem and form each subset on a different group
of parallel processes. Applying the subsets to a vector requires an
AllReduce to form the sum. Each subset is domain decomposed.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 13 / 22
Parallel Test - Simplified PN (SPN ) Assembly Problem
Figure : SPN solution example
The (SPN ) equations are an approximation
to the Boltzmann neutron transport
equation used to simulate nuclear reactors
− ·Dn Un+
4
m=1
AnmUm =
1
k
4
m=1
FnmUm
Scaling problem – 1 × 1 to 17 × 17
array of fuel assemblies with 289
pins each resolved by a 2 × 2 spatial
mesh and 200 axial zones
7 energy groups, 1 angular
moment, 1.6M to 273.5M
degrees of freedom
64 to 10,800 computational
cores via domain decomposition
We are usually interested in
solving generalized eigenvalue
problem - we use the operator
from these problems to test the
kernel scaling
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 14 / 22
Monte Carlo Communication Parameters
Message Check Frequency
128 256 512 1 024
Message Buffer Size
256 1.054 1.061 1.076 1.076
512 1.103 1.146 1.211 1.270
1 024 1.062 1.088 1.133 1.176
2 048 1.030 1.042 1.072 1.107
4 096 1.010 1.012 1.025 1.050
8 192 1.001 1.000 1.008 1.018
16 384 1.017 1.003 1.010 1.009
OLCF Eos: 736-node Cray XC30, Intel Xeon E5-2670, 11,776 cores,
47 TB memory, Cray Aries interconnect
64 cores, 1.6M DOFs, history length of 15, 3 histories per DOF
27% decrease in runtime observed for bad parameter choices
Worth the time to do this parameter study when running on new
hardware
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 15 / 22
Monte Carlo Scaling
Cores DOFs DOFs/Core Time Min (s) Time Max (s) Time Ave (s) Efficiency
256 273 509 600 1 068 397 260.53 260.54 260.54 1.00
1 024 273 509 600 267 099 61.92 61.92 61.92 1.05
4 096 273 509 600 66 775
7 744 273 509 600 35 319
10 816 273 509 600 25 288
Table : Strong Scaling
Cores DOFs DOFs/Core Time Min (s) Time Max (s) Time Ave (s) Efficiency
64 1 618 400 25 288 6.432 6.432 6.432 1.00
256 6 473 600 25 288 6.493 6.493 6.493 0.99
1 024 25 288
4 096 25 288
7 744 25 288
10 816 25 288
Table : Weak Scaling
Subsets Cores DOFs Time Min (s) Time Max (s) Time Ave (s) Efficiency
1 256 6 473 600 6.493 6.493 6.493 1.00
2 512 6 473 600
3 768 6 473 600
4 1 024 6 473 600
Table : Replication Scaling. 256 cores per subset.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 16 / 22
Monte Carlo Synthetic Acceleration (MCSA)
Devised by Evans and Mosher in the 2000’s as an acceleration scheme
for radiation diffusion problems (LANL)
Can be abstracted as a general linear solver with Monte Carlo as a
preconditioner
First Richardson step hits the high frequency error modes and second
Monte Carlo step hits the low frequency error modes
rk
= b − Axk
xk+1/2
= xk
+ rk
rk+1/2
= b − Axk+1/2
xk+1
= xk+1/2
+ MMCrk+1/2
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 17 / 22
Matrix-Free Algorithm
At each application of
MMC, execute the Monte
Carlo process
Must perform Monte Carlo
every time you want to
apply with a better
approximation requiring
more time and more
operations
Variations in random
number streams are
amortized over iterations
Vast majority of solve time
spent doing Monte Carlo
L NS MC Time (s) MC Fraction MCSA Iters
3 1 30.885 0.96 266
3 2 60.869 0.98 261
5 1 27.422 0.97 180
5 2 54.319 0.98 175
10 1 23.871 0.98 102
10 2 45.551 0.99 97
15 1 50.395 0.98 164
15 2 42.951 0.99 69
15 3 65.292 0.99 68
25 2 70.505 0.99 78
25 3 63.677 1.00 47
Table : MCSA performance. A had 115 600 rows
and 1 186 464 non-zero entries.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 18 / 22
Stochastic Approximate Inverse Algorithm
Construct MMC as a sparse matrix by executing the Monte Carlo process
once and tallying the row entries
Use this operator as a stochastic approximation to the inverse
A better approximation to the inverse requires more setup time and more
storage
We will investigate a drop tolerance strategy to control sparsity
L NS NNZ NNZ Ratio MC Time (s) Setup Time (s) MCSA Iters
3 2 484 714 0.41 0.104 0.671 255
3 3 622 123 0.52 0.145 0.705 255
5 2 783 153 0.66 0.158 0.737 185
5 3 1 032 573 0.87 0.237 0.831 171
5 4 1 241 442 1.05 0.302 0.906 171
10 3 1 969 540 1.66 0.433 1.061 95
10 4 2 416 572 2.04 0.570 1.214 95
15 3 2 867 005 2.42 0.645 1.317 132
15 4 3 544 181 2.99 0.833 1.534 67
15 5 4 157 269 3.50 1.029 1.765 66
Table : MCSA Performance. A had 115 600 rows and 1 186 464 non-zero entries.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 19 / 22
Unpreconditioned Algorithm Comparison
No preconditioning, serial computation, fastest MCSA times reported
GMRES easier to precondition - performance here only indicates Monte
Carlo potential
These results indicate good stochastic approximate inverse performance for
traditional CPU architectures
Matrix-free approach may be more effective when vectorized for new
architectures by favoring operations over storage - 95%+ of the runtime
spent in Monte Carlo
Solver Setup Time (s) Solve Time (s) Total Time (s) Iters
Richardson 2.098 1.6709 3.769 1 017
MCSA Matrix-Free 2.104 24.389 26.493 102
MCSA Approximate Inverse 2.953 0.779 3.731 95
Belos GMRES 1.791 1.021 2.812 81
Table : A had 115 600 rows and 1 186 464 non-zero entries.
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 20 / 22
Current Work - Vectorization and Threading
We have implemented a Monte Carlo kernel using the Kokkos
threading model (Trilinos)
The kernel supports multi-threaded CPU, GPU, and Xeon Phi
architectures with a single implementation
Vectorization an area of active research with a focus on
heterogeneous architectures (Titan and Summit)
Currently investigating event-based algorithms to enable vectorization
Event-based algorithm concepts are based on vectorized Monte Carlo
algorithms from particle transport
We are exploring an additive-Schwarz formulation to eliminate parallel
communication in the Monte Carlo kernel
Other threading models will be considered (e.g. HPX)
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 21 / 22
Conclusions and Future Work
Monte Carlo methods offer great potential for both resilient and highly
parallel solvers
A fully asynchronous algorithm provides scalability without collectives
Replication potentially offers resiliency with some overhead
Matrix-free and stochastic approximate inverse algorithms are
complementary - trade operations for storage
Extending methods to broader problem areas is a significant algorithmic
challenge and an attractive area for continued research
Explicit preconditioners are required for all problems
Performance modeling and resiliency simulations this FY
Fault injection studies using the xSim simulator
Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 22 / 22

More Related Content

What's hot

Eh4 energy harvesting due to random excitations and optimal design
Eh4   energy harvesting due to random excitations and optimal designEh4   energy harvesting due to random excitations and optimal design
Eh4 energy harvesting due to random excitations and optimal designUniversity of Glasgow
 
3 modelling of physical systems
3 modelling of physical systems3 modelling of physical systems
3 modelling of physical systemsJoanna Lock
 
Meeting w4 chapter 2 part 2
Meeting w4   chapter 2 part 2Meeting w4   chapter 2 part 2
Meeting w4 chapter 2 part 2Hattori Sidek
 
Presentation on shunt active filter
Presentation on shunt active filterPresentation on shunt active filter
Presentation on shunt active filterMaharshi Gohel
 
A short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum CryptographyA short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum CryptographyFacultad de Informática UCM
 
Unit 2 signal &amp;system
Unit 2 signal &amp;systemUnit 2 signal &amp;system
Unit 2 signal &amp;systemsushant7dare
 
Week 10 part 3 pe 6282 mecchanical liquid and electrical
Week 10 part 3 pe 6282 mecchanical liquid and electricalWeek 10 part 3 pe 6282 mecchanical liquid and electrical
Week 10 part 3 pe 6282 mecchanical liquid and electricalCharlton Inao
 
Exercise 1a transfer functions - solutions
Exercise 1a   transfer functions - solutionsExercise 1a   transfer functions - solutions
Exercise 1a transfer functions - solutionswondimu wolde
 
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...Amr E. Mohamed
 

What's hot (12)

Ph.D. Defense
Ph.D. DefensePh.D. Defense
Ph.D. Defense
 
Eh4 energy harvesting due to random excitations and optimal design
Eh4   energy harvesting due to random excitations and optimal designEh4   energy harvesting due to random excitations and optimal design
Eh4 energy harvesting due to random excitations and optimal design
 
Control chap2
Control chap2Control chap2
Control chap2
 
3 modelling of physical systems
3 modelling of physical systems3 modelling of physical systems
3 modelling of physical systems
 
Meeting w4 chapter 2 part 2
Meeting w4   chapter 2 part 2Meeting w4   chapter 2 part 2
Meeting w4 chapter 2 part 2
 
Presentation on shunt active filter
Presentation on shunt active filterPresentation on shunt active filter
Presentation on shunt active filter
 
A short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum CryptographyA short introduction to Quantum Computing and Quantum Cryptography
A short introduction to Quantum Computing and Quantum Cryptography
 
Unit 2 signal &amp;system
Unit 2 signal &amp;systemUnit 2 signal &amp;system
Unit 2 signal &amp;system
 
Week 10 part 3 pe 6282 mecchanical liquid and electrical
Week 10 part 3 pe 6282 mecchanical liquid and electricalWeek 10 part 3 pe 6282 mecchanical liquid and electrical
Week 10 part 3 pe 6282 mecchanical liquid and electrical
 
Exercise 1a transfer functions - solutions
Exercise 1a   transfer functions - solutionsExercise 1a   transfer functions - solutions
Exercise 1a transfer functions - solutions
 
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
 
Control chap8
Control chap8Control chap8
Control chap8
 

Viewers also liked

Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15Karen Pao
 
Steven N. Volpe resume
Steven N. Volpe resumeSteven N. Volpe resume
Steven N. Volpe resumeSteven Volpe
 
Anti Blah! A quick guide to brandable business names
Anti Blah! A quick guide to brandable business namesAnti Blah! A quick guide to brandable business names
Anti Blah! A quick guide to brandable business namesDave Clark
 
RogerResume2015v1
RogerResume2015v1RogerResume2015v1
RogerResume2015v1Roger Walls
 
Волдимир Винниченко
Волдимир ВинниченкоВолдимир Винниченко
Волдимир Винниченкоnjhujdbwz
 
Леся Українка
Леся УкраїнкаЛеся Українка
Леся Українкаnjhujdbwz
 
Rph manusiabernafas
Rph manusiabernafas Rph manusiabernafas
Rph manusiabernafas ruwalfindi
 
Visit to the Cartoon Museum
Visit to the Cartoon MuseumVisit to the Cartoon Museum
Visit to the Cartoon MuseumHollyKingham
 
công ty dịch vụ giúp việc quận 4 tại hcm
công ty dịch vụ giúp việc quận 4 tại hcmcông ty dịch vụ giúp việc quận 4 tại hcm
công ty dịch vụ giúp việc quận 4 tại hcmsara633
 
З досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літературиЗ досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літературиnjhujdbwz
 

Viewers also liked (13)

Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15
 
Steven N. Volpe resume
Steven N. Volpe resumeSteven N. Volpe resume
Steven N. Volpe resume
 
Anti Blah! A quick guide to brandable business names
Anti Blah! A quick guide to brandable business namesAnti Blah! A quick guide to brandable business names
Anti Blah! A quick guide to brandable business names
 
Reading a map
Reading a mapReading a map
Reading a map
 
Gaurav Resume
Gaurav ResumeGaurav Resume
Gaurav Resume
 
RogerResume2015v1
RogerResume2015v1RogerResume2015v1
RogerResume2015v1
 
Волдимир Винниченко
Волдимир ВинниченкоВолдимир Винниченко
Волдимир Винниченко
 
Company preso
Company presoCompany preso
Company preso
 
Леся Українка
Леся УкраїнкаЛеся Українка
Леся Українка
 
Rph manusiabernafas
Rph manusiabernafas Rph manusiabernafas
Rph manusiabernafas
 
Visit to the Cartoon Museum
Visit to the Cartoon MuseumVisit to the Cartoon Museum
Visit to the Cartoon Museum
 
công ty dịch vụ giúp việc quận 4 tại hcm
công ty dịch vụ giúp việc quận 4 tại hcmcông ty dịch vụ giúp việc quận 4 tại hcm
công ty dịch vụ giúp việc quận 4 tại hcm
 
З досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літературиЗ досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літератури
 

Similar to Slattery_SIAMCSE15

Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15Karen Pao
 
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Alexander Litvinenko
 
Convolutional Error Control Coding
Convolutional Error Control CodingConvolutional Error Control Coding
Convolutional Error Control CodingMohammed Abuibaid
 
Ieee cluster talk
Ieee cluster talkIeee cluster talk
Ieee cluster talkPeng Du
 
Mathematics Colloquium, UCSC
Mathematics Colloquium, UCSCMathematics Colloquium, UCSC
Mathematics Colloquium, UCSCdongwook159
 
Semi-Classical Transport Theory.ppt
Semi-Classical Transport Theory.pptSemi-Classical Transport Theory.ppt
Semi-Classical Transport Theory.pptVivekDixit100
 
Semet Gecco06
Semet Gecco06Semet Gecco06
Semet Gecco06ysemet
 
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executionsRedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executionsRedis Labs
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 
Cracking the Code: Using Quantitative Models in MATLAB to Solve Problems
Cracking the Code: Using Quantitative Models in MATLAB to Solve ProblemsCracking the Code: Using Quantitative Models in MATLAB to Solve Problems
Cracking the Code: Using Quantitative Models in MATLAB to Solve ProblemsSERC at Carleton College
 
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksSelf-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksRafael Nogueras
 
A Numerical Method For Friction Problems With Multiple Contacts
A Numerical Method For Friction Problems With Multiple ContactsA Numerical Method For Friction Problems With Multiple Contacts
A Numerical Method For Friction Problems With Multiple ContactsJoshua Gorinson
 
A numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdf
A numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdfA numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdf
A numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdfLisa Cain
 
Chaotic Communication for mobile applica
Chaotic Communication for mobile applicaChaotic Communication for mobile applica
Chaotic Communication for mobile applicaYaseenMo
 
quantumComputers.ppt
quantumComputers.pptquantumComputers.ppt
quantumComputers.pptAbhayGill3
 

Similar to Slattery_SIAMCSE15 (20)

Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15
 
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...
 
Convolutional Error Control Coding
Convolutional Error Control CodingConvolutional Error Control Coding
Convolutional Error Control Coding
 
Ieee cluster talk
Ieee cluster talkIeee cluster talk
Ieee cluster talk
 
Mathematics Colloquium, UCSC
Mathematics Colloquium, UCSCMathematics Colloquium, UCSC
Mathematics Colloquium, UCSC
 
solver (1)
solver (1)solver (1)
solver (1)
 
Semi-Classical Transport Theory.ppt
Semi-Classical Transport Theory.pptSemi-Classical Transport Theory.ppt
Semi-Classical Transport Theory.ppt
 
MATEX @ DAC14
MATEX @ DAC14MATEX @ DAC14
MATEX @ DAC14
 
Semet Gecco06
Semet Gecco06Semet Gecco06
Semet Gecco06
 
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executionsRedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
RedisDay London 2018 - CRDTs and Redis From sequential to concurrent executions
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
Cracking the Code: Using Quantitative Models in MATLAB to Solve Problems
Cracking the Code: Using Quantitative Models in MATLAB to Solve ProblemsCracking the Code: Using Quantitative Models in MATLAB to Solve Problems
Cracking the Code: Using Quantitative Models in MATLAB to Solve Problems
 
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksSelf-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
 
CassIndTalk15g2
CassIndTalk15g2CassIndTalk15g2
CassIndTalk15g2
 
A Numerical Method For Friction Problems With Multiple Contacts
A Numerical Method For Friction Problems With Multiple ContactsA Numerical Method For Friction Problems With Multiple Contacts
A Numerical Method For Friction Problems With Multiple Contacts
 
A numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdf
A numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdfA numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdf
A numerical Technique Finite Volume Method for Solving Diffusion 2D Problem.pdf
 
Chaotic Communication for mobile applica
Chaotic Communication for mobile applicaChaotic Communication for mobile applica
Chaotic Communication for mobile applica
 
quantumComputers.ppt
quantumComputers.pptquantumComputers.ppt
quantumComputers.ppt
 
quantumComputers.ppt
quantumComputers.pptquantumComputers.ppt
quantumComputers.ppt
 
quantumComputers.ppt
quantumComputers.pptquantumComputers.ppt
quantumComputers.ppt
 

More from Karen Pao

LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15Karen Pao
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Karen Pao
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15Karen Pao
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15Karen Pao
 
Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Karen Pao
 
Dubey_SIAMCSE15
Dubey_SIAMCSE15Dubey_SIAMCSE15
Dubey_SIAMCSE15Karen Pao
 

More from Karen Pao (6)

LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Loffeld_SIAMCSE15
Loffeld_SIAMCSE15
 
Dubey_SIAMCSE15
Dubey_SIAMCSE15Dubey_SIAMCSE15
Dubey_SIAMCSE15
 

Recently uploaded

IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxUalikhanKalkhojayev1
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGRAPE
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxROVELYNEDELUNA3
 
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Sérgio Sacani
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPirithiRaju
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrashi Coaching
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPirithiRaju
 
Genetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptxGenetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptxaishnasrivastava
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsSumathi Arumugam
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024suelcarter1
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Sérgio Sacani
 
Unit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th SemUnit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th SemHUHam1
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba caohsadfeeling
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WaySérgio Sacani
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptxVaishnaviAware
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceLayne Sadler
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...PirithiRaju
 

Recently uploaded (20)

IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptx
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eye
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
 
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPR
 
Genetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptxGenetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptx
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery Systems
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
 
Unit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th SemUnit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th Sem
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba ca
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptx
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data science
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
 

Slattery_SIAMCSE15

  • 1. Parallel Algorithms for Monte Carlo Linear Solvers Stuart Slattery Steven Hamilton Tom Evans (PI) Oak Ridge National Laboratory March 17, 2015
  • 2. Acknowledgments This material is based upon work supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research program. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 2 / 22
  • 3. 1. Motivation 2. Monte Carlo Methods 3. Domain Decomposition and Replication 4. Scaling Studies 5. Algorithm Variations 6. Current/Future Work and Conclusions Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 3 / 22
  • 4. Motivation As we move towards exascale computing, the rate of errors is expected to increase dramatically The probability that a compute node will fail during the course of a large scale calculation may be near 1 Algorithms need to not only have increased concurrency/scalability but have the ability to recover from hardware faults Lightweight machines Heterogeneous machines Both characterized by low power and high concurrency Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 4 / 22
  • 5. Towards Exascale Concurrency and Resiliency Two basic strategies: 1 Start with current “state of the art” methods and make incremental modifications to improve scalability and fault tolerance Many efforts are heading in this direction, attempting to find additional concurrency to exploit 2 Start with methods having natural scalability and resiliency aspects and work at improving performance (e.g. Monte Carlo) Soft failures introduce an additional stochastic error component Hard failures potentially mitigated by replication Concurrency enabled by several levels of parallelism Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 5 / 22
  • 6. Monte Carlo for Linear Systems Suppose we want to solve Ax = b If ρ(I − A) < 1, we can write the solution using the Neumann series x = ∞ n=0 (I − A)n b = ∞ n=0 Hn b where H ≡ (I − A) is the Richardson iteration matrix Build the Neumann series stochastically xi = ∞ k=0 N i1 N i2 . . . N ik hi,i1 hi1,i2 . . . hik−1,ik bik Define a sequence of state transitions ν = i → i1 → · · · → ik−1 → ik Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 6 / 22
  • 7. Forward Monte Carlo Choose a row-stochastic matrix P and weight matrix W such that H = P ◦ W Typical choice (Monte Carlo Almost-Optimal): Pij = |Hij| N j=1 |Hij| To compute solution component xi: Start a history in state i (with initial weight of 1) Transition to new state j based probabilities determined by Pi Modify history weight based on corresponding entry in Wij Add contribution to xi based on current history weight and value of bj A given random walk can only contribute to a single component of the solution vector with x ≈ MMCb Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 7 / 22
  • 8. Sampling Example (Forward Monte Carlo) Suppose A =   1.0 −0.2 −0.6 −0.4 1.0 −0.4 −0.1 −0.4 1.0   → H ≡ (I − A) =   0.0 0.2 0.6 0.4 0.0 0.4 0.1 0.4 0.0   then P =   0.0 0.25 0.75 0.5 0.0 0.5 0.2 0.8 0.0   , W =   0.0 0.8 0.8 0.8 0.0 0.8 0.5 0.5 0.0   If a history is started in state 3, there is a 20% chance of it transitioning to state 1 and an 80% chance of moving to state 2 Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 8 / 22
  • 9. Solving the Heat Equation: Forward Method Figure : Forward solution. 2.5 × 103 total histories. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
  • 10. Solving the Heat Equation: Forward Method Figure : Forward solution. 2.5 × 104 total histories. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
  • 11. Solving the Heat Equation: Forward Method Figure : Forward solution. 2.5 × 105 total histories. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
  • 12. Solving the Heat Equation: Forward Method Figure : Forward solution. 2.5 × 106 total histories. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 9 / 22
  • 13. Domain Decomposed Monte Carlo Each parallel process owns a piece of the domain (linear system) Random walks must be transported between adjacent domains through parallel communication Domain decomposition determined by the input system Load balancing not yet addressed C A B Figure : Domain decomposition example illustrating how domain-to-domain transport creates communication costs. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 10 / 22
  • 14. Asynchronous Monte Carlo Transport Kernel General extension of the Milagro algorithm (LANL) Asynchronous nearest neighbor communication of histories System-tunable communication parameters of buffer size and check frequency (performance impact) Need an asynchronous strategy for exiting the transport loop without a collective (running sum) 8 0 1 2 3 4 5 6 7 True Continue Send message if buffer full Transport Source History Transport Bank History Send message if buffer full False False False False False True True True TrueLocal Source and Bank Empty Add Messages to Bank Exit Loop Check Frequency Local Bank Not Empty Local Source Not Empty Completion Signal Control Termination Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 11 / 22
  • 15. Exiting the Transport Loop without Collectives # RUN 1 2 3 4 5 6 7 8 0 STOPSTOP # RUN # RUN STOP # RUN STOP STOP # RUN STOP # RUN STOP # RUN STOP # RUN Figure : Master/Slave 0 STOPSTOP STOP STOP STOP STOP STOP # RUN # RUN# RUN # RUN # RUN# RUN # RUN # RUN STOP # RUN 3 7 8 4 5 6 1 2 Figure : Binary Tree 10 100 1000 10000 100000 # of Cores 0.50 0.60 0.70 0.80 0.90 1.00 1.10 AbsoluteEciency MCSA Binary Tree MCSA Master/Slave Figure : Weak scaling absolute efficiency Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 12 / 22
  • 16. Replication Different batches of Monte Carlo samples can be combined in summation via superposition if they have different random number streams. For two different batches: MMCx = 1 2 (M1 + M2)x Consider each of these batches independent subsets of a Monte Carlo operator where now the operator can be formed as a general additive decomposition of NS subsets: MMC = 1 NS Ns n=1 Mn We replicate the linear problem and form each subset on a different group of parallel processes. Applying the subsets to a vector requires an AllReduce to form the sum. Each subset is domain decomposed. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 13 / 22
  • 17. Parallel Test - Simplified PN (SPN ) Assembly Problem Figure : SPN solution example The (SPN ) equations are an approximation to the Boltzmann neutron transport equation used to simulate nuclear reactors − ·Dn Un+ 4 m=1 AnmUm = 1 k 4 m=1 FnmUm Scaling problem – 1 × 1 to 17 × 17 array of fuel assemblies with 289 pins each resolved by a 2 × 2 spatial mesh and 200 axial zones 7 energy groups, 1 angular moment, 1.6M to 273.5M degrees of freedom 64 to 10,800 computational cores via domain decomposition We are usually interested in solving generalized eigenvalue problem - we use the operator from these problems to test the kernel scaling Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 14 / 22
  • 18. Monte Carlo Communication Parameters Message Check Frequency 128 256 512 1 024 Message Buffer Size 256 1.054 1.061 1.076 1.076 512 1.103 1.146 1.211 1.270 1 024 1.062 1.088 1.133 1.176 2 048 1.030 1.042 1.072 1.107 4 096 1.010 1.012 1.025 1.050 8 192 1.001 1.000 1.008 1.018 16 384 1.017 1.003 1.010 1.009 OLCF Eos: 736-node Cray XC30, Intel Xeon E5-2670, 11,776 cores, 47 TB memory, Cray Aries interconnect 64 cores, 1.6M DOFs, history length of 15, 3 histories per DOF 27% decrease in runtime observed for bad parameter choices Worth the time to do this parameter study when running on new hardware Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 15 / 22
  • 19. Monte Carlo Scaling Cores DOFs DOFs/Core Time Min (s) Time Max (s) Time Ave (s) Efficiency 256 273 509 600 1 068 397 260.53 260.54 260.54 1.00 1 024 273 509 600 267 099 61.92 61.92 61.92 1.05 4 096 273 509 600 66 775 7 744 273 509 600 35 319 10 816 273 509 600 25 288 Table : Strong Scaling Cores DOFs DOFs/Core Time Min (s) Time Max (s) Time Ave (s) Efficiency 64 1 618 400 25 288 6.432 6.432 6.432 1.00 256 6 473 600 25 288 6.493 6.493 6.493 0.99 1 024 25 288 4 096 25 288 7 744 25 288 10 816 25 288 Table : Weak Scaling Subsets Cores DOFs Time Min (s) Time Max (s) Time Ave (s) Efficiency 1 256 6 473 600 6.493 6.493 6.493 1.00 2 512 6 473 600 3 768 6 473 600 4 1 024 6 473 600 Table : Replication Scaling. 256 cores per subset. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 16 / 22
  • 20. Monte Carlo Synthetic Acceleration (MCSA) Devised by Evans and Mosher in the 2000’s as an acceleration scheme for radiation diffusion problems (LANL) Can be abstracted as a general linear solver with Monte Carlo as a preconditioner First Richardson step hits the high frequency error modes and second Monte Carlo step hits the low frequency error modes rk = b − Axk xk+1/2 = xk + rk rk+1/2 = b − Axk+1/2 xk+1 = xk+1/2 + MMCrk+1/2 Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 17 / 22
  • 21. Matrix-Free Algorithm At each application of MMC, execute the Monte Carlo process Must perform Monte Carlo every time you want to apply with a better approximation requiring more time and more operations Variations in random number streams are amortized over iterations Vast majority of solve time spent doing Monte Carlo L NS MC Time (s) MC Fraction MCSA Iters 3 1 30.885 0.96 266 3 2 60.869 0.98 261 5 1 27.422 0.97 180 5 2 54.319 0.98 175 10 1 23.871 0.98 102 10 2 45.551 0.99 97 15 1 50.395 0.98 164 15 2 42.951 0.99 69 15 3 65.292 0.99 68 25 2 70.505 0.99 78 25 3 63.677 1.00 47 Table : MCSA performance. A had 115 600 rows and 1 186 464 non-zero entries. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 18 / 22
  • 22. Stochastic Approximate Inverse Algorithm Construct MMC as a sparse matrix by executing the Monte Carlo process once and tallying the row entries Use this operator as a stochastic approximation to the inverse A better approximation to the inverse requires more setup time and more storage We will investigate a drop tolerance strategy to control sparsity L NS NNZ NNZ Ratio MC Time (s) Setup Time (s) MCSA Iters 3 2 484 714 0.41 0.104 0.671 255 3 3 622 123 0.52 0.145 0.705 255 5 2 783 153 0.66 0.158 0.737 185 5 3 1 032 573 0.87 0.237 0.831 171 5 4 1 241 442 1.05 0.302 0.906 171 10 3 1 969 540 1.66 0.433 1.061 95 10 4 2 416 572 2.04 0.570 1.214 95 15 3 2 867 005 2.42 0.645 1.317 132 15 4 3 544 181 2.99 0.833 1.534 67 15 5 4 157 269 3.50 1.029 1.765 66 Table : MCSA Performance. A had 115 600 rows and 1 186 464 non-zero entries. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 19 / 22
  • 23. Unpreconditioned Algorithm Comparison No preconditioning, serial computation, fastest MCSA times reported GMRES easier to precondition - performance here only indicates Monte Carlo potential These results indicate good stochastic approximate inverse performance for traditional CPU architectures Matrix-free approach may be more effective when vectorized for new architectures by favoring operations over storage - 95%+ of the runtime spent in Monte Carlo Solver Setup Time (s) Solve Time (s) Total Time (s) Iters Richardson 2.098 1.6709 3.769 1 017 MCSA Matrix-Free 2.104 24.389 26.493 102 MCSA Approximate Inverse 2.953 0.779 3.731 95 Belos GMRES 1.791 1.021 2.812 81 Table : A had 115 600 rows and 1 186 464 non-zero entries. Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 20 / 22
  • 24. Current Work - Vectorization and Threading We have implemented a Monte Carlo kernel using the Kokkos threading model (Trilinos) The kernel supports multi-threaded CPU, GPU, and Xeon Phi architectures with a single implementation Vectorization an area of active research with a focus on heterogeneous architectures (Titan and Summit) Currently investigating event-based algorithms to enable vectorization Event-based algorithm concepts are based on vectorized Monte Carlo algorithms from particle transport We are exploring an additive-Schwarz formulation to eliminate parallel communication in the Monte Carlo kernel Other threading models will be considered (e.g. HPX) Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 21 / 22
  • 25. Conclusions and Future Work Monte Carlo methods offer great potential for both resilient and highly parallel solvers A fully asynchronous algorithm provides scalability without collectives Replication potentially offers resiliency with some overhead Matrix-free and stochastic approximate inverse algorithms are complementary - trade operations for storage Extending methods to broader problem areas is a significant algorithmic challenge and an attractive area for continued research Explicit preconditioners are required for all problems Performance modeling and resiliency simulations this FY Fault injection studies using the xSim simulator Slattery, Hamilton, Evans (ORNL) Parallel Monte Carlo Solvers 3/17/15 22 / 22