SlideShare a Scribd company logo
1 of 47
Download to read offline
A challenge for thread parallelism
on OpenFOAM
YOSHIFUJI Naoki*
TOMIOKA Minoru
FUJIWARA Ko
SAWAHARA Masataka
ITO Yuki
MARUISHI Takafumi
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Who we are
Japanese software company
– Accelerating customer’s software
– In any area, any devices
Professionals in software speedup
– Not manufacturer using CAE software
– Not CAE software developer
2
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Who I am
Computational Civil Engineer
– Lead Engineer @ Solution Div., Fixstars Corporation
– Doctoral student @ Coastal and Ocean Lab., Nagoya University
Interests and professional
– High performance computing (HPC)
– Computational Fluid Dynamics (CFD)
– Speedup software (on from SoC to supercomputer)
3
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Name: YOSHIFUJI Naoki / 𠮷藤 尚生
Online ID: @LWisteria
Email: yoshifuji@fixstars.com
– Feel free to contact about anythingOnline avatar
What we’ve done
x13.5 speedup
1. Abstract
• Case: OpenFOAM Benchmark Test case "channelReTau110“
provided by The Open CAE Society of Japan
• Solver: pimpleFOAM with DIC-PCG.
• Based OpenFOAM version: the Foundation version 16b559c1
• Average time of the first five steps
• Computer: Intel Ninja Developer Platform
(Intel Xeon Phi 7210, DDR4)
4
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Abstract
5
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
1. Experimental implementation with OpenMP
2. Target = pimpleFoam for channel flow benchmark
3. Solver = DIC-PCG, one of the most challenging case
for thread parallelism
4. Measured speedup factor is x13.5 (without CM method)
over single process with Intel Knights Landing
5. The potential of thread parallelism is shown in this study
6. Improvements and investigation with other cases
will continues in the future
1. Abstract
Table of Contents
1. Abstract
2. Background and motivation
3. Parallel methodology
4. Performance measurement
5. Conclusion and future work
6
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Table of Contents
1. Abstract
2. Background and motivation
3. Parallel methodology
4. Performance measurement
5. Conclusion and future work
2. Background and motivation
7
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Modern engineering and OpenFOAM
 All product designers and engineers need CAE
 OpenFOAM is one of the most used
CAE software
 Speedup OpenFOAM is important
in modern engineering
2. Background and motivation
8
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
OpenFOAM is slow in modern computers
2. Background and motivation
9
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Quoted from Imano (2017): “OpenFOAMによる流体解析ベンチマークテスト FOCUS・クラウド・スパコンでのチャネルおよびボックスファン流れ解析”, 第17回PCクラスタシンポジウム, p.19.
Copyright 2017 OCAEL All ights reserved.
Difference of computers
2. Background and motivation
10
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Ancient computer Modern computer
Num. of CPU cores Single / a few Many
Num. of computer nodes A few Many
CPU speed
over intra network’s
Low High
i.e.
Num. of MPI processes A few Massive
MPI management cost
over arithmetic operation
Light Heavy
MPI communication cost
over arithmetic operation
Light Heavy
Solution: Parallelism
2. Background and motivation
11
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Current OpenFOAM This study
Framework MPI OpenMP
Mechanism Process Thread
Data communication Socket Shared memory
Target All inter-core In the same node
i.e.
Management cost Heavy Light
Communication cost Heavy Light
 Using OpenMP could speedup OpenFOAM
Our goal
2. Background and motivation
12
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
1. Implement thread parallelism with OpenMP
for the intra-node parallelism (Hyblid parallel)
2. Measure performance improvement
3. Share the impl. and result to the world
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
CPU core
OpenMP
OpenMP
OpenMP
OpenMP
MPI
Our goal in this study
2. Background and motivation
13
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
This study shows the progress and the incomplete result
pimpleFOAM & DIC-PCG only
– To estimate the worst improvement
Only single node
– Little MPI cost
– Expected as fast as flat MPI
• Possibility to be faster on the multiple node
Extra motivation for our business
2. Background and motivation
14
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Outreach to customers
– Provide an example by Fixstars’ work
–  We’re happy if you place an order with us to speedup your software 
https://www.fixstars.com/en/service/acceleration/
Employee training
– Provide an exercise to Fixstars’ engineer
– Problem with only CPU is good for the beginner
Table of Contents
1. Abstract
2. Background and motivation
3. Parallel methodology
4. Performance measurement
5. Conclusion and future work
3. Parallel methodology
15
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Target in this study
16
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Speedup / parallelize solving sparse linear equation
– Generally known that it takes the large part of CFD
Solver: DIC-PCG
– Diagonal Incomplete Cholesky preconditioner
– Preconditioned Conjugate Gradient
Many-core CPU, only one node
Challenge to one of the hardest case for thread parallelism
– To estimate the worst improvement
3. Parallel methodology
Components of DIC-PCG
17
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
Amul: Sparse-matrix vector multiply (SpMV)
DIC precondition
WAXPBY: Vector vector addition
sumMag: Sum of absoluted element of vector
sumProd: Vector inner product
consists only matrix/vector operation.
– it is element-independent, thus easy to parallelize
(in principle)
Parallelization of DIC-PCG with OpenMP
18
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Difficult
– lduMatrix format for Amul
– DIC’s substitution operation
Easy
– Elementwise operation
• WAXPBY
– Parallel reduction
• sumMag
• sumProd
3. Parallel methodology
lduMatrix SpMV parallelization
19
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
#pragma omp parallel for
for (label face=0; face<nFaces; face++)
{
ApsiPtr[uPtr[face]] += lowerPtr[face]*psiPtr[lPtr[face]];
ApsiPtr[lPtr[face]] += upperPtr[face]*psiPtr[uPtr[face]];
}
src/OpenFOAM/matrices/lduMatrix/lduMatrix/lduMatrixATmul.C::Foam::lduMatrix::Amul()
face 0 1 2
lPtr 0 0 2
uPtr 2 3 3
1 5 6
2
8 3 7
9 10 4
Dependency among face
– Data race (write at the same time, difference face)
lduMatrix can not be parallelized
lduMatrix to CSR format
20
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Compressed Sparse Row (CSR)
Widely used sparse matrix format
3. Parallel methodology
DIC preconditioner
21
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
#pragma omp parallel for
for (label face=0; face<nFaces; face++)
{
wAPtr[uPtr[face]] -= rDPtr[uPtr[face]]*upperPtr[face]*wAPtr[lPtr[face]];
}
src/OpenFOAM/matrices/lduMatrix/preconditioners/DICPreconditioner/DICPreconditioner.C::Foam::DICPreconditioner::precondition()
face 0 1 2
lPtr 0 0 2
uPtr 2 3 3
1 5 6
2
8 3 7
9 10 4
Substitusion phase (forward)
wA(face=2) uses wA(face=0)
– Data race (the result would be changed)
Can not be parallelized
Cuthill-McKee ordering
22
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
6 7 8
3 4 5
0 1 2
cell
Example : 4 point stencil (Regular mesh)
matrix
4 1 1
1 4 1 1
1 4
1 4 1 1
1 1 4 1 1
1 4
1 4 1
1 1 4 1
1 4
Cuthill-McKee ordering
23
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
6 7 8
3 4 5
0 1 2
cell
Dependency
matrix
4 1 1
1 4 1 1
1 4
1 4 1 1
1 1 4 1 1
1 4
1 4 1
1 1 4 1
1 4
Cuthill-McKee ordering
24
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
6 7 8
3 4 5
0 1 2
cell
Independent among colors
matrix
4 1 1
1 4 1 1
1 4
1 4 1 1
1 1 4 1 1
1 4
1 4 1
1 1 4 1
1 4
 Parallelly executable in the same color
Parallelization of DIC-PCG
25
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
Amul: CSR format
DIC precondition: Cuthill McKee
WAXPBY: parallel elementwise
sumMag: parallel reduction
sumProd: parallel reduction
Whole DIC-PCG can be parallelized
Table of Contents
1. Abstract
2. Background and motivation
3. Parallel methodology
4. Performance measurement
5. Conclusion and future work
4. Performance measurement
26
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Benchmark condition
27
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Case: OpenFOAM Benchmark Test case
"channelReTau110“
provided by The Open CAE Society of Japan
Solver: pimpleFOAM with DIC-PCG
Based OpenFOAM version: the Foundation version
16b559c1
Computer: Intel Ninja Developer Platform
(Intel Xeon Phi 7210, 256 logical core, DDR4)
Average time of the first five steps
– Insert a clock timer manually at the beginning and at the end of
each function in the source code
4. Performance measurement
Single process result
28
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
PCG is the largest part
of whole pimpleFOAM
Matrix op. for PCG
– DIC::precondition
– Amul
PCG
DIC
Amul
Step-by-step speedup (0)
29
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
The base version
– same as previous page
Step-by-step speedup (1)
30
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
Change to CSR
Longer DIC ??
Step-by-step speedup (2)
31
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
ldu-CSR format
Divide CSR into
– Lower triangular
– Upper triangular
– Diagonal
Went back to original
– Improve cache miss
Step-by-step speedup (3)
32
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
Parallelize matrix op.
– Amul
– DIC precondition
+ Cuthill-McKee
x2.0
Step-by-step speedup (4)
33
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
Parallelize vector op.
– WAXPBY
– sumMag
– sumProd
x3.4
Step-by-step speedup (5)
34
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
Change OpenMP
setting
– From 256 threads
– To 64 threads
x4.8
Achieved speedup without CM
x13.5
35
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
CM could be ignored
– required only if remeshed
vs. flat MPI
36
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
½ slower
Speedup by each function
37
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
4. Performance measurement
Single [s] OpenMP [s] Speedup factor
Amul 60.0 3.0 x19.8
DIC::precondition 80.1 7.9 x10.1
WAXPBY 21.1 0.4 x53.2
sumMag 6.1 0.1 x44.6
sumProd 12.3 0.3 x43.6
Cuthill-McKee 0.0 24.0 ---
(other) 1.1 1.6 x0.7
total 180.8 37.4 x4.8
total excluding CM 180.8 13.4 x13.5
Why slower than MPI (1)
38
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
DIC was slow
– Theoretical reason
• DIC is difficult for thread parallelism
– Small number of parallel thread
– Implementation reason
• Reordering input/output vector can be reduced
4. Performance measurement
Why slower than MPI (2)
39
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Expect: Num. of iteration decreases with OpenMP
– Convergence on OpenMP is expected better than MPI
because OpenMP does not require domain decomposition.
– Domain decomposition decrease the convergence
Actual: Not decreased
– Convergence of the used benchmark is too good
4. Performance measurement
Table of Contents
1. Abstract
2. Background and motivation
3. Parallel methodology
4. Performance measurement
5. Conclusion and future work
5. Conclusion and future work
40
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Conclusion
41
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Achieved
– x13.5 speedup over single process
– ½ slower over flat MPI
Condition with
– Channel flow benchmark with regular mesh
– DIC-PCG solver
5. Conclusion and future work
Very difficult method in very simple problem
= the worst condition shows only ½ degradation
Future work
42
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Speedup DIC
– erase vector reordering
More simple preconditioner / solver
– Diagonal, GAMG
More complicated benchmark
– Motorbike, dam-break
Multi-node supercomputer
– Efficiency of reduction of MPI process
5. Conclusion and future work
Please look forward to next our work
return 0;
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
lduMatrix format
44
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
Sparse matrix storing format
Used by pimpleFoam (and also by many other solver)
Three part
– Upper triangular part U: column major
– Lower triangular part : row major
– Diagonal part D
Equivalent to COO format
3. Parallel methodology
Example of lduMatrix
45
1 5 6
2
8 3 7
9 10 4
diag = 1, 2, 3, 4
upper = 5, 6, 7
lower = 8, 9, 10
lowerAdder = 0, 0, 2
upperAdder = 2, 3, 3
: Value of diagonal elements
: Value of upper triangular elements
: Value of lower triangular elements
: Column number of upper elements, row of lower
: Column number of lower elements, row of upper
𝑈
𝐿
𝐷
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
Example of CSR matrix
46
1 5 6
2
8 3 7
9 10 4
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
data = [1, 5, 6, 2, 8, 3, 7, 9, 10, 4]
column = [0, 2, 3, 1, 0, 2, 3, 0, 2, 3]
offset = [0, 3, 4, 7, 10]
: Element’s value
: Element’s column number
: Start position of row
CSR SpMV parallelization
47
Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3. Parallel methodology
#pragma omp parallel for
for (label i = 0; i < n; i++)
{
double y_i = 0.0;
for (label index = offset[i]; index < offset[i + 1]; index++) {
y_i += data[index] * x[column[index]];
}
y[i] = y_i;
}
Independent among i
– Never write at the same time (different i)
Can be parallelized

More Related Content

What's hot

Limited Gradient Schemes in OpenFOAM
Limited Gradient Schemes in OpenFOAMLimited Gradient Schemes in OpenFOAM
Limited Gradient Schemes in OpenFOAMFumiya Nozaki
 
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0ARPIT SINGHAL
 
OpenFOAMによる気液2相流解析の基礎と設定例
OpenFOAMによる気液2相流解析の基礎と設定例OpenFOAMによる気液2相流解析の基礎と設定例
OpenFOAMによる気液2相流解析の基礎と設定例takuyayamamoto1800
 
無償のモデリングソフトウェアCAESESを使ってみた
無償のモデリングソフトウェアCAESESを使ってみた無償のモデリングソフトウェアCAESESを使ってみた
無償のモデリングソフトウェアCAESESを使ってみたFumiya Nozaki
 
第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
PreCICE CHT with OpenFOAM and CalculiX
PreCICE CHT with OpenFOAM and CalculiXPreCICE CHT with OpenFOAM and CalculiX
PreCICE CHT with OpenFOAM and CalculiX守淑 田村
 
OpenFOAM Programming Tips
OpenFOAM Programming TipsOpenFOAM Programming Tips
OpenFOAM Programming TipsFumiya Nozaki
 
OpenFOAM の境界条件をまとめよう!
OpenFOAM の境界条件をまとめよう!OpenFOAM の境界条件をまとめよう!
OpenFOAM の境界条件をまとめよう!Fumiya Nozaki
 
OpenFOAMにおけるDEM計算の力モデルの解読
OpenFOAMにおけるDEM計算の力モデルの解読OpenFOAMにおけるDEM計算の力モデルの解読
OpenFOAMにおけるDEM計算の力モデルの解読takuyayamamoto1800
 
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-Fumiya Nozaki
 
About chtMultiRegionFoam
About chtMultiRegionFoam About chtMultiRegionFoam
About chtMultiRegionFoam 守淑 田村
 
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜Daisuke Matsubara
 
OpenFOAM LES乱流モデルカスタマイズ
OpenFOAM LES乱流モデルカスタマイズOpenFOAM LES乱流モデルカスタマイズ
OpenFOAM LES乱流モデルカスタマイズmmer547
 
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読takuyayamamoto1800
 
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』Fumiya Nozaki
 
OpenFOAMによる混相流シミュレーション入門
OpenFOAMによる混相流シミュレーション入門OpenFOAMによる混相流シミュレーション入門
OpenFOAMによる混相流シミュレーション入門takuyayamamoto1800
 
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)Shinya Takamaeda-Y
 
OpenFOAMソルバの実行時ベイズ最適化
OpenFOAMソルバの実行時ベイズ最適化OpenFOAMソルバの実行時ベイズ最適化
OpenFOAMソルバの実行時ベイズ最適化Masashi Imano
 
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』Fumiya Nozaki
 

What's hot (20)

Limited Gradient Schemes in OpenFOAM
Limited Gradient Schemes in OpenFOAMLimited Gradient Schemes in OpenFOAM
Limited Gradient Schemes in OpenFOAM
 
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
 
OpenFOAMによる気液2相流解析の基礎と設定例
OpenFOAMによる気液2相流解析の基礎と設定例OpenFOAMによる気液2相流解析の基礎と設定例
OpenFOAMによる気液2相流解析の基礎と設定例
 
無償のモデリングソフトウェアCAESESを使ってみた
無償のモデリングソフトウェアCAESESを使ってみた無償のモデリングソフトウェアCAESESを使ってみた
無償のモデリングソフトウェアCAESESを使ってみた
 
第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)
 
PreCICE CHT with OpenFOAM and CalculiX
PreCICE CHT with OpenFOAM and CalculiXPreCICE CHT with OpenFOAM and CalculiX
PreCICE CHT with OpenFOAM and CalculiX
 
OpenFOAM Programming Tips
OpenFOAM Programming TipsOpenFOAM Programming Tips
OpenFOAM Programming Tips
 
OpenFOAM の境界条件をまとめよう!
OpenFOAM の境界条件をまとめよう!OpenFOAM の境界条件をまとめよう!
OpenFOAM の境界条件をまとめよう!
 
OpenFOAMにおけるDEM計算の力モデルの解読
OpenFOAMにおけるDEM計算の力モデルの解読OpenFOAMにおけるDEM計算の力モデルの解読
OpenFOAMにおけるDEM計算の力モデルの解読
 
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
 
About chtMultiRegionFoam
About chtMultiRegionFoam About chtMultiRegionFoam
About chtMultiRegionFoam
 
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
 
rhoCentralFoam in OpenFOAM
rhoCentralFoam in OpenFOAMrhoCentralFoam in OpenFOAM
rhoCentralFoam in OpenFOAM
 
OpenFOAM LES乱流モデルカスタマイズ
OpenFOAM LES乱流モデルカスタマイズOpenFOAM LES乱流モデルカスタマイズ
OpenFOAM LES乱流モデルカスタマイズ
 
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
 
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
 
OpenFOAMによる混相流シミュレーション入門
OpenFOAMによる混相流シミュレーション入門OpenFOAMによる混相流シミュレーション入門
OpenFOAMによる混相流シミュレーション入門
 
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
 
OpenFOAMソルバの実行時ベイズ最適化
OpenFOAMソルバの実行時ベイズ最適化OpenFOAMソルバの実行時ベイズ最適化
OpenFOAMソルバの実行時ベイズ最適化
 
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
 

Similar to A challenge for thread parallelism on OpenFOAM

OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019NVIDIA
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...Edge AI and Vision Alliance
 
HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.Andrés Gómez
 
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing SoftwareOn Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing SoftwarePower System Operation
 
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing SoftwareOn Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing SoftwarePower System Operation
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)vaidehi87
 
Onyx relap5 Presentation from SCS 2019
Onyx relap5 Presentation from SCS 2019Onyx relap5 Presentation from SCS 2019
Onyx relap5 Presentation from SCS 2019GSE Systems, Inc.
 
COMPUTER AIDED PROCESS PLANNING (CAPP)
COMPUTER AIDED PROCESS PLANNING (CAPP)COMPUTER AIDED PROCESS PLANNING (CAPP)
COMPUTER AIDED PROCESS PLANNING (CAPP)KRUNAL RAVAL
 
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...MIPI Alliance
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
 
OpenACC Monthly Highlights: July 2020
OpenACC Monthly Highlights: July 2020OpenACC Monthly Highlights: July 2020
OpenACC Monthly Highlights: July 2020OpenACC
 
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015Josh Goergen
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your MicroservicesMarcus Hirt
 
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC
 
OpenACC Monthly Highlights - May and June 2018
OpenACC Monthly Highlights - May and June 2018OpenACC Monthly Highlights - May and June 2018
OpenACC Monthly Highlights - May and June 2018NVIDIA
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction Netronome
 

Similar to A challenge for thread parallelism on OpenFOAM (20)

OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC..."Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
 
HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.
 
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing SoftwareOn Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing Software
 
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing SoftwareOn Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing Software
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)
 
SpeedIT FLOW
SpeedIT FLOWSpeedIT FLOW
SpeedIT FLOW
 
Onyx relap5 Presentation from SCS 2019
Onyx relap5 Presentation from SCS 2019Onyx relap5 Presentation from SCS 2019
Onyx relap5 Presentation from SCS 2019
 
COMPUTER AIDED PROCESS PLANNING (CAPP)
COMPUTER AIDED PROCESS PLANNING (CAPP)COMPUTER AIDED PROCESS PLANNING (CAPP)
COMPUTER AIDED PROCESS PLANNING (CAPP)
 
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
 
OpenACC Monthly Highlights: July 2020
OpenACC Monthly Highlights: July 2020OpenACC Monthly Highlights: July 2020
OpenACC Monthly Highlights: July 2020
 
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your Microservices
 
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022
 
OpenACC Monthly Highlights - May and June 2018
OpenACC Monthly Highlights - May and June 2018OpenACC Monthly Highlights - May and June 2018
OpenACC Monthly Highlights - May and June 2018
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
 

More from Fixstars Corporation

製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptxFixstars Corporation
 
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編Fixstars Corporation
 
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~Fixstars Corporation
 
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~Fixstars Corporation
 
株式会社フィックスターズの会社説明資料(抜粋)
株式会社フィックスターズの会社説明資料(抜粋)株式会社フィックスターズの会社説明資料(抜粋)
株式会社フィックスターズの会社説明資料(抜粋)Fixstars Corporation
 
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編Fixstars Corporation
 
Fpga online seminar by fixstars (1st)
Fpga online seminar by fixstars (1st)Fpga online seminar by fixstars (1st)
Fpga online seminar by fixstars (1st)Fixstars Corporation
 
Jetson活用セミナー ROS2自律走行実現に向けて
Jetson活用セミナー ROS2自律走行実現に向けてJetson活用セミナー ROS2自律走行実現に向けて
Jetson活用セミナー ROS2自律走行実現に向けてFixstars Corporation
 
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門Fixstars Corporation
 
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~Fixstars Corporation
 
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化Fixstars Corporation
 
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例Fixstars Corporation
 
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門Fixstars Corporation
 
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)Fixstars Corporation
 
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)Fixstars Corporation
 
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方Fixstars Corporation
 
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術についてAIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術についてFixstars Corporation
 
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)Fixstars Corporation
 
第8回 社内プログラミングコンテスト 結果発表会
第8回社内プログラミングコンテスト 結果発表会第8回社内プログラミングコンテスト 結果発表会
第8回 社内プログラミングコンテスト 結果発表会Fixstars Corporation
 
第8回 社内プログラミングコンテスト 第1位 taiyo
第8回社内プログラミングコンテスト 第1位 taiyo第8回社内プログラミングコンテスト 第1位 taiyo
第8回 社内プログラミングコンテスト 第1位 taiyoFixstars Corporation
 

More from Fixstars Corporation (20)

製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
 
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
 
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
 
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
 
株式会社フィックスターズの会社説明資料(抜粋)
株式会社フィックスターズの会社説明資料(抜粋)株式会社フィックスターズの会社説明資料(抜粋)
株式会社フィックスターズの会社説明資料(抜粋)
 
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
 
Fpga online seminar by fixstars (1st)
Fpga online seminar by fixstars (1st)Fpga online seminar by fixstars (1st)
Fpga online seminar by fixstars (1st)
 
Jetson活用セミナー ROS2自律走行実現に向けて
Jetson活用セミナー ROS2自律走行実現に向けてJetson活用セミナー ROS2自律走行実現に向けて
Jetson活用セミナー ROS2自律走行実現に向けて
 
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門
 
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
 
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
 
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例
 
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門
 
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
 
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
 
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
 
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術についてAIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
 
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
 
第8回 社内プログラミングコンテスト 結果発表会
第8回社内プログラミングコンテスト 結果発表会第8回社内プログラミングコンテスト 結果発表会
第8回 社内プログラミングコンテスト 結果発表会
 
第8回 社内プログラミングコンテスト 第1位 taiyo
第8回社内プログラミングコンテスト 第1位 taiyo第8回社内プログラミングコンテスト 第1位 taiyo
第8回 社内プログラミングコンテスト 第1位 taiyo
 

Recently uploaded

Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기Chiwon Song
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxPrakarsh -
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Projectwajrcs
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 

Recently uploaded (20)

Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptx
 
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire ThornewillSustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
 
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example ProjectMastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 

A challenge for thread parallelism on OpenFOAM

  • 1. A challenge for thread parallelism on OpenFOAM YOSHIFUJI Naoki* TOMIOKA Minoru FUJIWARA Ko SAWAHARA Masataka ITO Yuki MARUISHI Takafumi Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 2. Who we are Japanese software company – Accelerating customer’s software – In any area, any devices Professionals in software speedup – Not manufacturer using CAE software – Not CAE software developer 2 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 3. Who I am Computational Civil Engineer – Lead Engineer @ Solution Div., Fixstars Corporation – Doctoral student @ Coastal and Ocean Lab., Nagoya University Interests and professional – High performance computing (HPC) – Computational Fluid Dynamics (CFD) – Speedup software (on from SoC to supercomputer) 3 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Name: YOSHIFUJI Naoki / 𠮷藤 尚生 Online ID: @LWisteria Email: yoshifuji@fixstars.com – Feel free to contact about anythingOnline avatar
  • 4. What we’ve done x13.5 speedup 1. Abstract • Case: OpenFOAM Benchmark Test case "channelReTau110“ provided by The Open CAE Society of Japan • Solver: pimpleFOAM with DIC-PCG. • Based OpenFOAM version: the Foundation version 16b559c1 • Average time of the first five steps • Computer: Intel Ninja Developer Platform (Intel Xeon Phi 7210, DDR4) 4 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 5. Abstract 5 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 1. Experimental implementation with OpenMP 2. Target = pimpleFoam for channel flow benchmark 3. Solver = DIC-PCG, one of the most challenging case for thread parallelism 4. Measured speedup factor is x13.5 (without CM method) over single process with Intel Knights Landing 5. The potential of thread parallelism is shown in this study 6. Improvements and investigation with other cases will continues in the future 1. Abstract
  • 6. Table of Contents 1. Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 6 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 7. Table of Contents 1. Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 2. Background and motivation 7 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 8. Modern engineering and OpenFOAM  All product designers and engineers need CAE  OpenFOAM is one of the most used CAE software  Speedup OpenFOAM is important in modern engineering 2. Background and motivation 8 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 9. OpenFOAM is slow in modern computers 2. Background and motivation 9 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Quoted from Imano (2017): “OpenFOAMによる流体解析ベンチマークテスト FOCUS・クラウド・スパコンでのチャネルおよびボックスファン流れ解析”, 第17回PCクラスタシンポジウム, p.19. Copyright 2017 OCAEL All ights reserved.
  • 10. Difference of computers 2. Background and motivation 10 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Ancient computer Modern computer Num. of CPU cores Single / a few Many Num. of computer nodes A few Many CPU speed over intra network’s Low High i.e. Num. of MPI processes A few Massive MPI management cost over arithmetic operation Light Heavy MPI communication cost over arithmetic operation Light Heavy
  • 11. Solution: Parallelism 2. Background and motivation 11 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Current OpenFOAM This study Framework MPI OpenMP Mechanism Process Thread Data communication Socket Shared memory Target All inter-core In the same node i.e. Management cost Heavy Light Communication cost Heavy Light  Using OpenMP could speedup OpenFOAM
  • 12. Our goal 2. Background and motivation 12 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 1. Implement thread parallelism with OpenMP for the intra-node parallelism (Hyblid parallel) 2. Measure performance improvement 3. Share the impl. and result to the world CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core OpenMP OpenMP OpenMP OpenMP MPI
  • 13. Our goal in this study 2. Background and motivation 13 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation This study shows the progress and the incomplete result pimpleFOAM & DIC-PCG only – To estimate the worst improvement Only single node – Little MPI cost – Expected as fast as flat MPI • Possibility to be faster on the multiple node
  • 14. Extra motivation for our business 2. Background and motivation 14 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Outreach to customers – Provide an example by Fixstars’ work –  We’re happy if you place an order with us to speedup your software  https://www.fixstars.com/en/service/acceleration/ Employee training – Provide an exercise to Fixstars’ engineer – Problem with only CPU is good for the beginner
  • 15. Table of Contents 1. Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 3. Parallel methodology 15 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 16. Target in this study 16 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Speedup / parallelize solving sparse linear equation – Generally known that it takes the large part of CFD Solver: DIC-PCG – Diagonal Incomplete Cholesky preconditioner – Preconditioned Conjugate Gradient Many-core CPU, only one node Challenge to one of the hardest case for thread parallelism – To estimate the worst improvement 3. Parallel methodology
  • 17. Components of DIC-PCG 17 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology Amul: Sparse-matrix vector multiply (SpMV) DIC precondition WAXPBY: Vector vector addition sumMag: Sum of absoluted element of vector sumProd: Vector inner product consists only matrix/vector operation. – it is element-independent, thus easy to parallelize (in principle)
  • 18. Parallelization of DIC-PCG with OpenMP 18 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Difficult – lduMatrix format for Amul – DIC’s substitution operation Easy – Elementwise operation • WAXPBY – Parallel reduction • sumMag • sumProd 3. Parallel methodology
  • 19. lduMatrix SpMV parallelization 19 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology #pragma omp parallel for for (label face=0; face<nFaces; face++) { ApsiPtr[uPtr[face]] += lowerPtr[face]*psiPtr[lPtr[face]]; ApsiPtr[lPtr[face]] += upperPtr[face]*psiPtr[uPtr[face]]; } src/OpenFOAM/matrices/lduMatrix/lduMatrix/lduMatrixATmul.C::Foam::lduMatrix::Amul() face 0 1 2 lPtr 0 0 2 uPtr 2 3 3 1 5 6 2 8 3 7 9 10 4 Dependency among face – Data race (write at the same time, difference face) lduMatrix can not be parallelized
  • 20. lduMatrix to CSR format 20 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Compressed Sparse Row (CSR) Widely used sparse matrix format 3. Parallel methodology
  • 21. DIC preconditioner 21 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology #pragma omp parallel for for (label face=0; face<nFaces; face++) { wAPtr[uPtr[face]] -= rDPtr[uPtr[face]]*upperPtr[face]*wAPtr[lPtr[face]]; } src/OpenFOAM/matrices/lduMatrix/preconditioners/DICPreconditioner/DICPreconditioner.C::Foam::DICPreconditioner::precondition() face 0 1 2 lPtr 0 0 2 uPtr 2 3 3 1 5 6 2 8 3 7 9 10 4 Substitusion phase (forward) wA(face=2) uses wA(face=0) – Data race (the result would be changed) Can not be parallelized
  • 22. Cuthill-McKee ordering 22 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology 6 7 8 3 4 5 0 1 2 cell Example : 4 point stencil (Regular mesh) matrix 4 1 1 1 4 1 1 1 4 1 4 1 1 1 1 4 1 1 1 4 1 4 1 1 1 4 1 1 4
  • 23. Cuthill-McKee ordering 23 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology 6 7 8 3 4 5 0 1 2 cell Dependency matrix 4 1 1 1 4 1 1 1 4 1 4 1 1 1 1 4 1 1 1 4 1 4 1 1 1 4 1 1 4
  • 24. Cuthill-McKee ordering 24 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology 6 7 8 3 4 5 0 1 2 cell Independent among colors matrix 4 1 1 1 4 1 1 1 4 1 4 1 1 1 1 4 1 1 1 4 1 4 1 1 1 4 1 1 4  Parallelly executable in the same color
  • 25. Parallelization of DIC-PCG 25 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology Amul: CSR format DIC precondition: Cuthill McKee WAXPBY: parallel elementwise sumMag: parallel reduction sumProd: parallel reduction Whole DIC-PCG can be parallelized
  • 26. Table of Contents 1. Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 4. Performance measurement 26 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 27. Benchmark condition 27 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Case: OpenFOAM Benchmark Test case "channelReTau110“ provided by The Open CAE Society of Japan Solver: pimpleFOAM with DIC-PCG Based OpenFOAM version: the Foundation version 16b559c1 Computer: Intel Ninja Developer Platform (Intel Xeon Phi 7210, 256 logical core, DDR4) Average time of the first five steps – Insert a clock timer manually at the beginning and at the end of each function in the source code 4. Performance measurement
  • 28. Single process result 28 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement PCG is the largest part of whole pimpleFOAM Matrix op. for PCG – DIC::precondition – Amul PCG DIC Amul
  • 29. Step-by-step speedup (0) 29 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement The base version – same as previous page
  • 30. Step-by-step speedup (1) 30 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Change to CSR Longer DIC ??
  • 31. Step-by-step speedup (2) 31 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement ldu-CSR format Divide CSR into – Lower triangular – Upper triangular – Diagonal Went back to original – Improve cache miss
  • 32. Step-by-step speedup (3) 32 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Parallelize matrix op. – Amul – DIC precondition + Cuthill-McKee x2.0
  • 33. Step-by-step speedup (4) 33 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Parallelize vector op. – WAXPBY – sumMag – sumProd x3.4
  • 34. Step-by-step speedup (5) 34 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Change OpenMP setting – From 256 threads – To 64 threads x4.8
  • 35. Achieved speedup without CM x13.5 35 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement CM could be ignored – required only if remeshed
  • 36. vs. flat MPI 36 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation ½ slower
  • 37. Speedup by each function 37 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Single [s] OpenMP [s] Speedup factor Amul 60.0 3.0 x19.8 DIC::precondition 80.1 7.9 x10.1 WAXPBY 21.1 0.4 x53.2 sumMag 6.1 0.1 x44.6 sumProd 12.3 0.3 x43.6 Cuthill-McKee 0.0 24.0 --- (other) 1.1 1.6 x0.7 total 180.8 37.4 x4.8 total excluding CM 180.8 13.4 x13.5
  • 38. Why slower than MPI (1) 38 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation DIC was slow – Theoretical reason • DIC is difficult for thread parallelism – Small number of parallel thread – Implementation reason • Reordering input/output vector can be reduced 4. Performance measurement
  • 39. Why slower than MPI (2) 39 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Expect: Num. of iteration decreases with OpenMP – Convergence on OpenMP is expected better than MPI because OpenMP does not require domain decomposition. – Domain decomposition decrease the convergence Actual: Not decreased – Convergence of the used benchmark is too good 4. Performance measurement
  • 40. Table of Contents 1. Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 5. Conclusion and future work 40 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 41. Conclusion 41 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Achieved – x13.5 speedup over single process – ½ slower over flat MPI Condition with – Channel flow benchmark with regular mesh – DIC-PCG solver 5. Conclusion and future work Very difficult method in very simple problem = the worst condition shows only ½ degradation
  • 42. Future work 42 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Speedup DIC – erase vector reordering More simple preconditioner / solver – Diagonal, GAMG More complicated benchmark – Motorbike, dam-break Multi-node supercomputer – Efficiency of reduction of MPI process 5. Conclusion and future work Please look forward to next our work
  • 43. return 0; Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
  • 44. lduMatrix format 44 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Sparse matrix storing format Used by pimpleFoam (and also by many other solver) Three part – Upper triangular part U: column major – Lower triangular part : row major – Diagonal part D Equivalent to COO format 3. Parallel methodology
  • 45. Example of lduMatrix 45 1 5 6 2 8 3 7 9 10 4 diag = 1, 2, 3, 4 upper = 5, 6, 7 lower = 8, 9, 10 lowerAdder = 0, 0, 2 upperAdder = 2, 3, 3 : Value of diagonal elements : Value of upper triangular elements : Value of lower triangular elements : Column number of upper elements, row of lower : Column number of lower elements, row of upper 𝑈 𝐿 𝐷 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology
  • 46. Example of CSR matrix 46 1 5 6 2 8 3 7 9 10 4 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology data = [1, 5, 6, 2, 8, 3, 7, 9, 10, 4] column = [0, 2, 3, 1, 0, 2, 3, 0, 2, 3] offset = [0, 3, 4, 7, 10] : Element’s value : Element’s column number : Start position of row
  • 47. CSR SpMV parallelization 47 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology #pragma omp parallel for for (label i = 0; i < n; i++) { double y_i = 0.0; for (label index = offset[i]; index < offset[i + 1]; index++) { y_i += data[index] * x[column[index]]; } y[i] = y_i; } Independent among i – Never write at the same time (different i) Can be parallelized

Editor's Notes

  1. Now I’m talking about methodology
  2. And then, we investigate the improvement of thread-parallelized version.
  3. Thank you for listeing.