Submit Search
Upload
A challenge for thread parallelism on OpenFOAM
•
0 likes
•
4,143 views
Fixstars Corporation
Follow
2019年10月15日 - 2019年10月17日 で開催された7th OpenFOAM Conferenceでの発表資料になります。
Read less
Read more
Software
Report
Share
Report
Share
1 of 47
Download Now
Download to read offline
Recommended
CFD for Rotating Machinery using OpenFOAM
CFD for Rotating Machinery using OpenFOAM
Fumiya Nozaki
Boundary Conditions in OpenFOAM
Boundary Conditions in OpenFOAM
Fumiya Nozaki
Dynamic Mesh in OpenFOAM
Dynamic Mesh in OpenFOAM
Fumiya Nozaki
OpenFOAMにおける相変化解析
OpenFOAMにおける相変化解析
takuyayamamoto1800
OpenFOAM の Function Object 機能について
OpenFOAM の Function Object 機能について
Fumiya Nozaki
OpenFOAM の cyclic、cyclicAMI、cyclicACMI 条件について
OpenFOAM の cyclic、cyclicAMI、cyclicACMI 条件について
Fumiya Nozaki
Spatial Interpolation Schemes in OpenFOAM
Spatial Interpolation Schemes in OpenFOAM
Fumiya Nozaki
Turbulence Models in OpenFOAM
Turbulence Models in OpenFOAM
Fumiya Nozaki
More Related Content
What's hot
Limited Gradient Schemes in OpenFOAM
Limited Gradient Schemes in OpenFOAM
Fumiya Nozaki
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
ARPIT SINGHAL
OpenFOAMによる気液2相流解析の基礎と設定例
OpenFOAMによる気液2相流解析の基礎と設定例
takuyayamamoto1800
無償のモデリングソフトウェアCAESESを使ってみた
無償のモデリングソフトウェアCAESESを使ってみた
Fumiya Nozaki
第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)
RCCSRENKEI
PreCICE CHT with OpenFOAM and CalculiX
PreCICE CHT with OpenFOAM and CalculiX
守淑 田村
OpenFOAM Programming Tips
OpenFOAM Programming Tips
Fumiya Nozaki
OpenFOAM の境界条件をまとめよう!
OpenFOAM の境界条件をまとめよう!
Fumiya Nozaki
OpenFOAMにおけるDEM計算の力モデルの解読
OpenFOAMにおけるDEM計算の力モデルの解読
takuyayamamoto1800
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
Fumiya Nozaki
About chtMultiRegionFoam
About chtMultiRegionFoam
守淑 田村
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
Daisuke Matsubara
rhoCentralFoam in OpenFOAM
rhoCentralFoam in OpenFOAM
Daisuke Matsubara
OpenFOAM LES乱流モデルカスタマイズ
OpenFOAM LES乱流モデルカスタマイズ
mmer547
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
takuyayamamoto1800
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
Fumiya Nozaki
OpenFOAMによる混相流シミュレーション入門
OpenFOAMによる混相流シミュレーション入門
takuyayamamoto1800
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
Shinya Takamaeda-Y
OpenFOAMソルバの実行時ベイズ最適化
OpenFOAMソルバの実行時ベイズ最適化
Masashi Imano
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
Fumiya Nozaki
What's hot
(20)
Limited Gradient Schemes in OpenFOAM
Limited Gradient Schemes in OpenFOAM
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
Tutorial to set up a case for chtMultiRegionFoam in OpenFOAM 2.0.0
OpenFOAMによる気液2相流解析の基礎と設定例
OpenFOAMによる気液2相流解析の基礎と設定例
無償のモデリングソフトウェアCAESESを使ってみた
無償のモデリングソフトウェアCAESESを使ってみた
第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)
PreCICE CHT with OpenFOAM and CalculiX
PreCICE CHT with OpenFOAM and CalculiX
OpenFOAM Programming Tips
OpenFOAM Programming Tips
OpenFOAM の境界条件をまとめよう!
OpenFOAM の境界条件をまとめよう!
OpenFOAMにおけるDEM計算の力モデルの解読
OpenFOAMにおけるDEM計算の力モデルの解読
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
OpenFOAM -空間の離散化と係数行列の取り扱い(Spatial Discretization and Coefficient Matrix)-
About chtMultiRegionFoam
About chtMultiRegionFoam
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
桜の花の落ちるスピードは秒速5センチメートルか? 〜OpenFOAM編〜
rhoCentralFoam in OpenFOAM
rhoCentralFoam in OpenFOAM
OpenFOAM LES乱流モデルカスタマイズ
OpenFOAM LES乱流モデルカスタマイズ
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
OpenFOAMのDEM解析のpatchInteractionModelクラスの解読
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
OpenFOAM v2.3.0のチュートリアル 『oscillatingInletACMI2D』
OpenFOAMによる混相流シミュレーション入門
OpenFOAMによる混相流シミュレーション入門
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
OpenFOAMソルバの実行時ベイズ最適化
OpenFOAMソルバの実行時ベイズ最適化
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
OpenFOAMのチュートリアルを作ってみた#1 『くさび油膜効果の計算』
Similar to A challenge for thread parallelism on OpenFOAM
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
NVIDIA
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
OpenACC
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
OpenACC
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
Edge AI and Vision Alliance
HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.
Andrés Gómez
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing Software
Power System Operation
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing Software
Power System Operation
CFD on Power
CFD on Power
Ganesan Narayanasamy
186 devlin p-poster(2)
186 devlin p-poster(2)
vaidehi87
SpeedIT FLOW
SpeedIT FLOW
University of Zurich
Onyx relap5 Presentation from SCS 2019
Onyx relap5 Presentation from SCS 2019
GSE Systems, Inc.
COMPUTER AIDED PROCESS PLANNING (CAPP)
COMPUTER AIDED PROCESS PLANNING (CAPP)
KRUNAL RAVAL
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
MIPI Alliance
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET Journal
OpenACC Monthly Highlights: July 2020
OpenACC Monthly Highlights: July 2020
OpenACC
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
Josh Goergen
Diagnose Your Microservices
Diagnose Your Microservices
Marcus Hirt
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC
OpenACC Monthly Highlights - May and June 2018
OpenACC Monthly Highlights - May and June 2018
NVIDIA
P4 Introduction
P4 Introduction
Netronome
Similar to A challenge for thread parallelism on OpenFOAM
(20)
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights February 2019
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing Software
On Improving Efficiency of Electricity Market Clearing Software
CFD on Power
CFD on Power
186 devlin p-poster(2)
186 devlin p-poster(2)
SpeedIT FLOW
SpeedIT FLOW
Onyx relap5 Presentation from SCS 2019
Onyx relap5 Presentation from SCS 2019
COMPUTER AIDED PROCESS PLANNING (CAPP)
COMPUTER AIDED PROCESS PLANNING (CAPP)
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
MIPI DevCon Taipei 2019: New Trends in the High-Volume Manufacturing Test of ...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
OpenACC Monthly Highlights: July 2020
OpenACC Monthly Highlights: July 2020
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
Diagnose Your Microservices
Diagnose Your Microservices
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC Monthly Highlights - May and June 2018
OpenACC Monthly Highlights - May and June 2018
P4 Introduction
P4 Introduction
More from Fixstars Corporation
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
Fixstars Corporation
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
Fixstars Corporation
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
Fixstars Corporation
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
Fixstars Corporation
株式会社フィックスターズの会社説明資料(抜粋)
株式会社フィックスターズの会社説明資料(抜粋)
Fixstars Corporation
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
Fixstars Corporation
Fpga online seminar by fixstars (1st)
Fpga online seminar by fixstars (1st)
Fixstars Corporation
Jetson活用セミナー ROS2自律走行実現に向けて
Jetson活用セミナー ROS2自律走行実現に向けて
Fixstars Corporation
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門
Fixstars Corporation
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
Fixstars Corporation
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
Fixstars Corporation
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例
Fixstars Corporation
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門
Fixstars Corporation
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
Fixstars Corporation
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
Fixstars Corporation
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
Fixstars Corporation
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
Fixstars Corporation
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
Fixstars Corporation
第8回社内プログラミングコンテスト 結果発表会
第8回社内プログラミングコンテスト 結果発表会
Fixstars Corporation
第8回社内プログラミングコンテスト 第1位 taiyo
第8回社内プログラミングコンテスト 第1位 taiyo
Fixstars Corporation
More from Fixstars Corporation
(20)
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
製造業向け量子コンピュータ時代のDXセミナー_生産計画最適化_20220323.pptx
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
CPU / GPU高速化セミナー!性能モデルの理論と実践:実践編
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
製造業向け量子コンピュータ時代のDXセミナー~ 最適化の中身を覗いてみよう~
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
製造業向け量子コンピュータ時代のDXセミナー ~見える化、分析、予測、その先の最適化へ~
株式会社フィックスターズの会社説明資料(抜粋)
株式会社フィックスターズの会社説明資料(抜粋)
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
Fpga online seminar by fixstars (1st)
Fpga online seminar by fixstars (1st)
Jetson活用セミナー ROS2自律走行実現に向けて
Jetson活用セミナー ROS2自律走行実現に向けて
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
量子コンピュータ時代の製造業におけるDXセミナー~生産工程効率化に向けた新たなご提案~
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
金融業界向けセミナー 量子コンピュータ時代を見据えた組合せ最適化
いまさら聞けないarmを使ったNEONの基礎と活用事例
いまさら聞けないarmを使ったNEONの基礎と活用事例
ARM CPUにおけるSIMDを用いた高速計算入門
ARM CPUにおけるSIMDを用いた高速計算入門
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
ソフト高速化の専門家が教える!AI・IoTエッジデバイスの選び方
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
AIチップ戦国時代における深層学習モデルの推論の最適化と実用的な運用を可能にするソフトウェア技術について
株式会社フィックスターズ 会社説明資料(抜粋)
株式会社フィックスターズ 会社説明資料(抜粋)
第8回社内プログラミングコンテスト 結果発表会
第8回社内プログラミングコンテスト 結果発表会
第8回社内プログラミングコンテスト 第1位 taiyo
第8回社内プログラミングコンテスト 第1位 taiyo
Recently uploaded
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
elliciumsolutionspun
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
Neo4j
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
Raymond Okyere-Forson
Top Software Development Trends in 2024
Top Software Development Trends in 2024
Mind IT Systems
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
Nirav Modi
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
OnePlan Solutions
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
AmeliaSmith90
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
Chiwon Song
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
SoftwareMill
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptx
Prakarsh -
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
Green Software Development
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
wajrcs
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
Brain Inventory
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
Autus Cyber Tech
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
Neo4j
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
Shane Coughlan
online pdf editor software solutions.pdf
online pdf editor software solutions.pdf
Meon Technology
Recently uploaded
(20)
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
Top Software Development Trends in 2024
Top Software Development Trends in 2024
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptx
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Mastering Kubernetes - Basics and Advanced Concepts using Example Project
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
online pdf editor software solutions.pdf
online pdf editor software solutions.pdf
A challenge for thread parallelism on OpenFOAM
1.
A challenge for
thread parallelism on OpenFOAM YOSHIFUJI Naoki* TOMIOKA Minoru FUJIWARA Ko SAWAHARA Masataka ITO Yuki MARUISHI Takafumi Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
2.
Who we are Japanese
software company – Accelerating customer’s software – In any area, any devices Professionals in software speedup – Not manufacturer using CAE software – Not CAE software developer 2 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
3.
Who I am Computational
Civil Engineer – Lead Engineer @ Solution Div., Fixstars Corporation – Doctoral student @ Coastal and Ocean Lab., Nagoya University Interests and professional – High performance computing (HPC) – Computational Fluid Dynamics (CFD) – Speedup software (on from SoC to supercomputer) 3 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Name: YOSHIFUJI Naoki / 𠮷藤 尚生 Online ID: @LWisteria Email: yoshifuji@fixstars.com – Feel free to contact about anythingOnline avatar
4.
What we’ve done x13.5
speedup 1. Abstract • Case: OpenFOAM Benchmark Test case "channelReTau110“ provided by The Open CAE Society of Japan • Solver: pimpleFOAM with DIC-PCG. • Based OpenFOAM version: the Foundation version 16b559c1 • Average time of the first five steps • Computer: Intel Ninja Developer Platform (Intel Xeon Phi 7210, DDR4) 4 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
5.
Abstract 5 Otherwise noted, available
under GPL version 3; ©2019 Fixstars Corporation 1. Experimental implementation with OpenMP 2. Target = pimpleFoam for channel flow benchmark 3. Solver = DIC-PCG, one of the most challenging case for thread parallelism 4. Measured speedup factor is x13.5 (without CM method) over single process with Intel Knights Landing 5. The potential of thread parallelism is shown in this study 6. Improvements and investigation with other cases will continues in the future 1. Abstract
6.
Table of Contents 1.
Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 6 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
7.
Table of Contents 1.
Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 2. Background and motivation 7 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
8.
Modern engineering and
OpenFOAM All product designers and engineers need CAE OpenFOAM is one of the most used CAE software Speedup OpenFOAM is important in modern engineering 2. Background and motivation 8 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
9.
OpenFOAM is slow
in modern computers 2. Background and motivation 9 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Quoted from Imano (2017): “OpenFOAMによる流体解析ベンチマークテスト FOCUS・クラウド・スパコンでのチャネルおよびボックスファン流れ解析”, 第17回PCクラスタシンポジウム, p.19. Copyright 2017 OCAEL All ights reserved.
10.
Difference of computers 2.
Background and motivation 10 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Ancient computer Modern computer Num. of CPU cores Single / a few Many Num. of computer nodes A few Many CPU speed over intra network’s Low High i.e. Num. of MPI processes A few Massive MPI management cost over arithmetic operation Light Heavy MPI communication cost over arithmetic operation Light Heavy
11.
Solution: Parallelism 2. Background
and motivation 11 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Current OpenFOAM This study Framework MPI OpenMP Mechanism Process Thread Data communication Socket Shared memory Target All inter-core In the same node i.e. Management cost Heavy Light Communication cost Heavy Light Using OpenMP could speedup OpenFOAM
12.
Our goal 2. Background
and motivation 12 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 1. Implement thread parallelism with OpenMP for the intra-node parallelism (Hyblid parallel) 2. Measure performance improvement 3. Share the impl. and result to the world CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core OpenMP OpenMP OpenMP OpenMP MPI
13.
Our goal in
this study 2. Background and motivation 13 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation This study shows the progress and the incomplete result pimpleFOAM & DIC-PCG only – To estimate the worst improvement Only single node – Little MPI cost – Expected as fast as flat MPI • Possibility to be faster on the multiple node
14.
Extra motivation for
our business 2. Background and motivation 14 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Outreach to customers – Provide an example by Fixstars’ work – We’re happy if you place an order with us to speedup your software https://www.fixstars.com/en/service/acceleration/ Employee training – Provide an exercise to Fixstars’ engineer – Problem with only CPU is good for the beginner
15.
Table of Contents 1.
Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 3. Parallel methodology 15 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
16.
Target in this
study 16 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Speedup / parallelize solving sparse linear equation – Generally known that it takes the large part of CFD Solver: DIC-PCG – Diagonal Incomplete Cholesky preconditioner – Preconditioned Conjugate Gradient Many-core CPU, only one node Challenge to one of the hardest case for thread parallelism – To estimate the worst improvement 3. Parallel methodology
17.
Components of DIC-PCG 17 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology Amul: Sparse-matrix vector multiply (SpMV) DIC precondition WAXPBY: Vector vector addition sumMag: Sum of absoluted element of vector sumProd: Vector inner product consists only matrix/vector operation. – it is element-independent, thus easy to parallelize (in principle)
18.
Parallelization of DIC-PCG
with OpenMP 18 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Difficult – lduMatrix format for Amul – DIC’s substitution operation Easy – Elementwise operation • WAXPBY – Parallel reduction • sumMag • sumProd 3. Parallel methodology
19.
lduMatrix SpMV parallelization 19 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology #pragma omp parallel for for (label face=0; face<nFaces; face++) { ApsiPtr[uPtr[face]] += lowerPtr[face]*psiPtr[lPtr[face]]; ApsiPtr[lPtr[face]] += upperPtr[face]*psiPtr[uPtr[face]]; } src/OpenFOAM/matrices/lduMatrix/lduMatrix/lduMatrixATmul.C::Foam::lduMatrix::Amul() face 0 1 2 lPtr 0 0 2 uPtr 2 3 3 1 5 6 2 8 3 7 9 10 4 Dependency among face – Data race (write at the same time, difference face) lduMatrix can not be parallelized
20.
lduMatrix to CSR
format 20 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Compressed Sparse Row (CSR) Widely used sparse matrix format 3. Parallel methodology
21.
DIC preconditioner 21 Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology #pragma omp parallel for for (label face=0; face<nFaces; face++) { wAPtr[uPtr[face]] -= rDPtr[uPtr[face]]*upperPtr[face]*wAPtr[lPtr[face]]; } src/OpenFOAM/matrices/lduMatrix/preconditioners/DICPreconditioner/DICPreconditioner.C::Foam::DICPreconditioner::precondition() face 0 1 2 lPtr 0 0 2 uPtr 2 3 3 1 5 6 2 8 3 7 9 10 4 Substitusion phase (forward) wA(face=2) uses wA(face=0) – Data race (the result would be changed) Can not be parallelized
22.
Cuthill-McKee ordering 22 Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology 6 7 8 3 4 5 0 1 2 cell Example : 4 point stencil (Regular mesh) matrix 4 1 1 1 4 1 1 1 4 1 4 1 1 1 1 4 1 1 1 4 1 4 1 1 1 4 1 1 4
23.
Cuthill-McKee ordering 23 Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology 6 7 8 3 4 5 0 1 2 cell Dependency matrix 4 1 1 1 4 1 1 1 4 1 4 1 1 1 1 4 1 1 1 4 1 4 1 1 1 4 1 1 4
24.
Cuthill-McKee ordering 24 Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology 6 7 8 3 4 5 0 1 2 cell Independent among colors matrix 4 1 1 1 4 1 1 1 4 1 4 1 1 1 1 4 1 1 1 4 1 4 1 1 1 4 1 1 4 Parallelly executable in the same color
25.
Parallelization of DIC-PCG 25 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology Amul: CSR format DIC precondition: Cuthill McKee WAXPBY: parallel elementwise sumMag: parallel reduction sumProd: parallel reduction Whole DIC-PCG can be parallelized
26.
Table of Contents 1.
Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 4. Performance measurement 26 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
27.
Benchmark condition 27 Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation Case: OpenFOAM Benchmark Test case "channelReTau110“ provided by The Open CAE Society of Japan Solver: pimpleFOAM with DIC-PCG Based OpenFOAM version: the Foundation version 16b559c1 Computer: Intel Ninja Developer Platform (Intel Xeon Phi 7210, 256 logical core, DDR4) Average time of the first five steps – Insert a clock timer manually at the beginning and at the end of each function in the source code 4. Performance measurement
28.
Single process result 28 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement PCG is the largest part of whole pimpleFOAM Matrix op. for PCG – DIC::precondition – Amul PCG DIC Amul
29.
Step-by-step speedup (0) 29 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement The base version – same as previous page
30.
Step-by-step speedup (1) 30 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Change to CSR Longer DIC ??
31.
Step-by-step speedup (2) 31 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement ldu-CSR format Divide CSR into – Lower triangular – Upper triangular – Diagonal Went back to original – Improve cache miss
32.
Step-by-step speedup (3) 32 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Parallelize matrix op. – Amul – DIC precondition + Cuthill-McKee x2.0
33.
Step-by-step speedup (4) 33 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Parallelize vector op. – WAXPBY – sumMag – sumProd x3.4
34.
Step-by-step speedup (5) 34 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Change OpenMP setting – From 256 threads – To 64 threads x4.8
35.
Achieved speedup without
CM x13.5 35 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement CM could be ignored – required only if remeshed
36.
vs. flat MPI 36 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation ½ slower
37.
Speedup by each
function 37 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 4. Performance measurement Single [s] OpenMP [s] Speedup factor Amul 60.0 3.0 x19.8 DIC::precondition 80.1 7.9 x10.1 WAXPBY 21.1 0.4 x53.2 sumMag 6.1 0.1 x44.6 sumProd 12.3 0.3 x43.6 Cuthill-McKee 0.0 24.0 --- (other) 1.1 1.6 x0.7 total 180.8 37.4 x4.8 total excluding CM 180.8 13.4 x13.5
38.
Why slower than
MPI (1) 38 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation DIC was slow – Theoretical reason • DIC is difficult for thread parallelism – Small number of parallel thread – Implementation reason • Reordering input/output vector can be reduced 4. Performance measurement
39.
Why slower than
MPI (2) 39 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation Expect: Num. of iteration decreases with OpenMP – Convergence on OpenMP is expected better than MPI because OpenMP does not require domain decomposition. – Domain decomposition decrease the convergence Actual: Not decreased – Convergence of the used benchmark is too good 4. Performance measurement
40.
Table of Contents 1.
Abstract 2. Background and motivation 3. Parallel methodology 4. Performance measurement 5. Conclusion and future work 5. Conclusion and future work 40 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation
41.
Conclusion 41 Otherwise noted, available
under GPL version 3; ©2019 Fixstars Corporation Achieved – x13.5 speedup over single process – ½ slower over flat MPI Condition with – Channel flow benchmark with regular mesh – DIC-PCG solver 5. Conclusion and future work Very difficult method in very simple problem = the worst condition shows only ½ degradation
42.
Future work 42 Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation Speedup DIC – erase vector reordering More simple preconditioner / solver – Diagonal, GAMG More complicated benchmark – Motorbike, dam-break Multi-node supercomputer – Efficiency of reduction of MPI process 5. Conclusion and future work Please look forward to next our work
43.
return 0; Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation
44.
lduMatrix format 44 Otherwise noted,
available under GPL version 3; ©2019 Fixstars Corporation Sparse matrix storing format Used by pimpleFoam (and also by many other solver) Three part – Upper triangular part U: column major – Lower triangular part : row major – Diagonal part D Equivalent to COO format 3. Parallel methodology
45.
Example of lduMatrix 45 1
5 6 2 8 3 7 9 10 4 diag = 1, 2, 3, 4 upper = 5, 6, 7 lower = 8, 9, 10 lowerAdder = 0, 0, 2 upperAdder = 2, 3, 3 : Value of diagonal elements : Value of upper triangular elements : Value of lower triangular elements : Column number of upper elements, row of lower : Column number of lower elements, row of upper 𝑈 𝐿 𝐷 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology
46.
Example of CSR
matrix 46 1 5 6 2 8 3 7 9 10 4 Otherwise noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology data = [1, 5, 6, 2, 8, 3, 7, 9, 10, 4] column = [0, 2, 3, 1, 0, 2, 3, 0, 2, 3] offset = [0, 3, 4, 7, 10] : Element’s value : Element’s column number : Start position of row
47.
CSR SpMV parallelization 47 Otherwise
noted, available under GPL version 3; ©2019 Fixstars Corporation 3. Parallel methodology #pragma omp parallel for for (label i = 0; i < n; i++) { double y_i = 0.0; for (label index = offset[i]; index < offset[i + 1]; index++) { y_i += data[index] * x[column[index]]; } y[i] = y_i; } Independent among i – Never write at the same time (different i) Can be parallelized
Editor's Notes
Now I’m talking about methodology
And then, we investigate the improvement of thread-parallelized version.
Thank you for listeing.
Download Now