SlideShare a Scribd company logo
1 of 8
Download to read offline
PG-Strom
~A FDW module utilizing GPU device~


                 NEC Europe
         SAP Global Competence Center
  KaiGai Kohei <kohei.kaigai@emea.nec.com>
FDW is fun
                                                           Exec
                                           Regular                    Exec
                                            Table




                   Executor
                                           Foreign          MySQL
                                            Table            FDW
SELECT * FROM …


                                           Foreign          Oracle      Exec
                                            Table           FDW

          Run on
          single                                           PG-Strom
                                           Foreign
          thread                                             FDW
                                            Table
                                                                       Utilizing
                                                                       External
                                                            Regular   Computing
                                                             Table    Resource!



 Page 2                       PostgreSQL Conference 2012
Idea of Asynchronous Execution using CPU and GPU
    vanilla PostgreSQL                                                               PostgreSQL with PG-Strom

           CPU                                                                       CPU                     GPU

                                                                                                    Asynchronous memory
                                                                                                    transfer and execution


                         Iteration of scan tuples and
                            evaluation of qualifiers




                                                                         Synchronization


                                                                    Larger “chunk” to scan
                                                                                                        Earlier than
                                                                     the database at once
                                                                                                      “Only CPU” scan


          : Red means, scan tuples from the database
          : Green means, execution of the qualifiers

 Page 3                                                 PostgreSQL Conference 2012
Architecture of PG-Strom
                                                           World of CPU                World of GPU
          regular       shadow
                                                               Data Exchange
           tables        tables
                                                              via shared chunk
                                                                                             Massive
              shared buffer                     shared chunks                                Parallel
                                                                                            Execution

          Exec                                                               Preload
                      PG-Strom                             PG-Strom                         GPU
                                                              GPU                          Kernel
                                                           Calculation                    Function
                 Executor
                                                             Server                        Exec
                  Backend
                   Backend
              Backend Process
                   Process
                    Process
                                                                                         GPU Device
                                                                           DMA            Memory
                               Postmaster                                Transfer


                                                      PCI-E x16 Gen2 (16GB/sec)


 Page 4                       PostgreSQL Conference 2012
Data Density and Column-base structure
              Foreign Table FT1
                 (a, b, c, d)
                                                                                                                                             Table: my_schema.ft1.c.cs



                                                                           column store of D
                                                       column store of C
               column store of A

                                   column store of B
                                                                                               Shadow                                    10100        {‘2010-10-21’, …}
  rowid map




                                                                                               Tables                                    10200        {‘2011-01-23’, …}
                                                                                                                                         10300        {‘2011-08-17’, …}


                                                                                                                     Table: my_schema.ft1.b.cs
                                                                                                                  10100       {2.4, 5.6, 4.95, … }
              row store of FT1
                                                                                                                  10300     {10.23, 7.54, 5.43, … }


  ② Calculation
                                                                                                                           rowmap
                                                                                                     Chunk Buffer of FT1



                                                       ① Transfer
                                                                                                                           value   a[]       <not used>

                                                                                                                           value   b[]
                                                       ③ Write-Back
                                                                                                                           value   c[]
                                                                                                                           value   d[]       <not used>

 Page 5                                                                                        PostgreSQL Conference 2012
Benchmark Result
 postgres=# SELECT COUNT(*) FROM rtbl
                WHERE sqrt((x-256)^2 + (y-128)^2) < 40;
  count
 -------
  25069
 (1 row)
                                             GPU
                                          Accelerated!
 Time: 3739.492 ms

 postgres=# SELECT COUNT(*) FROM ftbl
                WHERE sqrt((x-256)^2 + (y-128)^2) < 40;
  count
 -------
  25069               X10 times
 (1 row)                Faster
 Time: 227.023 ms


▌CPU: Intel Xeon E5504 (2.0GHz/4core), GPU: Nvidia GeForce GTS450 (128 cuda core)
▌rtbl and ftbl contains 5 million tuples, with same values.
▌All the tuples are already in the shared buffers, so seldom disk i/o happen.

 Page 6                        PostgreSQL Conference 2012
Future Development

▌Git URL
 https://github.com/kaigai/pg_strom
▌v9.3 development
  Writable Foreign Table
  Sort / Aggregate acceleration using GPU
  Inheritance between regular and foreign tables
▌Need your help
  Folks who can review the patches
  Folks who can provide real-life big data
  Folks who can know typical workload of analytic queries




 Page 7                   PostgreSQL Conference 2012
Page 8   PostgreSQL Conference 2012

More Related Content

What's hot

What's hot (20)

20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdw
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerHC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 

Viewers also liked

Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
Ashnikbiz
 

Viewers also liked (10)

Task Parallel Library (TPL)
Task Parallel Library (TPL)Task Parallel Library (TPL)
Task Parallel Library (TPL)
 
TPL Dataflow – зачем и для кого?
TPL Dataflow – зачем и для кого?TPL Dataflow – зачем и для кого?
TPL Dataflow – зачем и для кого?
 
Task Parallel Library 2014
Task Parallel Library 2014Task Parallel Library 2014
Task Parallel Library 2014
 
An Intelligent Storage?
An Intelligent Storage?An Intelligent Storage?
An Intelligent Storage?
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
 
20170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#820170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#8
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 

Similar to PG-Strom - A FDW module utilizing GPU device

2D Games to HPC
2D Games to HPC2D Games to HPC
2D Games to HPC
DVClub
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
Shinya Takamaeda-Y
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
oscon2007
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Förderverein Technische Fakultät
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 

Similar to PG-Strom - A FDW module utilizing GPU device (20)

Pgopencl
PgopenclPgopencl
Pgopencl
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
3 d to _hpc
3 d to _hpc3 d to _hpc
3 d to _hpc
 
3 d to_hpc
3 d to_hpc3 d to_hpc
3 d to_hpc
 
2D Games to HPC
2D Games to HPC2D Games to HPC
2D Games to HPC
 
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - PosterEfficient Parallel Set-Similarity Joins Using MapReduce - Poster
Efficient Parallel Set-Similarity Joins Using MapReduce - Poster
 
LCA 2013 - Baremetal Provisioning with Openstack
LCA 2013 - Baremetal Provisioning with OpenstackLCA 2013 - Baremetal Provisioning with Openstack
LCA 2013 - Baremetal Provisioning with Openstack
 
GPU Virtualization on VMware's Hosted I/O Architecture
GPU Virtualization on VMware's Hosted I/O ArchitectureGPU Virtualization on VMware's Hosted I/O Architecture
GPU Virtualization on VMware's Hosted I/O Architecture
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Gpu Cuda
Gpu CudaGpu Cuda
Gpu Cuda
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 

More from Kohei KaiGai

More from Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

PG-Strom - A FDW module utilizing GPU device

  • 1. PG-Strom ~A FDW module utilizing GPU device~ NEC Europe SAP Global Competence Center KaiGai Kohei <kohei.kaigai@emea.nec.com>
  • 2. FDW is fun Exec Regular Exec Table Executor Foreign MySQL Table FDW SELECT * FROM … Foreign Oracle Exec Table FDW Run on single PG-Strom Foreign thread FDW Table Utilizing External Regular Computing Table Resource! Page 2 PostgreSQL Conference 2012
  • 3. Idea of Asynchronous Execution using CPU and GPU vanilla PostgreSQL PostgreSQL with PG-Strom CPU CPU GPU Asynchronous memory transfer and execution Iteration of scan tuples and evaluation of qualifiers Synchronization Larger “chunk” to scan Earlier than the database at once “Only CPU” scan : Red means, scan tuples from the database : Green means, execution of the qualifiers Page 3 PostgreSQL Conference 2012
  • 4. Architecture of PG-Strom World of CPU World of GPU regular shadow Data Exchange tables tables via shared chunk Massive shared buffer shared chunks Parallel Execution Exec Preload PG-Strom PG-Strom GPU GPU Kernel Calculation Function Executor Server Exec Backend Backend Backend Process Process Process GPU Device DMA Memory Postmaster Transfer PCI-E x16 Gen2 (16GB/sec) Page 4 PostgreSQL Conference 2012
  • 5. Data Density and Column-base structure Foreign Table FT1 (a, b, c, d) Table: my_schema.ft1.c.cs column store of D column store of C column store of A column store of B Shadow 10100 {‘2010-10-21’, …} rowid map Tables 10200 {‘2011-01-23’, …} 10300 {‘2011-08-17’, …} Table: my_schema.ft1.b.cs 10100 {2.4, 5.6, 4.95, … } row store of FT1 10300 {10.23, 7.54, 5.43, … } ② Calculation rowmap Chunk Buffer of FT1 ① Transfer value a[] <not used> value b[] ③ Write-Back value c[] value d[] <not used> Page 5 PostgreSQL Conference 2012
  • 6. Benchmark Result postgres=# SELECT COUNT(*) FROM rtbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 (1 row) GPU Accelerated! Time: 3739.492 ms postgres=# SELECT COUNT(*) FROM ftbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 X10 times (1 row) Faster Time: 227.023 ms ▌CPU: Intel Xeon E5504 (2.0GHz/4core), GPU: Nvidia GeForce GTS450 (128 cuda core) ▌rtbl and ftbl contains 5 million tuples, with same values. ▌All the tuples are already in the shared buffers, so seldom disk i/o happen. Page 6 PostgreSQL Conference 2012
  • 7. Future Development ▌Git URL https://github.com/kaigai/pg_strom ▌v9.3 development  Writable Foreign Table  Sort / Aggregate acceleration using GPU  Inheritance between regular and foreign tables ▌Need your help  Folks who can review the patches  Folks who can provide real-life big data  Folks who can know typical workload of analytic queries Page 7 PostgreSQL Conference 2012
  • 8. Page 8 PostgreSQL Conference 2012