This document discusses using GPUs and SSDs to accelerate PostgreSQL queries. It introduces PG-Strom, a project that generates CUDA code from SQL to execute queries massively in parallel on GPUs. The document proposes enhancing PG-Strom to directly transfer data from SSDs to GPUs without going through CPU/RAM, in order to filter and join tuples during loading for further acceleration. Challenges include improving the NVIDIA driver for NVMe devices and tracking shared buffer usage to avoid unnecessary transfers. The goal is to maximize query performance by leveraging the high bandwidth and parallelism of GPUs and SSDs.
2. Self Introduction
▌Name: Wasserschwein@Shinagawa
▌PostgreSQL: 9Years (2006~)
▌Works: Security, FDW, etc...
▌Hobby: Mixture of heterogeneous technology
with PostgreSQL
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞2
Very powerful
computing
capability
Very functional
& well-used
database
PG-Strom:
What I’m making
GPGPU
3. What’s PG-Strom – Brief overview
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞3
▌Core ideas
① GPU native code generation on the fly
② Asynchronous massive parallel execution
▌Advantages
Transparent acceleration with 100% query compatibility
Commodity H/W and less system integration cost
Parser
Planner
Executor
Custom-
Scan/Join
Interface
Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid;
PG-Strom
CUDA
driver
nvrtc
DMA Data Transfer
CUDA
Source
code
Massive
Parallel
Execution
4. Supported Workload – Scan, Join, Aggregation
▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat;
t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded.
▌Environment:
PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)
CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞4
0
50
100
150
200
250
300
PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom
2 3 4 5 6 7 8
QueryResponseTime[sec]
# of tables involved
Time consumption per component (PostgreSQL v9.5β vs PG-Strom)
Scan Join Aggregate Others
5. Next target is I/O acceleration – from TPC/DS results
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞5
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Time consumption per workloads (PostgreSQL v9.5beta+PG-Strom)
Scan Join Aggregate Others
So, How to accelerate I/O stuff by GPU?
6. NOTICE
The story I like to introduce next is...
Just my Ideaat this moment
......So, I’ll pay my efforts to implement
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞6
7. A rough x86_64 hardware architecture
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞7
GPU SSD
CPU + RAM CPU + RAM
PCI-E
SAS
Usual I/O bottleneck
8. Simplified diagram for introduction
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞8
GPU SSD
CPU + RAM
PCI-E
OK, it’s storage
9. NVM EXPRESS SSD
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞9
PCI-E direct SSD device – low latency and higher bandwidth
Samsung
SSD 950 PRO
Intel SSD 750
HGST
Ultrastar SN100
Intel
SSD DC P3700
10. Data Flow in analytic queries
① Data load from storage to CPU/RAM
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞10
GPU SSD
CPU + RAM
PCI-E
Table
11. Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞11
GPU SSD
CPU + RAM
PCI-E
Table
12. Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
③ Remove unreferenced columns (Projection)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞12
GPU SSD
CPU + RAM
PCI-E
Table
13. The job of CPU
Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
③ Remove unreferenced columns (Projection)
④ Join with other tables (Join)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞13
GPU SSD
CPU + RAM
PCI-E
Table
+
14. SSD-to-GPU Direct
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞14
Data transfer between SSD
and GPU, bypass CPU/RAM
Also available on NVMe,
not only Fusion-IO
15. Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞15
GPU SSD
CPU + RAM
PCI-E
Table
SSD-to-GPU
Direct
16. Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞16
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
17. Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞17
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Remove invisible rows
according to the scan
qualifiers
18. Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞18
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Only visible rows are
moved to CPU+RAM
Remove invisible rows
according to the scan
qualifiers
19. Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞19
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
20. Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞20
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Remove invisible rows
according to the scan
qualifiers
21. Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞21
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Only visible rows and
referenced columns are
moved to CPU+RAM
Remove invisible rows
according to the scan
qualifiers
Remove invisible rows
and unreferenced
columns according to
the scan qualifiers
and projection
22. Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞22
GPU SSD
CPU + RAM
PCI-E
GPUcodegenerated
fromSQLonthefly
Innerrelations
(JOINtarget)
Table
23. Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞23
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Innerrelations
(JOINtarget)
24. Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞24
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Innerrelations
(JOINtarget)
25. Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞25
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Tuples are already
joined when it read
data from the storage
Innerrelations
(JOINtarget)
+
Generate
joined tuples
on GPU side
26. Primitive Technologies
▌NVIDIA GPUDirect enhancement on NVMe device driver
Interaction between NVMe and NVIDIA drivers are needed
▌Usage statistics of shared_buffers per relations
To avoid SSDGPU direct on relations that is already preloaded
▌Add new access mode to shared_buffers
Nobody can make the buffer dirty under the SSDGPU Direct transfer
We are welcome all the developer
who join to PG-Strom project
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞26