pgconfasia2016 lt ssd2gpu

Performing SQL with SSD-to-GPU P2P Transfer
かぴばらの旦那 / Herr.Wasserschwein
<kaigai@kaigai.gr.jp>

The PG-Strom Project
Feedbacks under the PG-Strom v1.0 development
PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer2
Application
Storage
Query
Optimizer
Query
Executor
PG-Strom
Extension
SQL Parser
Storage Manager
GPU
計算集約的ワークロード
computing intensive workloads
• 統計解析、科学技術計算、
マーケティング、etc...
• statistics, science, marketing, ...
 by PL/CUDA + Matrix-Array
I/O集約的ワークロード
(i/o intensive workloads)
• DWH, ETL, Reporting, ...
 by SSD-to-GPU P2P DMA

x86サーバのアーキテクチャ / Architecture of x86 server
RAMRAM
PCI Bus
NVMe-SSD GPU
PCIe x16PCIe x4~x8
PCH
その他低速デバイス
(other slow devices)

PCH
RAMRAM
PCI Bus
NVMe-SSD GPU
PCIe x16PCIe x4~x8
I/O READ
disk
block
disk
buffer
カタログスペック
(catalog spec)
2GB~6GB/s

PCH
RAMRAM
PCI Bus
NVMe-SSD GPU
PCIe x16PCIe x4~x8
disk
buffer

やりたい事 / What I want to do
PCH
RAMRAM
PCI Bus
NVMe-SSD GPU
PCIe x16PCIe x4~x8
result
buffer
disk
block

PCH
RAM
PCI Bus
NVMe-SSD GPU
PCIe x16PCIe x4~x8
Large PostgreSQL Tables
Small Inner Tables
WHERE句
JOIN
GROUP BY
データサイズが
すごく小さく！
(making data-size
much smaller!)

PCH
RAM
PCI Bus
NVMe-SSD GPU
PCIe x16PCIe x4~x8
Large PostgreSQL Tables
Small Inner Tables
WHERE句
JOIN
GROUP BY
ここまで完成
(works completed)

要素技術 / Element Technology： GPUDirect RDMA by NVIDIA
GPUのデバイスメモリを、物理アドレス空間
にマップするためのAPI
(API to map GPU’s device memory
on physical address space of the host system)

ストレージからのDMA転送先に
GPU上のデバイスメモリを指定できる。
(GPU’s device memory can be used for the
destination address of DMA from the storage)

NVMe-Stromドライバ / NVMe-Strom Driver
pg-strom
NVMe-Strom
VFS
Page Cache
NVMe SSD
Driver
nvidia
driver
PostgreSQL
/proc/nvme-strom
read(2)
User
Space
Kernel
Space

pg-strom
NVMe-Strom
VFS
Page Cache
NVMe SSD
Driver
nvidia
driver
GPU
device
memory
PostgreSQL
cuMemAlloc()
/proc/nvme-strom
read(2)
User
Space
Kernel
Space

pg-strom
NVMe-Strom
VFS
Page Cache
NVMe SSD
Driver
nvidia
driver
GPU
device
memory
GPU
device
memory
PostgreSQL
cuMemAlloc()
/proc/nvme-strom
ioctl(2)
read(2)
User
Space
Kernel
Space

pg-strom
NVMe-Strom
VFS
Page Cache
NVMe SSD
Driver
nvidia
driver
GPU
device
memory
GPU
device
memory
PostgreSQL
file offset
block number
cuMemAlloc()
/proc/nvme-strom
ioctl(2)
read(2)
User
Space
Kernel
Space

pg-strom
NVMe-Strom
VFS
Page Cache
NVMe SSD
Driver
nvidia
driver
GPU
device
memory
GPU
device
memory
PostgreSQL
file offset
DMA
request
block number
cuMemAlloc()
/proc/nvme-strom
ioctl(2)
read(2)
User
Space
Kernel
Space

pg-strom
NVMe-Strom
VFS
Page Cache
NVMe SSD
Driver
nvidia
driver
GPU
device
memory
GPU
device
memory
PostgreSQL
file offset
DMA
request
block number
SSD-to-GPU Peer-to-Peer DMA
cuMemAlloc()
/proc/nvme-strom
ioctl(2)
read(2)
User
Space
Kernel
Space

単純I/O性能 / Raw I/O Performance
 32MB x 6個のバッファを使用。バッファが空になる度に非同期DMAをキック
 6 of 32MB buffers were used. Async DMA was kicked per
 測定環境 / Environment
 CPU: Xeon E5-2670 v3, RAM: 64GB
 Intel SSD 750 (400GB; PCI-E x4)
 NVIDIA Tesla K20c (2496core; 706MHz, 5GB GDDR5; 208GB/s)
 OS: CentOS 7 (3.10.0-327.18.2.el7.x86_64), Filesystem: Ext4
カタログスペック！
(catalog spec!!)

測定に使用したNVMe-SSD / NVMe-SSD for this measurement
容量
(capacity)
順次128KB
読出し
(Seq Read)
順次128KB
書込み
(Seq Write)
ランダム4KB
読出し
(Random Read)
ランダム4KB
書込み
(Random Write)
インターフェース
(Interface)
400GB 2,200MB/s 900MB/s 430,000 IOPS 230,000 IOPS PCIe 3.0 x4
800GB 2,100MB/s 800MB/s 420,000 IOPS 210,000 IOPS PCIe 3.0 x4
1.2TB 2,500MB/s 1,200MB/s 460,000 IOPS 290,000 IOPS PCIe 3.0 x4
 これ以外に、Samsung PM1725 NVMe SSD (1.6TB, 6GB/s) での動作報告あり。
(working at Samsung PM1725 NVMe SSD (1.6TB, 6GB/s) was reported)
 Raw-I/OのSSD-to-GPUで5634MB/sを記録との報告
(It said the raw-I/O SSD-to-GPU worked with 5634MB/s)
 https://github.com/kaigai/nvme-kmod/issues/1

SQLスキャン性能 / SQL Scan Performance
▌この測定結果から分かる事 / What this measurement tells us
 既存ストレージ層の性能限界 / Performance limit of the storage layer
 64GB / 140sec = 468MB/s  Raw-I/O性能(587MB/s)に20%程度の追加コスト
(Extra 20% cost in addition to the raw-i/o throughput (587MB/s))
 NVMe-Stromによる改善 / Improvement by NVMe-Strom
 スループット / Throughput： 64GB / 43sec = 1524MB/s
Existing
Limit

測定に使用したクエリ / Query for the measurement
CREATE TABLE t_64g (id int not null,
x float not null,
y float not null,
z float not null,
memo text);
INSERT INTO t_64g (SELECT x, random()*1000, random()*1000,
random()*1000, md5(x::text)
FROM generate_series(1,700000000) x);
postgres=# ¥d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+--------+---------+-------------
public | t | table | kaigai | 965 MB |
public | t_64g | table | kaigai | 66 GB |
Query-1) Scan query with a simple WHERE-clause
SELECT * FROM t WHERE x BETWEEN y-2 AND y+2;
Query-2) Scan query with a complicated WHERE-clause
SELECT * FROM t_64g WHERE sqrt((x-200)^2 + (y-300)^2 +
(z-400)^2) < 10;
Query-3) Scan query with text matching
SELECT * FROM t WHERE memo LIKE '%abcd%';

開発ロードマップ / Development Roadmap
• GPUデバイスメモリのホストマッピングと、SSD-to-GPU P2P DMA要求の発行
• Host mapping of GPU device memory, and P2P DMA request for SSD-to-GPU transfer
① NVMe-Strom driver: the basic functionality
• PostgreSQL v9.6のCPU並列対応と、NVMe-Stromを使ったP2Pのデータロード
• hybrid parallel, and peer-to-peer data loading by NVMe-StromSupport of CPU+GPU
② PG-Strom: Integration with GpuScan + NVMe-Strom
• PostgreSQL v9.6の新オプティマイザ対応と、スキャン実装のGpuScanとの統合
• Support of new optimizer in PostgreSQL v9.6, and integration with GpuScan for simple scan
③ PG-Strom: JOIN/GROUP BY Support
• RAID-0/1区画に対するストライピングREAD / Striping READ on RAID-0/1 volumes
④ NVMe-Strom driver: RAID-0/1 support
• テスト、テスト、テスト、デバッグ / Test, Test, Test, Debug
⑤ 品質改善・安定化 / Quality improvement and stabilization
⑥ PG-Strom v2.0!! （2017/2Q～3Q）
いまココ！！

PG-Strom v2.0のターゲット / Target on PG-Strom v2.0
PCI-E x8
5.0GB/s
PCI-E
PCI-E x16
~10GB/s
シングルノードで最大20GB/sのデータ処理能力を目指す
(towards 20GB/s data processing capability per node)
Dual NVMe-SSD
+RAID0/1対応
10GB/sのスループットで
SSDブロックをGPUへロード
数千コアによる
GPU並列処理
PCI-E
PCI-E x8
5.0GB/s
PCI-E x8
5.0GB/s
PCI-E x16
~10GB/s
PCI-E x8
5.0GB/s
Dual NVMe-SSD
+RAID0/1 Support
Loading SSD blocks to GPU
with 10GB/s throughput
GPU Parallels by
thousands cores

乞うご期待
don’t miss it

pgconfasia2016 lt ssd2gpu

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to pgconfasia2016 lt ssd2gpu

Similar to pgconfasia2016 lt ssd2gpu (20)

More from Kohei KaiGai

More from Kohei KaiGai (15)

Recently uploaded

Recently uploaded (11)

pgconfasia2016 lt ssd2gpu