Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

pgconfasia2016 lt ssd2gpu

628 views

Published on

Slides for LT of PGconf.ASIA2016
SSD-to-GPU P2P Transfer

Published in: Technology
  • Login to see the comments

  • Be the first to like this

pgconfasia2016 lt ssd2gpu

  1. 1. Performing SQL with SSD-to-GPU P2P Transfer かぴばらの旦那 / Herr.Wasserschwein <kaigai@kaigai.gr.jp>
  2. 2. The PG-Strom Project Feedbacks under the PG-Strom v1.0 development PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer2 Application Storage Query Optimizer Query Executor PG-Strom Extension SQL Parser Storage Manager GPU 計算集約的ワークロード computing intensive workloads • 統計解析、科学技術計算、 マーケティング、etc... • statistics, science, marketing, ...  by PL/CUDA + Matrix-Array I/O集約的ワークロード (i/o intensive workloads) • DWH, ETL, Reporting, ...  by SSD-to-GPU P2P DMA
  3. 3. The PG-Strom Project x86サーバのアーキテクチャ / Architecture of x86 server PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer3 RAMRAM PCI Bus NVMe-SSD GPU PCIe x16PCIe x4~x8 PCH その他低速デバイス (other slow devices)
  4. 4. The PG-Strom Project x86サーバのアーキテクチャ / Architecture of x86 server PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer4 PCH RAMRAM PCI Bus NVMe-SSD GPU PCIe x16PCIe x4~x8 I/O READ disk block disk buffer カタログスペック (catalog spec) 2GB~6GB/s その他低速デバイス (other slow devices)
  5. 5. The PG-Strom Project x86サーバのアーキテクチャ / Architecture of x86 server PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer5 PCH RAMRAM PCI Bus NVMe-SSD GPU PCIe x16PCIe x4~x8 disk buffer その他低速デバイス (other slow devices)
  6. 6. The PG-Strom Project やりたい事 / What I want to do PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer6 PCH RAMRAM PCI Bus NVMe-SSD GPU PCIe x16PCIe x4~x8 result buffer disk block その他低速デバイス (other slow devices)
  7. 7. The PG-Strom Project やりたい事 / What I want to do PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer7 PCH RAM PCI Bus NVMe-SSD GPU PCIe x16PCIe x4~x8 Large PostgreSQL Tables Small Inner Tables WHERE句 JOIN GROUP BY データサイズが すごく小さく! (making data-size much smaller!) その他低速デバイス (other slow devices)
  8. 8. The PG-Strom Project やりたい事 / What I want to do PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer8 PCH RAM PCI Bus NVMe-SSD GPU PCIe x16PCIe x4~x8 Large PostgreSQL Tables Small Inner Tables WHERE句 JOIN GROUP BY ここまで完成 (works completed) その他低速デバイス (other slow devices)
  9. 9. The PG-Strom Project 要素技術 / Element Technology: GPUDirect RDMA by NVIDIA PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer9 GPUのデバイスメモリを、物理アドレス空間 にマップするためのAPI (API to map GPU’s device memory on physical address space of the host system)  ストレージからのDMA転送先に GPU上のデバイスメモリを指定できる。 (GPU’s device memory can be used for the destination address of DMA from the storage)
  10. 10. The PG-Strom Project NVMe-Stromドライバ / NVMe-Strom Driver PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer10 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver PostgreSQL /proc/nvme-strom read(2) User Space Kernel Space
  11. 11. The PG-Strom Project NVMe-Stromドライバ / NVMe-Strom Driver PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer11 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver GPU device memory PostgreSQL cuMemAlloc() /proc/nvme-strom read(2) User Space Kernel Space
  12. 12. The PG-Strom Project NVMe-Stromドライバ / NVMe-Strom Driver PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer12 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver GPU device memory GPU device memory PostgreSQL cuMemAlloc() /proc/nvme-strom ioctl(2) read(2) User Space Kernel Space
  13. 13. The PG-Strom Project NVMe-Stromドライバ / NVMe-Strom Driver PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer13 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver GPU device memory GPU device memory PostgreSQL file offset block number cuMemAlloc() /proc/nvme-strom ioctl(2) read(2) User Space Kernel Space
  14. 14. The PG-Strom Project NVMe-Stromドライバ / NVMe-Strom Driver PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer14 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver GPU device memory GPU device memory PostgreSQL file offset DMA request block number cuMemAlloc() /proc/nvme-strom ioctl(2) read(2) User Space Kernel Space
  15. 15. The PG-Strom Project NVMe-Stromドライバ / NVMe-Strom Driver PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer15 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver GPU device memory GPU device memory PostgreSQL file offset DMA request block number SSD-to-GPU Peer-to-Peer DMA cuMemAlloc() /proc/nvme-strom ioctl(2) read(2) User Space Kernel Space
  16. 16. The PG-Strom Project 単純I/O性能 / Raw I/O Performance PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer16  32MB x 6個のバッファを使用。バッファが空になる度に非同期DMAをキック  6 of 32MB buffers were used. Async DMA was kicked per  測定環境 / Environment  CPU: Xeon E5-2670 v3, RAM: 64GB  Intel SSD 750 (400GB; PCI-E x4)  NVIDIA Tesla K20c (2496core; 706MHz, 5GB GDDR5; 208GB/s)  OS: CentOS 7 (3.10.0-327.18.2.el7.x86_64), Filesystem: Ext4 カタログスペック! (catalog spec!!)
  17. 17. The PG-Strom Project 測定に使用したNVMe-SSD / NVMe-SSD for this measurement PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer17 容量 (capacity) 順次128KB 読出し (Seq Read) 順次128KB 書込み (Seq Write) ランダム4KB 読出し (Random Read) ランダム4KB 書込み (Random Write) インターフェース (Interface) 400GB 2,200MB/s 900MB/s 430,000 IOPS 230,000 IOPS PCIe 3.0 x4 800GB 2,100MB/s 800MB/s 420,000 IOPS 210,000 IOPS PCIe 3.0 x4 1.2TB 2,500MB/s 1,200MB/s 460,000 IOPS 290,000 IOPS PCIe 3.0 x4  これ以外に、Samsung PM1725 NVMe SSD (1.6TB, 6GB/s) での動作報告あり。 (working at Samsung PM1725 NVMe SSD (1.6TB, 6GB/s) was reported)  Raw-I/OのSSD-to-GPUで5634MB/sを記録との報告 (It said the raw-I/O SSD-to-GPU worked with 5634MB/s)  https://github.com/kaigai/nvme-kmod/issues/1
  18. 18. The PG-Strom Project SQLスキャン性能 / SQL Scan Performance PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer18 ▌この測定結果から分かる事 / What this measurement tells us  既存ストレージ層の性能限界 / Performance limit of the storage layer  64GB / 140sec = 468MB/s  Raw-I/O性能(587MB/s)に20%程度の追加コスト (Extra 20% cost in addition to the raw-i/o throughput (587MB/s))  NVMe-Stromによる改善 / Improvement by NVMe-Strom  スループット / Throughput: 64GB / 43sec = 1524MB/s Existing Limit
  19. 19. The PG-Strom Project 測定に使用したクエリ / Query for the measurement PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer19 CREATE TABLE t_64g (id int not null, x float not null, y float not null, z float not null, memo text); INSERT INTO t_64g (SELECT x, random()*1000, random()*1000, random()*1000, md5(x::text) FROM generate_series(1,700000000) x); postgres=# ¥d+ List of relations Schema | Name | Type | Owner | Size | Description --------+------------------+-------+--------+---------+------------- public | t | table | kaigai | 965 MB | public | t_64g | table | kaigai | 66 GB | Query-1) Scan query with a simple WHERE-clause SELECT * FROM t WHERE x BETWEEN y-2 AND y+2; Query-2) Scan query with a complicated WHERE-clause SELECT * FROM t_64g WHERE sqrt((x-200)^2 + (y-300)^2 + (z-400)^2) < 10; Query-3) Scan query with text matching SELECT * FROM t WHERE memo LIKE '%abcd%';
  20. 20. The PG-Strom Project 開発ロードマップ / Development Roadmap PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer20 • GPUデバイスメモリのホストマッピングと、SSD-to-GPU P2P DMA要求の発行 • Host mapping of GPU device memory, and P2P DMA request for SSD-to-GPU transfer ① NVMe-Strom driver: the basic functionality • PostgreSQL v9.6のCPU並列対応と、NVMe-Stromを使ったP2Pのデータロード • hybrid parallel, and peer-to-peer data loading by NVMe-StromSupport of CPU+GPU ② PG-Strom: Integration with GpuScan + NVMe-Strom • PostgreSQL v9.6の新オプティマイザ対応と、スキャン実装のGpuScanとの統合 • Support of new optimizer in PostgreSQL v9.6, and integration with GpuScan for simple scan ③ PG-Strom: JOIN/GROUP BY Support • RAID-0/1区画に対するストライピングREAD / Striping READ on RAID-0/1 volumes ④ NVMe-Strom driver: RAID-0/1 support • テスト、テスト、テスト、デバッグ / Test, Test, Test, Debug ⑤ 品質改善・安定化 / Quality improvement and stabilization ⑥ PG-Strom v2.0!! (2017/2Q~3Q) いまココ!!
  21. 21. The PG-Strom Project PG-Strom v2.0のターゲット / Target on PG-Strom v2.0 PGconf.ASIA2017 - LT / Performing SQL with SSD-to-GPU P2P Transfer21 PCI-E x8 5.0GB/s PCI-E PCI-E x16 ~10GB/s シングルノードで最大20GB/sのデータ処理能力を目指す (towards 20GB/s data processing capability per node) Dual NVMe-SSD +RAID0/1対応 10GB/sのスループットで SSDブロックをGPUへロード 数千コアによる GPU並列処理 PCI-E PCI-E x8 5.0GB/s PCI-E x8 5.0GB/s PCI-E x16 ~10GB/s PCI-E x8 5.0GB/s Dual NVMe-SSD +RAID0/1 Support Loading SSD blocks to GPU with 10GB/s throughput GPU Parallels by thousands cores
  22. 22. 乞うご期待 don’t miss it

×