SlideShare a Scribd company logo
1 of 43
計算機性能の限界点と
その考え方
2018/11/29 さくらインターネット株式会社 さくらインターネット研究所 上級研究員 / 松本直人
(C) Copyright 1996-2017 SAKURA Internet Inc
次世代データベース研究報告より
サーバー/ストレージにおけるストレージネットワークを介したデータ処理の流れ
2
CPU
40/100Gbit/s NIC
CPUCPU
PCI Express 3.0
OS(kernel)
Application
OS(kernel)
PCI Express 3.0
HDD/SSD
OS(kernel)
CPU
OS(kernel)
40/100Gbit/s NIC
CPU
PCI Express 3.0
OS(kernel)
Application
データ参照
データ提供
結果出力
サーバー/ストレージにおけるデータ処理の流れ
DRAM DRAM
データ処理 データ処理
長大なデータ処理の流れから「首長竜」に例えられる
サーバー間でのデータ通信には長大なデータ処理が介在する (技術的な基礎理解)
単位時間(秒)におけるデータサイズと処理性能の比較
3
40Gbit/s Ethernet (DPDK)
47M pps (64Bytes)*
fio RAMDSIK (DDR4)
19M iops (128Bytes)***
NVMe SSD (U.2)
1-3Miops (4KBytes)***
redis GET (localhost/DRAM)
2M rps (2Bytes)**
40Gbit/s Ethernet (line-rate)
3M pps (1500Bytes)*
Apache Ignite (CPU)
250Kops****
SOURCE: Linux 40GbE DPDK Performance / High Speed Packet Processing with Terminator 5 /Chelsio Communications Inc. (2015)*,
redis-benchmark with AMD RYZEN 1800X Intel Kaby Lake (i7-7700K) memo [GET rates: / SAKURA Internet Research Center. (2017/05)**,
SAKURA Internet Research Center Lab test results (2017)***, Apache Ignite on Intel Core i7 (4.5GHz)****,
R = randint(0,100,600000000); a = cp.array(R, dtype=np.uint8) 2.27 sec ; cp.sort(a) 0.54 sec; *****
SAKURA Internet Research Center (2018/05)
cupy.sort (GPU DDR5)
214Mops (uint8)***** 214Mops/byte (GPU)
732kpps/byte (CPU)
148kpps/byte (CPU)
2kpps/byte (NIC)
多量に高速演算処理が必要な場合、高速メモリと演算器を密結合させた構成が良い
1Mrps/byte (CPU)
750iops/byte (CPU)
単位を揃える→
(Ops/byte)
キャッシュ/サービス(API)
従来型の情報共有システムの問題点と次世代データベース領域の課題整理
4
リクエスト振分処理
不特定多数の参照ユーザー(80%)
不特定少数の投稿ユーザー(20%)
データベース/ストレージ
恒久的な
データ保存
整合性チェック
アーカイブ処理
力技と物量による問題解決(現在)
データ参照向けキャッシュ効率最適化/リクエスト振分処理等の改善は今後も課題
データ処理
ライフサイクル
(例)
複雑かつ
高コスト
Appendix/memo
CPU/GPU
5
How to measure your dataflow using Apache Ignite
6
Intel Core i7 (4.5GHz)
AMD Threadripper (3.8GHz)
AMD EPYC (2.1GHz)
(Operations/sec)
Apache Ignite Benchmark
シングルスレッドに特化したプロセスにはCPUクロック性能の高い環境が良い
How to measure your dataflow using cupy & numpy (NVIDIA GPU)
7
SOURCE: SAKURA Internet Research Center. (04/2018) Project Sprig.
import time
import cupy as cp
import numpy as np
from numpy.random import *
start = time.time()
R = randint(0,100,600000000)
end = time.time()
print ( end - start )
start = time.time()
a = np.array(R, dtype=np.uint8)
end = time.time()
print ( end - start )
start = time.time()
np.sort(a)
end = time.time()
print ( end - start )
import time
import cupy as cp
import numpy as np
from numpy.random import *
start = time.time()
R = randint(0,100,600000000)
end = time.time()
print ( end - start )
start = time.time()
a = cp.array(R, dtype=cp.uint8)
end = time.time()
print ( end - start )
start = time.time()
cp.sort(a)
end = time.time()
print ( end - start )
性能比較
numpy
(CPU)
cupy
(GPU)
# apt install python-pip
# pip install --upgrade pip
# pip install --upgrade setuptools
# pip install numpy cupy time
# python
R = randint(0,100,600000000)
R = randint(0,100,600000000)
a = cp.array(R, dtype=np.uint8) 2.27 sec
a = np.array(R, dtype=np.uint8) 0.46 sec
cp.sort(a)
np.sort(a)
numpy
(CPU)
cupy
(GPU)
5.36 sec 15.1sec
5.36 sec 0.54 sec
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off | N/A |
| 29% 27C P8 N/A / 65W | 1205MiB / 1997MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Time (Lower is better)
ROCm with dGPU(AMD GPU) using pyopencl
8
# uname -sr; cat /etc/lsb-release
Linux 4.4.0-116-generic
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS" ( ROCm does not support 17.10)
# lscpu
Model name: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
# lspci | grep VGA
65:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf)
ROCm Platform Supports Two Graphics Core Next (GCN) GPU Generations
GFX8: Radeon RX 480,Radeon RX 470,Radeon RX 460,R9 Nano,Radeon R9 Fury,Radeon R9 Fury X
Radeon Pro WX7100, FirePro S9300 x2 Radeon Vega Frointer Edition, Radeon Instinct: MI6, MI8, and MI25
(https://rocm.github.io/hardware.html)
# apt update
# apt dist-upgrade -y
# apt-get install -y libnuma-dev
# wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
# sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list'
# apt update
# apt-get install -y rocm-dkms
# ln -s /opt/rocm/opencl/lib/x86_64/libOpenCL.so.1 /usr/lib/libOpenCL.so
# usermod -a -G video $LOGNAME
# sync; sync; sync; reboot
# /opt/rocm/opencl/bin/x86_64/clinfo
Platform Version: OpenCL 2.1 AMD-APP.internal (2576.0)
Platform Name: AMD Accelerated Parallel Processing
# apt install python-pip opencl-headers -y
# pip install --upgrade pip
# pip install --upgrade setuptools
# pip install pyopencl
Successfully installed pyopencl-2018.1.1
>>> import numpy as np
>>> import pyopencl as cl
>>> from pyopencl import array as clarray
>>> from pyopencl import algorithm as clalg
>>> ctx = cl.create_some_context(0)
>>> queue = cl.CommandQueue(ctx)
>>> R = np.random.randint(0, 99, 100000000).astype(np.int8)
>>> a = clarray.to_device(queue, R)
>>> b = clalg.copy_if(a, 'ary[i] >= 55')
>>> print b
How to burn your GPU with CUDA9.1
9
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-21-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
# sync; sync; reboot
# apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev
# apt install libxi-dev libglu1-mesa libglu1-mesa-dev gcc-6 g++-6
Download CUDA9.1 from https://developer.nvidia.com/cuda-toolkit
# bash cuda_9.1.85_387.26_linux.run --silent --toolkit --override --no-opengl-libs --driver
# ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
# ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++
# vi ~/.bashrc
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
# source ~/.bashrc
# git clone https://github.com/wilicc/gpu-burn.git
# cd gpu-burn/
# vi Makefile
NVCC=/usr/local/cuda/bin/nvcc
# make
# ./gpu_burn 1000
# watch -n 1 nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.26 Driver Version: 387.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off | N/A |
| 37% 72C P0 N/A / 65W | 1793MiB / 1997MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
How to burn your GPU with CUDA9.1 (MapD Community Edition 3.4.0)
10
# apt install -y curl apt-transport-https
# useradd -U mapd
# ufw disable; ufw enable; ufw allow 9092/tcp; ufw allow 22/tcp
# curl https://releases.mapd.com/ce/mapd-ce-cuda.list | sudo tee /etc/apt/sources.list.d/mapd.list
# curl https://releases.mapd.com/GPG-KEY-mapd | sudo apt-key add -
# apt update
# apt install -y mapd
# vi ~/.bashrc
export MAPD_USER=mapd
export MAPD_GROUP=mapd
export MAPD_STORAGE=/var/lib/mapd
export MAPD_PATH=/opt/mapd
# source ~/.bashrc
# mkdir -p $MAPD_STORAGE
# chown -R $MAPD_USER $MAPD_STORAGE
# cd $MAPD_PATH/systemd
# ./install_mapd_systemd.sh
# cd $MAPD_PATH
# systemctl start mapd_server; systemctl enable mapd_server
# systemctl start mapd_web_server; systemctl enable mapd_web_server
# $MAPD_PATH/insert_sample_data
2) Flights (2008) 10k
2
# $MAPD_PATH/bin/mapdql -t
Password: HyperInteractive
mapdql> SELECT origin_city AS "Origin", dest_city AS "Destination", AVG(airtime) AS "Average Airtime" FROM flights_2008_10k
WHERE distance <= 33 GROUP BY origin_city, dest_city;
Execution time: 1268 ms, Total time: 1269 ms
SOURCE: https://www.mapd.com/platform/download-community/
+----------------------------------------------------------
| NVIDIA-SMI 387.26 Driver Version: 387.26
|-------------------------------+----------------------+---
| GPU Name Persistence-M| Bus-Id Disp.A |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |
|===============================+======================+===
| 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off |
| 29% 27C P0 N/A / 65W | 1449MiB / 1997MiB |
+-------------------------------+----------------------+---
|==========================================================
| 0 5828 C /opt/mapd/bin/mapd_server
+----------------------------------------------------------
Origin|Destination|Average Airtime
West Palm Beach|Tampa|33.81818181818182
Norfolk|Baltimore|36.07142857142857
Ft. Myers|Orlando|28.66666666666667
Indianapolis|Chicago|39.53846153846154
Tampa|West Palm Beach|33.25
Orlando|Ft. Myers|32.58333333333334
Austin|Houston|33.05555555555556
Chicago|Indianapolis|32.7
Baltimore|Norfolk|31.71428571428572
Houston|Austin|29.61111111111111
ROCm with dGPU(AMD GPU) (memo)
11
# uname -sr; cat /etc/lsb-release
Linux 4.4.0-87-generic
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
# lscpu
Model name: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
# lspci | grep VGA
65:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf) / *[Radeon RX 460]
ROCm Platform Supports Two Graphics Core Next (GCN) GPU Generations
GFX8: Radeon RX 480,Radeon RX 470,Radeon RX 460,R9 Nano,Radeon R9 Fury,Radeon R9 Fury X Radeon Pro WX7100, FirePro S9300 x2
Radeon Vega Frointer Edition, Radeon Instinct: MI6, MI8, and MI25 (https://rocm.github.io/hardware.html)
# apt update
# apt dist-upgrade
# apt-get install -y libnuma-dev
# sync; sync; sync; reboot
# wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
# sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list'
# apt-get install -y rocm-dkms
# usermod -a -G video $LOGNAME
# sync; sync; sync; reboot
# /opt/rocm/opencl/bin/x86_64/clinfo
Platform Version: OpenCL 2.1 AMD-APP.internal (2545.0)
Platform Name: AMD Accelerated Parallel Processing
# wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp
# wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cl
# g++ -I /opt/rocm/opencl/include/ ./HelloWorld.cpp -o HelloWorld -L/opt/rocm/opencl/lib/x86_64 -lOpenCL
# ./HelloWorld
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120
... 2985 2988 2991 2994 2997
Executed program succesfully.
AMDGPU ROCm Tensorflow 1.8 install memo (not support Ubuntu 1804)
12
# uname -sr; tail -2 /etc/lsb-release
Linux 4.4.0-131-generic
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
# lscpi
17:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf)
# apt update
# apt dist-upgrade
# apt install -y libnuma-dev wget python3-pip
# sync; sync; sync; reboot
# wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | apt-key add -
# vi /etc/apt/sources.list.d/rocm.list
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main
# apt update
# apt install -y rocm-dkms
# usermod -a -G video $LOGNAME
# sync; sync; sync; reboot
# apt install -y rocm-libs miopen-hip cxlactivitylogger
# sync; sync; sync; reboot
# wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# git clone https://github.com/tensorflow/models.git
# python3 classify_image.py
# cd ; git clone https://github.com/tensorflow/tensorflow.git
# cd tensorflow/
# python3 tensorflow/examples/speech_commands/train.py
# watch -n 1 /opt/rocm/bin/rocm-smi
==================== ROCm System Management Interface ====================
================================================================================
GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD
0 35c 21.82W 1210Mhz 300Mhz 0.0% auto 0% 0%
================================================================================
==================== End of ROCm SMI Log ====================
2018-09-02 10:40:10.368117:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451]
Found device 0 with
properties:
name: Device 67ef
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.21
pciBusID 0000:17:00.0
Total memory: 2.00GiB
Free memory: 1.75GiB
Adding visible gpu
devices: 0
Device interconnect
Created TensorFlow device
(/job:localhost/replica:0/task:0/device:
GPU:0 with 1567 MB memory) -> physical GPU
(device: 0, name: Device 67ef, pci bus id:
0000:17:00.0)
AMDGPU ROCm Tensorflow 1.8 (classify_image.py)
13
# wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# git clone https://github.com/tensorflow/models.git
# python3 classify_image.py
2018-09-02 10:40:10.368117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with
properties:
name: Device 67ef
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.21
pciBusID 0000:17:00.0
Total memory: 2.00GiB
Free memory: 1.75GiB
2018-09-02 10:40:10.368135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
2018-09-02 10:40:10.368153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect
StreamExecutor with strength 1 edge matrix:
2018-09-02 10:40:10.368162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995] 0
2018-09-02 10:40:10.368175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0: N
2018-09-02 10:40:10.368207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device
(/job:localhost/replica:0/task:0/device:GPU:0 with 1567 MB memory) -> physical GPU (device: 0, name: Device
/opt/rocm/miopen/share/miopen/db/gfx803_14.cd.pdb.txt
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296)
custard apple (score = 0.00147)
earthstar (score = 0.00117)
#
AMDGPU ROCm Tensorflow 1.8 (speech_commands/train.py)
14
# git clone https://github.com/tensorflow/tensorflow.git
# cd tensorflow/
# python3 tensorflow/examples/speech_commands/train.py
2018-09-02 10:43:36.924800: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions
that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.21
pciBusID 0000:17:00.0
Total memory: 2.00GiB
Free memory: 1.75GiB
:
INFO:tensorflow:Step #1: rate 0.001000, accuracy 9.0%, cross entropy 2.724346
INFO:tensorflow:Step #2: rate 0.001000, accuracy 9.0%, cross entropy 2.521507
:
INFO:tensorflow:Saving to "/tmp/speech_commands_train/conv.ckpt-4300"
INFO:tensorflow:Step #4301: rate 0.001000, accuracy 65.0%, cross entropy 1.094288
INFO:tensorflow:Step #4302: rate 0.001000, accuracy 69.0%, cross entropy 0.876309
:
# /opt/rocm/bin/rocm-smi
GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD
0 52c 44.230W 1172Mhz 1750Mhz 0.0% auto 0% 0%
# top
top - 10:58:10 up 25 min, 2 users, load average: 1.51, 1.29, 0.89
Tasks: 222 total, 2 running, 220 sleeping, 0 stopped, 0 zombie
%Cpu0 : 6.2 us, 1.7 sy, 0.0 ni, 92.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 5.6 us, 2.8 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 8.3 us, 3.1 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 6.4 us, 2.7 sy, 0.0 ni, 90.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 9.8 us, 3.7 sy, 0.0 ni, 86.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 8.4 us, 3.0 sy, 0.0 ni, 88.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 5.4 us, 2.3 sy, 0.0 ni, 92.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 3.4 us, 2.0 sy, 0.0 ni, 94.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 3.4 us, 1.7 sy, 0.0 ni, 94.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 3.7 us, 1.7 sy, 0.0 ni, 94.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 6.0 us, 2.7 sy, 0.0 ni, 91.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 4.4 us, 2.0 sy, 0.0 ni, 93.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
Appendix/memo
NVMe SSD/SPDK/DRAM
15
In-Memory Computing for FASTDATA using fio with RAMDISK(DDR4)
16
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-21-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# lshw -c cpu
product: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
# lshw -class memory
description: DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
# mkdir /ramdisk
# mount -t tmpfs tmpfs /ramdisk
# fio -directory=/ramdisk -rw=read -bs=* -size=1G -numjobs=16 -runtime=10 -group_reporting -name=data
64GB RAMDISK (fio block size: Bytes) with Core i7-7800X OverClocked 5GHz
19.9M IOPS 18.6M IOPS
16.3M IOPS
12.6M IOPS
7.8M IOPS
4.6M IOPS
2.4M IOPS
1.2M IOPS
(Bytes)
How To Configure NVMe over Fabrics using MLNX_OFED <DRAFT>
17
NVME Target Configuration
# ./mlnxofedinstall --add-kernel-support --with-nvmf
# modprobe mlx5_core
# modprobe nvmet
# modprobe nvmet-rdma
# modprobe nvme-rdma
# mkdir /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name
# cd /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name
# echo 1 > attr_allow_any_host
# mkdir namespaces/10
# cd namespaces/10
# echo -n /dev/nvme0n1> device_path
# echo 1 > enable
# mkdir /sys/kernel/config/nvmet/ports/1
# cd /sys/kernel/config/nvmet/ports/1
# ip addr add 1.1.1.1/24 dev enp2s0f0
# echo 1.1.1.1 > addr_traddr
# echo rdma > addr_trtype
# echo 4420 > addr_trsvcid
# echo ipv4 > addr_adrfam
# ln -s /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name /sys/kernel/config/nvmet/ports/1/subsystems/nvme-subsystem-name
NVMe Client (Initiator) Configuration
# ./mlnxofedinstall --add-kernel-support --with-nvmf
# modprobe mlx5_core
# modprobe nvme-rdma
# git clone https://github.com/linux-nvme/nvme-cli.git
# cd nvme-cli
# make
# make install
# nvme discover -t rdma -a 1.1.1.1 -s 4420
# nvme connect -t rdma -n nvme-subsystem-name -a 1.1.1.1 -s 4420
# nvme disconnect -d /dev/nvme0n1
Intel SPDK(Storage Performance Development Kit) benchmark
18
# uname -sr;
Linux 4.10.0-40-generic
# apt-get install libnuma-dev git uuid-dev libaio-dev libcunit1-dev libcunit1 libssl-dev g++ -y
# cd /opt/; git clone https://github.com/axboe/fio
# cd fio; git checkout -b fio-2.21
# make; make install
# cd /opt/; git clone https://github.com/spdk/spdk
# cd sdpk; git submodule update --init
# ./configure --with-fio=/opt/fio/
# make
# /opt/spdk/scripts/setup.sh
# fio --name=nvme --numjobs=8 --filename="trtype=PCIe traddr=0000.01.00.0 ns=1" --bs=4K --iodepth=4
--ioengine=/opt/spdk/examples/nvme/fio_plugin/fio_plugin
--group_reporting --size=50% --runtime=100 --thread=8 --rw=read
nvme: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=spdk, iodepth=4
...
fio-3.2-19-g609ac1
Starting 8 threads
Starting DPDK 17.11.0 initialization...
[ DPDK EAL parameters: fio -c 0x1 -m 512 --file-prefix=spdk_pid18356 ]
EAL: Detected 8 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:2700 spdk_nvme
nvme: (groupid=0, jobs=8): err= 0: pid=18367: Mon Nov 27 15:36:06 2017
read: IOPS=572k, BW=2236MiB/s (2345MB/s)(218GiB/100001msec)
slat (nsec): min=91, max=471828, avg=200.94, stdev=122.85
clat (usec): min=9, max=13319, avg=55.44, stdev= 7.84
lat (usec): min=14, max=13319, avg=55.64, stdev= 7.84
clat percentiles (usec):
| 1.00th=[ 48], 5.00th=[ 50], 10.00th=[ 50], 20.00th=[ 51],
| 30.00th=[ 52], 40.00th=[ 53], 50.00th=[ 53], 60.00th=[ 54],
| 70.00th=[ 56], 80.00th=[ 60], 90.00th=[ 64], 95.00th=[ 67],
| 99.00th=[ 88], 99.50th=[ 91], 99.90th=[ 100], 99.95th=[ 111],
| 99.99th=[ 121]
bw ( KiB/s): min=242664, max=310392, per=12.50%, avg=286296.77, stdev=11653.87, samples=1592
iops : min=60666, max=77598, avg=71574.18, stdev=2913.46, samples=1592
lat (usec) : 10=0.01%, 20=0.01%, 50=9.44%, 100=90.46%, 250=0.09%
lat (usec) : 500=0.01%, 750=0.01%
lat (msec) : 2=0.01%, 20=0.01%
In-Memory Database Registration Performance Check (Intel vs AMD)
19
Purley# uname -sr; cat /etc/redhat-release
Linux 3.10.0-514.el7.x86_64
CentOS Linux release 7.3.1611 (Core)
Purley# grep proc /proc/cpuinfo | wc -l
48
Purley# lscpu
Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
RYZEN# uname -sr; cat /etc/debian_version
Linux 4.10.0-19-generic
stretch/sid
RYZEN# grep proc /proc/cpuinfo | wc -l
16
RYZEN# lscpu
Model name: AMD Ryzen 7 1800X Eight-Core Processor
redisはデータサイズに応じてプロセスあたりの処理性能に低下が確認できる
In-Memory Database Performance Check
20
Intel Purley
AMD Ryzen
Xeon Phi(KNL)
# uname -sr; cat /etc/redhat-release
Linux 3.10.0-514.el7.x86_64
CentOS Linux release 7.3.1611 (Core)
# grep proc /proc/cpuinfo | wc -l
48
# lscpu
Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
ALL FLASH DATACENTER & IN-MEMORY COMPUTING: HOT TOPICS
21
SOURCE: SAKURA Internet Research Center. (2017/10), Project Sprig.
ClickHouse column-oriented database Install memo
22
# uname -sr; cat /etc/issue
Linux 4.10.0-35-generic
Ubuntu 17.04
# apt install software-properties-common
# apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4
# apt-add-repository "deb http://repo.yandex.ru/clickhouse/trusty stable main"
# apt-get update
# apt-get install clickhouse-server-common clickhouse-client -y
# service clickhouse-server start
# clickhouse-client --multiline
ClickHouse client version 1.1.54304.
Connecting to localhost:9000.
Connected to ClickHouse server version 1.1.54304.
:) CREATE TABLE ontime
(
Year UInt16,
Quarter UInt8,
Month UInt8,
:
Div5TailNum String
)
ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
or
# xz -v -c -d < ontime.csv.xz | clickhouse-client --query="INSERT INTO ontime FORMAT CSV"
MariaDB ColumnStore column-oriented database Install memo
23
# uname -sr; cat /etc/redhat-release
Linux 3.10.0-514.el7.x86_64
Red Hat Enterprise Linux Server release 7.4 (Maipo)
# mkdir mcs; cd mcs;
# wget https://downloads.mariadb.com/ColumnStore/1.0.11/centos/x86_64/7/mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz
# tar xzvf ./mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz
# yum install boost boost-devel boost-doc expect perl-DBD-MySQL -y
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-client.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-server.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-libs.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-shared.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-client.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-server.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-platform.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-storage-engine.rpm -vh
# /usr/local/mariadb/columnstore/bin/postConfigure
Select the type of System Server install [1=single, 2=multi] (2) > 1
Enter System Name (columnstore-1) > sprig-1
Select the type of Data Storage [1=internal, 2=external, 3=GlusterFS] (1) > 1
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) > 1
# . /usr/local/mariadb/columnstore/bin/columnstoreAlias
# mcsadmin
MariaDB ColumnStore Admin Console
enter 'help' for list of commands
enter 'exit' to exit the MariaDB ColumnStore Command Console
use up/down arrows to recall commands
mcsadmin>
Appendix/memo
eBPF/XDP/DPDK
24
Quagga with ROUTE_MULTIPATH (memo)
25
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-21-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# grep ROUTE_MULTIPATH /usr/src/*/.config
CONFIG_IP_ROUTE_MULTIPATH=y
# apt-get install -y quagga traceroute
# vi /etc/sysctl.conf
net.ipv4.conf.all.forwarding=1
net.ipv4.fib_multipath_hash_policy = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.default.arp_filter = 1
net.ipv6.conf.all.forwarding=1
net.ipv6.route.max_size = 32768
net.ipv6.xfrm6_gc_thresh = 32768
# touch /etc/quagga/zebra.conf
# touch /etc/quagga/ospfd.conf
# touch /etc/quagga/ospf6d.conf
# chown quagga.quaggavty /etc/quagga/*.conf
# chmod 640 /etc/quagga/*.conf
# ufw disable
# vi /etc/quagga/daemons
zebra=yes
ospfd=yes
ospf6d=yes
# echo VTYSH_PAGER=more >> /etc/environment
# sync; sync; sync; reboot
# vtysh
Quagga with ROUTE_MULTIPATH
My First XDP (eXpress Data Path)
26
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-21-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# apt install -y make gcc libssl-dev bc libelf-dev libcap-dev clang
# apt install -y gcc-multilib llvm libncurses5-dev git bison flex pkg-config
# apt install -y libmnl0 libmnl-dev clang libasm1 libasm-dev
# mkdir /usr/local/include/asm
# ln -s /usr/include/x86_64-linux-gnu/asm/* /usr/local/include/asm
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
# ./configure --prefix=/sbin
# cd iproute2/
# make; make install
# vi xdp_example.c
#include <linux/bpf.h>
#ifndef __section
# define __section(NAME) __attribute__((section(NAME), used))
#endif
__section("prog")
int xdp_drop(struct xdp_md *ctx)
{
return XDP_DROP;
}
char __license[] __section("license") = "GPL";
# clang -O2 -Wall -target bpf -c xdp_example.c -o xdp_example.o
# ip link set dev eth0 xdp obj xdp_example.o
# ip link set dev eth0 xdp of
SOURCE: https://github.com/torvalds/linux/tree/master/samples/bpf,
http://cilium.readthedocs.io/en/latest/bpf/#llvm,
http://vger.kernel.org/netconf2017_files/XDP_devel_update_NetConf2017_Seoul.pdf,
http://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html,
https://netdevconf.org/1.2/slides/oct7/10_nic_viljoen_eBPF_Offload_to_Hardware__cls_bpf_and_XDP_finalised.pdf,
https://people.netfilter.org/hawk/presentations/NetDev2.2_2017/XDP_for_the_Rest_of_Us_Part_2.pdf,
XDP – eXpress Data Path
My First F-Stack
27
#lscpu
Model name: AMD Ryzen Threadripper 1900X 8-Core Processor
# uname -sr; cat /etc/lsb-release
Linux 4.10.0-35-generic
DISTRIB_CODENAME=zesty
DISTRIB_DESCRIPTION="Ubuntu 17.04"
# cd /opt
# git clone https://github.com/F-Stack/f-stack.git
# /opt/f-stack/dpdk/tools/dpdk-setup.sh
[15] x86_64-native-linuxapp-gcc
Option: 15
# echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# mkdir /mnt/huge
# mount -t hugetlbfs nodev /mnt/huge
# echo 0 > /proc/sys/kernel/randomize_va_space
# modprobe uio
# insmod /opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
# insmod /opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/kmod/rte_kni.ko
# export FF_PATH=/opt/f-stack/
# export FF_DPDK=/opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/
# cd /root/f-stack/lib
# make ; make ; make ; make install
# cd /opt/f-stack/app/nginx-1.11.10
# ./configure --prefix=/usr/local/nginx_fstack --with-ff_module --without-http_rewrite_module
# make
# make install
# grep f-stack /usr/local/nginx_fstack/conf/nginx.conf
fstack_conf f-stack.conf;
# grep addr /usr/local/nginx_fstack/conf/f-stack.conf
addr=192.168.1.2
Copyright © 2018. Tencent Cloud All rights reserved.
My First FD.io VPP (Segment Routing for IPv6 / L3VPN for IPv4 traffic)
28
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-21-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# vi /etc/apt/sources.list.d/99fd.io.list
deb [trusted=yes] https://nexus.fd.io/content/repositories/fd.io.ubuntu.xenial.main/ ./
# apt-get update
# apt-get install -y vpp-lib vpp vpp-plugins
# service vpp start
# service vpp status
● vpp.service - vector packet processing engine
Loaded: loaded (/lib/systemd/system/vpp.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-02-13 09:30:25 JST; 21s ago
:
CGroup: /system.slice/vpp.service
mq2011 /usr/bin/vpp -c /etc/vpp/startup.conf
# vppctl
vpp# set sr encaps source addr C1::
vpp# sr policy add bsid C1::999:2 next C2:: next C4::4 encap
vpp# sr steer l3 1.1.1.0/24 via sr policy bsid C1::999:2
:
vpp# sr localsid address C4::4 behavior end.dx4 GigabitEthernet0/6/0 1.1.1.1
vpp# show sr localsid
SRv6 - My LocalSID Table:
=========================
Address: c4::4
Behavior: DX4 (Endpoint with decapsulation and IPv4 cross-connect)
Iface: GigabitEthernet0/6/0
Next hop: 1.1.1.1
SOURCE: VPP/Segment Routing for IPv6 (https://wiki.fd.io/view/VPP/Segment_Routing_for_IPv6)
© 2017 FD.io is a Linux Foundation Project. All Rights Reserved.
FD.io VPP with XeonPhi (Basic Configuration)
29
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-21-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# lscpu
CPU(s): 256
Model name: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
# vi /etc/apt/sources.list.d/99fd.io.list
deb [trusted=yes] https://nexus.fd.io/content/repositories/fd.io.ubuntu.xenial.main/ ./
# apt-get update
# apt install vpp vpp-lib vpp-plugins python-pip
# pip install vpp-config
# vpp-config
5) Execute some basic tests.
Command: 5
1) List/Create Simple IPv4 Setup
Command: 1
Would you like to keep this configuration [Y/n]? n
Would you like add address to interface GigabitEthernet4/0/1 [Y/n]? Y
Please enter the IPv4 Address [n.n.n.n/n]: 1.1.1.11/24
# vi /etc/vpp/startup.conf
unix {
nodaemon
log /var/log/vpp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
exec /usr/local/vpp/vpp-config/scripts/set_int_ipv4_and_up
}
# sync; sync; sync; reboot
© 2017 FD.io is a Linux Foundation Project. All Rights Reserved.
# vppctl
# show int
Name Idx State
GigabitEthernet4/0/1 1 up
# show int addr
GigabitEthernet4/0/1 (up):
1.1.1.11/24
FD.io VPP with XeonPhi (Load Balancer plugin)
30
# vppctl
# show int addr
GigabitEthernet4/0/1 (up):
1.1.1.11/24
# lb conf ip4-src-address 1.1.1.11 timeout 3
# lb vip 1.2.3.4/32 encap gre4 new_len 1024
# lb as 1.2.3.4/32 1.1.1.8 1.1.1.9 1.1.1.10
# show lb vips 1.2.3.4
ip4-gre4 1.2.3.4/32 new_size:1024 #as:3
Application Server(1.1.1.8,9,10) side Configuration
# ip tunnel add tun0 mode gre local 1.1.1.8 remote 1.1.1.11 ttl 255
# ifconfig tun0 1.2.3.4/32 up
# echo 1 > /proc/sys/net/ipv4/conf/tun0/arp_ignore
# echo 2 > /proc/sys/net/ipv4/conf/tun0/arp_announce
# echo 0 > /proc/sys/net/ipv4/conf/tun0/rp_filter
# echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
# echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
# echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
© 2017 FD.io is a Linux Foundation Project. All Rights Reserved.
1.1.1.8
1.1.1.11
1.1.1.9
(1.2.3.4/32)
tun0
(1.2.3.4/32)
tun0
GRE Tunnels
IP routing
(1.2.3.4/32)
Application
Server
Application
Server
Direct
Server
Responce
(DSR)
FD.io
VPP
Quagga with ROUTE_MULTIPATH for BGP load balancing (memo)
31
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-36-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# grep ROUTE_MULTIPATH /usr/src/*/.config
/usr/src/linux-headers-4.13.0-36-generic/.config:CONFIG_IP_ROUTE_MULTIPATH=y
# apt install -y quagga traceroute
# touch /etc/quagga/zebra.conf; touch /etc/quagga/bgpd.conf; chown quagga.quaggavty /etc/quagga/*.conf
# chmod 640 /etc/quagga/*.conf
# ufw disable ; echo VTYSH_PAGER=more >> /etc/environment
# vi /etc/quagga/daemons
zebra=yes
bgpd=yes
# sync; sync; sync; reboot
# vtysh
# router bgp 65001
# bgp router-id 1.1.1.1
# bgp bestpath as-path multipath-relax
# bgp bestpath compare-routerid
# redistribute connected
# neighbor 1.1.1.2 remote-as 65002
# neighbor 1.1.1.3 remote-as 65003
# maximum-paths 64
# interface lo
# ip address 1.2.3.4/24
# router bgp 65002
# bgp router-id 1.1.1.2
# bgp bestpath as-path multipath-relax
# bgp bestpath compare-routerid
# redistribute connected
# neighbor 1.1.1.1 remote-as 65001
# maximum-paths 64
# interface lo
# ip address 1.2.3.4/24
# router bgp 65003
# bgp router-id 1.1.1.3
# bgp bestpath as-path multipath-relax
# bgp bestpath compare-routerid
# redistribute connected
# neighbor 1.1.1.1 remote-as 65001
# maximum-paths 64
# show ip bgp
BGP table version is 0, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
Network Next Hop Metric LocPrf Weight Path
*> 1.2.3.0/24 1.1.1.2 0 0 65002
*= 1.1.1.3 0 0 65003
FD.io VPP tap-inject with sample_plugins
32
© 2017 FD.io is a Linux Foundation Project. All Rights Reserved.
# uname -sr; cat /etc/lsb-release
Linux 4.13.0-37-generic
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# echo VTYSH_PAGER=more >> /etc/environment
# apt install -y quagga
# touch /etc/quagga/zebra.conf
# touch /etc/quagga/bgpd.conf
# chown quagga.quaggavty /etc/quagga/*.conf
# chmod 640 /etc/quagga/*.conf
# ufw disable
# vi /etc/quagga/daemons
zebra=yes
bgpd=yes
# sync; sync; sync; reboot
# apt install build-essential -y
# cd /opt/
# git clone https://gerrit.fd.io/r/vpp
# git clone https://gerrit.fd.io/r/vppsb
# cd /opt/vpp
# ./extras/vagrant/build.sh
# make install-dep; make bootstrap; make build
# vi /opt/vppsb/router/router/tap_inject_node.c
#include <sys/uio.h>
# ln -sf /opt/vppsb/netlink
# ln -sf /opt/vppsb/router
# ln -sf /opt/vppsb/netlink/netlink.mk build-data/packages/
# ln -sf /opt/vppsb/router/router.mk build-data/packages/
# cd build-root/
# make V=0 PLATFORM=vpp TAG=vpp_debug netlink-install router-install
# dpkg -i *.deb
# cp -p /opt/vpp/build-root/install-vpp_debug-native/router/lib64/router.so.0.0.0 /usr/lib/vpp_plugins/router.so
# service vpp restart
# vppctl enable tap-inject
# vppctl show tap-inject
GigabitEthernet13/0/0 -> vpp1
GigabitEthernetb/0/0 -> vpp0
# vtysh (quagga)
# configure terminal
(config)# interface vpp0
(config-if)# ip address 192.168.11.100/24
(config-if)# exit
(config)# exit
# write
# quit
# vppctl show int addr
GigabitEthernetb/0/0 (up):
L3 192.168.11.100/24
L3 fe80::20c:29ff:fe24:af28/64
# /opt/vpp/src/examples/sample-plugin
# libtoolize
# aclocal
# autoconf
# autoheader
# automake --add-missing
# chmod +x configure
# ./configure
# make
# make install
GigabitEthernetb/0/0
vpp0
vpp_plugins / router.so
vpp_plugins / sample_plugin.so
quagga
FD.io VPP 18.07 with Ubuntu 16.04.5 LTS (not support Ubunutu 1804)
33
# uname -sr; tail -2 /etc/lsb-release
Linux 4.4.0-131-generic
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
# apt remove --purge vpp*
# vi /etc/apt/sources.list.d/99fd.io.list
deb [trusted=yes]
https://nexus.fd.io/content/repositories/fd.io.stable.1807.ubuntu.xenial.main/ ./
# apt update
# apt dist-upgrade -y
# apt install -y vpp vpp-lib vpp-plugins vpp-dpdk-dkms
# vppctl show pci
Address Sock VID:PID Link Speed Driver Product Name
0000:05:00.0 0 8086:1539 2.5 GT/s x1 uio_pci_generic
0000:65:00.0 0 8086:1584 8.0 GT/s x8 uio_pci_generic XL710 40GbE Controller
# vi /etc/vpp/startup.conf
dpdk {
dev 0000:65:00.0
}
# service vpp restart
# service vpp status
Active: active (running) since Tue 2018-09-04 18:50:02 JST; 2s ago
# vppctl set int ip address FortyGigabitEthernet65/0/0 1.2.3.4/24
# vppctl set int state FortyGigabitEthernet65/0/0 up
# vppctl show interface addr
FortyGigabitEthernet65/0/0 (up):
L3 1.2.3.4/24
# vppctl show version
vpp v18.07-rc2~6-gdb6d6b3~b28 built by root on 10268b67c8b1 at Mon Jul 30 ...
# vi /etc/apt/sources.list
deb http://security.ubuntu.com/ubuntu bionic-security main
# apt update
# apt install libssl1.1 -y
Download form http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc2/
# dpkg -i linux-headers-4.19.0-041900rc2_...rc2.201809022230_all.deb
# dpkg -i linux-headers-4.19.0-041900rc2-...rc2.201809022230_amd64.deb
# dpkg -i linux-headers-4.19.0-041900rc2-...rc2.201809022230_amd64.deb
# dpkg -i linux-image-unsigned-4.19.0-......rc2.201809022230_amd64.deb
# sync; sync; sync; reboot
# uname -sr; tail -2 /etc/lsb-release
Linux 4.19.0-041900rc2-generic
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
# vppctl show int
# (it does not works)
NOTICE: does not works with kernel 4.19-rc2
© 2018 The Fast Data Project. Copyright © 2018 FD.IO Project a Series of LF Projects, LLC
Appendix/memo
etc
34
My First Intel/Movidius NCS
35
SOURCE: SAKURA Internet Rsearch Center. (2017/07) Project Sprig.
$ sudo su
# apt-get update ; apt-get upgrade -y
# mkdir /opt/mvncsdk ; cd /opt/mvncsdk/
GoTo: https://developer.movidius.com/getting-started
# wget https://ncs-forum-uploads.s3.amazonaws.com/ncsdk/MvNC_SDK_01_07_07/MvNC_SDK_1.07.07.tgz
# tar zxvf MvNC_SDK_1.07.07.tgz ; tar zxvf MvNC_Toolkit-1.07.06.tgz ; tar xzvf ./MvNC_API-1.07.07.tgz
# ./bin/setup.sh ; ./bin/data/dlnets.sh
# source ~/.bashrc
# cd /opt/mvncsdk/ncapi/; ./setup.sh ; cd ./c_examples/ ; make
# ./ncs-fullcheck -l2 -c1 ../networks/AlexNet ../images/cat.jpg
Device 0 Address: 2 - VID/PID 03e7:2150
Starting wait for connect with 2000ms timeout
Found Address: 2 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 825136 bytes...
Successfully sent 825136 bytes of data in 35.764553 ms (22.002540 MB/s)
Boot successful, device address 2
Found Address: 2 - VID/PID 040e:f63b
done
Booted 2 -> VSC
OpenDevice 2 succeeded
Graph allocated
:
$ uname -sr;
Linux 4.8.0-36-generic (Ubuntu 16.04.02)
$ lsusb -v
Device Descriptor:
iProduct 2 Movidius MA2X5X
MaxPower 500mA
© Copyright Movidius 2017. All Rights Reserved.
UP Board AI Core Configuration memo
36
# uname -sr; cat /etc/lsb-release
Linux 4.4.0-116-generic
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
# lshw
*-pci:1
*-usb
description: USB controller
product: FL1100 USB 3.0 Host Controller
*-usbhost:1
*-usb UNCLAIMED
description: Generic USB device
product: Movidius MA2X5X
vendor: Movidius Ltd.
# git clone -b ncsdk2 http://github.com/Movidius/ncsdk && cd ncsdk && make install
# export PYTHONPATH="${PYTHONPATH}:/opt/movidius/caffe/python"
# cd /examples/tensorflow/inception_v3
# cat run.py
image_filename = path_to_images + 'nps_electric_guitar.png'
devices = mvnc.enumerate_devices()
# python3 run.py
Number of categories: 1001
Start download to NCS...
*******************************************************************************
inception-v3 on NCS
*******************************************************************************
547 electric guitar 0.988281
403 acoustic guitar 0.00751877
715 pick, plectrum, plectron 0.0014801
421 banjo 0.000901222
820 stage 0.000654221
*******************************************************************************
Finished
Copyright 2018 Up Board | All Rights Reserved
USB 3.0 CAPTURE HDMI 4K with Loop-through for Image redistribution
37
# uname -sr; tail -1 /etc/redhat-release
Linux 3.10.0-862.9.1.el7.x86_64
CentOS Linux release 7.4.1708 (Core)
# yum install -y usbutils hwinfo mplayer v4l-utils ffmpeg git
# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 5000M
|__ Port 4: Dev 2, If 9, Class=Human Interface Device, Driver=usbhid, 5000M
# lsusb -vv
# hwinfo --usb
# v4l2-ctl --list-devices
USB Capture HDMI 4K+ (usb-0000:00:14.0-4):
/dev/video0
# v4l2-ctl -d /dev/video0 --info
# v4l2-ctl --list-formats-ext -d /dev/video0
Type : Video Capture
Name : YUV 4:2:2 (YUYV)
Size: Discrete 4096x2160
Interval: Discrete 0.017s (60.000 fps)
# wget https://libav.org/releases/libav-12.3.tar.xz
# tar Jxvf ./libav-12.3.tar.xz; cd libav-12.3
# ./configure --disable-yasm; make; make install
# avconv -f video4linux2 -input_format nv12 -s 1920x1080 -i /dev/video0 -qscale 10 out.mpeg
Input #0, video4linux2, from '/dev/video0':
Duration: N/A, start: 1240.062083, bitrate: 1492992 kb/s
nv12, 1920x1080, 1492992 kb/s
60 fps, 1000k tbn
# ffmpeg -f v4l2 -list_formats all -i /dev/video0
[video4linux2,v4l2 @ 0x24114c0] Raw
: yuyv422 : YUV 4:2:2 (YUYV) :
640x360 640x480 720x480 720x576 768x576 800x600
856x480 960x540 1024x576 1024x768 1280x720 1280x800
1280x960 1280x1024 1368x768 1440x900 1600x1200 1680x1050
1920x1080 1920x1200 2048x1080 2560x1440 3840x2160 4096x2160
[video4linux2,v4l2 @ 0x24114c0] Raw
: nv12 : YUV 4:2:0 (NV12) :
640x360 640x480 720x480 720x576 768x576 800x600
856x480 960x540 1024x576 1024x768 1280x720 1280x800
1280x960 1280x1024 1368x768 1440x900 1600x1200 1680x1050
1920x1080 1920x1200 2048x1080 2560x1440 3840x2160 4096x2160
© 2018, Nanjing Magewell Electronics Co., Ltd
SERVERSERVER
HDMI output
SERVER SERVER
HDMI CAPTURE
SERVER
HDMI CAPTURE
HDMI Loop-Trough HDMI Loop-Trough HDMI Loop-TroughHDMI Loop-Trough
POWER ON/OS DOWN POWER ON/OS DOWN
HDMI CAPTURE HDMI CAPTURE
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
ORIGINAL
READ once
COPY COPY
AMD Threadripper 1900X overview/spec
38
# uname -sr
Linux 4.10.0-19-generic
# vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="pci=noaer"
# update-grub; sync; sync; sync; reboot
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
:
Model name: AMD Ryzen Threadripper 1900X 8-Core Processor
CPU MHz: 3800.000
CPU max MHz: 3800.0000
CPU min MHz: 2200.0000
BogoMIPS: 7585.39
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht
syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx cpb
hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf
arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov
succor smca
AMD EPYC 7251 overview/spec
39
# uname -sr ; cat /etc/redhat-release
Linux 3.10.0-693.5.2.el7.x86_64
CentOS Linux release 7.4.1708 (Core)
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7251 8-Core Processor
Stepping: 2
CPU MHz: 1200.000
CPU max MHz: 2100.0000
CPU min MHz: 1200.0000
BogoMIPS: 4199.47
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 4096K
NUMA node0 CPU(s): 0,1,16,17
NUMA node1 CPU(s): 2,3,18,19
NUMA node2 CPU(s): 4,5,20,21
NUMA node3 CPU(s): 6,7,22,23
NUMA node4 CPU(s): 8,9,24,25
NUMA node5 CPU(s): 10,11,26,27
NUMA node6 CPU(s): 12,13,28,29
NUMA node7 CPU(s): 14,15,30,31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2
movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext
perfctr_core perfctr_nb bpext perfctr_l2 cpb hw_pstate avic fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 arat
npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov succor smca
Appendix: SSD DC P4800X and cost/performance analysis
40
(Cost)
512GB DDR4 DRAM
(2666Hz/ECC)
$6,399.68
SOURCE: © 2016 Colfax International. , © 2000-2017 Newegg Inc. / SAKURA Internet Research Center. (08/2017) Project Sprig.
750GB 3D XPOINT/NVMe SSD
(P4800X x2)
$3,790.00
30GB (300M records)
sort: 296 sec
200GB (2,000M records)
sort: 4,648 sec
192GB DDR4 DRAM
(2666Hz/ECC)
$2,399.88-
In-Memory Computing
All Flash Computing
Processing Size: 6.7x
Processing Cost: 1.5x
Processing Time: 15x
In-Memory Computing
# gensort -a 2000000000 test
# time sort --parallel=52 -T /memdrv test -o out
# gensort -a 300000000 test
# time sort --parallel=52 -T /ramdisk test -o out
Processing Size: 2.7x
Processing Cost: 2.7x
Processing Time: N/A
Appendix: stream_openmp performance check
41
Xeon Phi
AMD RYZEN
Xeon
Xeon
Special Thanks: Takefumi Miyoshi
Appendix: Network Application Benchmark result (iperf with 20 servers)
42
How to measure your dataflow using fio, pktgen and bandwidthTest
43
WRITE: 12,648MB/s (bs=256KB)
READ: 13,793MB/s (bs=256KB)
RAMDISK DDR4 2133MHz 16GB x4
40Mpps (pkt=64B) 2,560MB/s
40GbE (max rate) 5,000MB/s
Mellanox Connect X-4
Intel(R) Core(TM)
i7-7800X CPU @ 3.50GHz
WRITE: 12,648MB/s (bs=256KB)
READ: 13,793MB/s (bs=256KB)
RAMDISK DDR4 2133MHz 16GB x4
Intel(R) Core(TM)
i7-7800X CPU @ 3.50GHz
# cd /opt
# git clone git://dpdk.org/dpdk
# git clone git://dpdk.org/apps/pktgen-dpdk
export RTE_SDK=/opt/dpdk
export RTE_TARGET=x86_64-native-linuxapp-gcc
# sysctl vm.nr_hugepages=2048
# cd /opt/dpdk
# make install T=x86_64-native-linuxapp-gcc
# /opt/dpdk/usertools/dpdk-devbind.py -u 0b:00.0
# /opt/dpdk/usertools/dpdk-devbind.py -u 13:00.0
# /opt/dpdk/usertools/dpdk-devbind.py -b igb_uio 0b:00.0
# /opt/dpdk/usertools/dpdk-devbind.py -b igb_uio 13:00.0
# /opt/dpdk/usertools/dpdk-devbind.py --status
# cd /opt/pktgen-dpdk/
# make
# /opt/pktgen-dpdk/tools/setup.sh
# /opt/pktgen-dpdk/app/x86_64-native-linuxapp-gcc/pktgen -- -m "1.0, 2.1"
Intel(R) Core(TM)
i7-7800X CPU @ 3.50GHz
Host to Device: 6,029MB/s
Device to Host: 6,448MB/s
GeForce GTX 1050
WRITE: 2,000MB/s (bs=4KB)
READ: 2,500MB/s (bs=4KB)
IntelOpate 900P (3DXP)
WRITE: 2,000MB/s (bs=4KB)
READ: 2,500MB/s (bs=4KB)
IntelOpate 900P (3DXP)
# mount -t tmpfs -o size=32G tmpfs /ramdisk
# fio --directory=/ramdisk --rw=write --bs=4k --size=1G --numjobs=3 
--runtime=100 --group_reporting --name=data
# bash cuda_9.1.85_387.26_linux.run --silent --toolkit --override 
--no-opengl-libs --driver
:
# cd NVIDIA_CUDA-9.1_Samples/1_Utilities/bandwidthTest
# ./bandwidthTest

More Related Content

What's hot

Low Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPFLow Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPFAkshay Kapoor
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Multicloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRPMulticloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRPBob Melander
 
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuJ.J. Ciarlante
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationAndrew Hutchings
 
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafObtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafInfluxData
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwnARUN DN
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
 
Python profiling
Python profilingPython profiling
Python profilingdreampuf
 
Gstreamer Basics
Gstreamer BasicsGstreamer Basics
Gstreamer Basicsidrajeev
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Igalia
 
Slurm @ 2018 LabTech
Slurm @  2018 LabTechSlurm @  2018 LabTech
Slurm @ 2018 LabTechTin Ho
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer PériodiqueAnne Nicolas
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsBrendan Gregg
 

What's hot (19)

Ping to Pong
Ping to PongPing to Pong
Ping to Pong
 
Low Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPFLow Overhead System Tracing with eBPF
Low Overhead System Tracing with eBPF
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Multicloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRPMulticloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRP
 
SOFA Tutorial
SOFA TutorialSOFA Tutorial
SOFA Tutorial
 
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with Juju
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
 
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and TelegrafObtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
Obtaining the Perfect Smoke By Monitoring Your BBQ with InfluxDB and Telegraf
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 
Python profiling
Python profilingPython profiling
Python profiling
 
The Quantum Physics of Java
The Quantum Physics of JavaThe Quantum Physics of Java
The Quantum Physics of Java
 
Gstreamer Basics
Gstreamer BasicsGstreamer Basics
Gstreamer Basics
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
 
Slurm @ 2018 LabTech
Slurm @  2018 LabTechSlurm @  2018 LabTech
Slurm @ 2018 LabTech
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer Périodique
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
 

Similar to 計算機性能の限界点とその考え方

Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdfTigabu Yaya
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSPeterAndreasEntschev
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdwKohei KaiGai
 
What the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupWhat the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupGerrit Grunwald
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 

Similar to 計算機性能の限界点とその考え方 (20)

Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
~Ns2~
~Ns2~~Ns2~
~Ns2~
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
Debug generic process
Debug generic processDebug generic process
Debug generic process
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
What the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupWhat the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startup
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 

More from Naoto MATSUMOTO

Alder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature MonitoringAlder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature MonitoringNaoto MATSUMOTO
 
CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化Naoto MATSUMOTO
 
2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)Naoto MATSUMOTO
 
防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察Naoto MATSUMOTO
 
旅するパケットの見える化
旅するパケットの見える化旅するパケットの見える化
旅するパケットの見える化Naoto MATSUMOTO
 
LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91Naoto MATSUMOTO
 
災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化Naoto MATSUMOTO
 
BeautifulSoup / selenium Deep dive
BeautifulSoup / selenium Deep diveBeautifulSoup / selenium Deep dive
BeautifulSoup / selenium Deep diveNaoto MATSUMOTO
 
Network Adapter Deep dive
Network Adapter Deep diveNetwork Adapter Deep dive
Network Adapter Deep diveNaoto MATSUMOTO
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep diveNaoto MATSUMOTO
 
ADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheetADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheetNaoto MATSUMOTO
 
3/4G USB modem Cheat Sheet
3/4G USB modem Cheat Sheet3/4G USB modem Cheat Sheet
3/4G USB modem Cheat SheetNaoto MATSUMOTO
 
How To Train Your ARM(SBC)
How To  Train Your ARM(SBC)How To  Train Your ARM(SBC)
How To Train Your ARM(SBC)Naoto MATSUMOTO
 
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~Naoto MATSUMOTO
 
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)Naoto MATSUMOTO
 
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化Naoto MATSUMOTO
 

More from Naoto MATSUMOTO (20)

Alder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature MonitoringAlder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature Monitoring
 
CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化
 
5Gの見える化
5Gの見える化5Gの見える化
5Gの見える化
 
2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)
 
防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察
 
旅するパケットの見える化
旅するパケットの見える化旅するパケットの見える化
旅するパケットの見える化
 
LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91
 
災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化
 
BeautifulSoup / selenium Deep dive
BeautifulSoup / selenium Deep diveBeautifulSoup / selenium Deep dive
BeautifulSoup / selenium Deep dive
 
AMDGPU ROCm Deep dive
AMDGPU ROCm Deep diveAMDGPU ROCm Deep dive
AMDGPU ROCm Deep dive
 
Network Adapter Deep dive
Network Adapter Deep diveNetwork Adapter Deep dive
Network Adapter Deep dive
 
RTL2838 DVB-T Deep dive
RTL2838 DVB-T Deep diveRTL2838 DVB-T Deep dive
RTL2838 DVB-T Deep dive
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep dive
 
ADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheetADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheet
 
curl --http3 cheatsheet
curl --http3 cheatsheetcurl --http3 cheatsheet
curl --http3 cheatsheet
 
3/4G USB modem Cheat Sheet
3/4G USB modem Cheat Sheet3/4G USB modem Cheat Sheet
3/4G USB modem Cheat Sheet
 
How To Train Your ARM(SBC)
How To  Train Your ARM(SBC)How To  Train Your ARM(SBC)
How To Train Your ARM(SBC)
 
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
 
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
 
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

計算機性能の限界点とその考え方

  • 1. 計算機性能の限界点と その考え方 2018/11/29 さくらインターネット株式会社 さくらインターネット研究所 上級研究員 / 松本直人 (C) Copyright 1996-2017 SAKURA Internet Inc 次世代データベース研究報告より
  • 2. サーバー/ストレージにおけるストレージネットワークを介したデータ処理の流れ 2 CPU 40/100Gbit/s NIC CPUCPU PCI Express 3.0 OS(kernel) Application OS(kernel) PCI Express 3.0 HDD/SSD OS(kernel) CPU OS(kernel) 40/100Gbit/s NIC CPU PCI Express 3.0 OS(kernel) Application データ参照 データ提供 結果出力 サーバー/ストレージにおけるデータ処理の流れ DRAM DRAM データ処理 データ処理 長大なデータ処理の流れから「首長竜」に例えられる サーバー間でのデータ通信には長大なデータ処理が介在する (技術的な基礎理解)
  • 3. 単位時間(秒)におけるデータサイズと処理性能の比較 3 40Gbit/s Ethernet (DPDK) 47M pps (64Bytes)* fio RAMDSIK (DDR4) 19M iops (128Bytes)*** NVMe SSD (U.2) 1-3Miops (4KBytes)*** redis GET (localhost/DRAM) 2M rps (2Bytes)** 40Gbit/s Ethernet (line-rate) 3M pps (1500Bytes)* Apache Ignite (CPU) 250Kops**** SOURCE: Linux 40GbE DPDK Performance / High Speed Packet Processing with Terminator 5 /Chelsio Communications Inc. (2015)*, redis-benchmark with AMD RYZEN 1800X Intel Kaby Lake (i7-7700K) memo [GET rates: / SAKURA Internet Research Center. (2017/05)**, SAKURA Internet Research Center Lab test results (2017)***, Apache Ignite on Intel Core i7 (4.5GHz)****, R = randint(0,100,600000000); a = cp.array(R, dtype=np.uint8) 2.27 sec ; cp.sort(a) 0.54 sec; ***** SAKURA Internet Research Center (2018/05) cupy.sort (GPU DDR5) 214Mops (uint8)***** 214Mops/byte (GPU) 732kpps/byte (CPU) 148kpps/byte (CPU) 2kpps/byte (NIC) 多量に高速演算処理が必要な場合、高速メモリと演算器を密結合させた構成が良い 1Mrps/byte (CPU) 750iops/byte (CPU) 単位を揃える→ (Ops/byte)
  • 6. How to measure your dataflow using Apache Ignite 6 Intel Core i7 (4.5GHz) AMD Threadripper (3.8GHz) AMD EPYC (2.1GHz) (Operations/sec) Apache Ignite Benchmark シングルスレッドに特化したプロセスにはCPUクロック性能の高い環境が良い
  • 7. How to measure your dataflow using cupy & numpy (NVIDIA GPU) 7 SOURCE: SAKURA Internet Research Center. (04/2018) Project Sprig. import time import cupy as cp import numpy as np from numpy.random import * start = time.time() R = randint(0,100,600000000) end = time.time() print ( end - start ) start = time.time() a = np.array(R, dtype=np.uint8) end = time.time() print ( end - start ) start = time.time() np.sort(a) end = time.time() print ( end - start ) import time import cupy as cp import numpy as np from numpy.random import * start = time.time() R = randint(0,100,600000000) end = time.time() print ( end - start ) start = time.time() a = cp.array(R, dtype=cp.uint8) end = time.time() print ( end - start ) start = time.time() cp.sort(a) end = time.time() print ( end - start ) 性能比較 numpy (CPU) cupy (GPU) # apt install python-pip # pip install --upgrade pip # pip install --upgrade setuptools # pip install numpy cupy time # python R = randint(0,100,600000000) R = randint(0,100,600000000) a = cp.array(R, dtype=np.uint8) 2.27 sec a = np.array(R, dtype=np.uint8) 0.46 sec cp.sort(a) np.sort(a) numpy (CPU) cupy (GPU) 5.36 sec 15.1sec 5.36 sec 0.54 sec | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off | N/A | | 29% 27C P8 N/A / 65W | 1205MiB / 1997MiB | 0% Default | +-------------------------------+----------------------+----------------------+ Time (Lower is better)
  • 8. ROCm with dGPU(AMD GPU) using pyopencl 8 # uname -sr; cat /etc/lsb-release Linux 4.4.0-116-generic DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS" ( ROCm does not support 17.10) # lscpu Model name: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz # lspci | grep VGA 65:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf) ROCm Platform Supports Two Graphics Core Next (GCN) GPU Generations GFX8: Radeon RX 480,Radeon RX 470,Radeon RX 460,R9 Nano,Radeon R9 Fury,Radeon R9 Fury X Radeon Pro WX7100, FirePro S9300 x2 Radeon Vega Frointer Edition, Radeon Instinct: MI6, MI8, and MI25 (https://rocm.github.io/hardware.html) # apt update # apt dist-upgrade -y # apt-get install -y libnuma-dev # wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - # sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list' # apt update # apt-get install -y rocm-dkms # ln -s /opt/rocm/opencl/lib/x86_64/libOpenCL.so.1 /usr/lib/libOpenCL.so # usermod -a -G video $LOGNAME # sync; sync; sync; reboot # /opt/rocm/opencl/bin/x86_64/clinfo Platform Version: OpenCL 2.1 AMD-APP.internal (2576.0) Platform Name: AMD Accelerated Parallel Processing # apt install python-pip opencl-headers -y # pip install --upgrade pip # pip install --upgrade setuptools # pip install pyopencl Successfully installed pyopencl-2018.1.1 >>> import numpy as np >>> import pyopencl as cl >>> from pyopencl import array as clarray >>> from pyopencl import algorithm as clalg >>> ctx = cl.create_some_context(0) >>> queue = cl.CommandQueue(ctx) >>> R = np.random.randint(0, 99, 100000000).astype(np.int8) >>> a = clarray.to_device(queue, R) >>> b = clalg.copy_if(a, 'ary[i] >= 55') >>> print b
  • 9. How to burn your GPU with CUDA9.1 9 # uname -sr; cat /etc/lsb-release Linux 4.13.0-21-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # vi /etc/modprobe.d/blacklist-nouveau.conf blacklist nouveau options nouveau modeset=0 # sync; sync; reboot # apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev # apt install libxi-dev libglu1-mesa libglu1-mesa-dev gcc-6 g++-6 Download CUDA9.1 from https://developer.nvidia.com/cuda-toolkit # bash cuda_9.1.85_387.26_linux.run --silent --toolkit --override --no-opengl-libs --driver # ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc # ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++ # vi ~/.bashrc export PATH=/usr/local/cuda/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export CUDA_HOME=/usr/local/cuda # source ~/.bashrc # git clone https://github.com/wilicc/gpu-burn.git # cd gpu-burn/ # vi Makefile NVCC=/usr/local/cuda/bin/nvcc # make # ./gpu_burn 1000 # watch -n 1 nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 387.26 Driver Version: 387.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off | N/A | | 37% 72C P0 N/A / 65W | 1793MiB / 1997MiB | 100% Default | +-------------------------------+----------------------+----------------------+
  • 10. How to burn your GPU with CUDA9.1 (MapD Community Edition 3.4.0) 10 # apt install -y curl apt-transport-https # useradd -U mapd # ufw disable; ufw enable; ufw allow 9092/tcp; ufw allow 22/tcp # curl https://releases.mapd.com/ce/mapd-ce-cuda.list | sudo tee /etc/apt/sources.list.d/mapd.list # curl https://releases.mapd.com/GPG-KEY-mapd | sudo apt-key add - # apt update # apt install -y mapd # vi ~/.bashrc export MAPD_USER=mapd export MAPD_GROUP=mapd export MAPD_STORAGE=/var/lib/mapd export MAPD_PATH=/opt/mapd # source ~/.bashrc # mkdir -p $MAPD_STORAGE # chown -R $MAPD_USER $MAPD_STORAGE # cd $MAPD_PATH/systemd # ./install_mapd_systemd.sh # cd $MAPD_PATH # systemctl start mapd_server; systemctl enable mapd_server # systemctl start mapd_web_server; systemctl enable mapd_web_server # $MAPD_PATH/insert_sample_data 2) Flights (2008) 10k 2 # $MAPD_PATH/bin/mapdql -t Password: HyperInteractive mapdql> SELECT origin_city AS "Origin", dest_city AS "Destination", AVG(airtime) AS "Average Airtime" FROM flights_2008_10k WHERE distance <= 33 GROUP BY origin_city, dest_city; Execution time: 1268 ms, Total time: 1269 ms SOURCE: https://www.mapd.com/platform/download-community/ +---------------------------------------------------------- | NVIDIA-SMI 387.26 Driver Version: 387.26 |-------------------------------+----------------------+--- | GPU Name Persistence-M| Bus-Id Disp.A | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | |===============================+======================+=== | 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off | | 29% 27C P0 N/A / 65W | 1449MiB / 1997MiB | +-------------------------------+----------------------+--- |========================================================== | 0 5828 C /opt/mapd/bin/mapd_server +---------------------------------------------------------- Origin|Destination|Average Airtime West Palm Beach|Tampa|33.81818181818182 Norfolk|Baltimore|36.07142857142857 Ft. Myers|Orlando|28.66666666666667 Indianapolis|Chicago|39.53846153846154 Tampa|West Palm Beach|33.25 Orlando|Ft. Myers|32.58333333333334 Austin|Houston|33.05555555555556 Chicago|Indianapolis|32.7 Baltimore|Norfolk|31.71428571428572 Houston|Austin|29.61111111111111
  • 11. ROCm with dGPU(AMD GPU) (memo) 11 # uname -sr; cat /etc/lsb-release Linux 4.4.0-87-generic DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS" # lscpu Model name: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz # lspci | grep VGA 65:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf) / *[Radeon RX 460] ROCm Platform Supports Two Graphics Core Next (GCN) GPU Generations GFX8: Radeon RX 480,Radeon RX 470,Radeon RX 460,R9 Nano,Radeon R9 Fury,Radeon R9 Fury X Radeon Pro WX7100, FirePro S9300 x2 Radeon Vega Frointer Edition, Radeon Instinct: MI6, MI8, and MI25 (https://rocm.github.io/hardware.html) # apt update # apt dist-upgrade # apt-get install -y libnuma-dev # sync; sync; sync; reboot # wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - # sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list' # apt-get install -y rocm-dkms # usermod -a -G video $LOGNAME # sync; sync; sync; reboot # /opt/rocm/opencl/bin/x86_64/clinfo Platform Version: OpenCL 2.1 AMD-APP.internal (2545.0) Platform Name: AMD Accelerated Parallel Processing # wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp # wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cl # g++ -I /opt/rocm/opencl/include/ ./HelloWorld.cpp -o HelloWorld -L/opt/rocm/opencl/lib/x86_64 -lOpenCL # ./HelloWorld 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 ... 2985 2988 2991 2994 2997 Executed program succesfully.
  • 12. AMDGPU ROCm Tensorflow 1.8 install memo (not support Ubuntu 1804) 12 # uname -sr; tail -2 /etc/lsb-release Linux 4.4.0-131-generic DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS" # lscpi 17:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf) # apt update # apt dist-upgrade # apt install -y libnuma-dev wget python3-pip # sync; sync; sync; reboot # wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | apt-key add - # vi /etc/apt/sources.list.d/rocm.list deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main # apt update # apt install -y rocm-dkms # usermod -a -G video $LOGNAME # sync; sync; sync; reboot # apt install -y rocm-libs miopen-hip cxlactivitylogger # sync; sync; sync; reboot # wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl # pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl # git clone https://github.com/tensorflow/models.git # python3 classify_image.py # cd ; git clone https://github.com/tensorflow/tensorflow.git # cd tensorflow/ # python3 tensorflow/examples/speech_commands/train.py # watch -n 1 /opt/rocm/bin/rocm-smi ==================== ROCm System Management Interface ==================== ================================================================================ GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD 0 35c 21.82W 1210Mhz 300Mhz 0.0% auto 0% 0% ================================================================================ ==================== End of ROCm SMI Log ==================== 2018-09-02 10:40:10.368117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties: name: Device 67ef AMDGPU ISA: gfx803 memoryClockRate (GHz) 1.21 pciBusID 0000:17:00.0 Total memory: 2.00GiB Free memory: 1.75GiB Adding visible gpu devices: 0 Device interconnect Created TensorFlow device (/job:localhost/replica:0/task:0/device: GPU:0 with 1567 MB memory) -> physical GPU (device: 0, name: Device 67ef, pci bus id: 0000:17:00.0)
  • 13. AMDGPU ROCm Tensorflow 1.8 (classify_image.py) 13 # wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl # pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl # git clone https://github.com/tensorflow/models.git # python3 classify_image.py 2018-09-02 10:40:10.368117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties: name: Device 67ef AMDGPU ISA: gfx803 memoryClockRate (GHz) 1.21 pciBusID 0000:17:00.0 Total memory: 2.00GiB Free memory: 1.75GiB 2018-09-02 10:40:10.368135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0 2018-09-02 10:40:10.368153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-02 10:40:10.368162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995] 0 2018-09-02 10:40:10.368175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0: N 2018-09-02 10:40:10.368207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1567 MB memory) -> physical GPU (device: 0, name: Device /opt/rocm/miopen/share/miopen/db/gfx803_14.cd.pdb.txt giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107) indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779) lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296) custard apple (score = 0.00147) earthstar (score = 0.00117) #
  • 14. AMDGPU ROCm Tensorflow 1.8 (speech_commands/train.py) 14 # git clone https://github.com/tensorflow/tensorflow.git # cd tensorflow/ # python3 tensorflow/examples/speech_commands/train.py 2018-09-02 10:43:36.924800: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA AMDGPU ISA: gfx803 memoryClockRate (GHz) 1.21 pciBusID 0000:17:00.0 Total memory: 2.00GiB Free memory: 1.75GiB : INFO:tensorflow:Step #1: rate 0.001000, accuracy 9.0%, cross entropy 2.724346 INFO:tensorflow:Step #2: rate 0.001000, accuracy 9.0%, cross entropy 2.521507 : INFO:tensorflow:Saving to "/tmp/speech_commands_train/conv.ckpt-4300" INFO:tensorflow:Step #4301: rate 0.001000, accuracy 65.0%, cross entropy 1.094288 INFO:tensorflow:Step #4302: rate 0.001000, accuracy 69.0%, cross entropy 0.876309 : # /opt/rocm/bin/rocm-smi GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD 0 52c 44.230W 1172Mhz 1750Mhz 0.0% auto 0% 0% # top top - 10:58:10 up 25 min, 2 users, load average: 1.51, 1.29, 0.89 Tasks: 222 total, 2 running, 220 sleeping, 0 stopped, 0 zombie %Cpu0 : 6.2 us, 1.7 sy, 0.0 ni, 92.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 5.6 us, 2.8 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 8.3 us, 3.1 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 6.4 us, 2.7 sy, 0.0 ni, 90.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 9.8 us, 3.7 sy, 0.0 ni, 86.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 8.4 us, 3.0 sy, 0.0 ni, 88.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 5.4 us, 2.3 sy, 0.0 ni, 92.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 3.4 us, 2.0 sy, 0.0 ni, 94.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 3.4 us, 1.7 sy, 0.0 ni, 94.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 3.7 us, 1.7 sy, 0.0 ni, 94.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 6.0 us, 2.7 sy, 0.0 ni, 91.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 4.4 us, 2.0 sy, 0.0 ni, 93.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
  • 16. In-Memory Computing for FASTDATA using fio with RAMDISK(DDR4) 16 # uname -sr; cat /etc/lsb-release Linux 4.13.0-21-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # lshw -c cpu product: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz # lshw -class memory description: DIMM DDR4 Synchronous 2666 MHz (0.4 ns) # mkdir /ramdisk # mount -t tmpfs tmpfs /ramdisk # fio -directory=/ramdisk -rw=read -bs=* -size=1G -numjobs=16 -runtime=10 -group_reporting -name=data 64GB RAMDISK (fio block size: Bytes) with Core i7-7800X OverClocked 5GHz 19.9M IOPS 18.6M IOPS 16.3M IOPS 12.6M IOPS 7.8M IOPS 4.6M IOPS 2.4M IOPS 1.2M IOPS (Bytes)
  • 17. How To Configure NVMe over Fabrics using MLNX_OFED <DRAFT> 17 NVME Target Configuration # ./mlnxofedinstall --add-kernel-support --with-nvmf # modprobe mlx5_core # modprobe nvmet # modprobe nvmet-rdma # modprobe nvme-rdma # mkdir /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name # cd /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name # echo 1 > attr_allow_any_host # mkdir namespaces/10 # cd namespaces/10 # echo -n /dev/nvme0n1> device_path # echo 1 > enable # mkdir /sys/kernel/config/nvmet/ports/1 # cd /sys/kernel/config/nvmet/ports/1 # ip addr add 1.1.1.1/24 dev enp2s0f0 # echo 1.1.1.1 > addr_traddr # echo rdma > addr_trtype # echo 4420 > addr_trsvcid # echo ipv4 > addr_adrfam # ln -s /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name /sys/kernel/config/nvmet/ports/1/subsystems/nvme-subsystem-name NVMe Client (Initiator) Configuration # ./mlnxofedinstall --add-kernel-support --with-nvmf # modprobe mlx5_core # modprobe nvme-rdma # git clone https://github.com/linux-nvme/nvme-cli.git # cd nvme-cli # make # make install # nvme discover -t rdma -a 1.1.1.1 -s 4420 # nvme connect -t rdma -n nvme-subsystem-name -a 1.1.1.1 -s 4420 # nvme disconnect -d /dev/nvme0n1
  • 18. Intel SPDK(Storage Performance Development Kit) benchmark 18 # uname -sr; Linux 4.10.0-40-generic # apt-get install libnuma-dev git uuid-dev libaio-dev libcunit1-dev libcunit1 libssl-dev g++ -y # cd /opt/; git clone https://github.com/axboe/fio # cd fio; git checkout -b fio-2.21 # make; make install # cd /opt/; git clone https://github.com/spdk/spdk # cd sdpk; git submodule update --init # ./configure --with-fio=/opt/fio/ # make # /opt/spdk/scripts/setup.sh # fio --name=nvme --numjobs=8 --filename="trtype=PCIe traddr=0000.01.00.0 ns=1" --bs=4K --iodepth=4 --ioengine=/opt/spdk/examples/nvme/fio_plugin/fio_plugin --group_reporting --size=50% --runtime=100 --thread=8 --rw=read nvme: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=spdk, iodepth=4 ... fio-3.2-19-g609ac1 Starting 8 threads Starting DPDK 17.11.0 initialization... [ DPDK EAL parameters: fio -c 0x1 -m 512 --file-prefix=spdk_pid18356 ] EAL: Detected 8 lcore(s) EAL: No free hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: PCI device 0000:01:00.0 on NUMA socket 0 EAL: probe driver: 8086:2700 spdk_nvme nvme: (groupid=0, jobs=8): err= 0: pid=18367: Mon Nov 27 15:36:06 2017 read: IOPS=572k, BW=2236MiB/s (2345MB/s)(218GiB/100001msec) slat (nsec): min=91, max=471828, avg=200.94, stdev=122.85 clat (usec): min=9, max=13319, avg=55.44, stdev= 7.84 lat (usec): min=14, max=13319, avg=55.64, stdev= 7.84 clat percentiles (usec): | 1.00th=[ 48], 5.00th=[ 50], 10.00th=[ 50], 20.00th=[ 51], | 30.00th=[ 52], 40.00th=[ 53], 50.00th=[ 53], 60.00th=[ 54], | 70.00th=[ 56], 80.00th=[ 60], 90.00th=[ 64], 95.00th=[ 67], | 99.00th=[ 88], 99.50th=[ 91], 99.90th=[ 100], 99.95th=[ 111], | 99.99th=[ 121] bw ( KiB/s): min=242664, max=310392, per=12.50%, avg=286296.77, stdev=11653.87, samples=1592 iops : min=60666, max=77598, avg=71574.18, stdev=2913.46, samples=1592 lat (usec) : 10=0.01%, 20=0.01%, 50=9.44%, 100=90.46%, 250=0.09% lat (usec) : 500=0.01%, 750=0.01% lat (msec) : 2=0.01%, 20=0.01%
  • 19. In-Memory Database Registration Performance Check (Intel vs AMD) 19 Purley# uname -sr; cat /etc/redhat-release Linux 3.10.0-514.el7.x86_64 CentOS Linux release 7.3.1611 (Core) Purley# grep proc /proc/cpuinfo | wc -l 48 Purley# lscpu Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz RYZEN# uname -sr; cat /etc/debian_version Linux 4.10.0-19-generic stretch/sid RYZEN# grep proc /proc/cpuinfo | wc -l 16 RYZEN# lscpu Model name: AMD Ryzen 7 1800X Eight-Core Processor redisはデータサイズに応じてプロセスあたりの処理性能に低下が確認できる
  • 20. In-Memory Database Performance Check 20 Intel Purley AMD Ryzen Xeon Phi(KNL) # uname -sr; cat /etc/redhat-release Linux 3.10.0-514.el7.x86_64 CentOS Linux release 7.3.1611 (Core) # grep proc /proc/cpuinfo | wc -l 48 # lscpu Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
  • 21. ALL FLASH DATACENTER & IN-MEMORY COMPUTING: HOT TOPICS 21 SOURCE: SAKURA Internet Research Center. (2017/10), Project Sprig.
  • 22. ClickHouse column-oriented database Install memo 22 # uname -sr; cat /etc/issue Linux 4.10.0-35-generic Ubuntu 17.04 # apt install software-properties-common # apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # apt-add-repository "deb http://repo.yandex.ru/clickhouse/trusty stable main" # apt-get update # apt-get install clickhouse-server-common clickhouse-client -y # service clickhouse-server start # clickhouse-client --multiline ClickHouse client version 1.1.54304. Connecting to localhost:9000. Connected to ClickHouse server version 1.1.54304. :) CREATE TABLE ontime ( Year UInt16, Quarter UInt8, Month UInt8, : Div5TailNum String ) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192); or # xz -v -c -d < ontime.csv.xz | clickhouse-client --query="INSERT INTO ontime FORMAT CSV"
  • 23. MariaDB ColumnStore column-oriented database Install memo 23 # uname -sr; cat /etc/redhat-release Linux 3.10.0-514.el7.x86_64 Red Hat Enterprise Linux Server release 7.4 (Maipo) # mkdir mcs; cd mcs; # wget https://downloads.mariadb.com/ColumnStore/1.0.11/centos/x86_64/7/mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz # tar xzvf ./mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz # yum install boost boost-devel boost-doc expect perl-DBD-MySQL -y # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-client.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-server.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-libs.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-shared.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-client.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-server.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-platform.rpm -vh # rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-storage-engine.rpm -vh # /usr/local/mariadb/columnstore/bin/postConfigure Select the type of System Server install [1=single, 2=multi] (2) > 1 Enter System Name (columnstore-1) > sprig-1 Select the type of Data Storage [1=internal, 2=external, 3=GlusterFS] (1) > 1 Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) > 1 # . /usr/local/mariadb/columnstore/bin/columnstoreAlias # mcsadmin MariaDB ColumnStore Admin Console enter 'help' for list of commands enter 'exit' to exit the MariaDB ColumnStore Command Console use up/down arrows to recall commands mcsadmin>
  • 25. Quagga with ROUTE_MULTIPATH (memo) 25 # uname -sr; cat /etc/lsb-release Linux 4.13.0-21-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # grep ROUTE_MULTIPATH /usr/src/*/.config CONFIG_IP_ROUTE_MULTIPATH=y # apt-get install -y quagga traceroute # vi /etc/sysctl.conf net.ipv4.conf.all.forwarding=1 net.ipv4.fib_multipath_hash_policy = 1 net.ipv4.conf.all.arp_announce = 2 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.default.arp_filter = 1 net.ipv6.conf.all.forwarding=1 net.ipv6.route.max_size = 32768 net.ipv6.xfrm6_gc_thresh = 32768 # touch /etc/quagga/zebra.conf # touch /etc/quagga/ospfd.conf # touch /etc/quagga/ospf6d.conf # chown quagga.quaggavty /etc/quagga/*.conf # chmod 640 /etc/quagga/*.conf # ufw disable # vi /etc/quagga/daemons zebra=yes ospfd=yes ospf6d=yes # echo VTYSH_PAGER=more >> /etc/environment # sync; sync; sync; reboot # vtysh Quagga with ROUTE_MULTIPATH
  • 26. My First XDP (eXpress Data Path) 26 # uname -sr; cat /etc/lsb-release Linux 4.13.0-21-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # apt install -y make gcc libssl-dev bc libelf-dev libcap-dev clang # apt install -y gcc-multilib llvm libncurses5-dev git bison flex pkg-config # apt install -y libmnl0 libmnl-dev clang libasm1 libasm-dev # mkdir /usr/local/include/asm # ln -s /usr/include/x86_64-linux-gnu/asm/* /usr/local/include/asm # git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git # ./configure --prefix=/sbin # cd iproute2/ # make; make install # vi xdp_example.c #include <linux/bpf.h> #ifndef __section # define __section(NAME) __attribute__((section(NAME), used)) #endif __section("prog") int xdp_drop(struct xdp_md *ctx) { return XDP_DROP; } char __license[] __section("license") = "GPL"; # clang -O2 -Wall -target bpf -c xdp_example.c -o xdp_example.o # ip link set dev eth0 xdp obj xdp_example.o # ip link set dev eth0 xdp of SOURCE: https://github.com/torvalds/linux/tree/master/samples/bpf, http://cilium.readthedocs.io/en/latest/bpf/#llvm, http://vger.kernel.org/netconf2017_files/XDP_devel_update_NetConf2017_Seoul.pdf, http://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html, https://netdevconf.org/1.2/slides/oct7/10_nic_viljoen_eBPF_Offload_to_Hardware__cls_bpf_and_XDP_finalised.pdf, https://people.netfilter.org/hawk/presentations/NetDev2.2_2017/XDP_for_the_Rest_of_Us_Part_2.pdf, XDP – eXpress Data Path
  • 27. My First F-Stack 27 #lscpu Model name: AMD Ryzen Threadripper 1900X 8-Core Processor # uname -sr; cat /etc/lsb-release Linux 4.10.0-35-generic DISTRIB_CODENAME=zesty DISTRIB_DESCRIPTION="Ubuntu 17.04" # cd /opt # git clone https://github.com/F-Stack/f-stack.git # /opt/f-stack/dpdk/tools/dpdk-setup.sh [15] x86_64-native-linuxapp-gcc Option: 15 # echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # mkdir /mnt/huge # mount -t hugetlbfs nodev /mnt/huge # echo 0 > /proc/sys/kernel/randomize_va_space # modprobe uio # insmod /opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/kmod/igb_uio.ko # insmod /opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/kmod/rte_kni.ko # export FF_PATH=/opt/f-stack/ # export FF_DPDK=/opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/ # cd /root/f-stack/lib # make ; make ; make ; make install # cd /opt/f-stack/app/nginx-1.11.10 # ./configure --prefix=/usr/local/nginx_fstack --with-ff_module --without-http_rewrite_module # make # make install # grep f-stack /usr/local/nginx_fstack/conf/nginx.conf fstack_conf f-stack.conf; # grep addr /usr/local/nginx_fstack/conf/f-stack.conf addr=192.168.1.2 Copyright © 2018. Tencent Cloud All rights reserved.
  • 28. My First FD.io VPP (Segment Routing for IPv6 / L3VPN for IPv4 traffic) 28 # uname -sr; cat /etc/lsb-release Linux 4.13.0-21-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # vi /etc/apt/sources.list.d/99fd.io.list deb [trusted=yes] https://nexus.fd.io/content/repositories/fd.io.ubuntu.xenial.main/ ./ # apt-get update # apt-get install -y vpp-lib vpp vpp-plugins # service vpp start # service vpp status ● vpp.service - vector packet processing engine Loaded: loaded (/lib/systemd/system/vpp.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2018-02-13 09:30:25 JST; 21s ago : CGroup: /system.slice/vpp.service mq2011 /usr/bin/vpp -c /etc/vpp/startup.conf # vppctl vpp# set sr encaps source addr C1:: vpp# sr policy add bsid C1::999:2 next C2:: next C4::4 encap vpp# sr steer l3 1.1.1.0/24 via sr policy bsid C1::999:2 : vpp# sr localsid address C4::4 behavior end.dx4 GigabitEthernet0/6/0 1.1.1.1 vpp# show sr localsid SRv6 - My LocalSID Table: ========================= Address: c4::4 Behavior: DX4 (Endpoint with decapsulation and IPv4 cross-connect) Iface: GigabitEthernet0/6/0 Next hop: 1.1.1.1 SOURCE: VPP/Segment Routing for IPv6 (https://wiki.fd.io/view/VPP/Segment_Routing_for_IPv6) © 2017 FD.io is a Linux Foundation Project. All Rights Reserved.
  • 29. FD.io VPP with XeonPhi (Basic Configuration) 29 # uname -sr; cat /etc/lsb-release Linux 4.13.0-21-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # lscpu CPU(s): 256 Model name: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz # vi /etc/apt/sources.list.d/99fd.io.list deb [trusted=yes] https://nexus.fd.io/content/repositories/fd.io.ubuntu.xenial.main/ ./ # apt-get update # apt install vpp vpp-lib vpp-plugins python-pip # pip install vpp-config # vpp-config 5) Execute some basic tests. Command: 5 1) List/Create Simple IPv4 Setup Command: 1 Would you like to keep this configuration [Y/n]? n Would you like add address to interface GigabitEthernet4/0/1 [Y/n]? Y Please enter the IPv4 Address [n.n.n.n/n]: 1.1.1.11/24 # vi /etc/vpp/startup.conf unix { nodaemon log /var/log/vpp/vpp.log full-coredump cli-listen /run/vpp/cli.sock exec /usr/local/vpp/vpp-config/scripts/set_int_ipv4_and_up } # sync; sync; sync; reboot © 2017 FD.io is a Linux Foundation Project. All Rights Reserved. # vppctl # show int Name Idx State GigabitEthernet4/0/1 1 up # show int addr GigabitEthernet4/0/1 (up): 1.1.1.11/24
  • 30. FD.io VPP with XeonPhi (Load Balancer plugin) 30 # vppctl # show int addr GigabitEthernet4/0/1 (up): 1.1.1.11/24 # lb conf ip4-src-address 1.1.1.11 timeout 3 # lb vip 1.2.3.4/32 encap gre4 new_len 1024 # lb as 1.2.3.4/32 1.1.1.8 1.1.1.9 1.1.1.10 # show lb vips 1.2.3.4 ip4-gre4 1.2.3.4/32 new_size:1024 #as:3 Application Server(1.1.1.8,9,10) side Configuration # ip tunnel add tun0 mode gre local 1.1.1.8 remote 1.1.1.11 ttl 255 # ifconfig tun0 1.2.3.4/32 up # echo 1 > /proc/sys/net/ipv4/conf/tun0/arp_ignore # echo 2 > /proc/sys/net/ipv4/conf/tun0/arp_announce # echo 0 > /proc/sys/net/ipv4/conf/tun0/rp_filter # echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter # echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore # echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce © 2017 FD.io is a Linux Foundation Project. All Rights Reserved. 1.1.1.8 1.1.1.11 1.1.1.9 (1.2.3.4/32) tun0 (1.2.3.4/32) tun0 GRE Tunnels IP routing (1.2.3.4/32) Application Server Application Server Direct Server Responce (DSR) FD.io VPP
  • 31. Quagga with ROUTE_MULTIPATH for BGP load balancing (memo) 31 # uname -sr; cat /etc/lsb-release Linux 4.13.0-36-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # grep ROUTE_MULTIPATH /usr/src/*/.config /usr/src/linux-headers-4.13.0-36-generic/.config:CONFIG_IP_ROUTE_MULTIPATH=y # apt install -y quagga traceroute # touch /etc/quagga/zebra.conf; touch /etc/quagga/bgpd.conf; chown quagga.quaggavty /etc/quagga/*.conf # chmod 640 /etc/quagga/*.conf # ufw disable ; echo VTYSH_PAGER=more >> /etc/environment # vi /etc/quagga/daemons zebra=yes bgpd=yes # sync; sync; sync; reboot # vtysh # router bgp 65001 # bgp router-id 1.1.1.1 # bgp bestpath as-path multipath-relax # bgp bestpath compare-routerid # redistribute connected # neighbor 1.1.1.2 remote-as 65002 # neighbor 1.1.1.3 remote-as 65003 # maximum-paths 64 # interface lo # ip address 1.2.3.4/24 # router bgp 65002 # bgp router-id 1.1.1.2 # bgp bestpath as-path multipath-relax # bgp bestpath compare-routerid # redistribute connected # neighbor 1.1.1.1 remote-as 65001 # maximum-paths 64 # interface lo # ip address 1.2.3.4/24 # router bgp 65003 # bgp router-id 1.1.1.3 # bgp bestpath as-path multipath-relax # bgp bestpath compare-routerid # redistribute connected # neighbor 1.1.1.1 remote-as 65001 # maximum-paths 64 # show ip bgp BGP table version is 0, local router ID is 1.1.1.1 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, Network Next Hop Metric LocPrf Weight Path *> 1.2.3.0/24 1.1.1.2 0 0 65002 *= 1.1.1.3 0 0 65003
  • 32. FD.io VPP tap-inject with sample_plugins 32 © 2017 FD.io is a Linux Foundation Project. All Rights Reserved. # uname -sr; cat /etc/lsb-release Linux 4.13.0-37-generic DISTRIB_DESCRIPTION="Ubuntu 17.10" # echo VTYSH_PAGER=more >> /etc/environment # apt install -y quagga # touch /etc/quagga/zebra.conf # touch /etc/quagga/bgpd.conf # chown quagga.quaggavty /etc/quagga/*.conf # chmod 640 /etc/quagga/*.conf # ufw disable # vi /etc/quagga/daemons zebra=yes bgpd=yes # sync; sync; sync; reboot # apt install build-essential -y # cd /opt/ # git clone https://gerrit.fd.io/r/vpp # git clone https://gerrit.fd.io/r/vppsb # cd /opt/vpp # ./extras/vagrant/build.sh # make install-dep; make bootstrap; make build # vi /opt/vppsb/router/router/tap_inject_node.c #include <sys/uio.h> # ln -sf /opt/vppsb/netlink # ln -sf /opt/vppsb/router # ln -sf /opt/vppsb/netlink/netlink.mk build-data/packages/ # ln -sf /opt/vppsb/router/router.mk build-data/packages/ # cd build-root/ # make V=0 PLATFORM=vpp TAG=vpp_debug netlink-install router-install # dpkg -i *.deb # cp -p /opt/vpp/build-root/install-vpp_debug-native/router/lib64/router.so.0.0.0 /usr/lib/vpp_plugins/router.so # service vpp restart # vppctl enable tap-inject # vppctl show tap-inject GigabitEthernet13/0/0 -> vpp1 GigabitEthernetb/0/0 -> vpp0 # vtysh (quagga) # configure terminal (config)# interface vpp0 (config-if)# ip address 192.168.11.100/24 (config-if)# exit (config)# exit # write # quit # vppctl show int addr GigabitEthernetb/0/0 (up): L3 192.168.11.100/24 L3 fe80::20c:29ff:fe24:af28/64 # /opt/vpp/src/examples/sample-plugin # libtoolize # aclocal # autoconf # autoheader # automake --add-missing # chmod +x configure # ./configure # make # make install GigabitEthernetb/0/0 vpp0 vpp_plugins / router.so vpp_plugins / sample_plugin.so quagga
  • 33. FD.io VPP 18.07 with Ubuntu 16.04.5 LTS (not support Ubunutu 1804) 33 # uname -sr; tail -2 /etc/lsb-release Linux 4.4.0-131-generic DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS" # apt remove --purge vpp* # vi /etc/apt/sources.list.d/99fd.io.list deb [trusted=yes] https://nexus.fd.io/content/repositories/fd.io.stable.1807.ubuntu.xenial.main/ ./ # apt update # apt dist-upgrade -y # apt install -y vpp vpp-lib vpp-plugins vpp-dpdk-dkms # vppctl show pci Address Sock VID:PID Link Speed Driver Product Name 0000:05:00.0 0 8086:1539 2.5 GT/s x1 uio_pci_generic 0000:65:00.0 0 8086:1584 8.0 GT/s x8 uio_pci_generic XL710 40GbE Controller # vi /etc/vpp/startup.conf dpdk { dev 0000:65:00.0 } # service vpp restart # service vpp status Active: active (running) since Tue 2018-09-04 18:50:02 JST; 2s ago # vppctl set int ip address FortyGigabitEthernet65/0/0 1.2.3.4/24 # vppctl set int state FortyGigabitEthernet65/0/0 up # vppctl show interface addr FortyGigabitEthernet65/0/0 (up): L3 1.2.3.4/24 # vppctl show version vpp v18.07-rc2~6-gdb6d6b3~b28 built by root on 10268b67c8b1 at Mon Jul 30 ... # vi /etc/apt/sources.list deb http://security.ubuntu.com/ubuntu bionic-security main # apt update # apt install libssl1.1 -y Download form http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc2/ # dpkg -i linux-headers-4.19.0-041900rc2_...rc2.201809022230_all.deb # dpkg -i linux-headers-4.19.0-041900rc2-...rc2.201809022230_amd64.deb # dpkg -i linux-headers-4.19.0-041900rc2-...rc2.201809022230_amd64.deb # dpkg -i linux-image-unsigned-4.19.0-......rc2.201809022230_amd64.deb # sync; sync; sync; reboot # uname -sr; tail -2 /etc/lsb-release Linux 4.19.0-041900rc2-generic DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS" # vppctl show int # (it does not works) NOTICE: does not works with kernel 4.19-rc2 © 2018 The Fast Data Project. Copyright © 2018 FD.IO Project a Series of LF Projects, LLC
  • 35. My First Intel/Movidius NCS 35 SOURCE: SAKURA Internet Rsearch Center. (2017/07) Project Sprig. $ sudo su # apt-get update ; apt-get upgrade -y # mkdir /opt/mvncsdk ; cd /opt/mvncsdk/ GoTo: https://developer.movidius.com/getting-started # wget https://ncs-forum-uploads.s3.amazonaws.com/ncsdk/MvNC_SDK_01_07_07/MvNC_SDK_1.07.07.tgz # tar zxvf MvNC_SDK_1.07.07.tgz ; tar zxvf MvNC_Toolkit-1.07.06.tgz ; tar xzvf ./MvNC_API-1.07.07.tgz # ./bin/setup.sh ; ./bin/data/dlnets.sh # source ~/.bashrc # cd /opt/mvncsdk/ncapi/; ./setup.sh ; cd ./c_examples/ ; make # ./ncs-fullcheck -l2 -c1 ../networks/AlexNet ../images/cat.jpg Device 0 Address: 2 - VID/PID 03e7:2150 Starting wait for connect with 2000ms timeout Found Address: 2 - VID/PID 03e7:2150 Found EP 0x81 : max packet size is 512 bytes Found EP 0x01 : max packet size is 512 bytes Found and opened device Performing bulk write of 825136 bytes... Successfully sent 825136 bytes of data in 35.764553 ms (22.002540 MB/s) Boot successful, device address 2 Found Address: 2 - VID/PID 040e:f63b done Booted 2 -> VSC OpenDevice 2 succeeded Graph allocated : $ uname -sr; Linux 4.8.0-36-generic (Ubuntu 16.04.02) $ lsusb -v Device Descriptor: iProduct 2 Movidius MA2X5X MaxPower 500mA © Copyright Movidius 2017. All Rights Reserved.
  • 36. UP Board AI Core Configuration memo 36 # uname -sr; cat /etc/lsb-release Linux 4.4.0-116-generic DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS" # lshw *-pci:1 *-usb description: USB controller product: FL1100 USB 3.0 Host Controller *-usbhost:1 *-usb UNCLAIMED description: Generic USB device product: Movidius MA2X5X vendor: Movidius Ltd. # git clone -b ncsdk2 http://github.com/Movidius/ncsdk && cd ncsdk && make install # export PYTHONPATH="${PYTHONPATH}:/opt/movidius/caffe/python" # cd /examples/tensorflow/inception_v3 # cat run.py image_filename = path_to_images + 'nps_electric_guitar.png' devices = mvnc.enumerate_devices() # python3 run.py Number of categories: 1001 Start download to NCS... ******************************************************************************* inception-v3 on NCS ******************************************************************************* 547 electric guitar 0.988281 403 acoustic guitar 0.00751877 715 pick, plectrum, plectron 0.0014801 421 banjo 0.000901222 820 stage 0.000654221 ******************************************************************************* Finished Copyright 2018 Up Board | All Rights Reserved
  • 37. USB 3.0 CAPTURE HDMI 4K with Loop-through for Image redistribution 37 # uname -sr; tail -1 /etc/redhat-release Linux 3.10.0-862.9.1.el7.x86_64 CentOS Linux release 7.4.1708 (Core) # yum install -y usbutils hwinfo mplayer v4l-utils ffmpeg git # lsusb -t /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 5000M |__ Port 4: Dev 2, If 9, Class=Human Interface Device, Driver=usbhid, 5000M # lsusb -vv # hwinfo --usb # v4l2-ctl --list-devices USB Capture HDMI 4K+ (usb-0000:00:14.0-4): /dev/video0 # v4l2-ctl -d /dev/video0 --info # v4l2-ctl --list-formats-ext -d /dev/video0 Type : Video Capture Name : YUV 4:2:2 (YUYV) Size: Discrete 4096x2160 Interval: Discrete 0.017s (60.000 fps) # wget https://libav.org/releases/libav-12.3.tar.xz # tar Jxvf ./libav-12.3.tar.xz; cd libav-12.3 # ./configure --disable-yasm; make; make install # avconv -f video4linux2 -input_format nv12 -s 1920x1080 -i /dev/video0 -qscale 10 out.mpeg Input #0, video4linux2, from '/dev/video0': Duration: N/A, start: 1240.062083, bitrate: 1492992 kb/s nv12, 1920x1080, 1492992 kb/s 60 fps, 1000k tbn # ffmpeg -f v4l2 -list_formats all -i /dev/video0 [video4linux2,v4l2 @ 0x24114c0] Raw : yuyv422 : YUV 4:2:2 (YUYV) : 640x360 640x480 720x480 720x576 768x576 800x600 856x480 960x540 1024x576 1024x768 1280x720 1280x800 1280x960 1280x1024 1368x768 1440x900 1600x1200 1680x1050 1920x1080 1920x1200 2048x1080 2560x1440 3840x2160 4096x2160 [video4linux2,v4l2 @ 0x24114c0] Raw : nv12 : YUV 4:2:0 (NV12) : 640x360 640x480 720x480 720x576 768x576 800x600 856x480 960x540 1024x576 1024x768 1280x720 1280x800 1280x960 1280x1024 1368x768 1440x900 1600x1200 1680x1050 1920x1080 1920x1200 2048x1080 2560x1440 3840x2160 4096x2160 © 2018, Nanjing Magewell Electronics Co., Ltd SERVERSERVER HDMI output SERVER SERVER HDMI CAPTURE SERVER HDMI CAPTURE HDMI Loop-Trough HDMI Loop-Trough HDMI Loop-TroughHDMI Loop-Trough POWER ON/OS DOWN POWER ON/OS DOWN HDMI CAPTURE HDMI CAPTURE USB 3.0 BUS POWER USB 3.0 BUS POWER USB 3.0 BUS POWER USB 3.0 BUS POWER USB 3.0 BUS POWER ORIGINAL READ once COPY COPY
  • 38. AMD Threadripper 1900X overview/spec 38 # uname -sr Linux 4.10.0-19-generic # vi /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="pci=noaer" # update-grub; sync; sync; sync; reboot # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 : Model name: AMD Ryzen Threadripper 1900X 8-Core Processor CPU MHz: 3800.000 CPU max MHz: 3800.0000 CPU min MHz: 2200.0000 BogoMIPS: 7585.39 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-15 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx cpb hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
  • 39. AMD EPYC 7251 overview/spec 39 # uname -sr ; cat /etc/redhat-release Linux 3.10.0-693.5.2.el7.x86_64 CentOS Linux release 7.4.1708 (Core) # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7251 8-Core Processor Stepping: 2 CPU MHz: 1200.000 CPU max MHz: 2100.0000 CPU min MHz: 1200.0000 BogoMIPS: 4199.47 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 4096K NUMA node0 CPU(s): 0,1,16,17 NUMA node1 CPU(s): 2,3,18,19 NUMA node2 CPU(s): 4,5,20,21 NUMA node3 CPU(s): 6,7,22,23 NUMA node4 CPU(s): 8,9,24,25 NUMA node5 CPU(s): 10,11,26,27 NUMA node6 CPU(s): 12,13,28,29 NUMA node7 CPU(s): 14,15,30,31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb hw_pstate avic fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov succor smca
  • 40. Appendix: SSD DC P4800X and cost/performance analysis 40 (Cost) 512GB DDR4 DRAM (2666Hz/ECC) $6,399.68 SOURCE: © 2016 Colfax International. , © 2000-2017 Newegg Inc. / SAKURA Internet Research Center. (08/2017) Project Sprig. 750GB 3D XPOINT/NVMe SSD (P4800X x2) $3,790.00 30GB (300M records) sort: 296 sec 200GB (2,000M records) sort: 4,648 sec 192GB DDR4 DRAM (2666Hz/ECC) $2,399.88- In-Memory Computing All Flash Computing Processing Size: 6.7x Processing Cost: 1.5x Processing Time: 15x In-Memory Computing # gensort -a 2000000000 test # time sort --parallel=52 -T /memdrv test -o out # gensort -a 300000000 test # time sort --parallel=52 -T /ramdisk test -o out Processing Size: 2.7x Processing Cost: 2.7x Processing Time: N/A
  • 41. Appendix: stream_openmp performance check 41 Xeon Phi AMD RYZEN Xeon Xeon Special Thanks: Takefumi Miyoshi
  • 42. Appendix: Network Application Benchmark result (iperf with 20 servers) 42
  • 43. How to measure your dataflow using fio, pktgen and bandwidthTest 43 WRITE: 12,648MB/s (bs=256KB) READ: 13,793MB/s (bs=256KB) RAMDISK DDR4 2133MHz 16GB x4 40Mpps (pkt=64B) 2,560MB/s 40GbE (max rate) 5,000MB/s Mellanox Connect X-4 Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz WRITE: 12,648MB/s (bs=256KB) READ: 13,793MB/s (bs=256KB) RAMDISK DDR4 2133MHz 16GB x4 Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz # cd /opt # git clone git://dpdk.org/dpdk # git clone git://dpdk.org/apps/pktgen-dpdk export RTE_SDK=/opt/dpdk export RTE_TARGET=x86_64-native-linuxapp-gcc # sysctl vm.nr_hugepages=2048 # cd /opt/dpdk # make install T=x86_64-native-linuxapp-gcc # /opt/dpdk/usertools/dpdk-devbind.py -u 0b:00.0 # /opt/dpdk/usertools/dpdk-devbind.py -u 13:00.0 # /opt/dpdk/usertools/dpdk-devbind.py -b igb_uio 0b:00.0 # /opt/dpdk/usertools/dpdk-devbind.py -b igb_uio 13:00.0 # /opt/dpdk/usertools/dpdk-devbind.py --status # cd /opt/pktgen-dpdk/ # make # /opt/pktgen-dpdk/tools/setup.sh # /opt/pktgen-dpdk/app/x86_64-native-linuxapp-gcc/pktgen -- -m "1.0, 2.1" Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz Host to Device: 6,029MB/s Device to Host: 6,448MB/s GeForce GTX 1050 WRITE: 2,000MB/s (bs=4KB) READ: 2,500MB/s (bs=4KB) IntelOpate 900P (3DXP) WRITE: 2,000MB/s (bs=4KB) READ: 2,500MB/s (bs=4KB) IntelOpate 900P (3DXP) # mount -t tmpfs -o size=32G tmpfs /ramdisk # fio --directory=/ramdisk --rw=write --bs=4k --size=1G --numjobs=3 --runtime=100 --group_reporting --name=data # bash cuda_9.1.85_387.26_linux.run --silent --toolkit --override --no-opengl-libs --driver : # cd NVIDIA_CUDA-9.1_Samples/1_Utilities/bandwidthTest # ./bandwidthTest