計算機性能の限界点とその考え方

計算機性能の限界点と
その考え方
2018/11/29 さくらインターネット株式会社さくらインターネット研究所上級研究員 / 松本直人
(C) Copyright 1996-2017 SAKURA Internet Inc
次世代データベース研究報告より

サーバー/ストレージにおけるストレージネットワークを介したデータ処理の流れ
2
CPU
40/100Gbit/s NIC
CPUCPU
PCI Express 3.0
OS(kernel)
Application
OS(kernel)
PCI Express 3.0
HDD/SSD
OS(kernel)
CPU
OS(kernel)
40/100Gbit/s NIC
CPU
PCI Express 3.0
OS(kernel)
Application
データ参照
データ提供
結果出力
サーバー/ストレージにおけるデータ処理の流れ
DRAM DRAM
データ処理データ処理
長大なデータ処理の流れから「首長竜」に例えられる
サーバー間でのデータ通信には長大なデータ処理が介在する (技術的な基礎理解)

単位時間(秒)におけるデータサイズと処理性能の比較
3
40Gbit/s Ethernet (DPDK)
47M pps (64Bytes)*
fio RAMDSIK (DDR4)
19M iops (128Bytes)***
NVMe SSD (U.2)
1-3Miops (4KBytes)***
redis GET (localhost/DRAM)
2M rps (2Bytes)**
40Gbit/s Ethernet (line-rate)
3M pps (1500Bytes)*
Apache Ignite (CPU)
250Kops****
SOURCE: Linux 40GbE DPDK Performance / High Speed Packet Processing with Terminator 5 /Chelsio Communications Inc. (2015)*,
redis-benchmark with AMD RYZEN 1800X Intel Kaby Lake (i7-7700K) memo [GET rates: / SAKURA Internet Research Center. (2017/05)**,
SAKURA Internet Research Center Lab test results (2017)***, Apache Ignite on Intel Core i7 (4.5GHz)****,
R = randint(0,100,600000000); a = cp.array(R, dtype=np.uint8) 2.27 sec ; cp.sort(a) 0.54 sec; *****
SAKURA Internet Research Center (2018/05)
cupy.sort (GPU DDR5)
214Mops (uint8)***** 214Mops/byte (GPU)
732kpps/byte (CPU)
148kpps/byte (CPU)
2kpps/byte (NIC)
多量に高速演算処理が必要な場合、高速メモリと演算器を密結合させた構成が良い
1Mrps/byte (CPU)
750iops/byte (CPU)
単位を揃える→
(Ops/byte)

キャッシュ/サービス(API)
従来型の情報共有システムの問題点と次世代データベース領域の課題整理
4
リクエスト振分処理
不特定多数の参照ユーザー(80%)
不特定少数の投稿ユーザー(20%)
データベース/ストレージ
恒久的な
データ保存
整合性チェック
アーカイブ処理
力技と物量による問題解決(現在)
データ参照向けキャッシュ効率最適化/リクエスト振分処理等の改善は今後も課題
データ処理
ライフサイクル
(例)
複雑かつ
高コスト

How to measure your dataflow using Apache Ignite
6
Intel Core i7 (4.5GHz)
AMD Threadripper (3.8GHz)
AMD EPYC (2.1GHz)
(Operations/sec)
Apache Ignite Benchmark
シングルスレッドに特化したプロセスにはCPUクロック性能の高い環境が良い

How to measure your dataflow using cupy & numpy (NVIDIA GPU)
7
SOURCE: SAKURA Internet Research Center. (04/2018) Project Sprig.
import time
import cupy as cp
import numpy as np
from numpy.random import *
start = time.time()
R = randint(0,100,600000000)
end = time.time()
print ( end - start )
start = time.time()
a = np.array(R, dtype=np.uint8)
end = time.time()
start = time.time()
np.sort(a)
end = time.time()
import time
import cupy as cp
import numpy as np
from numpy.random import *
start = time.time()
R = randint(0,100,600000000)
end = time.time()
start = time.time()
a = cp.array(R, dtype=cp.uint8)
end = time.time()
start = time.time()
cp.sort(a)
end = time.time()
性能比較
numpy
(CPU)
cupy
(GPU)
# apt install python-pip
# pip install --upgrade pip
# pip install --upgrade setuptools
# pip install numpy cupy time
# python
R = randint(0,100,600000000)
R = randint(0,100,600000000)
a = cp.array(R, dtype=np.uint8) 2.27 sec
a = np.array(R, dtype=np.uint8) 0.46 sec
cp.sort(a)
np.sort(a)
numpy
(CPU)
cupy
(GPU)
5.36 sec 15.1sec
5.36 sec 0.54 sec
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off | N/A |
| 29% 27C P8 N/A / 65W | 1205MiB / 1997MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Time (Lower is better)

ROCm with dGPU(AMD GPU) using pyopencl
8
# uname -sr; cat /etc/lsb-release
Linux 4.4.0-116-generic
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS" ( ROCm does not support 17.10)
# lscpu
Model name: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
# lspci | grep VGA
65:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf)
ROCm Platform Supports Two Graphics Core Next (GCN) GPU Generations
GFX8: Radeon RX 480,Radeon RX 470,Radeon RX 460,R9 Nano,Radeon R9 Fury,Radeon R9 Fury X
Radeon Pro WX7100, FirePro S9300 x2 Radeon Vega Frointer Edition, Radeon Instinct: MI6, MI8, and MI25
(https://rocm.github.io/hardware.html)
# apt update
# apt dist-upgrade -y
# apt-get install -y libnuma-dev
# wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
# sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list'
# apt update
# apt-get install -y rocm-dkms
# ln -s /opt/rocm/opencl/lib/x86_64/libOpenCL.so.1 /usr/lib/libOpenCL.so
# usermod -a -G video $LOGNAME
# sync; sync; sync; reboot
# /opt/rocm/opencl/bin/x86_64/clinfo
Platform Version: OpenCL 2.1 AMD-APP.internal (2576.0)
Platform Name: AMD Accelerated Parallel Processing
# apt install python-pip opencl-headers -y
# pip install --upgrade pip
# pip install --upgrade setuptools
# pip install pyopencl
Successfully installed pyopencl-2018.1.1
>>> import numpy as np
>>> import pyopencl as cl
>>> from pyopencl import array as clarray
>>> from pyopencl import algorithm as clalg
>>> ctx = cl.create_some_context(0)
>>> queue = cl.CommandQueue(ctx)
>>> R = np.random.randint(0, 99, 100000000).astype(np.int8)
>>> a = clarray.to_device(queue, R)
>>> b = clalg.copy_if(a, 'ary[i] >= 55')
>>> print b

How to burn your GPU with CUDA9.1
9
DISTRIB_DESCRIPTION="Ubuntu 17.10"
# vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
# sync; sync; reboot
# apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev
# apt install libxi-dev libglu1-mesa libglu1-mesa-dev gcc-6 g++-6
Download CUDA9.1 from https://developer.nvidia.com/cuda-toolkit
# bash cuda_9.1.85_387.26_linux.run --silent --toolkit --override --no-opengl-libs --driver
# ln -s /usr/bin/gcc-6 /usr/local/cuda/bin/gcc
# ln -s /usr/bin/g++-6 /usr/local/cuda/bin/g++
# vi ~/.bashrc
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
# source ~/.bashrc
# git clone https://github.com/wilicc/gpu-burn.git
# cd gpu-burn/
# vi Makefile
NVCC=/usr/local/cuda/bin/nvcc
# make
# ./gpu_burn 1000
# watch -n 1 nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.26 Driver Version: 387.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off | N/A |
| 37% 72C P0 N/A / 65W | 1793MiB / 1997MiB | 100% Default |
+-------------------------------+----------------------+----------------------+

How to burn your GPU with CUDA9.1 (MapD Community Edition 3.4.0)
10
# apt install -y curl apt-transport-https
# useradd -U mapd
# ufw disable; ufw enable; ufw allow 9092/tcp; ufw allow 22/tcp
# curl https://releases.mapd.com/ce/mapd-ce-cuda.list | sudo tee /etc/apt/sources.list.d/mapd.list
# curl https://releases.mapd.com/GPG-KEY-mapd | sudo apt-key add -
# apt update
# apt install -y mapd
# vi ~/.bashrc
export MAPD_USER=mapd
export MAPD_GROUP=mapd
export MAPD_STORAGE=/var/lib/mapd
export MAPD_PATH=/opt/mapd
# source ~/.bashrc
# mkdir -p $MAPD_STORAGE
# chown -R $MAPD_USER $MAPD_STORAGE
# cd $MAPD_PATH/systemd
# ./install_mapd_systemd.sh
# cd $MAPD_PATH
# systemctl start mapd_server; systemctl enable mapd_server
# systemctl start mapd_web_server; systemctl enable mapd_web_server
# $MAPD_PATH/insert_sample_data
2) Flights (2008) 10k
2
# $MAPD_PATH/bin/mapdql -t
Password: HyperInteractive
mapdql> SELECT origin_city AS "Origin", dest_city AS "Destination", AVG(airtime) AS "Average Airtime" FROM flights_2008_10k
WHERE distance <= 33 GROUP BY origin_city, dest_city;
Execution time: 1268 ms, Total time: 1269 ms
SOURCE: https://www.mapd.com/platform/download-community/
+----------------------------------------------------------
| NVIDIA-SMI 387.26 Driver Version: 387.26
|-------------------------------+----------------------+---
| GPU Name Persistence-M| Bus-Id Disp.A |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |
|===============================+======================+===
| 0 GeForce GTX 1050 Off | 00000000:65:00.0 Off |
| 29% 27C P0 N/A / 65W | 1449MiB / 1997MiB |
+-------------------------------+----------------------+---
|==========================================================
| 0 5828 C /opt/mapd/bin/mapd_server
+----------------------------------------------------------
Origin|Destination|Average Airtime
West Palm Beach|Tampa|33.81818181818182
Norfolk|Baltimore|36.07142857142857
Ft. Myers|Orlando|28.66666666666667
Indianapolis|Chicago|39.53846153846154
Tampa|West Palm Beach|33.25
Orlando|Ft. Myers|32.58333333333334
Austin|Houston|33.05555555555556
Chicago|Indianapolis|32.7
Baltimore|Norfolk|31.71428571428572
Houston|Austin|29.61111111111111

ROCm with dGPU(AMD GPU) (memo)
11
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
# lscpu
Model name: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
# lspci | grep VGA
65:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf) / *[Radeon RX 460]
ROCm Platform Supports Two Graphics Core Next (GCN) GPU Generations
GFX8: Radeon RX 480,Radeon RX 470,Radeon RX 460,R9 Nano,Radeon R9 Fury,Radeon R9 Fury X Radeon Pro WX7100, FirePro S9300 x2
Radeon Vega Frointer Edition, Radeon Instinct: MI6, MI8, and MI25 (https://rocm.github.io/hardware.html)
# apt update
# apt dist-upgrade
# apt-get install -y libnuma-dev
# wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
# sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list'
# apt-get install -y rocm-dkms
# /opt/rocm/opencl/bin/x86_64/clinfo
Platform Version: OpenCL 2.1 AMD-APP.internal (2545.0)
Platform Name: AMD Accelerated Parallel Processing
# wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp
# wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cl
# g++ -I /opt/rocm/opencl/include/ ./HelloWorld.cpp -o HelloWorld -L/opt/rocm/opencl/lib/x86_64 -lOpenCL
# ./HelloWorld
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120
... 2985 2988 2991 2994 2997
Executed program succesfully.

AMDGPU ROCm Tensorflow 1.8 install memo (not support Ubuntu 1804)
12
# uname -sr; tail -2 /etc/lsb-release
DISTRIB_CODENAME=xenial
# lscpi
17:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67ef (rev cf)
# apt update
# apt dist-upgrade
# apt install -y libnuma-dev wget python3-pip
# wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | apt-key add -
# vi /etc/apt/sources.list.d/rocm.list
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main
# apt update
# apt install -y rocm-dkms
# apt install -y rocm-libs miopen-hip cxlactivitylogger
# wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# git clone https://github.com/tensorflow/models.git
# python3 classify_image.py
# cd ; git clone https://github.com/tensorflow/tensorflow.git
# cd tensorflow/
# python3 tensorflow/examples/speech_commands/train.py
# watch -n 1 /opt/rocm/bin/rocm-smi
==================== ROCm System Management Interface ====================
================================================================================
GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD
0 35c 21.82W 1210Mhz 300Mhz 0.0% auto 0% 0%
================================================================================
==================== End of ROCm SMI Log ====================
2018-09-02 10:40:10.368117:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451]
Found device 0 with
properties:
name: Device 67ef
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.21
pciBusID 0000:17:00.0
Total memory: 2.00GiB
Free memory: 1.75GiB
Adding visible gpu
devices: 0
Device interconnect
Created TensorFlow device
(/job:localhost/replica:0/task:0/device:
GPU:0 with 1567 MB memory) -> physical GPU
(device: 0, name: Device 67ef, pci bus id:
0000:17:00.0)

AMDGPU ROCm Tensorflow 1.8 (classify_image.py)
13
# wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
# git clone https://github.com/tensorflow/models.git
# python3 classify_image.py
2018-09-02 10:40:10.368117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with
properties:
name: Device 67ef
AMDGPU ISA: gfx803
pciBusID 0000:17:00.0
2018-09-02 10:40:10.368135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
2018-09-02 10:40:10.368153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect
StreamExecutor with strength 1 edge matrix:
2018-09-02 10:40:10.368162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995] 0
2018-09-02 10:40:10.368175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0: N
2018-09-02 10:40:10.368207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device
(/job:localhost/replica:0/task:0/device:GPU:0 with 1567 MB memory) -> physical GPU (device: 0, name: Device
/opt/rocm/miopen/share/miopen/db/gfx803_14.cd.pdb.txt
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89107)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00779)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00296)
custard apple (score = 0.00147)
earthstar (score = 0.00117)
#

AMDGPU ROCm Tensorflow 1.8 (speech_commands/train.py)
14
# git clone https://github.com/tensorflow/tensorflow.git
# cd tensorflow/
# python3 tensorflow/examples/speech_commands/train.py
2018-09-02 10:43:36.924800: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions
that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
AMDGPU ISA: gfx803
pciBusID 0000:17:00.0
:
INFO:tensorflow:Step #1: rate 0.001000, accuracy 9.0%, cross entropy 2.724346
:
INFO:tensorflow:Saving to "/tmp/speech_commands_train/conv.ckpt-4300"
:
# /opt/rocm/bin/rocm-smi
GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD
0 52c 44.230W 1172Mhz 1750Mhz 0.0% auto 0% 0%
# top
top - 10:58:10 up 25 min, 2 users, load average: 1.51, 1.29, 0.89
Tasks: 222 total, 2 running, 220 sleeping, 0 stopped, 0 zombie
%Cpu0 : 6.2 us, 1.7 sy, 0.0 ni, 92.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

Appendix/memo
NVMe SSD/SPDK/DRAM
15

In-Memory Computing for FASTDATA using fio with RAMDISK(DDR4)
16
# lshw -c cpu
product: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
# lshw -class memory
description: DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
# mkdir /ramdisk
# mount -t tmpfs tmpfs /ramdisk
# fio -directory=/ramdisk -rw=read -bs=* -size=1G -numjobs=16 -runtime=10 -group_reporting -name=data
64GB RAMDISK (fio block size: Bytes) with Core i7-7800X OverClocked 5GHz
19.9M IOPS 18.6M IOPS
16.3M IOPS
12.6M IOPS
7.8M IOPS
4.6M IOPS
2.4M IOPS
1.2M IOPS
(Bytes)

How To Configure NVMe over Fabrics using MLNX_OFED <DRAFT>
17
NVME Target Configuration
# ./mlnxofedinstall --add-kernel-support --with-nvmf
# modprobe mlx5_core
# modprobe nvmet
# modprobe nvmet-rdma
# modprobe nvme-rdma
# mkdir /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name
# cd /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name
# echo 1 > attr_allow_any_host
# mkdir namespaces/10
# cd namespaces/10
# echo -n /dev/nvme0n1> device_path
# echo 1 > enable
# mkdir /sys/kernel/config/nvmet/ports/1
# cd /sys/kernel/config/nvmet/ports/1
# ip addr add 1.1.1.1/24 dev enp2s0f0
# echo 1.1.1.1 > addr_traddr
# echo rdma > addr_trtype
# echo 4420 > addr_trsvcid
# echo ipv4 > addr_adrfam
# ln -s /sys/kernel/config/nvmet/subsystems/nvme-subsystem-name /sys/kernel/config/nvmet/ports/1/subsystems/nvme-subsystem-name
NVMe Client (Initiator) Configuration
# ./mlnxofedinstall --add-kernel-support --with-nvmf
# modprobe mlx5_core
# modprobe nvme-rdma
# git clone https://github.com/linux-nvme/nvme-cli.git
# cd nvme-cli
# make
# make install
# nvme discover -t rdma -a 1.1.1.1 -s 4420
# nvme connect -t rdma -n nvme-subsystem-name -a 1.1.1.1 -s 4420
# nvme disconnect -d /dev/nvme0n1

Intel SPDK(Storage Performance Development Kit) benchmark
18
# uname -sr;
# apt-get install libnuma-dev git uuid-dev libaio-dev libcunit1-dev libcunit1 libssl-dev g++ -y
# cd /opt/; git clone https://github.com/axboe/fio
# cd fio; git checkout -b fio-2.21
# make; make install
# cd /opt/; git clone https://github.com/spdk/spdk
# cd sdpk; git submodule update --init
# ./configure --with-fio=/opt/fio/
# make
# /opt/spdk/scripts/setup.sh
# fio --name=nvme --numjobs=8 --filename="trtype=PCIe traddr=0000.01.00.0 ns=1" --bs=4K --iodepth=4
--ioengine=/opt/spdk/examples/nvme/fio_plugin/fio_plugin
--group_reporting --size=50% --runtime=100 --thread=8 --rw=read
nvme: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
4096B-4096B, ioengine=spdk, iodepth=4
...
fio-3.2-19-g609ac1
Starting 8 threads
Starting DPDK 17.11.0 initialization...
[ DPDK EAL parameters: fio -c 0x1 -m 512 --file-prefix=spdk_pid18356 ]
EAL: Detected 8 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:2700 spdk_nvme
nvme: (groupid=0, jobs=8): err= 0: pid=18367: Mon Nov 27 15:36:06 2017
read: IOPS=572k, BW=2236MiB/s (2345MB/s)(218GiB/100001msec)
slat (nsec): min=91, max=471828, avg=200.94, stdev=122.85
clat (usec): min=9, max=13319, avg=55.44, stdev= 7.84
lat (usec): min=14, max=13319, avg=55.64, stdev= 7.84
clat percentiles (usec):
| 1.00th=[ 48], 5.00th=[ 50], 10.00th=[ 50], 20.00th=[ 51],
| 30.00th=[ 52], 40.00th=[ 53], 50.00th=[ 53], 60.00th=[ 54],
| 70.00th=[ 56], 80.00th=[ 60], 90.00th=[ 64], 95.00th=[ 67],
| 99.00th=[ 88], 99.50th=[ 91], 99.90th=[ 100], 99.95th=[ 111],
| 99.99th=[ 121]
bw ( KiB/s): min=242664, max=310392, per=12.50%, avg=286296.77, stdev=11653.87, samples=1592
iops : min=60666, max=77598, avg=71574.18, stdev=2913.46, samples=1592
lat (usec) : 10=0.01%, 20=0.01%, 50=9.44%, 100=90.46%, 250=0.09%
lat (usec) : 500=0.01%, 750=0.01%
lat (msec) : 2=0.01%, 20=0.01%

In-Memory Database Registration Performance Check (Intel vs AMD)
19
Purley# uname -sr; cat /etc/redhat-release
Linux 3.10.0-514.el7.x86_64
CentOS Linux release 7.3.1611 (Core)
Purley# grep proc /proc/cpuinfo | wc -l
48
Purley# lscpu
Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
RYZEN# uname -sr; cat /etc/debian_version
stretch/sid
RYZEN# grep proc /proc/cpuinfo | wc -l
16
RYZEN# lscpu
Model name: AMD Ryzen 7 1800X Eight-Core Processor
redisはデータサイズに応じてプロセスあたりの処理性能に低下が確認できる

In-Memory Database Performance Check
20
Intel Purley
AMD Ryzen
Xeon Phi(KNL)
# uname -sr; cat /etc/redhat-release
Linux 3.10.0-514.el7.x86_64
# grep proc /proc/cpuinfo | wc -l
48
# lscpu
Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz

ALL FLASH DATACENTER & IN-MEMORY COMPUTING: HOT TOPICS
21
SOURCE: SAKURA Internet Research Center. (2017/10), Project Sprig.

ClickHouse column-oriented database Install memo
22
# uname -sr; cat /etc/issue
Ubuntu 17.04
# apt install software-properties-common
# apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4
# apt-add-repository "deb http://repo.yandex.ru/clickhouse/trusty stable main"
# apt-get update
# apt-get install clickhouse-server-common clickhouse-client -y
# service clickhouse-server start
# clickhouse-client --multiline
ClickHouse client version 1.1.54304.
Connecting to localhost:9000.
Connected to ClickHouse server version 1.1.54304.
:) CREATE TABLE ontime
(
Year UInt16,
Quarter UInt8,
Month UInt8,
:
Div5TailNum String
)
ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
or
# xz -v -c -d < ontime.csv.xz | clickhouse-client --query="INSERT INTO ontime FORMAT CSV"

MariaDB ColumnStore column-oriented database Install memo
23
# uname -sr; cat /etc/redhat-release
Linux 3.10.0-514.el7.x86_64
Red Hat Enterprise Linux Server release 7.4 (Maipo)
# mkdir mcs; cd mcs;
# wget https://downloads.mariadb.com/ColumnStore/1.0.11/centos/x86_64/7/mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz
# tar xzvf ./mariadb-columnstore-1.0.11-1-centos7.x86_64.rpm.tar.gz
# yum install boost boost-devel boost-doc expect perl-DBD-MySQL -y
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-common.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-client.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-server.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-libs.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-shared.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-client.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-gssapi-server.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-platform.rpm -vh
# rpm -i mariadb-columnstore-1.0.11-1-x86_64-centos7-storage-engine.rpm -vh
# /usr/local/mariadb/columnstore/bin/postConfigure
Select the type of System Server install [1=single, 2=multi] (2) > 1
Enter System Name (columnstore-1) > sprig-1
Select the type of Data Storage [1=internal, 2=external, 3=GlusterFS] (1) > 1
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) > 1
# . /usr/local/mariadb/columnstore/bin/columnstoreAlias
# mcsadmin
MariaDB ColumnStore Admin Console
enter 'help' for list of commands
enter 'exit' to exit the MariaDB ColumnStore Command Console
use up/down arrows to recall commands
mcsadmin>

Appendix/memo
eBPF/XDP/DPDK
24

Quagga with ROUTE_MULTIPATH (memo)
25
# grep ROUTE_MULTIPATH /usr/src/*/.config
CONFIG_IP_ROUTE_MULTIPATH=y
# apt-get install -y quagga traceroute
# vi /etc/sysctl.conf
net.ipv4.conf.all.forwarding=1
net.ipv4.fib_multipath_hash_policy = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.default.arp_filter = 1
net.ipv6.conf.all.forwarding=1
net.ipv6.route.max_size = 32768
net.ipv6.xfrm6_gc_thresh = 32768
# touch /etc/quagga/zebra.conf
# touch /etc/quagga/ospfd.conf
# touch /etc/quagga/ospf6d.conf
# chown quagga.quaggavty /etc/quagga/*.conf
# chmod 640 /etc/quagga/*.conf
# ufw disable
# vi /etc/quagga/daemons
zebra=yes
ospfd=yes
ospf6d=yes
# echo VTYSH_PAGER=more >> /etc/environment
# vtysh
Quagga with ROUTE_MULTIPATH

My First XDP (eXpress Data Path)
26
# apt install -y make gcc libssl-dev bc libelf-dev libcap-dev clang
# apt install -y gcc-multilib llvm libncurses5-dev git bison flex pkg-config
# apt install -y libmnl0 libmnl-dev clang libasm1 libasm-dev
# mkdir /usr/local/include/asm
# ln -s /usr/include/x86_64-linux-gnu/asm/* /usr/local/include/asm
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
# ./configure --prefix=/sbin
# cd iproute2/
# make; make install
# vi xdp_example.c
#include <linux/bpf.h>
#ifndef __section
# define __section(NAME) __attribute__((section(NAME), used))
#endif
__section("prog")
int xdp_drop(struct xdp_md *ctx)
{
return XDP_DROP;
}
char __license[] __section("license") = "GPL";
# clang -O2 -Wall -target bpf -c xdp_example.c -o xdp_example.o
# ip link set dev eth0 xdp obj xdp_example.o
# ip link set dev eth0 xdp of
SOURCE: https://github.com/torvalds/linux/tree/master/samples/bpf,
http://cilium.readthedocs.io/en/latest/bpf/#llvm,
http://vger.kernel.org/netconf2017_files/XDP_devel_update_NetConf2017_Seoul.pdf,
http://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html,
https://netdevconf.org/1.2/slides/oct7/10_nic_viljoen_eBPF_Offload_to_Hardware__cls_bpf_and_XDP_finalised.pdf,
https://people.netfilter.org/hawk/presentations/NetDev2.2_2017/XDP_for_the_Rest_of_Us_Part_2.pdf,
XDP – eXpress Data Path

My First F-Stack
27
#lscpu
Model name: AMD Ryzen Threadripper 1900X 8-Core Processor
DISTRIB_CODENAME=zesty
# cd /opt
# git clone https://github.com/F-Stack/f-stack.git
# /opt/f-stack/dpdk/tools/dpdk-setup.sh
[15] x86_64-native-linuxapp-gcc
Option: 15
# echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# mkdir /mnt/huge
# mount -t hugetlbfs nodev /mnt/huge
# echo 0 > /proc/sys/kernel/randomize_va_space
# modprobe uio
# insmod /opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
# insmod /opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/kmod/rte_kni.ko
# export FF_PATH=/opt/f-stack/
# export FF_DPDK=/opt/f-stack/dpdk/x86_64-native-linuxapp-gcc/
# cd /root/f-stack/lib
# make ; make ; make ; make install
# cd /opt/f-stack/app/nginx-1.11.10
# ./configure --prefix=/usr/local/nginx_fstack --with-ff_module --without-http_rewrite_module
# make
# make install
# grep f-stack /usr/local/nginx_fstack/conf/nginx.conf
fstack_conf f-stack.conf;
# grep addr /usr/local/nginx_fstack/conf/f-stack.conf
addr=192.168.1.2
Copyright © 2018. Tencent Cloud All rights reserved.

My First FD.io VPP (Segment Routing for IPv6 / L3VPN for IPv4 traffic)
28
# vi /etc/apt/sources.list.d/99fd.io.list
deb [trusted=yes] https://nexus.fd.io/content/repositories/fd.io.ubuntu.xenial.main/ ./
# apt-get update
# apt-get install -y vpp-lib vpp vpp-plugins
# service vpp start
# service vpp status
● vpp.service - vector packet processing engine
Loaded: loaded (/lib/systemd/system/vpp.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-02-13 09:30:25 JST; 21s ago
:
CGroup: /system.slice/vpp.service
mq2011 /usr/bin/vpp -c /etc/vpp/startup.conf
# vppctl
vpp# set sr encaps source addr C1::
vpp# sr policy add bsid C1::999:2 next C2:: next C4::4 encap
vpp# sr steer l3 1.1.1.0/24 via sr policy bsid C1::999:2
:
vpp# sr localsid address C4::4 behavior end.dx4 GigabitEthernet0/6/0 1.1.1.1
vpp# show sr localsid
SRv6 - My LocalSID Table:
=========================
Address: c4::4
Behavior: DX4 (Endpoint with decapsulation and IPv4 cross-connect)
Iface: GigabitEthernet0/6/0
Next hop: 1.1.1.1
SOURCE: VPP/Segment Routing for IPv6 (https://wiki.fd.io/view/VPP/Segment_Routing_for_IPv6)
© 2017 FD.io is a Linux Foundation Project. All Rights Reserved.

FD.io VPP with XeonPhi (Basic Configuration)
29
# lscpu
CPU(s): 256
Model name: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
deb [trusted=yes] https://nexus.fd.io/content/repositories/fd.io.ubuntu.xenial.main/ ./
# apt-get update
# apt install vpp vpp-lib vpp-plugins python-pip
# pip install vpp-config
# vpp-config
5) Execute some basic tests.
Command: 5
1) List/Create Simple IPv4 Setup
Command: 1
Would you like to keep this configuration [Y/n]? n
Would you like add address to interface GigabitEthernet4/0/1 [Y/n]? Y
Please enter the IPv4 Address [n.n.n.n/n]: 1.1.1.11/24
# vi /etc/vpp/startup.conf
unix {
nodaemon
log /var/log/vpp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
exec /usr/local/vpp/vpp-config/scripts/set_int_ipv4_and_up
}
# vppctl
# show int
Name Idx State
GigabitEthernet4/0/1 1 up
# show int addr
GigabitEthernet4/0/1 (up):
1.1.1.11/24

FD.io VPP with XeonPhi (Load Balancer plugin)
30
# vppctl
# show int addr
GigabitEthernet4/0/1 (up):
1.1.1.11/24
# lb conf ip4-src-address 1.1.1.11 timeout 3
# lb vip 1.2.3.4/32 encap gre4 new_len 1024
# lb as 1.2.3.4/32 1.1.1.8 1.1.1.9 1.1.1.10
# show lb vips 1.2.3.4
ip4-gre4 1.2.3.4/32 new_size:1024 #as:3
Application Server(1.1.1.8,9,10) side Configuration
# ip tunnel add tun0 mode gre local 1.1.1.8 remote 1.1.1.11 ttl 255
# ifconfig tun0 1.2.3.4/32 up
# echo 1 > /proc/sys/net/ipv4/conf/tun0/arp_ignore
# echo 2 > /proc/sys/net/ipv4/conf/tun0/arp_announce
# echo 0 > /proc/sys/net/ipv4/conf/tun0/rp_filter
# echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
# echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
# echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
1.1.1.8
1.1.1.11
1.1.1.9
(1.2.3.4/32)
tun0
(1.2.3.4/32)
tun0
GRE Tunnels
IP routing
(1.2.3.4/32)
Application
Server
Application
Server
Direct
Server
Responce
(DSR)
FD.io
VPP

Quagga with ROUTE_MULTIPATH for BGP load balancing (memo)
31
# grep ROUTE_MULTIPATH /usr/src/*/.config
/usr/src/linux-headers-4.13.0-36-generic/.config:CONFIG_IP_ROUTE_MULTIPATH=y
# apt install -y quagga traceroute
# touch /etc/quagga/zebra.conf; touch /etc/quagga/bgpd.conf; chown quagga.quaggavty /etc/quagga/*.conf
# ufw disable ; echo VTYSH_PAGER=more >> /etc/environment
zebra=yes
bgpd=yes
# vtysh
# router bgp 65001
# bgp router-id 1.1.1.1
# bgp bestpath as-path multipath-relax
# bgp bestpath compare-routerid
# redistribute connected
# neighbor 1.1.1.2 remote-as 65002
# maximum-paths 64
# interface lo
# ip address 1.2.3.4/24
# router bgp 65002
# maximum-paths 64
# interface lo
# ip address 1.2.3.4/24
# router bgp 65003
# maximum-paths 64
# show ip bgp
BGP table version is 0, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
Network Next Hop Metric LocPrf Weight Path
*> 1.2.3.0/24 1.1.1.2 0 0 65002
*= 1.1.1.3 0 0 65003

FD.io VPP tap-inject with sample_plugins
32
# echo VTYSH_PAGER=more >> /etc/environment
# apt install -y quagga
# touch /etc/quagga/zebra.conf
# touch /etc/quagga/bgpd.conf
# chown quagga.quaggavty /etc/quagga/*.conf
# ufw disable
zebra=yes
bgpd=yes
# apt install build-essential -y
# cd /opt/
# git clone https://gerrit.fd.io/r/vpp
# git clone https://gerrit.fd.io/r/vppsb
# cd /opt/vpp
# ./extras/vagrant/build.sh
# make install-dep; make bootstrap; make build
# vi /opt/vppsb/router/router/tap_inject_node.c
#include <sys/uio.h>
# ln -sf /opt/vppsb/netlink
# ln -sf /opt/vppsb/router
# ln -sf /opt/vppsb/netlink/netlink.mk build-data/packages/
# ln -sf /opt/vppsb/router/router.mk build-data/packages/
# cd build-root/
# make V=0 PLATFORM=vpp TAG=vpp_debug netlink-install router-install
# dpkg -i *.deb
# cp -p /opt/vpp/build-root/install-vpp_debug-native/router/lib64/router.so.0.0.0 /usr/lib/vpp_plugins/router.so
# service vpp restart
# vppctl enable tap-inject
# vppctl show tap-inject
GigabitEthernet13/0/0 -> vpp1
GigabitEthernetb/0/0 -> vpp0
# vtysh (quagga)
# configure terminal
(config)# interface vpp0
(config-if)# ip address 192.168.11.100/24
(config-if)# exit
(config)# exit
# write
# quit
# vppctl show int addr
GigabitEthernetb/0/0 (up):
L3 192.168.11.100/24
L3 fe80::20c:29ff:fe24:af28/64
# /opt/vpp/src/examples/sample-plugin
# libtoolize
# aclocal
# autoconf
# autoheader
# automake --add-missing
# chmod +x configure
# ./configure
# make
# make install
GigabitEthernetb/0/0
vpp0
vpp_plugins / router.so
vpp_plugins / sample_plugin.so
quagga

FD.io VPP 18.07 with Ubuntu 16.04.5 LTS (not support Ubunutu 1804)
33
# apt remove --purge vpp*
deb [trusted=yes]
https://nexus.fd.io/content/repositories/fd.io.stable.1807.ubuntu.xenial.main/ ./
# apt update
# apt dist-upgrade -y
# apt install -y vpp vpp-lib vpp-plugins vpp-dpdk-dkms
# vppctl show pci
Address Sock VID:PID Link Speed Driver Product Name
0000:05:00.0 0 8086:1539 2.5 GT/s x1 uio_pci_generic
0000:65:00.0 0 8086:1584 8.0 GT/s x8 uio_pci_generic XL710 40GbE Controller
# vi /etc/vpp/startup.conf
dpdk {
dev 0000:65:00.0
}
# service vpp restart
# service vpp status
Active: active (running) since Tue 2018-09-04 18:50:02 JST; 2s ago
# vppctl set int ip address FortyGigabitEthernet65/0/0 1.2.3.4/24
# vppctl set int state FortyGigabitEthernet65/0/0 up
# vppctl show interface addr
FortyGigabitEthernet65/0/0 (up):
L3 1.2.3.4/24
# vppctl show version
vpp v18.07-rc2~6-gdb6d6b3~b28 built by root on 10268b67c8b1 at Mon Jul 30 ...
# vi /etc/apt/sources.list
deb http://security.ubuntu.com/ubuntu bionic-security main
# apt update
# apt install libssl1.1 -y
Download form http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc2/
# dpkg -i linux-headers-4.19.0-041900rc2_...rc2.201809022230_all.deb
# dpkg -i linux-headers-4.19.0-041900rc2-...rc2.201809022230_amd64.deb
# dpkg -i linux-headers-4.19.0-041900rc2-...rc2.201809022230_amd64.deb
# dpkg -i linux-image-unsigned-4.19.0-......rc2.201809022230_amd64.deb
Linux 4.19.0-041900rc2-generic
# vppctl show int
# (it does not works)
NOTICE: does not works with kernel 4.19-rc2
© 2018 The Fast Data Project. Copyright © 2018 FD.IO Project a Series of LF Projects, LLC

My First Intel/Movidius NCS
35
SOURCE: SAKURA Internet Rsearch Center. (2017/07) Project Sprig.
$ sudo su
# apt-get update ; apt-get upgrade -y
# mkdir /opt/mvncsdk ; cd /opt/mvncsdk/
GoTo: https://developer.movidius.com/getting-started
# wget https://ncs-forum-uploads.s3.amazonaws.com/ncsdk/MvNC_SDK_01_07_07/MvNC_SDK_1.07.07.tgz
# tar zxvf MvNC_SDK_1.07.07.tgz ; tar zxvf MvNC_Toolkit-1.07.06.tgz ; tar xzvf ./MvNC_API-1.07.07.tgz
# ./bin/setup.sh ; ./bin/data/dlnets.sh
# source ~/.bashrc
# cd /opt/mvncsdk/ncapi/; ./setup.sh ; cd ./c_examples/ ; make
# ./ncs-fullcheck -l2 -c1 ../networks/AlexNet ../images/cat.jpg
Device 0 Address: 2 - VID/PID 03e7:2150
Starting wait for connect with 2000ms timeout
Found Address: 2 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 825136 bytes...
Successfully sent 825136 bytes of data in 35.764553 ms (22.002540 MB/s)
Boot successful, device address 2
Found Address: 2 - VID/PID 040e:f63b
done
Booted 2 -> VSC
OpenDevice 2 succeeded
Graph allocated
:
$ uname -sr;
Linux 4.8.0-36-generic (Ubuntu 16.04.02)
$ lsusb -v
Device Descriptor:
iProduct 2 Movidius MA2X5X
MaxPower 500mA
© Copyright Movidius 2017. All Rights Reserved.

UP Board AI Core Configuration memo
36
# lshw
*-pci:1
*-usb
description: USB controller
product: FL1100 USB 3.0 Host Controller
*-usbhost:1
*-usb UNCLAIMED
description: Generic USB device
product: Movidius MA2X5X
vendor: Movidius Ltd.
# git clone -b ncsdk2 http://github.com/Movidius/ncsdk && cd ncsdk && make install
# export PYTHONPATH="${PYTHONPATH}:/opt/movidius/caffe/python"
# cd /examples/tensorflow/inception_v3
# cat run.py
image_filename = path_to_images + 'nps_electric_guitar.png'
devices = mvnc.enumerate_devices()
# python3 run.py
Number of categories: 1001
Start download to NCS...
*******************************************************************************
inception-v3 on NCS
*******************************************************************************
547 electric guitar 0.988281
403 acoustic guitar 0.00751877
715 pick, plectrum, plectron 0.0014801
421 banjo 0.000901222
820 stage 0.000654221
*******************************************************************************
Finished
Copyright 2018 Up Board | All Rights Reserved

USB 3.0 CAPTURE HDMI 4K with Loop-through for Image redistribution
37
# uname -sr; tail -1 /etc/redhat-release
Linux 3.10.0-862.9.1.el7.x86_64
# yum install -y usbutils hwinfo mplayer v4l-utils ffmpeg git
# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 5000M
|__ Port 4: Dev 2, If 9, Class=Human Interface Device, Driver=usbhid, 5000M
# lsusb -vv
# hwinfo --usb
# v4l2-ctl --list-devices
USB Capture HDMI 4K+ (usb-0000:00:14.0-4):
/dev/video0
# v4l2-ctl -d /dev/video0 --info
# v4l2-ctl --list-formats-ext -d /dev/video0
Type : Video Capture
Name : YUV 4:2:2 (YUYV)
Size: Discrete 4096x2160
Interval: Discrete 0.017s (60.000 fps)
# wget https://libav.org/releases/libav-12.3.tar.xz
# tar Jxvf ./libav-12.3.tar.xz; cd libav-12.3
# ./configure --disable-yasm; make; make install
# avconv -f video4linux2 -input_format nv12 -s 1920x1080 -i /dev/video0 -qscale 10 out.mpeg
Input #0, video4linux2, from '/dev/video0':
Duration: N/A, start: 1240.062083, bitrate: 1492992 kb/s
nv12, 1920x1080, 1492992 kb/s
60 fps, 1000k tbn
# ffmpeg -f v4l2 -list_formats all -i /dev/video0
[video4linux2,v4l2 @ 0x24114c0] Raw
: yuyv422 : YUV 4:2:2 (YUYV) :
640x360 640x480 720x480 720x576 768x576 800x600
856x480 960x540 1024x576 1024x768 1280x720 1280x800
1280x960 1280x1024 1368x768 1440x900 1600x1200 1680x1050
1920x1080 1920x1200 2048x1080 2560x1440 3840x2160 4096x2160
[video4linux2,v4l2 @ 0x24114c0] Raw
: nv12 : YUV 4:2:0 (NV12) :
640x360 640x480 720x480 720x576 768x576 800x600
856x480 960x540 1024x576 1024x768 1280x720 1280x800
1280x960 1280x1024 1368x768 1440x900 1600x1200 1680x1050
1920x1080 1920x1200 2048x1080 2560x1440 3840x2160 4096x2160
© 2018, Nanjing Magewell Electronics Co., Ltd
SERVERSERVER
HDMI output
SERVER SERVER
HDMI CAPTURE
SERVER
HDMI CAPTURE
HDMI Loop-Trough HDMI Loop-Trough HDMI Loop-TroughHDMI Loop-Trough
POWER ON/OS DOWN POWER ON/OS DOWN
HDMI CAPTURE HDMI CAPTURE
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
USB 3.0
BUS
POWER
ORIGINAL
READ once
COPY COPY

AMD Threadripper 1900X overview/spec
38
# uname -sr
# vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="pci=noaer"
# update-grub; sync; sync; sync; reboot
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
:
Model name: AMD Ryzen Threadripper 1900X 8-Core Processor
CPU MHz: 3800.000
CPU max MHz: 3800.0000
CPU min MHz: 2200.0000
BogoMIPS: 7585.39
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht
syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx cpb
hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf
arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov
succor smca

AMD EPYC 7251 overview/spec
39
# uname -sr ; cat /etc/redhat-release
Linux 3.10.0-693.5.2.el7.x86_64
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7251 8-Core Processor
Stepping: 2
CPU MHz: 1200.000
CPU max MHz: 2100.0000
CPU min MHz: 1200.0000
BogoMIPS: 4199.47
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 4096K
NUMA node0 CPU(s): 0,1,16,17
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2
movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext
perfctr_core perfctr_nb bpext perfctr_l2 cpb hw_pstate avic fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 arat
npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov succor smca

Appendix: SSD DC P4800X and cost/performance analysis
40
(Cost)
512GB DDR4 DRAM
(2666Hz/ECC)
$6,399.68
SOURCE: © 2016 Colfax International. , © 2000-2017 Newegg Inc. / SAKURA Internet Research Center. (08/2017) Project Sprig.
750GB 3D XPOINT/NVMe SSD
(P4800X x2)
$3,790.00
30GB (300M records)
sort: 296 sec
200GB (2,000M records)
sort: 4,648 sec
192GB DDR4 DRAM
(2666Hz/ECC)
$2,399.88-
In-Memory Computing
All Flash Computing
Processing Size: 6.7x
Processing Cost: 1.5x
Processing Time: 15x
In-Memory Computing
# gensort -a 2000000000 test
# time sort --parallel=52 -T /memdrv test -o out
# gensort -a 300000000 test
# time sort --parallel=52 -T /ramdisk test -o out
Processing Size: 2.7x
Processing Cost: 2.7x
Processing Time: N/A

Appendix: stream_openmp performance check
41
Xeon Phi
AMD RYZEN
Xeon
Xeon
Special Thanks: Takefumi Miyoshi

Appendix: Network Application Benchmark result (iperf with 20 servers)
42

How to measure your dataflow using fio, pktgen and bandwidthTest
43
WRITE: 12,648MB/s (bs=256KB)
READ: 13,793MB/s (bs=256KB)
RAMDISK DDR4 2133MHz 16GB x4
40Mpps (pkt=64B) 2,560MB/s
40GbE (max rate) 5,000MB/s
Mellanox Connect X-4
Intel(R) Core(TM)
i7-7800X CPU @ 3.50GHz
RAMDISK DDR4 2133MHz 16GB x4
Intel(R) Core(TM)
i7-7800X CPU @ 3.50GHz
# cd /opt
# git clone git://dpdk.org/dpdk
# git clone git://dpdk.org/apps/pktgen-dpdk
export RTE_SDK=/opt/dpdk
export RTE_TARGET=x86_64-native-linuxapp-gcc
# sysctl vm.nr_hugepages=2048
# cd /opt/dpdk
# make install T=x86_64-native-linuxapp-gcc
# /opt/dpdk/usertools/dpdk-devbind.py -u 0b:00.0
# /opt/dpdk/usertools/dpdk-devbind.py -u 13:00.0
# /opt/dpdk/usertools/dpdk-devbind.py -b igb_uio 0b:00.0
# /opt/dpdk/usertools/dpdk-devbind.py -b igb_uio 13:00.0
# /opt/dpdk/usertools/dpdk-devbind.py --status
# cd /opt/pktgen-dpdk/
# make
# /opt/pktgen-dpdk/tools/setup.sh
# /opt/pktgen-dpdk/app/x86_64-native-linuxapp-gcc/pktgen -- -m "1.0, 2.1"
Intel(R) Core(TM)
i7-7800X CPU @ 3.50GHz
Host to Device: 6,029MB/s
Device to Host: 6,448MB/s
GeForce GTX 1050
IntelOpate 900P (3DXP)
IntelOpate 900P (3DXP)
# mount -t tmpfs -o size=32G tmpfs /ramdisk
# fio --directory=/ramdisk --rw=write --bs=4k --size=1G --numjobs=3
--runtime=100 --group_reporting --name=data
# bash cuda_9.1.85_387.26_linux.run --silent --toolkit --override
--no-opengl-libs --driver
:
# cd NVIDIA_CUDA-9.1_Samples/1_Utilities/bandwidthTest
# ./bandwidthTest

計算機性能の限界点とその考え方

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to 計算機性能の限界点とその考え方

Similar to 計算機性能の限界点とその考え方 (20)

More from Naoto MATSUMOTO

More from Naoto MATSUMOTO (20)

Recently uploaded

Recently uploaded (20)

計算機性能の限界点とその考え方