SlideShare a Scribd company logo
1 of 30
Download to read offline
A Random Forest using a Multi-valued
Decision Diagram on an FPGA
1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato,
2Tsutomu Sasao
1Tokyo Institute of Technology, JP, 2Meiji University, JP
May, 22nd, 2017
@ISMVL2017
Outline
• Background
• Random forest (RF)
• Multi-valued decision diagram (MDD)
• RF using MDDs
• Experimental results
• Conclusion
2
Machine Learning
3
Much computation power, and Big data
(Left): “Single-Threaded Integer Performance,” 2016
(Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014
Machine Learning Algorithms
M. Warrick, “How to get started with machine learning,” PyCon2014 4
Introduction
• Random Forest (RF)
• Ensemble learning method
• Consists of multiple decision trees (DTs)
• Applications: Segmentation, human pose
detection
• It is based on binary DTs (BDTs)
• A node is evaluated by an if-then-else
statement
• The same variable may appear several times
• Multiple-valued decision diagram (MDD)
• Each variable appears only once on a path
5
Introduction (Contʼd)
• Target platform
• CPU: Too slow
• GPU: Not suitable to the RF → slow, and
consumes much power
• FPGA: Faster, low power, long TAT
• High-level synthesis (HLS) for the RF using
MDDs on an FPGA
• Low power, high performance,
short design time
6
Random Forest
7
Classification by a Binary
Decision Tree (BDT)
• Partition of the feature map
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
8
Training of a BDT
• It is built by randomized samples
• Recursively partition the dataset to maximize its
entropy → The same variables may appear
9
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1 C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
Random Forest (RF)
• Ensemble learning
• Classification and regression
• Consists of multiple BDT
10
Tree 1 Tree 2 Tree n
C1
C2
C1
Voter
C1 (Class)
InputX1<0.53?
X3<0.71? X2<0.63?
X2<0.63? X3<0.72?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C3
C1
Tree 1
Binary Decision Tree (BDT) Random Forest
...
Applications
• Key point matching [Lepetit et al., 2006]
• Object detector [Shotton et al., 2008][Gall et al., 2011]
• Hand written character recognition [Amit&Geman, 1997]
• Visual word clustering
[Moosmann et al.,2006]
• Pose recognition
[Yamashita et al., 2010]
• Human detector
[Mitsui et al., 2011]
[Dahang et al., 2012]
• Human pose estimation
[Shotton 2011]
11
Known Problem
• Build BDTs from randomized samples
• The same variable may appear on a path
• Tend to be slow, even if we use the GPUs
12
X2<0.53?
X2<0.29? X2<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
if X2 < 0.09 then
output C1;
else
goto Child_node;
Multi-valued Decision Diagram
13
14
Binary Decision Diagram (BDD)
• Recursively apply Shannon expansion to a
given logic function
• Non-terminal node: If-then-else statement
• Terminal node: Set functional value
0 1
x1
x2
x3
x4
x5
x6
Non‐terminal node
Terminal node
15
Measurement of BDD
Memory size: # of nodes size of a node
Worst case performance: LPL (Longest Path Length)
→Dedicated fully pipeline hardware
0 1
x1
x2
x3
x4
x5
x6

16
Multi-Valued Decision Diagram (MDD)
• MDD(k): 2k outgoing edges
• Evaluates k variables at a time
0 1
x1
x2
x3
x4
x5
x6
BDD
0 1
X3
X2
X1
{x5,x6}
{x3,x4}
{x1,x2}
MDD(2)
Comparison the BDT with the MDD
17
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
X2
X1 X1
C1 C2
<0.29
<0.53
<1.00
<1.00
<0.71
<0.71
<1.00
<0.63
BDT MDD
# of Nodes
18
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X2
X1
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X2
X1
BDT MDD
Complexities of the BDT
and the MDD
19
# Nodes LPL
BDT O(Σ|Xi|) O(Σ|Xi|)
MDD O(|Xi|k) O(n)
The RF prefers shallow decision trees for avoid 
the overfitting
Random Forest
using MDDs on an FPGA
20
FPGA (Field Programmable
Gate Array)
• Reconfigurable architecture
• Look-up Table (LUT)
• Configurable channel
• Advantages
• Faster than CPU
• Dissipate lower power
than GPU
• Short time design
than ASIC
21
Fully Pipeline Circuit
Tree 1 Tree 2 Tree b
C1 C2
C1
Voter
C1
X (Input)
...
22
MUX-based Realization
23
System Design Tool
24
①
②
④
③
1. Behavior design
+ pragmas
2. Profile analysis
3. IP core generation by HLS
4. Bitstream generation by
FPGA CAD tool
5. Middle ware generation
↓
Automatically done
Proposed Tool Flow
Training
Dataset
scikit‐learn
Hyper
Parameter
(by Grid‐
search)
Random
Forest
Host
Code
Kernel
Code aocx
Binary
Host
PC
FPGA
Board
aoc
gcc
RF2AOC
25
scikit‐learn Intel SDK for OpenCL
Experimental Results
26
Comparison the MDD
based with the BDT based
27
BDT MDD
Name Path len.
(Peform.)
#Nodes
(Mem.)
Max.
Path
Path len.
(Peform.)
#Nodes
(Mem.)
Dermatology 720 676 15 322 118336
Contraceptive 
Method
600 1055 9 198 7360
Glass 
Identification
952 1260 10 268 17204
Hayes‐Roth 480 577 5 73 448
Hepatitis 720 1040 15 357 145664
Ionosphere 1196 1077 20 381 671744
Iris 1056 777 4 199 517
Dataset: UCI Machine Learning Repository
http://archive.ics.uci.edu/ml/datasets.html
Comparison of Platforms
• Implemented RF following devices
• CPU: Intel Core i7 650
• GPU: NVIDIA GeForce GTX Titan
• FPGA: Terasic DE5-NET
• Measure dynamic power including
the host PC
• Test bench: 10,000 random vectors
• Execution time including
communication time between
the host PC and devices
28
GPU
FPGA
Comparison of Platforms
29
GPU@86W
GeForce Titan
CPU@13W
Xeon (R) E5607
FPGA@15W
Stratix V A7
Name LPS LPS/W LPS LPS/W LPS LPS/W
Dermatology 336.2 3.9 211.6 16.3 3221.2 214.7
Contraceptive 
Method
521.9 6.1 286.4 22.0 10924.3 728.3
Glass 
Identification
726.7 8.5 587.5 45.2 6442.3 429.5
Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0
Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3
Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2
Iris 446.6 5.2 436.7 33.6 4831.7 322.1
LPS: #Looks Per Second
Conclusion
• Proposed the RF using MDDs
• Reduced the path length
• Increased the column multiplicity
• # of nodes: O(|X|k)
• The shallow decision diagram is
recommended to avoid the overfitting
• Developed the high-level synthesis design
flow toward the FPGA realization
• 10.7x faster than the GPU
• 14.0x faster than the CPU
30

More Related Content

What's hot

"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 

What's hot (20)

"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
TensorFlow Study Part I
TensorFlow Study Part ITensorFlow Study Part I
TensorFlow Study Part I
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken Content
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Lec08 optimizations
Lec08 optimizationsLec08 optimizations
Lec08 optimizations
 
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCTed Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
 
Lec07 threading hw
Lec07 threading hwLec07 threading hw
Lec07 threading hw
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...
【DL輪読会】Incorporating group update for speech enhancement  based on convolutio...【DL輪読会】Incorporating group update for speech enhancement  based on convolutio...
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
Lec13 multidevice
Lec13 multideviceLec13 multidevice
Lec13 multidevice
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 

Viewers also liked

Viewers also liked (20)

(公開版)FPGAエクストリームコンピューティング2017
(公開版)FPGAエクストリームコンピューティング2017 (公開版)FPGAエクストリームコンピューティング2017
(公開版)FPGAエクストリームコンピューティング2017
 
(公開版)Reconf研2017GUINNESS
(公開版)Reconf研2017GUINNESS(公開版)Reconf研2017GUINNESS
(公開版)Reconf研2017GUINNESS
 
Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)
 
2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介
2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介
2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介
 
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
2値化CNN on FPGAでGPUとガチンコバトル(公開版)2値化CNN on FPGAでGPUとガチンコバトル(公開版)
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
 
Verilog-HDL Tutorial (15) hardware
Verilog-HDL Tutorial (15) hardwareVerilog-HDL Tutorial (15) hardware
Verilog-HDL Tutorial (15) hardware
 
Nested RNSを用いたディープニューラルネットワークのFPGA実装
Nested RNSを用いたディープニューラルネットワークのFPGA実装Nested RNSを用いたディープニューラルネットワークのFPGA実装
Nested RNSを用いたディープニューラルネットワークのFPGA実装
 
Verilog-HDL Tutorial (12)
Verilog-HDL Tutorial (12)Verilog-HDL Tutorial (12)
Verilog-HDL Tutorial (12)
 
Verilog-HDL Tutorial (14)
Verilog-HDL Tutorial (14)Verilog-HDL Tutorial (14)
Verilog-HDL Tutorial (14)
 
Verilog-HDL Tutorial (13)
Verilog-HDL Tutorial (13)Verilog-HDL Tutorial (13)
Verilog-HDL Tutorial (13)
 
FPGAX2016 ドキュンなFPGA
FPGAX2016 ドキュンなFPGAFPGAX2016 ドキュンなFPGA
FPGAX2016 ドキュンなFPGA
 
Verilog-HDL Tutorial (11)
Verilog-HDL Tutorial (11)Verilog-HDL Tutorial (11)
Verilog-HDL Tutorial (11)
 
Verilog-HDL Tutorial (9)
Verilog-HDL Tutorial (9)Verilog-HDL Tutorial (9)
Verilog-HDL Tutorial (9)
 
Verilog-HDL Tutorial (15) software
Verilog-HDL Tutorial (15) softwareVerilog-HDL Tutorial (15) software
Verilog-HDL Tutorial (15) software
 
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
 
Altera sdk for open cl アンケート集計結果(公開版)
Altera sdk for open cl アンケート集計結果(公開版)Altera sdk for open cl アンケート集計結果(公開版)
Altera sdk for open cl アンケート集計結果(公開版)
 
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
 
電子回路の民主化とその実践
電子回路の民主化とその実践電子回路の民主化とその実践
電子回路の民主化とその実践
 
ゆるふわコンピュータ (IPSJ-ONE2017)
ゆるふわコンピュータ (IPSJ-ONE2017)ゆるふわコンピュータ (IPSJ-ONE2017)
ゆるふわコンピュータ (IPSJ-ONE2017)
 
Beatroboでのハードウェアプロトタイピング
BeatroboでのハードウェアプロトタイピングBeatroboでのハードウェアプロトタイピング
Beatroboでのハードウェアプロトタイピング
 

Similar to A Random Forest using a Multi-valued Decision Diagram on an FPGa

Real Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth SensorsReal Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth Sensors
Wassim Filali
 
Alto Desempenho com Java
Alto Desempenho com JavaAlto Desempenho com Java
Alto Desempenho com Java
codebits
 
mini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptxmini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptx
tusharpawar803067
 
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Deltares
 

Similar to A Random Forest using a Multi-valued Decision Diagram on an FPGa (20)

MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
 
realtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptxrealtime_ai_systems_academia.pptx
realtime_ai_systems_academia.pptx
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
Real Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth SensorsReal Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth Sensors
 
Alto Desempenho com Java
Alto Desempenho com JavaAlto Desempenho com Java
Alto Desempenho com Java
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshop
 
mini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptxmini project_shortest path visualizer.pptx
mini project_shortest path visualizer.pptx
 
Continuous and Parallel LiDAR Point-cloud Clustering
Continuous and Parallel LiDAR Point-cloud ClusteringContinuous and Parallel LiDAR Point-cloud Clustering
Continuous and Parallel LiDAR Point-cloud Clustering
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
Dsd int 2014 - data science symposium - application 1 - point clouds, prof. p...
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
 
Artificial intelligence at the edge
Artificial intelligence at the edgeArtificial intelligence at the edge
Artificial intelligence at the edge
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
 
04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers
 

More from Hiroki Nakahara (6)

ROS User Group Meeting #28 マルチ深層学習とROS
ROS User Group Meeting #28 マルチ深層学習とROSROS User Group Meeting #28 マルチ深層学習とROS
ROS User Group Meeting #28 マルチ深層学習とROS
 
FPGAX2019
FPGAX2019FPGAX2019
FPGAX2019
 
SBRA2018講演資料
SBRA2018講演資料SBRA2018講演資料
SBRA2018講演資料
 
DSF2018講演スライド
DSF2018講演スライドDSF2018講演スライド
DSF2018講演スライド
 
Verilog-HDL Tutorial (8)
Verilog-HDL Tutorial (8)Verilog-HDL Tutorial (8)
Verilog-HDL Tutorial (8)
 
Verilog-HDL Tutorial (7)
Verilog-HDL Tutorial (7)Verilog-HDL Tutorial (7)
Verilog-HDL Tutorial (7)
 

Recently uploaded

Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

A Random Forest using a Multi-valued Decision Diagram on an FPGa

  • 1. A Random Forest using a Multi-valued Decision Diagram on an FPGA 1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato, 2Tsutomu Sasao 1Tokyo Institute of Technology, JP, 2Meiji University, JP May, 22nd, 2017 @ISMVL2017
  • 2. Outline • Background • Random forest (RF) • Multi-valued decision diagram (MDD) • RF using MDDs • Experimental results • Conclusion 2
  • 3. Machine Learning 3 Much computation power, and Big data (Left): “Single-Threaded Integer Performance,” 2016 (Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014
  • 5. Introduction • Random Forest (RF) • Ensemble learning method • Consists of multiple decision trees (DTs) • Applications: Segmentation, human pose detection • It is based on binary DTs (BDTs) • A node is evaluated by an if-then-else statement • The same variable may appear several times • Multiple-valued decision diagram (MDD) • Each variable appears only once on a path 5
  • 6. Introduction (Contʼd) • Target platform • CPU: Too slow • GPU: Not suitable to the RF → slow, and consumes much power • FPGA: Faster, low power, long TAT • High-level synthesis (HLS) for the RF using MDDs on an FPGA • Low power, high performance, short design time 6
  • 8. Classification by a Binary Decision Tree (BDT) • Partition of the feature map 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X1 X2 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 8
  • 9. Training of a BDT • It is built by randomized samples • Recursively partition the dataset to maximize its entropy → The same variables may appear 9 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X1 X2 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1
  • 10. Random Forest (RF) • Ensemble learning • Classification and regression • Consists of multiple BDT 10 Tree 1 Tree 2 Tree n C1 C2 C1 Voter C1 (Class) InputX1<0.53? X3<0.71? X2<0.63? X2<0.63? X3<0.72? Y N N NN NY Y Y Y C1 C1C2 C1C3 C1 Tree 1 Binary Decision Tree (BDT) Random Forest ...
  • 11. Applications • Key point matching [Lepetit et al., 2006] • Object detector [Shotton et al., 2008][Gall et al., 2011] • Hand written character recognition [Amit&Geman, 1997] • Visual word clustering [Moosmann et al.,2006] • Pose recognition [Yamashita et al., 2010] • Human detector [Mitsui et al., 2011] [Dahang et al., 2012] • Human pose estimation [Shotton 2011] 11
  • 12. Known Problem • Build BDTs from randomized samples • The same variable may appear on a path • Tend to be slow, even if we use the GPUs 12 X2<0.53? X2<0.29? X2<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 if X2 < 0.09 then output C1; else goto Child_node;
  • 14. 14 Binary Decision Diagram (BDD) • Recursively apply Shannon expansion to a given logic function • Non-terminal node: If-then-else statement • Terminal node: Set functional value 0 1 x1 x2 x3 x4 x5 x6 Non‐terminal node Terminal node
  • 15. 15 Measurement of BDD Memory size: # of nodes size of a node Worst case performance: LPL (Longest Path Length) →Dedicated fully pipeline hardware 0 1 x1 x2 x3 x4 x5 x6 
  • 16. 16 Multi-Valued Decision Diagram (MDD) • MDD(k): 2k outgoing edges • Evaluates k variables at a time 0 1 x1 x2 x3 x4 x5 x6 BDD 0 1 X3 X2 X1 {x5,x6} {x3,x4} {x1,x2} MDD(2)
  • 17. Comparison the BDT with the MDD 17 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 X2 X1 X1 C1 C2 <0.29 <0.53 <1.00 <1.00 <0.71 <0.71 <1.00 <0.63 BDT MDD
  • 18. # of Nodes 18 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X2 X1 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X2 X1 BDT MDD
  • 19. Complexities of the BDT and the MDD 19 # Nodes LPL BDT O(Σ|Xi|) O(Σ|Xi|) MDD O(|Xi|k) O(n) The RF prefers shallow decision trees for avoid  the overfitting
  • 20. Random Forest using MDDs on an FPGA 20
  • 21. FPGA (Field Programmable Gate Array) • Reconfigurable architecture • Look-up Table (LUT) • Configurable channel • Advantages • Faster than CPU • Dissipate lower power than GPU • Short time design than ASIC 21
  • 22. Fully Pipeline Circuit Tree 1 Tree 2 Tree b C1 C2 C1 Voter C1 X (Input) ... 22
  • 24. System Design Tool 24 ① ② ④ ③ 1. Behavior design + pragmas 2. Profile analysis 3. IP core generation by HLS 4. Bitstream generation by FPGA CAD tool 5. Middle ware generation ↓ Automatically done
  • 25. Proposed Tool Flow Training Dataset scikit‐learn Hyper Parameter (by Grid‐ search) Random Forest Host Code Kernel Code aocx Binary Host PC FPGA Board aoc gcc RF2AOC 25 scikit‐learn Intel SDK for OpenCL
  • 27. Comparison the MDD based with the BDT based 27 BDT MDD Name Path len. (Peform.) #Nodes (Mem.) Max. Path Path len. (Peform.) #Nodes (Mem.) Dermatology 720 676 15 322 118336 Contraceptive  Method 600 1055 9 198 7360 Glass  Identification 952 1260 10 268 17204 Hayes‐Roth 480 577 5 73 448 Hepatitis 720 1040 15 357 145664 Ionosphere 1196 1077 20 381 671744 Iris 1056 777 4 199 517 Dataset: UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets.html
  • 28. Comparison of Platforms • Implemented RF following devices • CPU: Intel Core i7 650 • GPU: NVIDIA GeForce GTX Titan • FPGA: Terasic DE5-NET • Measure dynamic power including the host PC • Test bench: 10,000 random vectors • Execution time including communication time between the host PC and devices 28 GPU FPGA
  • 29. Comparison of Platforms 29 GPU@86W GeForce Titan CPU@13W Xeon (R) E5607 FPGA@15W Stratix V A7 Name LPS LPS/W LPS LPS/W LPS LPS/W Dermatology 336.2 3.9 211.6 16.3 3221.2 214.7 Contraceptive  Method 521.9 6.1 286.4 22.0 10924.3 728.3 Glass  Identification 726.7 8.5 587.5 45.2 6442.3 429.5 Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0 Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3 Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2 Iris 446.6 5.2 436.7 33.6 4831.7 322.1 LPS: #Looks Per Second
  • 30. Conclusion • Proposed the RF using MDDs • Reduced the path length • Increased the column multiplicity • # of nodes: O(|X|k) • The shallow decision diagram is recommended to avoid the overfitting • Developed the high-level synthesis design flow toward the FPGA realization • 10.7x faster than the GPU • 14.0x faster than the CPU 30