Automatic mapping, partitioning and scheduling for hardware acceleration on FPGAs

1
DIPARTIMENTO DI ELETTRONICA,
INFORMAZIONE E BIOINGEGNERIA
AMPS
Automatic Mapping, Partitioning and Scheduling
for hardware acceleration on FPGAs
Mirko Salaris: mirko.salaris@mail.polimi.it
Marco Rabozzi: marco.rabozzi@polimi.it
May 17-31, 2019
NGCX at San Francisco

2
Steps for software acceleration on FPGA
• Software profiling
• Identification of candidate hardware functions
• Design Space Exploration of hardware functions
implementations
• Choice of the function to implement in hardware

3
✓ Software profiling
✓ Identification of candidate hardware functions
✓ Design Space Exploration of hardware functions
implementations
✓ Choice of the function to implement in hardware
Already supported by CAOS
1
[1] CAOS: CAD as an Adaptive OpenPlatform Service, http://caos.necst.it

4
implementations
✓ Choice of the function to implement in hardware
What about the acceleration
of multiple functions?

5
implementations
• Hardware functions implementations selection

6
implementations
• Hardware functions implementations selection
• Hardware functions partitioning
into one or more bitstreams
• Scheduling of the FPGA
reconfigurations

7
AMPS – High Level Overview

8
AMPS – Profiling
Function Self Time % Total Time %
funA 98.71% 32.26%
funB 92.65% 12.83%
funC 89.26% 27.98%
funD 94.37% 9.41%
funE 2.73% 68.52%
… … …

9
AMPS – Profiling
funA 98.71% 32.26%
funB 92.65% 12.83%
funC 89.26% 27.98%
funD 94.37% 9.41%
funE 2.73% 68.52%
… … …

10
AMPS – Call Trace Analysis
The list of function calls, in order

11
funE
funB
funD
funA
funA
funA
funA
funA
funF
funG
funG
funG
funE
funB
funD
funB
funD
funF
funG
funH
funC
funE
funA
funA
funA
funF
funG
funG
funH
funC
funF
funG
funG
funE
funA
funA
funA
funB
funD
funB
funD
funB
funF
funG
funF
funG
funH
funC
funE
funA
funA
funA

12
funE
funB
funD
funA
funA
funA
funA
funA
funF
funG
funG
funG
funE
funB
funD
funB
funD
funF
funG
funH
funC
funE
funA
funA
funA
funF
funG
funG
funH
funC
funF
funG
funG
funE
funA
funA
funA
funB
funD
funB
funD
funB
funF
funG
funF
funG
funH
funC
funE
funA
funA
funA
NOT synthesizable in
hardware:
funE, funF, funG, funH

13
funB
funD
funA
funA
funA
funA
funA
funB
funD
funB
funD
funC
funA
funA
funA
funC
funA
funA
funA
funB
funD
funB
funD
funB
funC
funA
funA
funA

14
funB
funD
funA
funA
funA
funA
funA
funB
funD
funB
funD
funC
funA
funA
funA
funC
funA
funA
funA
funB
funD
funB
funD
funB
funC
funA
funA
funA
funA is always called in
blocks of multiples calls

15
funB
funD
funA
funA
funA
funA
funA
funB
funD
funB
funD
funC
funA
funA
funA
funC
funA
funA
funA
funB
funD
funB
funD
funB
funC
funA
funA
funA
funB and funD are always
called in quick succession
and in an alternate
fashion

16
funB
funD
funA
funA
funA
funA
funA
funB
funD
funB
funD
funC
funA
funA
funA
funC
funA
funA
funA
funB
funD
funB
funD
funB
funC
funA
funA
funA
and in an alternate
fashion
Other patterns?

17
AMPS – DSE
Function Implementation Performance Resources
function_1 F1.impl_1 Execution time: 14.68s
Clock Frequency: 200MHz
BRAM_18K: 1523 (35%)
FF: 1211 (0.05%)
LUT: 2211 (0.19%)
[…]
F1.impl_2 Execution time: 12.47s
Clock Frequency: 220MHz
BRAM_18K: 3 (0.07%)
FF: 1274 (0.05%)
LUT: 1937 (0.16%)
[…]
function_2 F2.impl_1 […] […]
Automated Design Space Exploration and Roofline Analysis for FPGA-based HLS Applications
Marco Siracusa, Marco Rabozzi, Lorenzo di Tucci, Marco Domenico Santambrogio

19
Partitioning, Mapping
and Scheduling

20
and Scheduling

21
Is this the best?
and Scheduling

22
funA 98.71% 32.26%
funC 89.26% 27.98%
and Scheduling

23
funA 98.71% 32.26%
funC 89.26% 27.98%
funC is called few times
and Scheduling

24
funA 98.71% 32.26%
funC 89.26% 27.98%
and Scheduling

25
funA 98.71% 32.26%
funC 89.26% 27.98%
and in an alternate
fashion
and Scheduling

26
funA 98.71% 32.26%
funC 89.26% 27.98%
and in an alternate
fashion
and Scheduling

27
funA 98.71% 32.26%
funC 89.26% 27.98%
and in an alternate
fashion
Is this the best?
and Scheduling

28
Conclusions
• The concurrent acceleration of multiple functions requires
multiple steps
• There is no easy way to decouple these steps while still
guaranteeing optimality
Future works:
• Validate the proposed flow on a set of real applications
• Integrate this flow into CAOS

29
DIPARTIMENTO DI ELETTRONICA,
INFORMAZIONE E BIOINGEGNERIA
Mirko Salaris: mirko.salaris@mail.polimi.it
Marco Rabozzi: marco.rabozzi@polimi.it
Thank You

Automatic mapping, partitioning and scheduling for hardware acceleration on FPGAs

Recommended

Recommended

More Related Content

Similar to Automatic mapping, partitioning and scheduling for hardware acceleration on FPGAs

Similar to Automatic mapping, partitioning and scheduling for hardware acceleration on FPGAs (20)

More from NECST Lab @ Politecnico di Milano

More from NECST Lab @ Politecnico di Milano (20)

Recently uploaded

Recently uploaded (20)

Automatic mapping, partitioning and scheduling for hardware acceleration on FPGAs