SlideShare a Scribd company logo
1 of 32
Download to read offline
Automatic generation of hardware
memory architectures for HPC
Christian Pilato
Assistant Professor
christian.pilato@polimi.it
Universidad Complutense de Madrid – April 21, 2022
©Christian Pilato, 2022 2
Assistant Professor (RTD-B - Ricercatore a Tempo Determinato Senior)
About Me
PhD Student
2008-2011
Research Assistant
2011-2013
Postdoc Research Scientist
2013-2016
Postdoc
Research Assistant
2016-2018
Assistant Prof.
2018-now
R&D
Internship
6 months
Visiting
Researcher
3 months
Visiting
Researcher
4 months
Visiting
Researcher
9 months
FP6 HARTES
FP7 SYNAPTIC
FP7 FASTER DARPA PERFECT
SRC CFAR
H2020 CERBERO
DARPA CRAFT
R&D Projects
Office: DEIB (Building 20 - 1st floor – Room 029) Website: http://pilato.faculty.polimi.it
https://github.com/sld-columbia/esp
H2020
EVEREST
©Christian Pilato, 2022 3
©EVEREST Consortium, 2020-2021
H2020 project funded under the call – "Big Data technologies and extreme-
scale analytics" [Kick-off on Oct 1, 2020] – http://www.everest-h2020.eu
Project Coordinator: Christoph Hagleitner, IBM Research Europe, Zurich, Switzerland
Scientific Coordinator: Christian Pilato, Politecnico di Milano, Italy
Key idea: a coordinated action with the appropriate technology areas (e.g., AI, analytics,
software engineering, HPC, Cloud technologies, IoT and edge/fog/ubiquitous computing) è Computing
continuum to enable cloud-to-edge integration
system engineering/tools to contribute to the co-design of federated/distributed systems è
EVEREST system development kit
EVEREST: Big Data Analytics on FPGA
We can use AI/ML to optimize data analytics and knowledge
extraction, but this is not strictly an AI/ML-related project!
standardized interconnection methods
architectures for collecting, managing and exploiting
data security
hardware acceleration
runtime management
domain-specific
extensions
virtualization
©Christian Pilato, 2022 4
EVEREST Partners
IBM Reseach Lab, Zurich (Switzerland)
Project administration, prototype of the target system
PI: Christoph Hagleitner
Politecnico di Milano (Italy)
Scientific coordination, high-level synthesis, flexible memory
managers, autotuning
PI: Christian Pilato
Università della Svizzera italiana (Switzerland)
Data security requirements and protection techniques
PI: Francesco Regazzoni
TU Dresden (Germany)
Domain-specific extensions, code optimizations and variants
PI: Jeronimo Castrillon
Centro Internazionale di Monitoraggio Ambientale (Italy)
Weather prediction models
PI: Antonio Parodi
IT4Innovations (Czech Republic)
Exploitation leaders, large HPC infrastructure, workflow
libraries
PI: Katerina Slaninova
Virtual Open Systems (France)
Virtualization techniques, runtime extensions to manage
heterogeneous resources
PI: Michele Paolino
Duferco Energia (Italy)
Application for prediction of renewable energies
PI: Lorenzo Pittaluga
Numtech (France)
Application for monitoring the air quality of industrial sites
PI: Fabien Brocheton
Sygic A/S (Slovakia)
Application for intelligent transportation in smart cities
PI: Radim Cmar
©EVEREST Consortium, 2020-2021
©Christian Pilato, 2022 5
Three use cases provided by the application partners
Looking for hardware acceleration (intense data computation) with efficient and secure data
management (distributed data sources)
Possibility of AI/ML-based decision making
Combination of the tasks in different pipelines
Application Concepts
Weather-based prediction of
renewable energy production
Air-quality monitoring in industrial sites
Traffic modelling for intelligent
transportation
©EVEREST Consortium, 2020-2021
©Christian Pilato, 2022 6
EVEREST Target System
Target prototype based on IBM products and internal projects
• Combination of CPU-managed systems with tightly-coupled bus-attached FPGAs and FPGA-
disaggregated systems with loosely-coupled network-attached FPGAs
Other platforms (e.g., from IT4I and NUM or existing SoCs)
• Combination of devices and architectures: Xilinx Alveo, cloudFPGA, Columbia ESP, …
Seamless integration of
additional EVEREST nodes
Standard communication
formats (e.g., OpenCAPI)
Virtualization of resources
Efficient communication
libraries with clear API
FPGA device
FPGA device
FPGA device
Acc
HW
Mem
Mgr. OCAPI
Ctrl.
I/O
MEM
MEM
DRAM
HBM
HBM
FPGA device
Acc
HW Mem
Mgr.
NC
I/O
MEM
MEM
DRAM
HBM
HBM
POWER9
I/O MEM
MEM
DRAM
HBM
CPU
HBM
On-chip Acc
OpenCAPI
25
Gb/s
x8
100
Gb/s x2
DC Network
EVEREST Heterogeneous Node
POWER9 CPU with bus-attached FPGA
EVEREST Node
Network-attached FPGA
OCAPI
TCP/UDP
EVEREST
Node
CPU, GPU,
FPGA, …
up to 4x per 2U node
(Wistron Mihawk)
up to 64x per 2U node
(cloudFPGA)
TCP/UDP
Acc
10
Gb/s
Dual-port Mellanox
ConnectX-5 100G
NC
TCP/UDP
10
Gb/s
©EVEREST Consortium, 2020-2021
©Christian Pilato, 2022 7
EVEREST Programming Environment
1. Compilation Environment: analyzes
application and creates all "variants"
based on architecture abstraction and
application/data requirements
• Exploring unified IR framework (e.g., MLIR)
• Integration of non-functional properties
with domain-specific extensions
• Hardware acceleration and High-level
synthesis (Bambu, Vivado HLS)
EVEREST Runtime Environment
Unified IR
framework
Implemented with high-level
abstractions, e.g., in MLIR
Middle-
end Opt-IR/
C-code
SW HW
Multi-variant and optimized IR with
SW/HW components (memory managers)
Meta-data/Info: HW
interfaces, variants info
Front-end
Backend
Implementation (SYCL, C, HDL,
meta-data, EVEREST APIs)
SW-optimization HW-optimization
HW-info
Standard
compilers
Bin/bit-
stream
Use case description, e.g., Short-time
prediction in traffic simulations
Application high-level
dataflow
ML-Kernel
Simulation
kernel
auto A = Matrix(m, n),
B = Matrix(m, n),
C = Matrix(m, n);
auto u = Tensor<3>
(n, n, n);
auto v = (A*B*C)(u);
Kernel DSL-spec, e.g., using
C++ syntax from [RINK19]
Possibility of using different
(ML) frameworks
Interoperability with
different HLS tools
Standard IR format and
exchange files
Novel domain-specific
extensions (format)
System and resource description (format)
©EVEREST Consortium, 2020-2021
©Christian Pilato, 2022 8
EVEREST Programming Environment
Autotuning API
Runtime API
Seamless execution when varying
the system configuration
(resources, nodes, data, etc.)
Hiding communication latency
(e.g., prefetching)
How to collect system status and
expose it to the runtime?
2. Runtime Environment: implements
the selection of "variants" and the
hardware configuration based
on the system status
• Dynamic adaptation and autotuning
(mARGOt)
• Two-level runtime for (1) virtualization of
hardware resources regardless their distribution
and the low-level details of the platforms; (2) implement
functional decisions (VOSYS solutions, mARGOt, HyperLoom)
©EVEREST Consortium, 2020-2021
©Christian Pilato, 2022 9
Hardware Compilation Flow
Annotated C code
/ LLVM IR / MLIR
HLS
(Vitis/Bambu)
Arch. Info
Mem. Gen.
(Mnemosyne)
IP config.
System Integration
(Olympus)
DSL
Src-to-Src (MLIR)
Compiler+DSE
Security/data
requirements
Mem. Info
Security/data
requirements
Memory
access patterns
IP requirements
Synthesis Tools
©Christian Pilato, 2022 10
From DSL to Bitstream
kernel_body
PLM
void kernel_body(double S[11][11], double D[11][11][11], double u[11][11][11],
double v[11][11][11],
double t[11][11][11], double r[11][11][11], double t1[11][11][11],
double t3[11][11][11], double t0[11][11][11], double t2[11][11][11])
kernel_body
ctrl S D u v
t r t1 t3 t0 t2
CE0 A0 Q0
kernel_body
PLM
CE1 A1 D1 WE1
...
...
Read port Write port
S
D
r
u
v t3
t1 t0
t2
t
©Christian Pilato, 2022 11
High-Level Synthesis (HLS) to create the accelerator logic
• Definition of memory-related parameters
(e.g. number of process interfaces)
Generation of specialized PLMs
• Technology-related optimizations
• Possibility of system-level optimizations
across accelerators
PLM Customization for Heterogeneous SoCs
Accelerator Tile
DMA
Ctrl
Load
Compute 1
Store
Compute n
kernel()
Private Local Memory
PLM ports
ping-pong buffer
read
write
circular buffer
1
2
3
4
5
6
…
1 2
in
out
Accelerator Logic
Memory Library
PLM
Generation
High-Level
Synthesis
Data
Structures
High-Level
Description
(C/C++/SystemC)
©Christian Pilato, 2022 12
System-level methodology for PLM customization
PLM Customization
Data structures, access
patterns, …
HLS optimizations, number of
memory interfaces, …
Memory IPs, multi-bank
architectures, …
SystemC
SystemC + RTL
RTL
Designer
HLS tool Optimizations to reduce memory cost
Flexible memory controller to coordinate
memory accesses
Data Access
Requirements
Memory
Library
PLM Generation
PLM architecture
(RTL)
Automatic Generation
Data
Structures
PLM Generation
Performance optimization: HLS defines how the accelerator logic accesses the
data structures (e.g. number of parallel accesses)
Cost optimization: PLM Customization defines the best PLM microarchitecture
to achieve the desired performance (e.g. number of banks, data allocation)
©Christian Pilato, 2022 13
Generally we can use one PLM unit (eventually composed of many
banks) for each data structure
“Two data structures are compatible if they can be
allocated to the same PLM unit (memory IPs)”
A common case: accelerators never executed at the same time
• Possible only at system-level, when integrating the components
• Optimizations of accelerator logic and memory subsystem are independent
Reuse What is not Used
Reuse the same memory IPs
for several data structures
©Christian Pilato, 2022 14
Accelerator(s) memory subsystem is defined during SoC integration
• Possibility for more optimizations
Optimization only at the System-Level
Logic
PLM
IP DESIGN
Logic
PLM
IP DESIGN
SOC INTEGRATION
Accelerator
Design (SystemC)
Algorithm
Design (C/C++)
Accelerator
Design (SystemC)
Algorithm
Design (C/C++)
Accelerator
Design (SystemC)
Algorithm
Design (C/C++)
Logic
IP DESIGN
Accelerator
Design (SystemC)
Algorithm
Design (C/C++)
SOC INTEGRATION
Memory Subsystem Design
Logic
IP DESIGN
Mem
Reqs
Mem
Reqs
Component-based Approach System-Level Approach
©Christian Pilato, 2022 15
PLM Optimization for Multiple Accelerators
HLS and DSE
Accelerator Design1
(SystemC)
Accelerator Logic1
(Verilog)
Memory
Requirements1 HLS and DSE
Accelerator Designk
(SystemC)
Accelerator Logick
(Verilog)
Memory
Requirementsk
Compatibility
Information
Memory
IPs
Technology-unaware
Transformations1
Local Tech-aware
Transformations1
Memory
Subsystem
(Verilog)
Global Technology-aware Transformations
1
1
2
3
4
MNEMOSYNE Technology-unaware
Transformationsk
2
Local Tech-aware
Transformationsk
3
Generation of RTL Architecture
5
…
©Christian Pilato, 2022 16
Let us assume to have the two following data structures that are never
alive at the same time
• A[1024] with data duplication over 4 parallel banks
• B[4096] with data distribution over 2 parallel banks
Address-Space Compatibility
A0 A1 A2 A3
+
B0 B1
Memory footprint: 4x1024x32
+ 2x2048x32 = 254,485.68 um2
A0 A1 A2 A3
A0 A2
A1 A3
Reused to store B by
putting banks in “series” to
virtually increase capacity
Memory footprint: 4x1024x32
= 140,426.46 um2 (-44.8%)
©Christian Pilato, 2022 17
A classical example is the ping-pong buffer (two 2048x16 arrays – A0/A1)
• When process P writes A0 (A1), it never writes A1 (A0)
• When process C reads from A0 (A1), it never reads from A1 (A0)
Memory-Interface Compatibility
if (ping)
A0[i] = …
else
A1[i] = …
if (ping)
… = f(A0[i])
else
… = f(A1[i])
μ-architectural optimizations
P C
P C
memory controller
valid
ready
A0 A1
A0
(odd)
A1
(even)
A0 A1
A0
(even)
A1
(odd)
Memory footprint: 4x1024x32 = 140,426 um2
P C
P C
memory controller
valid
ready
A0 A1
A0
(even)
A1
(even)
A0
(odd)
A1
(odd)
A0 A1
Merged in the same IP, but in
a different memory space
Memory footprint: 2x2048x32 = 114,059.2 um2
Area reduced by 18% without any
performance overhead!
©Christian Pilato, 2022 18
Graph to represent the possibilities for optimizing the data structures
• Each node represents a data structure to be allocated, annotated with its data
footprint (after data allocation)
• Each edge represents compatibility between the two data structures
Memory Compatibility Graph (MCG)
A0
2x1024x32
A1
2x1024x32
B0
1x2048x32
a
a
b
a) Address-space compatibility: the
data structures are compatible and
can use the same memory IPs
b) Memory-interface compatibility:
the ports are never accessed at the
same time and the data structures
can stay in the same memory IP
©Christian Pilato, 2022 19
“A clique is a subset of the vertices of the memory
compatibility graph such that every two vertices are
connected by an edge”
Clique Definition
A0
2x1024x32
A1
2x1024x32
B0
2048x32
a
a
b
A0
2x1024x32
A1
2x1024x32
B0
2048x32
a
We need two distinct configurations!
{A0,B0} and {A1} or {A1,B0} and {A0}?
a
A clique represents a set of
data structures that can
share the same memory IPs
©Christian Pilato, 2022 20
Memory Cost Minimization
To determine how to partition the MCG such that the total memory cost is minimized
Clique Characterization
To determine the memory architecture of all cliques and their memory cost
Clique Enumeration
To define the list of admissible cliques in the MCG
How to Determine the Memory Subsystem
©Christian Pilato, 2022 21
A lightweight PLM controller is created for each compatibility set
(clique) based on the bank configuration
• Accelerator logic is not aware of the actual memory organization
• Array offsets need to be translated into proper memory addresses
PLM Controller Generation
Clique Configuration
B0 B1 B2 B3
PLM Controller
Custom logic with negligible overhead, especially when
the number of banks and their size is a power of two
0x0 0x1
0x0
0x1
0x0
0x1
ATU ATU ATU ATU
CE	
WE	
A	
D	
Q	
CE	
WE	
A	
D	
Q	
0x00
0x01
0x02
0x03
…
CE	
WE	
A	
D	
Q	
CE	
WE	
A	
D	
Q	
ATU ATU
ATU ATU
CE	
WE	
A	
D	
CE	
A	
Q	
0x00
0x01
0x02
0x03
…
0x00
0x01
0x02
0x03
…
0x00
0x01
0x02
0x03
…
…	
1 0010 1
100101
©Christian Pilato, 2022 22
Industrial 32nm CMOS
technology
• Memory library with 18 SRAMs
Impact of Optimizations
0
0.2
0.4
0.6
0.8
1.0
Compatibility Coloring Final
Normalized
area
of
the
PLM
Sort
FFT1D
FFT2D
Debayer
Lucas Kanade
Change Detection
Interpolation 1
Interpolation 2
Backprojection
Disparity
PCA
RBM
SRR
0
0.2
0.4
0.6
0.8
1.0
Compatibility Coloring Final
Normalized
area
of
the
PLM
Sort
FFT1D
FFT2D
Debayer
Lucas Kanade
Change Detection
Interpolation 1
Interpolation 2
Backprojection
Disparity
PCA
RBM
SRR
Xilinx Virtex-7 FPGA
• Memory library with 6
BRAM configurations
©Christian Pilato, 2022 23
Bram
Ctrl
PLM0 ACC0
Ctrl
ctrl
Creation of Parallel Architectures
PLM0
Ctrl
PLMm-1
…
ACC0
ctrl
Batch
Bram
Ctrl
A[MSBs]
PLM0
Ctrl
PLMm-1
…
ACCk-1
ctrl
ACC0
ctrl
A[MSBs]
…
Bram
Ctrl
©Christian Pilato, 2022 24
• Xilinx Zynq UltraScale+ MPSoC ZCU106 board
• CFD simulation of 50,000 elements
• Memory sharing allows us to fit more
kernels
Preliminary Evaluation
©Christian Pilato, 2022 25
DSL for representing the kernel
Moving to a system-level representation
• Simple example for a massively parallel architecture:
LOOP ~ KERNEL(S, D, u, v)
Possibility to decide the memory layout and configure DMA/prefetchers
based on the target architecture/platform
Next Step: System-Level DSL
©Christian Pilato, 2022 26
We are building a compilation flow based on
LLVM MLIR for automatic specialization:
• MLIR Input – From DSL descriptions of the system
functionality
• Data Organization – Determine which data resides
off chip (also based on user/compiler annotations)
• Layout – Reorganize communication to exploit local
memories (cache/PLM)
• Communication – Configure prefetcher to hide
transfer latency
• Local Partitioning – Determine multi-bank PLM
architecture (Mnemosyne1)
• HLS – Generate computation part (interfacing with
existing HLS tools, e.g., open-source Vitis HLS
frontend)
• HDL Output – Automated code generation and
system-level integration based on the target platform
Next Step: Let’s Put Memory First
S. Soldavini and C. Pilato. "Compiler Infrastructure for Specializing
Domain-Specific Memory Templates" LATTE’21
External Memory
Kernel
Logic to Resolve Addr
and Reduce Delay
Cache
DMA
Prefetcher
Multi-Channel
Controller
DRAM
HBM
Remote
PLM PLM
Multi port
(based on access
patterns)
Data Org Layout Communication
Local
Partitioning
Kernel Gen
System-Level
Description
HDL
Intelligent Memory Logic
(Latency Insensitive)
Direct Access Memory
(Fixed Latency)
©Christian Pilato, 2022 27
We are developing a complete hardware architecture generation flow
based on MLIR description of the system functionality
Platform-specific description
• HBM-based Xilinx Alveo
• IBM CloudFPGA
• …
Host code generation
• Based on platform libraries
to be developed for the specific
target
Olympus – Automated System-Level Integration
©Christian Pilato, 2022 28
• Determines the system-level architectures based on:
• Algorithm parallelism
• Characteristics of the target platform(s)
• Interfaces of the modules (HLS tools)
• Produces
• Synthesizable C++ code that includes:
• Accelerators and PLM generated with HLS
• Communication modules to match interfaces
• Standard AXI interfaces to the system (either cloudFPGA SHELL or HBM channels)
• May include “intelligent” policies to coordinate data transfers
• System configuration file to create the overall architecture
• Support for multiple computing units executing in parallel
• Interfacing with Xilinx HLS and synthesis tools
Olympus – System generation flow
©Christian Pilato, 2022 29
Automatic integration of memory optimizations for high-performance
data transfers, such as:
• Double buffering to hide latency of host-FPGA data transfers
• Bus optimization (and data interleaving) for maximizing bandwidth (e.g., 256-bit
AXI channels) – algorithms for efficient data layout on the bus
• Dataflow execution model to enable kernel pipelining – automatic (pre-HLS) code
transformations
From MLIR to System Architecture
Double buffering Bus optimization Dataflow
©Christian Pilato, 2022 30
Results on HBM FPGA
Best performance: 103 GOPS
(118x faster than our "starting point")
Results are 6x better than HPC ones
(~25x more energy efficient)
Possibility of integrating custom
data formats and configure memories
and data transfers accordingly
©Christian Pilato, 2022 31
Data management optimizations are becoming the key element for the
creation of efficient FPGA architectures
HLS is now used not only to create accelerator kernels but also to
generate the system-level architecture
• Portable solutions across multiple target platforms
Novel HBM architectures offer high bandwidth (that’s why are called
high-bandwidth memory architectures… J) but their design is complex:
• Necessary to match application requirements and technology characteristics
Conclusions
Thank you!
Christian Pilato, christian.pilato@polimi.it
This project has received funding from the European
Union’s Horizon 2020 research and innovation
programme under grant agreement No 957269
Work done in collaboration with Stephanie Soldavini (Politecnico di
Milano), Jeronimo Castrillon (TU Dresden), Karl F. A. Friebel (TU
Dresden), and Gerald Hempel (TU Dresden)

More Related Content

Similar to Automatic generation of hardware memory architectures for HPC

Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
Amruth_Kumar_Juturu_Resume
Amruth_Kumar_Juturu_ResumeAmruth_Kumar_Juturu_Resume
Amruth_Kumar_Juturu_ResumeAmruth Kumar
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationVEDLIoT Project
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
IoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM InformixIoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM InformixPradeep Muthalpuredathe
 
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...Design World
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorialcybercbm
 
IoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentationIoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentationVEDLIoT Project
 
Industrial Pioneers Days - Machine Learning
Industrial Pioneers Days - Machine LearningIndustrial Pioneers Days - Machine Learning
Industrial Pioneers Days - Machine LearningVEDLIoT Project
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQBuilding a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQDominik Obermaier
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
 

Similar to Automatic generation of hardware memory architectures for HPC (20)

Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
Par com
Par comPar com
Par com
 
Amruth_Kumar_Juturu_Resume
Amruth_Kumar_Juturu_ResumeAmruth_Kumar_Juturu_Resume
Amruth_Kumar_Juturu_Resume
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
IoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM InformixIoT Analytics from Edge to Cloud - using IBM Informix
IoT Analytics from Edge to Cloud - using IBM Informix
 
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 
Smartblitzmerker
SmartblitzmerkerSmartblitzmerker
Smartblitzmerker
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Cluster Tutorial
Cluster TutorialCluster Tutorial
Cluster Tutorial
 
IoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentationIoT Week 2021_Jens Hagemeyer presentation
IoT Week 2021_Jens Hagemeyer presentation
 
Industrial Pioneers Days - Machine Learning
Industrial Pioneers Days - Machine LearningIndustrial Pioneers Days - Machine Learning
Industrial Pioneers Days - Machine Learning
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQBuilding a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 

More from Facultad de Informática UCM

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?Facultad de Informática UCM
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...Facultad de Informática UCM
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersFacultad de Informática UCM
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmFacultad de Informática UCM
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingFacultad de Informática UCM
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroFacultad de Informática UCM
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...Facultad de Informática UCM
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Facultad de Informática UCM
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFacultad de Informática UCM
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoFacultad de Informática UCM
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Facultad de Informática UCM
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Facultad de Informática UCM
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Facultad de Informática UCM
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windFacultad de Informática UCM
 
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Facultad de Informática UCM
 

More from Facultad de Informática UCM (20)

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
 

Recently uploaded

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 

Recently uploaded (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 

Automatic generation of hardware memory architectures for HPC

  • 1. Automatic generation of hardware memory architectures for HPC Christian Pilato Assistant Professor christian.pilato@polimi.it Universidad Complutense de Madrid – April 21, 2022
  • 2. ©Christian Pilato, 2022 2 Assistant Professor (RTD-B - Ricercatore a Tempo Determinato Senior) About Me PhD Student 2008-2011 Research Assistant 2011-2013 Postdoc Research Scientist 2013-2016 Postdoc Research Assistant 2016-2018 Assistant Prof. 2018-now R&D Internship 6 months Visiting Researcher 3 months Visiting Researcher 4 months Visiting Researcher 9 months FP6 HARTES FP7 SYNAPTIC FP7 FASTER DARPA PERFECT SRC CFAR H2020 CERBERO DARPA CRAFT R&D Projects Office: DEIB (Building 20 - 1st floor – Room 029) Website: http://pilato.faculty.polimi.it https://github.com/sld-columbia/esp H2020 EVEREST
  • 3. ©Christian Pilato, 2022 3 ©EVEREST Consortium, 2020-2021 H2020 project funded under the call – "Big Data technologies and extreme- scale analytics" [Kick-off on Oct 1, 2020] – http://www.everest-h2020.eu Project Coordinator: Christoph Hagleitner, IBM Research Europe, Zurich, Switzerland Scientific Coordinator: Christian Pilato, Politecnico di Milano, Italy Key idea: a coordinated action with the appropriate technology areas (e.g., AI, analytics, software engineering, HPC, Cloud technologies, IoT and edge/fog/ubiquitous computing) è Computing continuum to enable cloud-to-edge integration system engineering/tools to contribute to the co-design of federated/distributed systems è EVEREST system development kit EVEREST: Big Data Analytics on FPGA We can use AI/ML to optimize data analytics and knowledge extraction, but this is not strictly an AI/ML-related project! standardized interconnection methods architectures for collecting, managing and exploiting data security hardware acceleration runtime management domain-specific extensions virtualization
  • 4. ©Christian Pilato, 2022 4 EVEREST Partners IBM Reseach Lab, Zurich (Switzerland) Project administration, prototype of the target system PI: Christoph Hagleitner Politecnico di Milano (Italy) Scientific coordination, high-level synthesis, flexible memory managers, autotuning PI: Christian Pilato Università della Svizzera italiana (Switzerland) Data security requirements and protection techniques PI: Francesco Regazzoni TU Dresden (Germany) Domain-specific extensions, code optimizations and variants PI: Jeronimo Castrillon Centro Internazionale di Monitoraggio Ambientale (Italy) Weather prediction models PI: Antonio Parodi IT4Innovations (Czech Republic) Exploitation leaders, large HPC infrastructure, workflow libraries PI: Katerina Slaninova Virtual Open Systems (France) Virtualization techniques, runtime extensions to manage heterogeneous resources PI: Michele Paolino Duferco Energia (Italy) Application for prediction of renewable energies PI: Lorenzo Pittaluga Numtech (France) Application for monitoring the air quality of industrial sites PI: Fabien Brocheton Sygic A/S (Slovakia) Application for intelligent transportation in smart cities PI: Radim Cmar ©EVEREST Consortium, 2020-2021
  • 5. ©Christian Pilato, 2022 5 Three use cases provided by the application partners Looking for hardware acceleration (intense data computation) with efficient and secure data management (distributed data sources) Possibility of AI/ML-based decision making Combination of the tasks in different pipelines Application Concepts Weather-based prediction of renewable energy production Air-quality monitoring in industrial sites Traffic modelling for intelligent transportation ©EVEREST Consortium, 2020-2021
  • 6. ©Christian Pilato, 2022 6 EVEREST Target System Target prototype based on IBM products and internal projects • Combination of CPU-managed systems with tightly-coupled bus-attached FPGAs and FPGA- disaggregated systems with loosely-coupled network-attached FPGAs Other platforms (e.g., from IT4I and NUM or existing SoCs) • Combination of devices and architectures: Xilinx Alveo, cloudFPGA, Columbia ESP, … Seamless integration of additional EVEREST nodes Standard communication formats (e.g., OpenCAPI) Virtualization of resources Efficient communication libraries with clear API FPGA device FPGA device FPGA device Acc HW Mem Mgr. OCAPI Ctrl. I/O MEM MEM DRAM HBM HBM FPGA device Acc HW Mem Mgr. NC I/O MEM MEM DRAM HBM HBM POWER9 I/O MEM MEM DRAM HBM CPU HBM On-chip Acc OpenCAPI 25 Gb/s x8 100 Gb/s x2 DC Network EVEREST Heterogeneous Node POWER9 CPU with bus-attached FPGA EVEREST Node Network-attached FPGA OCAPI TCP/UDP EVEREST Node CPU, GPU, FPGA, … up to 4x per 2U node (Wistron Mihawk) up to 64x per 2U node (cloudFPGA) TCP/UDP Acc 10 Gb/s Dual-port Mellanox ConnectX-5 100G NC TCP/UDP 10 Gb/s ©EVEREST Consortium, 2020-2021
  • 7. ©Christian Pilato, 2022 7 EVEREST Programming Environment 1. Compilation Environment: analyzes application and creates all "variants" based on architecture abstraction and application/data requirements • Exploring unified IR framework (e.g., MLIR) • Integration of non-functional properties with domain-specific extensions • Hardware acceleration and High-level synthesis (Bambu, Vivado HLS) EVEREST Runtime Environment Unified IR framework Implemented with high-level abstractions, e.g., in MLIR Middle- end Opt-IR/ C-code SW HW Multi-variant and optimized IR with SW/HW components (memory managers) Meta-data/Info: HW interfaces, variants info Front-end Backend Implementation (SYCL, C, HDL, meta-data, EVEREST APIs) SW-optimization HW-optimization HW-info Standard compilers Bin/bit- stream Use case description, e.g., Short-time prediction in traffic simulations Application high-level dataflow ML-Kernel Simulation kernel auto A = Matrix(m, n), B = Matrix(m, n), C = Matrix(m, n); auto u = Tensor<3> (n, n, n); auto v = (A*B*C)(u); Kernel DSL-spec, e.g., using C++ syntax from [RINK19] Possibility of using different (ML) frameworks Interoperability with different HLS tools Standard IR format and exchange files Novel domain-specific extensions (format) System and resource description (format) ©EVEREST Consortium, 2020-2021
  • 8. ©Christian Pilato, 2022 8 EVEREST Programming Environment Autotuning API Runtime API Seamless execution when varying the system configuration (resources, nodes, data, etc.) Hiding communication latency (e.g., prefetching) How to collect system status and expose it to the runtime? 2. Runtime Environment: implements the selection of "variants" and the hardware configuration based on the system status • Dynamic adaptation and autotuning (mARGOt) • Two-level runtime for (1) virtualization of hardware resources regardless their distribution and the low-level details of the platforms; (2) implement functional decisions (VOSYS solutions, mARGOt, HyperLoom) ©EVEREST Consortium, 2020-2021
  • 9. ©Christian Pilato, 2022 9 Hardware Compilation Flow Annotated C code / LLVM IR / MLIR HLS (Vitis/Bambu) Arch. Info Mem. Gen. (Mnemosyne) IP config. System Integration (Olympus) DSL Src-to-Src (MLIR) Compiler+DSE Security/data requirements Mem. Info Security/data requirements Memory access patterns IP requirements Synthesis Tools
  • 10. ©Christian Pilato, 2022 10 From DSL to Bitstream kernel_body PLM void kernel_body(double S[11][11], double D[11][11][11], double u[11][11][11], double v[11][11][11], double t[11][11][11], double r[11][11][11], double t1[11][11][11], double t3[11][11][11], double t0[11][11][11], double t2[11][11][11]) kernel_body ctrl S D u v t r t1 t3 t0 t2 CE0 A0 Q0 kernel_body PLM CE1 A1 D1 WE1 ... ... Read port Write port S D r u v t3 t1 t0 t2 t
  • 11. ©Christian Pilato, 2022 11 High-Level Synthesis (HLS) to create the accelerator logic • Definition of memory-related parameters (e.g. number of process interfaces) Generation of specialized PLMs • Technology-related optimizations • Possibility of system-level optimizations across accelerators PLM Customization for Heterogeneous SoCs Accelerator Tile DMA Ctrl Load Compute 1 Store Compute n kernel() Private Local Memory PLM ports ping-pong buffer read write circular buffer 1 2 3 4 5 6 … 1 2 in out Accelerator Logic Memory Library PLM Generation High-Level Synthesis Data Structures High-Level Description (C/C++/SystemC)
  • 12. ©Christian Pilato, 2022 12 System-level methodology for PLM customization PLM Customization Data structures, access patterns, … HLS optimizations, number of memory interfaces, … Memory IPs, multi-bank architectures, … SystemC SystemC + RTL RTL Designer HLS tool Optimizations to reduce memory cost Flexible memory controller to coordinate memory accesses Data Access Requirements Memory Library PLM Generation PLM architecture (RTL) Automatic Generation Data Structures PLM Generation Performance optimization: HLS defines how the accelerator logic accesses the data structures (e.g. number of parallel accesses) Cost optimization: PLM Customization defines the best PLM microarchitecture to achieve the desired performance (e.g. number of banks, data allocation)
  • 13. ©Christian Pilato, 2022 13 Generally we can use one PLM unit (eventually composed of many banks) for each data structure “Two data structures are compatible if they can be allocated to the same PLM unit (memory IPs)” A common case: accelerators never executed at the same time • Possible only at system-level, when integrating the components • Optimizations of accelerator logic and memory subsystem are independent Reuse What is not Used Reuse the same memory IPs for several data structures
  • 14. ©Christian Pilato, 2022 14 Accelerator(s) memory subsystem is defined during SoC integration • Possibility for more optimizations Optimization only at the System-Level Logic PLM IP DESIGN Logic PLM IP DESIGN SOC INTEGRATION Accelerator Design (SystemC) Algorithm Design (C/C++) Accelerator Design (SystemC) Algorithm Design (C/C++) Accelerator Design (SystemC) Algorithm Design (C/C++) Logic IP DESIGN Accelerator Design (SystemC) Algorithm Design (C/C++) SOC INTEGRATION Memory Subsystem Design Logic IP DESIGN Mem Reqs Mem Reqs Component-based Approach System-Level Approach
  • 15. ©Christian Pilato, 2022 15 PLM Optimization for Multiple Accelerators HLS and DSE Accelerator Design1 (SystemC) Accelerator Logic1 (Verilog) Memory Requirements1 HLS and DSE Accelerator Designk (SystemC) Accelerator Logick (Verilog) Memory Requirementsk Compatibility Information Memory IPs Technology-unaware Transformations1 Local Tech-aware Transformations1 Memory Subsystem (Verilog) Global Technology-aware Transformations 1 1 2 3 4 MNEMOSYNE Technology-unaware Transformationsk 2 Local Tech-aware Transformationsk 3 Generation of RTL Architecture 5 …
  • 16. ©Christian Pilato, 2022 16 Let us assume to have the two following data structures that are never alive at the same time • A[1024] with data duplication over 4 parallel banks • B[4096] with data distribution over 2 parallel banks Address-Space Compatibility A0 A1 A2 A3 + B0 B1 Memory footprint: 4x1024x32 + 2x2048x32 = 254,485.68 um2 A0 A1 A2 A3 A0 A2 A1 A3 Reused to store B by putting banks in “series” to virtually increase capacity Memory footprint: 4x1024x32 = 140,426.46 um2 (-44.8%)
  • 17. ©Christian Pilato, 2022 17 A classical example is the ping-pong buffer (two 2048x16 arrays – A0/A1) • When process P writes A0 (A1), it never writes A1 (A0) • When process C reads from A0 (A1), it never reads from A1 (A0) Memory-Interface Compatibility if (ping) A0[i] = … else A1[i] = … if (ping) … = f(A0[i]) else … = f(A1[i]) μ-architectural optimizations P C P C memory controller valid ready A0 A1 A0 (odd) A1 (even) A0 A1 A0 (even) A1 (odd) Memory footprint: 4x1024x32 = 140,426 um2 P C P C memory controller valid ready A0 A1 A0 (even) A1 (even) A0 (odd) A1 (odd) A0 A1 Merged in the same IP, but in a different memory space Memory footprint: 2x2048x32 = 114,059.2 um2 Area reduced by 18% without any performance overhead!
  • 18. ©Christian Pilato, 2022 18 Graph to represent the possibilities for optimizing the data structures • Each node represents a data structure to be allocated, annotated with its data footprint (after data allocation) • Each edge represents compatibility between the two data structures Memory Compatibility Graph (MCG) A0 2x1024x32 A1 2x1024x32 B0 1x2048x32 a a b a) Address-space compatibility: the data structures are compatible and can use the same memory IPs b) Memory-interface compatibility: the ports are never accessed at the same time and the data structures can stay in the same memory IP
  • 19. ©Christian Pilato, 2022 19 “A clique is a subset of the vertices of the memory compatibility graph such that every two vertices are connected by an edge” Clique Definition A0 2x1024x32 A1 2x1024x32 B0 2048x32 a a b A0 2x1024x32 A1 2x1024x32 B0 2048x32 a We need two distinct configurations! {A0,B0} and {A1} or {A1,B0} and {A0}? a A clique represents a set of data structures that can share the same memory IPs
  • 20. ©Christian Pilato, 2022 20 Memory Cost Minimization To determine how to partition the MCG such that the total memory cost is minimized Clique Characterization To determine the memory architecture of all cliques and their memory cost Clique Enumeration To define the list of admissible cliques in the MCG How to Determine the Memory Subsystem
  • 21. ©Christian Pilato, 2022 21 A lightweight PLM controller is created for each compatibility set (clique) based on the bank configuration • Accelerator logic is not aware of the actual memory organization • Array offsets need to be translated into proper memory addresses PLM Controller Generation Clique Configuration B0 B1 B2 B3 PLM Controller Custom logic with negligible overhead, especially when the number of banks and their size is a power of two 0x0 0x1 0x0 0x1 0x0 0x1 ATU ATU ATU ATU CE WE A D Q CE WE A D Q 0x00 0x01 0x02 0x03 … CE WE A D Q CE WE A D Q ATU ATU ATU ATU CE WE A D CE A Q 0x00 0x01 0x02 0x03 … 0x00 0x01 0x02 0x03 … 0x00 0x01 0x02 0x03 … … 1 0010 1 100101
  • 22. ©Christian Pilato, 2022 22 Industrial 32nm CMOS technology • Memory library with 18 SRAMs Impact of Optimizations 0 0.2 0.4 0.6 0.8 1.0 Compatibility Coloring Final Normalized area of the PLM Sort FFT1D FFT2D Debayer Lucas Kanade Change Detection Interpolation 1 Interpolation 2 Backprojection Disparity PCA RBM SRR 0 0.2 0.4 0.6 0.8 1.0 Compatibility Coloring Final Normalized area of the PLM Sort FFT1D FFT2D Debayer Lucas Kanade Change Detection Interpolation 1 Interpolation 2 Backprojection Disparity PCA RBM SRR Xilinx Virtex-7 FPGA • Memory library with 6 BRAM configurations
  • 23. ©Christian Pilato, 2022 23 Bram Ctrl PLM0 ACC0 Ctrl ctrl Creation of Parallel Architectures PLM0 Ctrl PLMm-1 … ACC0 ctrl Batch Bram Ctrl A[MSBs] PLM0 Ctrl PLMm-1 … ACCk-1 ctrl ACC0 ctrl A[MSBs] … Bram Ctrl
  • 24. ©Christian Pilato, 2022 24 • Xilinx Zynq UltraScale+ MPSoC ZCU106 board • CFD simulation of 50,000 elements • Memory sharing allows us to fit more kernels Preliminary Evaluation
  • 25. ©Christian Pilato, 2022 25 DSL for representing the kernel Moving to a system-level representation • Simple example for a massively parallel architecture: LOOP ~ KERNEL(S, D, u, v) Possibility to decide the memory layout and configure DMA/prefetchers based on the target architecture/platform Next Step: System-Level DSL
  • 26. ©Christian Pilato, 2022 26 We are building a compilation flow based on LLVM MLIR for automatic specialization: • MLIR Input – From DSL descriptions of the system functionality • Data Organization – Determine which data resides off chip (also based on user/compiler annotations) • Layout – Reorganize communication to exploit local memories (cache/PLM) • Communication – Configure prefetcher to hide transfer latency • Local Partitioning – Determine multi-bank PLM architecture (Mnemosyne1) • HLS – Generate computation part (interfacing with existing HLS tools, e.g., open-source Vitis HLS frontend) • HDL Output – Automated code generation and system-level integration based on the target platform Next Step: Let’s Put Memory First S. Soldavini and C. Pilato. "Compiler Infrastructure for Specializing Domain-Specific Memory Templates" LATTE’21 External Memory Kernel Logic to Resolve Addr and Reduce Delay Cache DMA Prefetcher Multi-Channel Controller DRAM HBM Remote PLM PLM Multi port (based on access patterns) Data Org Layout Communication Local Partitioning Kernel Gen System-Level Description HDL Intelligent Memory Logic (Latency Insensitive) Direct Access Memory (Fixed Latency)
  • 27. ©Christian Pilato, 2022 27 We are developing a complete hardware architecture generation flow based on MLIR description of the system functionality Platform-specific description • HBM-based Xilinx Alveo • IBM CloudFPGA • … Host code generation • Based on platform libraries to be developed for the specific target Olympus – Automated System-Level Integration
  • 28. ©Christian Pilato, 2022 28 • Determines the system-level architectures based on: • Algorithm parallelism • Characteristics of the target platform(s) • Interfaces of the modules (HLS tools) • Produces • Synthesizable C++ code that includes: • Accelerators and PLM generated with HLS • Communication modules to match interfaces • Standard AXI interfaces to the system (either cloudFPGA SHELL or HBM channels) • May include “intelligent” policies to coordinate data transfers • System configuration file to create the overall architecture • Support for multiple computing units executing in parallel • Interfacing with Xilinx HLS and synthesis tools Olympus – System generation flow
  • 29. ©Christian Pilato, 2022 29 Automatic integration of memory optimizations for high-performance data transfers, such as: • Double buffering to hide latency of host-FPGA data transfers • Bus optimization (and data interleaving) for maximizing bandwidth (e.g., 256-bit AXI channels) – algorithms for efficient data layout on the bus • Dataflow execution model to enable kernel pipelining – automatic (pre-HLS) code transformations From MLIR to System Architecture Double buffering Bus optimization Dataflow
  • 30. ©Christian Pilato, 2022 30 Results on HBM FPGA Best performance: 103 GOPS (118x faster than our "starting point") Results are 6x better than HPC ones (~25x more energy efficient) Possibility of integrating custom data formats and configure memories and data transfers accordingly
  • 31. ©Christian Pilato, 2022 31 Data management optimizations are becoming the key element for the creation of efficient FPGA architectures HLS is now used not only to create accelerator kernels but also to generate the system-level architecture • Portable solutions across multiple target platforms Novel HBM architectures offer high bandwidth (that’s why are called high-bandwidth memory architectures… J) but their design is complex: • Necessary to match application requirements and technology characteristics Conclusions
  • 32. Thank you! Christian Pilato, christian.pilato@polimi.it This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 957269 Work done in collaboration with Stephanie Soldavini (Politecnico di Milano), Jeronimo Castrillon (TU Dresden), Karl F. A. Friebel (TU Dresden), and Gerald Hempel (TU Dresden)