For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/fotonation/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-teig
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Steve Teig, Chief Technology Officer at Xperi, presents the "Embedding Programmable DNNs in Low-Power SoCs" tutorial at the May 2018 Embedded Vision Summit.
This talk presents the latest generation of FotoNation's (a core business unit of Xperi) Image Processing Unit (IPU)—an embedded AI enabled image processing engine that can be customized and adapted to suit a wide range of imaging tasks. Due to its scalable nature, the IPU can be deployed in low-power applications such as IoT devices, and can also be scaled up to much more powerful configurations suitable for challenging automotive computer vision applications. And, in perhaps the most exciting development, the latest variants of the IPU feature FotoNation's programmable convolutional neural network engine (pCNN), which can implement CNN architectures created using state-of-art design tools such as TensorFlow and Caffe.
The pCNN hardware architecture, optimized for image analytics and Xperi's state of the art DBI™ digital bonding interconnect technology, can also implement multiple CNNs in parallel being able to meet most stringent real time requirements. The combination of the IPU and DBI™ enables advanced artificial intelligence solutions implemented on mid-sized chips – opening the door to powerful AI driven imaging solutions that you can carry in your pocket.
Presentation on how to chat with PDF using ChatGPT code interpreter
"Embedding Programmable DNNs in Low-Power SoCs," a Presentation from Xperi
1.
2.
3. Portfolio of Trusted Brands
Licensing
Semiconductor
Intellectual Property
Imaging and
Computer Vision
silicon IP cores and
solutions
Audio Technology
Solutions
Automotive Audio,
Data, and Digital
Radio Broadcast
Solutions
Semiconductor
and Interconnect
Packaging
Technology &
Solutions
3.4+ B Devices 70+ M Cars 1+ B Devices 100+ B Devices2+ B Devices
4.
5. Always-on
inference: operates
even while the
device is “off”
(e.g., ultra low power
FD/FR as an enabler)
Head-mounted
displays for AR or MR
(e.g., ultra low power
IRIS, eye gaze, scene
understanding,, etc..)
Smart IoT: TVs to
drones to microwave
ovens to …
(e.g., ultra low power
people detection)
Driver State
Sensing for
autonomous driving
(e.g., always ON
driver assistant)
6. Enhance
• De-warping & Stitching
• Stabilization with Rolling
Shutter Correction
• HDR & LTM
Understand
• Face, People, Object
Detection, Segmentation
& Tracking
• Scene Classification and
recognition
3rd Party
FN IP
Personalize
• Visible and NIR 3D FR
• IRIS recognition
• Liveliness Detection &
Continuous recognition
(hand jitter, facial, etc.)Accelerate
Computer Vision
Accelerator
Sensor
MIPI
DDR
CTRLR
I/F
LCD
DDR
COMM
S
FLASHI/F
-------
-------
-------
Flash
GPU
ISP
Display/LTM
IPU
CPU
Low Power Face
Detection
HQ Distortion
Correction
Facial Feature
Extraction
Image
Registration
Object Detection
& Classification
PCNN Clusters
Biometrics Cores
10. Pre-processing
• Multi-resolution stream generation
• Local tone mapping enhancement (significantly
improves detection ratios)
• High-quality, low-latency distortion correction
Dedicated
Cores
• Facial & people analytics (AI/ML)
• Stabilization (HQ resampling, analytics)
• Optional Depth (AI/ML)
PCNNs
• PCNNs for reconfigurable
functionality
• Concurrent support of
multiple real-time networks
11. 11
• IPU
• RTL (image pre-processing, dedicated analytics
and inference cores)
• Tools for programming, training and debugging
• Testing Framework (a.k.a. ImageDB) test
framework supporting acquisition, marking, testing
and reporting
• Data Sets (a.k.a. CV Infra) computer vision
infrastructure supporting 2D and 3D image sets
acquisition, annotation, marking and training sets
generation with ground truth
IPU
Cores &
Tools
Testing
Framework
Data
Sets
… inference cores are a
small part of the total
CNN
Inference
Cores
15. Built in pre-
processing on the fly
imaging engine for
(e.g., layer 0)
Local memory for fast
data access and
reduced memory BW
Designed for very low
latency and real-time
network inference
Supporting toolchain to
implement customer
defined network architecturesPCNN 1.2 - 36 MAC/cycle
or
PCNN 2.0 - 512 MAC/cycle
Support for compression,
quantization, decryption
16. PCNN CORE
PCNN ENGINE
SYS BUS
(AXI)
DDR CPU
REGS
MAP RD MAP WR CODE RD IRQ APB
FLASH
CTL BUS (APB)
•
•
•
•
•
•
18. SYSTEM BUS
(AXI)
SYS MEMORY
(DDR, FLASH)
HOST CPU
PCNN-CLUSTER
CORE
RISC
MAILBOX
CFG
(AHB)
PCNN-C ENGINE
PCNN PCNNPCNN PCNN
SRAM CTL
SHARED SRAM
IRQ
CFG
1K Bus
*
*
ARBITER
TO/FROM
OTHER
PCNN-C
•
•
•
19. Device can be accessed
and storage contents
read
Networks sit in the
device’s
permanent storage
Networks
representation patterns
can be identified and
localized in the storage
contents
Neural processor
makers offer
network transfer
tools
Once the network
representation is known,
architecture and weights
values can be obtained
Network
architecture &
weights extraction
Networks can be
remapped and inferred
on alternative
architectures as own
Network re-map and
inference on
alternative
architectures
20. Kstream= Npub
* Csec
Encrypted NN
Kstream
SW-HW
Public
Interface
PCNN
DECIPHER
Neural
Network
PCNN
INFERENCE
Stream
cipher
SW run by NN provider
Chip
Csec
Genration
Network
NSec
Generation
Neual
Network
On chip
(fuses)
Npub
Npub
Creation
Cpub
Creation
Kstream= Cpub
* Nsec
Cpub
FotoNation HW IPUSW run by chip maker
21.
22. Scalable to
1,000,000
interconnects
per mm2
3D Design &
Architecture
Materials
Characterization
Simulation
Wafer/Die Bonding
& Processing
Reliability
Failure
Analysis
Technology
Development &
Optimization
Die to Wafer (D2W)
Die to Die (D2D)
Wafer to Wafer (W2W)
ZiBond® DBI®
Homogeneous Bonding Hybrid Bonding
23. 23
SRAM die
LOGIC die
• Array of 2µm pitch DBI®
interconnects (between silicon dies) -
up to 250,000 vertical interconnects
per mm2 enables groundbreaking
computing architecture
• Very short vertical interconnects offer
ultra high performance at very low
power
HOST CPU
CTL CTL
SRAM 8MB SRAM 8MB
. . .
. . .
CTL
8K
SRAM 8MB
2xAXI 128
. . . . . .
DDR CTRL FLASH CTRL
LPDDR 4 NAND FLASH
x32
1K
AHB
IRQ
Cache
AXI
PCNN-CPCNN-CPCNN-C
DBI®
30. 1. Logic wafer/die with pre-drilled TSVs and DBI® layer
2. SRAM wafer/die with DBI® layer
DBI® layer
Interconnect layers
Active (IC) layer
TSVs
Bulk silicon
3. Face-to-face Zibond bonding of two wafers/dies
4. Thinning of logic wafer/die to expose TSVs
5. Wafer level packaging of stacked wafers/dies
Industry Standard Damascene Process
31. Wafer to
wafer
bonding
Chip to wafer
bonding
Chip to chip
bonding
ZiBond: Full surface bonding (no underfill)
DBI®: Cu-Cu interconnect joining (no solder)
Sony’s latest image sensor uses DBI®
BSI
image
sensor
Logic
Local
memory
Signal
processing
Wafer 1
Wafer 2
µm scale Cu interconnects
Full surface oxide bond
3D Wafer Bonding with Copper Interconnects
DBI® with Zibond: the most advanced 3D interconnects