Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Sessions

SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST

Charlie Wang, Peng Tu, Mikhail Nikolsky, Jerry Dong
Building a Deep Learning Video Analytics
Framework for Intel AI Platforms
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST

Speakers
• Charlie Wang
• Principal Engineer, VTT
• Peng Tu
• Principal Engineer, CPDP
• Mikhail Nikolsky
• Sr. Staff Engineer, CPDP
3
Intel Architecture Graphics Software (IAGS)

Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ GStreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources

Intelligence on Video Data
Retailanalytics Industrialinspection Content filtering Parking management
Super Resolution Autonomous driving Action recognitionEncode Quality Control

Intel Video Analytics HW
6
Intel® CPU
Client and server
Intel® Vision
AcceleratorDesign
with Intel® Movidius™
Vision Processing
Units (VPU)
Intel® GPU
Integrated and
discrete
Intel® Vision
AcceleratorDesign
with an Intel® Arria 10
FPGA (preview)
Scalar Vector Matrix Spatial

Typical Video Analytics Flow
Decode
Scale
/csc
Inference
Object
tracking
Post processing
+ Encode
720p
1080p
4K
(AVC, HEVC)
Object
Detection
Image
Segmentation
ObjectTracking
• LKT
• IOU
Crop/scale
Bounding box
720P
1080p
(AVC, HEVC)
Output:
224x224
RGBP
Objects
Recognition
Image
Classification
Action
Recognition
Crop
scale
Inference
Inference

Map to Intel Hardwares
Decode
Scale
/csc
Inference
Object
tracking
Post processing
+ Encode
Crop
scale
Inference
Inference
CPU
Media FF
CPU
GPU
VPU
FPGA
CPU
GPU
Programmable
CPU
Media FF
GPU
CPU
Media FF
GPU
CPU
GPU
VPU
FPGA
Video analyticsrequire heavy video and compute interaction

Popular video Processing software
frameworks
Video and audio demux, decode, processing, encoding, rendering, and muxing, also
allow customizedplugin/filter

Intel Video software Offering
11
CPU GPU Media FF FPGA
VAAPI/DXVA
Next Gen Media Library
SW codec
App
FFMPEG GStreamer
App App
Customized
Framework
FPGA driver
Intel GPU media FF support video decode/encode and video processing

Intel Computer VISION Software Offering
Deep Learning for Computer Vision
Accelerate and deploy convolutional neuralnetworks
(CNN) on Intel® platforms with the Deep Learning
DeploymentToolkit includedOpenVINO
Traditional Computer Vision
Develop and optimize classic computer vision applications
built with the OpenCV library or OpenVX API.

OpenVINO for Inference
13
CPU GPU VPU
MKLDNN
GPU PluginCPU Plugin
DL Inference Engine API
FPGA
MVNC
VPU Plugin
DLA
FPGA Plugin
Heterogeneous Execution Engine
CLDNN
Inference App
Single interface supports all platforms, no SW change.
Library designed for CNN inference accelerationon Intel HW

Inference Engine
This execution engine uses a common API to deliver inference
solutionson the platform of your choice: CPU, GPU, VPU, or
FPGA.
Model Optimizer
This Python*-based command line tool imports trained
models from popular deep learning frameworks such as
Caffe*, TensorFlow*, and Apache MXNet*, and Open
Neural Network Exchange (ONNX).
Intel Deep Learning Deployment Toolkit

Now let’s build the video
analytics framework
Write-once, Deploy on All HWs

Video Analytics Framework
Video Analytics Application
CPU GPU VPU FPGA
VAAPI
Media Driver
MKLDNN
ComputeDriver
Next Gen Media Library OpenVINO
CLDNN DLA
CPU codec
FFMPEG Plugin
Gstreamer Plugin
Your own framework
NV12
The framework supports load balanceamong devices

Media & Compute interoperability
• Media and Compute/Inferenceuse differentcolor format
Tiled NV12
Surface
Decode
Linear RGBP
SurfaceScaling/csc Meta dataInference
A copy is required if we don’t
handlebuffer sharing
correctly
Common media format – YUV
with padding, pitch, etc.
Inference format – tensor array

Media format as a compute/Inference
Format
Decode
NV12
SurfaceScaling Meta dataInferenceC
Y channel
Layer1 (csc+convolution)
weight1
weight2
weight3
bias
csc
R
G
B
UV
channel
Inference time reduced by 3% to 20% depends on resolution
Model Resolution Time reduction
GoogleNetV1 224x224 3.6%
YoloTinyV1 448x448 9.3%
Mtcnn 1280x720 20.9%

Video Analytics e2e flow
Trained
Model
OfflineModel
Opt
Model
IR
DLDT
Inference Engine API
Application
Inference Engine
DLDT
MSDK Library
GSTVA
DEC/ENC/VPP API
MSDK
DEC/ENC/VPP/INF API
FFMPEG-VA/GST-VA
VA Pipeline DesignVideo
sources
VA Pipeline Implementation

21
FFMPEG componentsCommands
(console)
ffmpeg ffplay ffprobe ffserver
Libraries
libavdevice
libavformat
libavcodec
libavfilter
libavutil
libpostproc
libavresample
libswresamplemux / demux
Libraryfora/vfilters which to implement
all kind ofeffects,such as scale, crop, frc,
etc
Commandtoolto do
transcoding
Simple playerwith SDL
usingffmpeg
demux/decoder
Tool to extract the
informationofmulti-
media stream
Real-time stream
server to broadcast
multi-mediastream
A Library to implement mostofA/Vcodec,
and used bymost ofpopular codectools
Common tool library
We are adding inference as
filters

22
Tensorflow
ffmpeg or ffplay
CPU GPU FPGAVPU
DNN INTERFACEDNNModel DNNModel
Tensorflow
Backend
Inference Engine
Backend
InferenceEngine(OpenVINO)
MKLD
NN
clDNN Movidius DLA
SR Filter Classify FilterDetect Filter
APP
LIBAVFILTER
3RD
LIBRARIESHW
Kafka
produce
r
Meta
data
muxer
LIBAVFORMAT
Librdkafka
FFmpeg Filter VA Architecture new
FFmpeg
hardware
3rd party
…

23
• Face detection + emotion&age_gender recognition:
ffmpeg -i clip.mp4 -vf “
detect=model=$DETECT_MODEL1:model_proc=$MODEL1-JSON:device=$CPU,
classify=model=$EMOTION_MODEL2:model_proc=&MODEL2-JSON:device=$CPU,
classify=model=$AGE_GENDER_MODEL3:model_proc=$MODEL3-JSON:device=$CPU”
-an -y -f iemetadata metadata.json
The pipelines look like ...
23
input decoder face emotion age-gender convert send to
detection recognition recognition to JSON kafka server
stream detect classify classify iemetadata Kafkadecode

WHY AddING VA to Gstreamer
• Multiplatform:Linux, Windows, Mac OS X, Android, …
• Comprehensivecore:graph-based multi-threadedpipeline, lightweightdata
passing
• Broad coverageof media technologies: file and streamingpacket i/o, codecs,
metadata,video and audio
• Extensivedevelopmenttools: gst-launch, Python, C++ API
• Easy to extend and reusethrough plugins and metadata
25

GStreamer Plugins Architecture
26
GStreamer API
Application
MetadataPer-plugin params/API
pipeline control pipeline events inference plugins configuration inference res ults
zero-copy,multi-channel inference
OpenVINO Inference Engine
MKLDNN
plugin
clDNN
plugin
MDK
plugin
iGPU/
dGPU
CPU
KMB/
HDDL
VAAPI
libav/
ffmpeg
VAAPI SVT
GPU
i/d/VSI
CPU
GPU
i/d/VSI
CPU
RTSP,
WebRTC,
Render,
File IO,
…
V4L2
Media HW acceleration
ImageInference API
Kafka
MQTT
Video
sources
gvainference
Video Analytics plugins (GVA*)Media plugins
HW
decode
SW
decode
HW
encode
SW
encode
Other plugins
200+
plugins
gvadetect gvaclassify gvaidentify
Meta
convert
publish
decodebin

GVA Inference Plugins Architecture
27
Inference Shared Instance
OpenVINO IEOpenVINO IE
GvaDetect
Thread
sink
padDMABuf
or RGBx
(any
resolution)
Attach GstMeta
to GstBuffer
source
pad
Output Layer
Post-Processing
GstBuffer + GstMeta’s
VASurface or RGBx
VAAPI
original GstBuffer
queueInput Layer
Pre-Processing
Inference queue
per device
DownScale
NV12→RGBP

GStreamer pipeline example
input HW/SW face age/gender emotion landmark re-identify face convert overlay
decode detection classification recognition points inference recognition to JSON result
28
filesrc decodebin gvaclassify gvaclassify gvaclassifygvadetect gvaidentify gvametaconvert gvawatermark
Video Analytics pipeline in Ad Insertion demo – facedetection plus age, gender, person recognition
gvaclassify
gst-launch-1.0 filesrc location=${FILE} !
decodebin !
gvadetect model=face-detection-adas-0001.xml model-proc=face-detection-adas-0001.json ! queue !
gvaclassify device=CPU model=age-gender-recognition.xml model-proc=age-gender-recognition.json ! queue !
gvaclassify device=GPU model=emotions-recognition.xml model-proc=emotions-recognition.json ! queue !
gvaclassify device=CPU,GPU model=landmarks-regression.xml model-proc=landmarks-regression.json ! queue !
gvaclassify model=face-reidentification.xml model-proc=face-reidentification.json ! queue !
gvaidentify gallery=face_gallery.json ! queue !
gvawatermark ! videoconvert ! fpsdisplaysink sync=false
Platform specific tuning: gvaclassify device={CPU|CPU,GPU} cpu-streams=15 nireq=16 …

Multiple Programming language
BINDINGS• Easily build pipeline from C/C++, Python in addition to the gst-launch command
29

GStreamer Video Analytics Plugins List
30
GST element Description INPUT OUTPUT
gvainference Generic inference GstBuffer INPUT+ GvaTensorMeta
gvadetect Objectdetection GstBuffer INPUT+ GstVideoRegionOfInterestMeta
gvaclassify Objectclassification GstBuffer +
GstVideoRegionOfInterestMeta
INPUT+ GstVideoRegionOfInterestMeta
gvaidentify Objectidentification/
recognition
GstBuffer +
INPUT+ GstVideoRegionOfInterestMeta
gvametaconvert Metadata conversion GstBuffer +GvaTensorMeta/
INPUT+ GvaJSONMeta
gvawatermark Overlay GstBuffer +GvaDetectionMeta + {
GvaTensorMeta }
INPUT(with modified image)
gvametapublish Messagebus (Kafka,
MQTT)
GstBuffer +GvaJSONMeta -

Video Analytics API levels
• Low-level (per frame processing)
▪ VAAPI, OpenVINO, OpenCL, OpenCV
• Pipeline level
▪ GStreamer, FFMPEG, MediaFoundation, …
• Video Analytics as Service
▪ REST/gRPC multi-node service
32

Video Analytics REST Service
33
• Provide RESTfulinterfacesfor executing
and monitoringpipeline status
• Interface agnostic to underlying
implementation(GStreamer,FFMPEG,
Custom backend)
• Supportscaling through container
deploymentand orchestration frameworks
• Load balancing
Video Analytics Pipeline Service
GStreamer / FFMPEG
REST / gRPC / Message Bus
VAAPI / MSDKHW
Acceleration
Pipeline
Support
Video Analytics
Normalization
Edge / Cloud
Integration
Open VINO
PipelineA PipelineB …
Pipeline Manager Model Manager
Model A Model B…
CPU GPU VPU

Video Analytics Service Developer
Workflow
34
HW / OS
Optimized
Docker Files
Pipelines
Configuration
json files
Docker
Build
Model
Configuration
json files
HW/ OS
Optimized
Libraries
VA Pipeline Developer creates
pipeline templates with
customizable parameters and
models
VA Pipeline Developer builds
Docker image
Developer / System Integrator
deploys containers to Cloud or Edge
and integrates into application
HTTP request
1 2 3
Application 2
Scheduler
Application 1
Worker
HTTP request
Worker
…
…

CloudEdge
End2end DISTRIBUTED VIDEO analytics
examplePOST/pipelines/vehicles/1
{
"source": "rtsp://10.43.38.158",
"destination": {
“type": “Kafka",
“hosts": [10.43.38.150:9299]
}
}
Video Analytics
REST Service
REST
RTSP
{
"camera_id" : “173207954”,
"label": "vehicle",
"object_id": 3,
"bounding_box": [0.13958, 0.33766, 0.42094, 0.06687],
"confidence": 0.83
}
NVR
storage
RTSP
Database
web dashboard
Analytics
algorithm

Video Analytics Service is part of Open
Visual Cloud
36
https://01.org/openvisualcloud
The Open Visual Cloud is an open source project that offers a set of pre-defined reference
pipelines for various target visual cloud use cases.

Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ Gstreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources

Demo
• 4 channel face detection
• Smart city, edge to cloud
38

Summary
• We presented Intel video analytics e2e pipeline in GStreamer
and FFMPEG
• It supports multiple Intel HWs with same SW pipeline/API
• It provides optimized flow between media and DL inference, zero
copy, inference on NV12 image
• It provides scalable deployment cross edge/cloud
• Call to Action
• Try it out and let us know your feedback

Resources
▪ OpenVINO - https://github.com/opencv/dldt
▪ MediaSDK - https://github.com/Intel-Media-SDK
▪ GStreamer Plugin - https://github.com/opencv/gst-video-analytics
▪ Open Visual Cloud
▪ Smart City sample
https://github.com/OpenVisualCloud/Smart-City-Sample
▪ Ad Insertion sample
https://github.com/OpenVisualCloud/Ad-Insertion-Sample
▪ Docker files including FFMPEG Video Analytics Filters
https://github.com/OpenVisualCloud/Dockerfiles
40

Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Sessions

Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Sessions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Sessions

Similar to Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Sessions (20)

More from Intel® Software

More from Intel® Software (20)

Recently uploaded

Recently uploaded (20)

Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Sessions