Explore how to build a unified framework based on FFmpeg and GStreamer to enable video analytics on all Intel® hardware, including CPUs, GPUs, VPUs, FPGAs, and in-circuit emulators.
2. Charlie Wang, Peng Tu, Mikhail Nikolsky, Jerry Dong
Building a Deep Learning Video Analytics
Framework for Intel AI Platforms
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
3. Speakers
• Charlie Wang
• Principal Engineer, VTT
• Peng Tu
• Principal Engineer, CPDP
• Mikhail Nikolsky
• Sr. Staff Engineer, CPDP
3
Intel Architecture Graphics Software (IAGS)
4. Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ GStreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources
5. Intelligence on Video Data
Retailanalytics Industrialinspection Content filtering Parking management
Super Resolution Autonomous driving Action recognitionEncode Quality Control
6. Intel Video Analytics HW
6
Intel® CPU
Client and server
Intel® Vision
AcceleratorDesign
with Intel® Movidius™
Vision Processing
Units (VPU)
Intel® GPU
Integrated and
discrete
Intel® Vision
AcceleratorDesign
with an Intel® Arria 10
FPGA (preview)
Scalar Vector Matrix Spatial
8. Map to Intel Hardwares
Decode
Scale
/csc
Inference
Object
tracking
Post processing
+ Encode
Crop
scale
Inference
Inference
CPU
Media FF
CPU
GPU
VPU
FPGA
CPU
GPU
Programmable
CPU
Media FF
GPU
CPU
Media FF
GPU
CPU
GPU
VPU
FPGA
Video analyticsrequire heavy video and compute interaction
9. Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ GStreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources
10. Popular video Processing software
frameworks
Video and audio demux, decode, processing, encoding, rendering, and muxing, also
allow customizedplugin/filter
11. Intel Video software Offering
11
CPU GPU Media FF FPGA
VAAPI/DXVA
Next Gen Media Library
SW codec
App
FFMPEG GStreamer
App App
Customized
Framework
FPGA driver
Intel GPU media FF support video decode/encode and video processing
12. Intel Computer VISION Software Offering
Deep Learning for Computer Vision
Accelerate and deploy convolutional neuralnetworks
(CNN) on Intel® platforms with the Deep Learning
DeploymentToolkit includedOpenVINO
Traditional Computer Vision
Develop and optimize classic computer vision applications
built with the OpenCV library or OpenVX API.
13. OpenVINO for Inference
13
CPU GPU VPU
MKLDNN
GPU PluginCPU Plugin
DL Inference Engine API
FPGA
MVNC
VPU Plugin
DLA
FPGA Plugin
Heterogeneous Execution Engine
CLDNN
Inference App
Single interface supports all platforms, no SW change.
Library designed for CNN inference accelerationon Intel HW
14. Inference Engine
This execution engine uses a common API to deliver inference
solutionson the platform of your choice: CPU, GPU, VPU, or
FPGA.
Model Optimizer
This Python*-based command line tool imports trained
models from popular deep learning frameworks such as
Caffe*, TensorFlow*, and Apache MXNet*, and Open
Neural Network Exchange (ONNX).
Intel Deep Learning Deployment Toolkit
15. Now let’s build the video
analytics framework
Write-once, Deploy on All HWs
16. Video Analytics Framework
Video Analytics Application
CPU GPU VPU FPGA
VAAPI
Media Driver
MKLDNN
ComputeDriver
Next Gen Media Library OpenVINO
CLDNN DLA
CPU codec
FFMPEG Plugin
Video Analytics Application
Gstreamer Plugin
Video Analytics Application
Your own framework
NV12
The framework supports load balanceamong devices
17. Media & Compute interoperability
• Media and Compute/Inferenceuse differentcolor format
Tiled NV12
Surface
Decode
Linear RGBP
SurfaceScaling/csc Meta dataInference
A copy is required if we don’t
handlebuffer sharing
correctly
Common media format – YUV
with padding, pitch, etc.
Inference format – tensor array
18. Media format as a compute/Inference
Format
Decode
NV12
SurfaceScaling Meta dataInferenceC
Y channel
Layer1 (csc+convolution)
weight1
weight2
weight3
bias
csc
R
G
B
UV
channel
Inference time reduced by 3% to 20% depends on resolution
Model Resolution Time reduction
GoogleNetV1 224x224 3.6%
YoloTinyV1 448x448 9.3%
Mtcnn 1280x720 20.9%
19. Video Analytics e2e flow
Trained
Model
OfflineModel
Opt
Model
IR
DLDT
Inference Engine API
Application
Inference Engine
DLDT
MSDK Library
GSTVA
DEC/ENC/VPP API
MSDK
DEC/ENC/VPP/INF API
FFMPEG-VA/GST-VA
VA Pipeline DesignVideo
sources
VA Pipeline Implementation
20. Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ GStreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources
21. 21
FFMPEG componentsCommands
(console)
ffmpeg ffplay ffprobe ffserver
Libraries
libavdevice
libavformat
libavcodec
libavfilter
libavutil
libpostproc
libavresample
libswresamplemux / demux
Libraryfora/vfilters which to implement
all kind ofeffects,such as scale, crop, frc,
etc
Commandtoolto do
transcoding
Simple playerwith SDL
usingffmpeg
demux/decoder
Tool to extract the
informationofmulti-
media stream
Real-time stream
server to broadcast
multi-mediastream
A Library to implement mostofA/Vcodec,
and used bymost ofpopular codectools
Common tool library
We are adding inference as
filters
22. 22
Tensorflow
ffmpeg or ffplay
CPU GPU FPGAVPU
DNN INTERFACEDNNModel DNNModel
Tensorflow
Backend
Inference Engine
Backend
InferenceEngine(OpenVINO)
MKLD
NN
clDNN Movidius DLA
SR Filter Classify FilterDetect Filter
APP
LIBAVFILTER
3RD
LIBRARIESHW
Kafka
produce
r
Meta
data
muxer
LIBAVFORMAT
Librdkafka
FFmpeg Filter VA Architecture new
FFmpeg
hardware
3rd party
…
23. 23
• Face detection + emotion&age_gender recognition:
ffmpeg -i clip.mp4 -vf “
detect=model=$DETECT_MODEL1:model_proc=$MODEL1-JSON:device=$CPU,
classify=model=$EMOTION_MODEL2:model_proc=&MODEL2-JSON:device=$CPU,
classify=model=$AGE_GENDER_MODEL3:model_proc=$MODEL3-JSON:device=$CPU”
-an -y -f iemetadata metadata.json
The pipelines look like ...
23
input decoder face emotion age-gender convert send to
detection recognition recognition to JSON kafka server
stream detect classify classify iemetadata Kafkadecode
24. Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ GStreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources
25. WHY AddING VA to Gstreamer
• Multiplatform:Linux, Windows, Mac OS X, Android, …
• Comprehensivecore:graph-based multi-threadedpipeline, lightweightdata
passing
• Broad coverageof media technologies: file and streamingpacket i/o, codecs,
metadata,video and audio
• Extensivedevelopmenttools: gst-launch, Python, C++ API
• Easy to extend and reusethrough plugins and metadata
25
26. GStreamer Plugins Architecture
26
GStreamer API
Application
MetadataPer-plugin params/API
pipeline control pipeline events inference plugins configuration inference res ults
zero-copy,multi-channel inference
OpenVINO Inference Engine
MKLDNN
plugin
clDNN
plugin
MDK
plugin
iGPU/
dGPU
CPU
KMB/
HDDL
VAAPI
libav/
ffmpeg
VAAPI SVT
GPU
i/d/VSI
CPU
GPU
i/d/VSI
CPU
RTSP,
WebRTC,
Render,
File IO,
…
V4L2
Media HW acceleration
ImageInference API
Kafka
MQTT
Video
sources
gvainference
Video Analytics plugins (GVA*)Media plugins
HW
decode
SW
decode
HW
encode
SW
encode
Other plugins
200+
plugins
gvadetect gvaclassify gvaidentify
Meta
convert
publish
decodebin
27. GVA Inference Plugins Architecture
27
Inference Shared Instance
OpenVINO IEOpenVINO IE
GvaDetect
Thread
sink
padDMABuf
or RGBx
(any
resolution)
Attach GstMeta
to GstBuffer
source
pad
Output Layer
Post-Processing
GstBuffer + GstMeta’s
VASurface or RGBx
VAAPI
original GstBuffer
queueInput Layer
Pre-Processing
Inference queue
per device
DownScale
NV12→RGBP
28. GStreamer pipeline example
input HW/SW face age/gender emotion landmark re-identify face convert overlay
decode detection classification recognition points inference recognition to JSON result
28
filesrc decodebin gvaclassify gvaclassify gvaclassifygvadetect gvaidentify gvametaconvert gvawatermark
Video Analytics pipeline in Ad Insertion demo – facedetection plus age, gender, person recognition
gvaclassify
gst-launch-1.0 filesrc location=${FILE} !
decodebin !
gvadetect model=face-detection-adas-0001.xml model-proc=face-detection-adas-0001.json ! queue !
gvaclassify device=CPU model=age-gender-recognition.xml model-proc=age-gender-recognition.json ! queue !
gvaclassify device=GPU model=emotions-recognition.xml model-proc=emotions-recognition.json ! queue !
gvaclassify device=CPU,GPU model=landmarks-regression.xml model-proc=landmarks-regression.json ! queue !
gvaclassify model=face-reidentification.xml model-proc=face-reidentification.json ! queue !
gvaidentify gallery=face_gallery.json ! queue !
gvawatermark ! videoconvert ! fpsdisplaysink sync=false
Platform specific tuning: gvaclassify device={CPU|CPU,GPU} cpu-streams=15 nireq=16 …
31. Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ GStreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources
32. Video Analytics API levels
• Low-level (per frame processing)
▪ VAAPI, OpenVINO, OpenCL, OpenCV
• Pipeline level
▪ GStreamer, FFMPEG, MediaFoundation, …
• Video Analytics as Service
▪ REST/gRPC multi-node service
32
33. Video Analytics REST Service
33
• Provide RESTfulinterfacesfor executing
and monitoringpipeline status
• Interface agnostic to underlying
implementation(GStreamer,FFMPEG,
Custom backend)
• Supportscaling through container
deploymentand orchestration frameworks
• Load balancing
Video Analytics Pipeline Service
GStreamer / FFMPEG
REST / gRPC / Message Bus
VAAPI / MSDKHW
Acceleration
Pipeline
Support
Video Analytics
Normalization
Edge / Cloud
Integration
Open VINO
PipelineA PipelineB …
Pipeline Manager Model Manager
Model A Model B…
CPU GPU VPU
34. Video Analytics Service Developer
Workflow
34
HW / OS
Optimized
Docker Files
Pipelines
Configuration
json files
Docker
Build
Model
Configuration
json files
HW/ OS
Optimized
Libraries
VA Pipeline Developer creates
pipeline templates with
customizable parameters and
models
VA Pipeline Developer builds
Docker image
Developer / System Integrator
deploys containers to Cloud or Edge
and integrates into application
HTTP request
1 2 3
Application 2
Scheduler
Application 1
Worker
HTTP request
Worker
…
…
36. Video Analytics Service is part of Open
Visual Cloud
36
https://01.org/openvisualcloud
The Open Visual Cloud is an open source project that offers a set of pre-defined reference
pipelines for various target visual cloud use cases.
37. Agenda
▪ Video Analytics Usages
▪ Build a Video Analytics Framework for all Intel HWs
▪ FFMPEG Filter Implementation
▪ Gstreamer Plugin Implementation
▪ Video Analytics as REST Service
▪ Demo
▪ Resources
39. Summary
• We presented Intel video analytics e2e pipeline in GStreamer
and FFMPEG
• It supports multiple Intel HWs with same SW pipeline/API
• It provides optimized flow between media and DL inference, zero
copy, inference on NV12 image
• It provides scalable deployment cross edge/cloud
• Call to Action
• Try it out and let us know your feedback
40. Resources
▪ OpenVINO - https://github.com/opencv/dldt
▪ MediaSDK - https://github.com/Intel-Media-SDK
▪ GStreamer Plugin - https://github.com/opencv/gst-video-analytics
▪ Open Visual Cloud
▪ Smart City sample
https://github.com/OpenVisualCloud/Smart-City-Sample
▪ Ad Insertion sample
https://github.com/OpenVisualCloud/Ad-Insertion-Sample
▪ Docker files including FFMPEG Video Analytics Filters
https://github.com/OpenVisualCloud/Dockerfiles
40