Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Sessions

Explore how to build a unified framework based on FFmpeg and GStreamer to enable video analytics on all Intel® hardware, including CPUs, GPUs, VPUs, FPGAs, and in-circuit emulators.

  1. 1. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  2. 2. Charlie Wang, Peng Tu, Mikhail Nikolsky, Jerry Dong Building a Deep Learning Video Analytics Framework for Intel AI Platforms SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  3. 3. Speakers • Charlie Wang • Principal Engineer, VTT • Peng Tu • Principal Engineer, CPDP • Mikhail Nikolsky • Sr. Staff Engineer, CPDP 3 Intel Architecture Graphics Software (IAGS)
  4. 4. Agenda ▪ Video Analytics Usages ▪ Build a Video Analytics Framework for all Intel HWs ▪ FFMPEG Filter Implementation ▪ GStreamer Plugin Implementation ▪ Video Analytics as REST Service ▪ Demo ▪ Resources
  5. 5. Intelligence on Video Data Retailanalytics Industrialinspection Content filtering Parking management Super Resolution Autonomous driving Action recognitionEncode Quality Control
  6. 6. Intel Video Analytics HW 6 Intel® CPU Client and server Intel® Vision AcceleratorDesign with Intel® Movidius™ Vision Processing Units (VPU) Intel® GPU Integrated and discrete Intel® Vision AcceleratorDesign with an Intel® Arria 10 FPGA (preview) Scalar Vector Matrix Spatial
  7. 7. Typical Video Analytics Flow Decode Scale /csc Inference Object tracking Post processing + Encode 720p 1080p 4K (AVC, HEVC) Object Detection Image Segmentation ObjectTracking • LKT • IOU Crop/scale Bounding box 720P 1080p (AVC, HEVC) Output: 224x224 RGBP Objects Recognition Image Classification Action Recognition Crop scale Inference Inference
  8. 8. Map to Intel Hardwares Decode Scale /csc Inference Object tracking Post processing + Encode Crop scale Inference Inference CPU Media FF CPU GPU VPU FPGA CPU GPU Programmable CPU Media FF GPU CPU Media FF GPU CPU GPU VPU FPGA Video analyticsrequire heavy video and compute interaction
  9. 9. Agenda ▪ Video Analytics Usages ▪ Build a Video Analytics Framework for all Intel HWs ▪ FFMPEG Filter Implementation ▪ GStreamer Plugin Implementation ▪ Video Analytics as REST Service ▪ Demo ▪ Resources
  10. 10. Popular video Processing software frameworks Video and audio demux, decode, processing, encoding, rendering, and muxing, also allow customizedplugin/filter
  11. 11. Intel Video software Offering 11 CPU GPU Media FF FPGA VAAPI/DXVA Next Gen Media Library SW codec App FFMPEG GStreamer App App Customized Framework FPGA driver Intel GPU media FF support video decode/encode and video processing
  12. 12. Intel Computer VISION Software Offering Deep Learning for Computer Vision Accelerate and deploy convolutional neuralnetworks (CNN) on Intel® platforms with the Deep Learning DeploymentToolkit includedOpenVINO Traditional Computer Vision Develop and optimize classic computer vision applications built with the OpenCV library or OpenVX API.
  13. 13. OpenVINO for Inference 13 CPU GPU VPU MKLDNN GPU PluginCPU Plugin DL Inference Engine API FPGA MVNC VPU Plugin DLA FPGA Plugin Heterogeneous Execution Engine CLDNN Inference App Single interface supports all platforms, no SW change. Library designed for CNN inference accelerationon Intel HW
  14. 14. Inference Engine This execution engine uses a common API to deliver inference solutionson the platform of your choice: CPU, GPU, VPU, or FPGA. Model Optimizer This Python*-based command line tool imports trained models from popular deep learning frameworks such as Caffe*, TensorFlow*, and Apache MXNet*, and Open Neural Network Exchange (ONNX). Intel Deep Learning Deployment Toolkit
  15. 15. Now let’s build the video analytics framework Write-once, Deploy on All HWs
  16. 16. Video Analytics Framework Video Analytics Application CPU GPU VPU FPGA VAAPI Media Driver MKLDNN ComputeDriver Next Gen Media Library OpenVINO CLDNN DLA CPU codec FFMPEG Plugin Video Analytics Application Gstreamer Plugin Video Analytics Application Your own framework NV12 The framework supports load balanceamong devices
  17. 17. Media & Compute interoperability • Media and Compute/Inferenceuse differentcolor format Tiled NV12 Surface Decode Linear RGBP SurfaceScaling/csc Meta dataInference A copy is required if we don’t handlebuffer sharing correctly Common media format – YUV with padding, pitch, etc. Inference format – tensor array
  18. 18. Media format as a compute/Inference Format Decode NV12 SurfaceScaling Meta dataInferenceC Y channel Layer1 (csc+convolution) weight1 weight2 weight3 bias csc R G B UV channel Inference time reduced by 3% to 20% depends on resolution Model Resolution Time reduction GoogleNetV1 224x224 3.6% YoloTinyV1 448x448 9.3% Mtcnn 1280x720 20.9%
  19. 19. Video Analytics e2e flow Trained Model OfflineModel Opt Model IR DLDT Inference Engine API Application Inference Engine DLDT MSDK Library GSTVA DEC/ENC/VPP API MSDK DEC/ENC/VPP/INF API FFMPEG-VA/GST-VA VA Pipeline DesignVideo sources VA Pipeline Implementation
  20. 20. Agenda ▪ Video Analytics Usages ▪ Build a Video Analytics Framework for all Intel HWs ▪ FFMPEG Filter Implementation ▪ GStreamer Plugin Implementation ▪ Video Analytics as REST Service ▪ Demo ▪ Resources
  21. 21. 21 FFMPEG componentsCommands (console) ffmpeg ffplay ffprobe ffserver Libraries libavdevice libavformat libavcodec libavfilter libavutil libpostproc libavresample libswresamplemux / demux Libraryfora/vfilters which to implement all kind ofeffects,such as scale, crop, frc, etc Commandtoolto do transcoding Simple playerwith SDL usingffmpeg demux/decoder Tool to extract the informationofmulti- media stream Real-time stream server to broadcast multi-mediastream A Library to implement mostofA/Vcodec, and used bymost ofpopular codectools Common tool library We are adding inference as filters
  22. 22. 22 Tensorflow ffmpeg or ffplay CPU GPU FPGAVPU DNN INTERFACEDNNModel DNNModel Tensorflow Backend Inference Engine Backend InferenceEngine(OpenVINO) MKLD NN clDNN Movidius DLA SR Filter Classify FilterDetect Filter APP LIBAVFILTER 3RD LIBRARIESHW Kafka produce r Meta data muxer LIBAVFORMAT Librdkafka FFmpeg Filter VA Architecture new FFmpeg hardware 3rd party …
  23. 23. 23 • Face detection + emotion&age_gender recognition: ffmpeg -i clip.mp4 -vf “ detect=model=$DETECT_MODEL1:model_proc=$MODEL1-JSON:device=$CPU, classify=model=$EMOTION_MODEL2:model_proc=&MODEL2-JSON:device=$CPU, classify=model=$AGE_GENDER_MODEL3:model_proc=$MODEL3-JSON:device=$CPU” -an -y -f iemetadata metadata.json The pipelines look like ... 23 input decoder face emotion age-gender convert send to detection recognition recognition to JSON kafka server stream detect classify classify iemetadata Kafkadecode
  24. 24. Agenda ▪ Video Analytics Usages ▪ Build a Video Analytics Framework for all Intel HWs ▪ FFMPEG Filter Implementation ▪ GStreamer Plugin Implementation ▪ Video Analytics as REST Service ▪ Demo ▪ Resources
  25. 25. WHY AddING VA to Gstreamer • Multiplatform:Linux, Windows, Mac OS X, Android, … • Comprehensivecore:graph-based multi-threadedpipeline, lightweightdata passing • Broad coverageof media technologies: file and streamingpacket i/o, codecs, metadata,video and audio • Extensivedevelopmenttools: gst-launch, Python, C++ API • Easy to extend and reusethrough plugins and metadata 25
  26. 26. GStreamer Plugins Architecture 26 GStreamer API Application MetadataPer-plugin params/API pipeline control pipeline events inference plugins configuration inference res ults zero-copy,multi-channel inference OpenVINO Inference Engine MKLDNN plugin clDNN plugin MDK plugin iGPU/ dGPU CPU KMB/ HDDL VAAPI libav/ ffmpeg VAAPI SVT GPU i/d/VSI CPU GPU i/d/VSI CPU RTSP, WebRTC, Render, File IO, … V4L2 Media HW acceleration ImageInference API Kafka MQTT Video sources gvainference Video Analytics plugins (GVA*)Media plugins HW decode SW decode HW encode SW encode Other plugins 200+ plugins gvadetect gvaclassify gvaidentify Meta convert publish decodebin
  27. 27. GVA Inference Plugins Architecture 27 Inference Shared Instance OpenVINO IEOpenVINO IE GvaDetect Thread sink padDMABuf or RGBx (any resolution) Attach GstMeta to GstBuffer source pad Output Layer Post-Processing GstBuffer + GstMeta’s VASurface or RGBx VAAPI original GstBuffer queueInput Layer Pre-Processing Inference queue per device DownScale NV12→RGBP
  28. 28. GStreamer pipeline example input HW/SW face age/gender emotion landmark re-identify face convert overlay decode detection classification recognition points inference recognition to JSON result 28 filesrc decodebin gvaclassify gvaclassify gvaclassifygvadetect gvaidentify gvametaconvert gvawatermark Video Analytics pipeline in Ad Insertion demo – facedetection plus age, gender, person recognition gvaclassify gst-launch-1.0 filesrc location=${FILE} ! decodebin ! gvadetect model=face-detection-adas-0001.xml model-proc=face-detection-adas-0001.json ! queue ! gvaclassify device=CPU model=age-gender-recognition.xml model-proc=age-gender-recognition.json ! queue ! gvaclassify device=GPU model=emotions-recognition.xml model-proc=emotions-recognition.json ! queue ! gvaclassify device=CPU,GPU model=landmarks-regression.xml model-proc=landmarks-regression.json ! queue ! gvaclassify model=face-reidentification.xml model-proc=face-reidentification.json ! queue ! gvaidentify gallery=face_gallery.json ! queue ! gvawatermark ! videoconvert ! fpsdisplaysink sync=false Platform specific tuning: gvaclassify device={CPU|CPU,GPU} cpu-streams=15 nireq=16 …
  29. 29. Multiple Programming language BINDINGS• Easily build pipeline from C/C++, Python in addition to the gst-launch command 29
  30. 30. GStreamer Video Analytics Plugins List 30 GST element Description INPUT OUTPUT gvainference Generic inference GstBuffer INPUT+ GvaTensorMeta gvadetect Objectdetection GstBuffer INPUT+ GstVideoRegionOfInterestMeta gvaclassify Objectclassification GstBuffer + GstVideoRegionOfInterestMeta INPUT+ GstVideoRegionOfInterestMeta gvaidentify Objectidentification/ recognition GstBuffer + GstVideoRegionOfInterestMeta INPUT+ GstVideoRegionOfInterestMeta gvametaconvert Metadata conversion GstBuffer +GvaTensorMeta/ GstVideoRegionOfInterestMeta INPUT+ GvaJSONMeta gvawatermark Overlay GstBuffer +GvaDetectionMeta + { GvaTensorMeta } INPUT(with modified image) gvametapublish Messagebus (Kafka, MQTT) GstBuffer +GvaJSONMeta -
  31. 31. Agenda ▪ Video Analytics Usages ▪ Build a Video Analytics Framework for all Intel HWs ▪ FFMPEG Filter Implementation ▪ GStreamer Plugin Implementation ▪ Video Analytics as REST Service ▪ Demo ▪ Resources
  32. 32. Video Analytics API levels • Low-level (per frame processing) ▪ VAAPI, OpenVINO, OpenCL, OpenCV • Pipeline level ▪ GStreamer, FFMPEG, MediaFoundation, … • Video Analytics as Service ▪ REST/gRPC multi-node service 32
  33. 33. Video Analytics REST Service 33 • Provide RESTfulinterfacesfor executing and monitoringpipeline status • Interface agnostic to underlying implementation(GStreamer,FFMPEG, Custom backend) • Supportscaling through container deploymentand orchestration frameworks • Load balancing Video Analytics Pipeline Service GStreamer / FFMPEG REST / gRPC / Message Bus VAAPI / MSDKHW Acceleration Pipeline Support Video Analytics Normalization Edge / Cloud Integration Open VINO PipelineA PipelineB … Pipeline Manager Model Manager Model A Model B… CPU GPU VPU
  34. 34. Video Analytics Service Developer Workflow 34 HW / OS Optimized Docker Files Pipelines Configuration json files Docker Build Model Configuration json files HW/ OS Optimized Libraries VA Pipeline Developer creates pipeline templates with customizable parameters and models VA Pipeline Developer builds Docker image Developer / System Integrator deploys containers to Cloud or Edge and integrates into application HTTP request 1 2 3 Application 2 Scheduler Application 1 Worker HTTP request Worker … …
  35. 35. CloudEdge End2end DISTRIBUTED VIDEO analytics examplePOST/pipelines/vehicles/1 { "source": "rtsp://", "destination": { “type": “Kafka", “hosts": [] } } Video Analytics REST Service REST RTSP { "camera_id" : “173207954”, "label": "vehicle", "object_id": 3, "bounding_box": [0.13958, 0.33766, 0.42094, 0.06687], "confidence": 0.83 } NVR storage RTSP Database web dashboard Analytics algorithm
  36. 36. Video Analytics Service is part of Open Visual Cloud 36 The Open Visual Cloud is an open source project that offers a set of pre-defined reference pipelines for various target visual cloud use cases.
  37. 37. Agenda ▪ Video Analytics Usages ▪ Build a Video Analytics Framework for all Intel HWs ▪ FFMPEG Filter Implementation ▪ Gstreamer Plugin Implementation ▪ Video Analytics as REST Service ▪ Demo ▪ Resources
  38. 38. Demo • 4 channel face detection • Smart city, edge to cloud 38
  39. 39. Summary • We presented Intel video analytics e2e pipeline in GStreamer and FFMPEG • It supports multiple Intel HWs with same SW pipeline/API • It provides optimized flow between media and DL inference, zero copy, inference on NV12 image • It provides scalable deployment cross edge/cloud • Call to Action • Try it out and let us know your feedback
  40. 40. Resources ▪ OpenVINO - ▪ MediaSDK - ▪ GStreamer Plugin - ▪ Open Visual Cloud ▪ Smart City sample ▪ Ad Insertion sample ▪ Docker files including FFMPEG Video Analytics Filters 40