Computer Vision and Media Analytics Creating New Opportunities

Computer
Vision and
Media
Analytics
Tony Emerson
Managing Director
Worldwide Media and Cable
Creating New Opportunities
for Value-Add Connected TV
Services
1

Our OTT Vision
 Capitalize on existing content rights
 Unlock new niche markets
 Create a better viewing experience
 Create a closer connection to the audience
 Grow the business

The one of the biggest network in the world
• 30 regions is generally available.
• 6 region is coming soon.
https://azure.microsoft.com/ja-jp/regions/

Platform Services
Security &
Management
Infrastructure Services
Web Apps
Mobile
Apps
API
Management
API
Apps
Logic
Apps
Notification
Hubs
HDInsight Machine
Learning
Stream
Analytics
Data
Factory
Event
Hubs
Mobile
Engagement
Active
Directory
Multi-Factor
Authentication
Automation
Portal
Key Vault
Store /
Marketplace
Hybrid
Operations
Backup
StorSimple
Site
Recovery
Import/Export
SQL
Database
DocumentDB
Redis
Cache Search
Tables
SQL Data
Warehouse
Azure AD
Connect Health
AD Privileged
Identity
Management
Operational
Insights
Cloud
Services
Batch Remote App
Service
Fabric Visual Studio
Application
Insights
Azure SDK
Team Project
VM Image Gallery
& VM Depot
Content Delivery
Network (CDN)
Media
Insights
VoD/Live
Transcoding
Azure Media
Player
Multi DRM VoD/Live Channel
Streaming

Rio Olympics most successful in history

Microsoft Azure and Partners deliver globally

Plus a growing
ecosystem of value-add
third party partner
components
Live & On Demand
Streaming
with integrated CDN
Content
Protection
Encoding
&
Media Analytics
Cloud Upload
& Storage
Scalable components for building
custom media workflows in the cloud
Azure Media Services
Player
Clients

Wide Adoption
Premium video
on-demand
content,
broadcasts & live
event streaming,
online video
platforms for web
and mobile,
enterprise video
management….
And more!

What will you do to differentiate your service?
How can you increase the value of content?
Can you make it easier to search – for text, faces, logos and images,
specific actions?
How can you pull more data out of the content to enhance
discoverability, viewability?
How can you deal more efficiently with legal and regulatory
compliance?

Make Video and Audio Searchable
Creating a database of rich metadata pulled directly out of the video
and audio content itself
Powerful new media processors
 Speech-to-Text
 Facial and Emotion Detection & Facial Redaction
 Motion Detection, Stabilization, and Acceleration
 Object, Character, and Logo Recognition
 Automated Video Summarization
Azure Media Analytics – Enhancing Your Content

• Enables speech to text conversion
• Languages supported
• English & Spanish as GA
• German, French, Italian, Chinese, Portuguese, Arabic as Preview
• Use cases
• Deep Search & First-pass captions
• Capable of custom vocabulary adaptation
• User provides list of words related to video to improve speech recognition
Indexer

AZURE MEDIA INDEXER
TECHNICAL DETAILS
Azure Media
Indexer
Audio Decoding
Vocabulary Adaptation
Segmentation
Speech Recognition
Caption Alignment
Closed captions
(TTML/WebVTT/SAMI)
Audio or Video
MP4, WMV, MP3, M4A,
AAC, WAV, WMA
Audio Indexing Blob
(AIB) for use with SQL
Server and custom
Ifilter add-on (link)
Flexible metadata files
(keywords, word info)

• Detect faces that appear in your videos
• Track faces as they move around the frame
• Output Metadata with face locations and timestamps
• Age Detection
• Gender Detection
• Facial Recognition
Face
Detection

• Recognize the emotion of a person or crowd over time based on
the facial expressions in the video
• Designed for real emotions in-the-wild.
• Identifies emotions based on expressions that psychological research has identified as universal
• A solution for personalizing experiences, analyzing responses to
media and products, and crowd analytics
• Recognize: happiness, sadness, surprise, anger, contempt, fear,
digest, neutral
• Use cases – Audience Analytics, Personalization etc.
Emotion
Recognition

• Extract typeset words from video
content
• Select your own sampling rate to
balance performance and quality
• Specify where in the video to
look (e.g. bottom third for
captions)
• Output describes text with
location
Video OCR
Text: Who are we?
Location:
(200,100,250,50)
Time: 0:45:02
Text: Who are you and who
is the person sitting
next to you?
Location:
(100,250,350,90)
Time: 0:45:02

• Transforms first-person videos into smooth time-lapses
• Designed for forward-moving camera scenarios (action sports)
dash camera)
Hyperlapse

• Creates an automatic summary for videos to let people see a
preview or snapshot of their video
• Frames are selected based off of video quality, diversity, and
stability of the footage
Video
Summarization

• Detect video content policy violations
• Save time and money spent manually reviewing your content for
offensive, illicit and inappropriate material
• Currently supports adult content classification
Content
Moderation

Indexer Success Story
As a company dedicated to
building intelligent cloud
solutions across industries,
we’re excited to incorporate
Microsoft Azure Media
Analytics’ advanced machine
learning technology in
speech and vision onto our
platform.
Ryan Steelberg
President of Veritone Media and Co-Founder
of Veritone Inc.
"
"
Veritone’s Cognitive Media Platform (CMP) is an open cloud
ecosystem of cognitive tools to harness the power of media

GrayMeta™ - The video & metadata experts
behind MetaFarm
MetaFarm is a powerful platform that tackles big data and metadata
problems, saving business’s time and money and bringing insight to the
data that is already there.
• Connect to dispersed and siloed data for the right reasons, enabling
easier migration and adoption of Azure and other services
• Extract embedded metadata from any file type across all file systems,
databases and data feeds
• Create new metadata leveraging the exponential growth of cognitive,
machine learning & AI services powered by Azure and the Cortana Suite
Easy Upload to Azure
(Signiant)
Powerful Search &
Discovery
Review, Consume, Share and
Take Action
Bring Cognitive, AI & Machine Learning
to the data in a easy to digest way

Azure Consumption
• Storage
• Compute
• Cognitive Services
- Vision
- Speech
- Language
- Knowledge
- Search

SORT, ORGANIZE & ACCESS
“What data do I really have? How many duplicates?”
Find and organize all assets / data across all departments, which also helps prior to
data migration.
SEARCH & DISCOVERY ACROSS ENTERPRISE DATA
SILO’s
“I need the same asset another group has, why do I need to create or source it again?”
Leverage data / content across Linear, VOD and OTT within broadcaster’s multiple data locations
and increase efficiencies across the enterprise while driving cost down.
TIME & COST EFFICENCIES
“I need more data quicker and the FTE approach doesn’t have the ROI.”
Bring efficiencies to the workplace, reducing the need for manual tagging and labeling of content
and reducing the time to access the right data by 10x.

Video Stream Networks S.L. – Copyright © 2016 www.vsn-tv.com
MICROSOFT AZURE
• VSN is a leading End-to-End IT
developer company for the Broadcast
and the M&E Industries, with over 1000
clients in more than 100 countries.
• VSNEXPLORER provides corporations a
secure, always-on media asset
management solution that allows
companies and users to collaborate
with their media archive, optimizing
their processes and enhancing their
capabilities from any location
MICROSOFT AZURE MEDIA SERVICES
VSNEXPLORER and Azure Media Services
Integration with Speech to Text and Translation
VSNEXPLORER working with Azure Media Services

Celebrity
Search
• Input – Entertainment videos
• Output – Search index based on celebrities in videos
• User should be able to search for videos where
• Celebrity X and Y were sighted together
• Celebrity X said certain words or phrases
• …..
Media
Analytics
Cognitive
Vision API
Azure
Search

Face
recognition
• Input – Videos of any type (entertainment, surveillance etc.)
• Output – Search index based on list of known faces
• User should be able to search for videos where
• Person X and Y were sighted together
• Person X said certain words or phrases
• …..
Media
Analytics
Cognitive
Face API
Azure
Search

Audio
redaction
• Input – Videos of any type
• Output – Videos with keywords redacted
• Useful in following scenarios
• Identity protection (security videos)
• Applying censorship (broadcasting on public channels)
• …..
Media
Indexer
Transcript
Filter
AMS
Audio
OverlaysRedaction
timecodes

• Identifies objects and categories that are within a video frame
• Uses a trained model with over 2000 tags
• Output metadata with video tags by frames
Video
Tagging

• Identifies the actions taking place within a sequence of frames
• Starting with 61 categories (46 sports + 15 daily activities)
• Output metadata with action and time stamp
Action
Recognition

• Protect the Identities of individuals by blurring the video
• Automatically detect and redact faces
• Tag and blur identifiable information in dynamic settings such as
License Plates
Redaction

Computer Vision and Media Analytics
Imagine what you can
do with these services?

#azurejp
https://www.facebook.com/dahatake/
https://twitter.com/dahatake/
https://github.com/dahatake/
https://daiyuhatakeyama.wordpress.com/

Speech-to-
text
話しているテキストを
抽出
現在、8言語対応
Face &
Emotion
detection
顔のカウントおよび
性別・年齢・感情の判
定
Hyperlapse
スタビライザーとタイ
ムラプス
Video
summarizatio
n
ハイライトシーンによ
る
サマリービデオの自動
作成
Motion
detection
動きのあった箇所の検
知
Object
Character
Recognition
(OCR)
ビデオ内の画像から、
テキストを抽出
450 6th St.
San
Francisco
Face
Redaction
特定の人の顔に
ぼかしを入れる

Computer Vision and Media Analytics Creating New Opportunities

Computer Vision and Media Analytics Creating New Opportunities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Computer Vision and Media Analytics Creating New Opportunities

Similar to Computer Vision and Media Analytics Creating New Opportunities (20)

More from Daiyu Hatakeyama

More from Daiyu Hatakeyama (20)

Recently uploaded

Recently uploaded (20)

Computer Vision and Media Analytics Creating New Opportunities