The document discusses using computer vision and media analytics technologies to create new opportunities for value-added connected TV services. It describes using these technologies to capitalize on existing content rights, unlock new niche markets, create a better viewing experience, and grow the business. Specific technologies mentioned include speech-to-text, facial detection, emotion recognition, video summarization, object recognition, and face redaction.
2. Our OTT Vision
Capitalize on existing content rights
Unlock new niche markets
Create a better viewing experience
Create a closer connection to the audience
Grow the business
3. The one of the biggest network in the world
• 30 regions is generally available.
• 6 region is coming soon.
https://azure.microsoft.com/ja-jp/regions/
4. Platform Services
Security &
Management
Infrastructure Services
Web Apps
Mobile
Apps
API
Management
API
Apps
Logic
Apps
Notification
Hubs
HDInsight Machine
Learning
Stream
Analytics
Data
Factory
Event
Hubs
Mobile
Engagement
Active
Directory
Multi-Factor
Authentication
Automation
Portal
Key Vault
Store /
Marketplace
Hybrid
Operations
Backup
StorSimple
Site
Recovery
Import/Export
SQL
Database
DocumentDB
Redis
Cache Search
Tables
SQL Data
Warehouse
Azure AD
Connect Health
AD Privileged
Identity
Management
Operational
Insights
Cloud
Services
Batch Remote App
Service
Fabric Visual Studio
Application
Insights
Azure SDK
Team Project
VM Image Gallery
& VM Depot
Content Delivery
Network (CDN)
Media
Insights
VoD/Live
Transcoding
Azure Media
Player
Multi DRM VoD/Live Channel
Streaming
8. Plus a growing
ecosystem of value-add
third party partner
components
Live & On Demand
Streaming
with integrated CDN
Content
Protection
Encoding
&
Media Analytics
Cloud Upload
& Storage
Scalable components for building
custom media workflows in the cloud
Azure Media Services
Player
Clients
10. What will you do to differentiate your service?
How can you increase the value of content?
Can you make it easier to search – for text, faces, logos and images,
specific actions?
How can you pull more data out of the content to enhance
discoverability, viewability?
How can you deal more efficiently with legal and regulatory
compliance?
11. Make Video and Audio Searchable
Creating a database of rich metadata pulled directly out of the video
and audio content itself
Powerful new media processors
Speech-to-Text
Facial and Emotion Detection & Facial Redaction
Motion Detection, Stabilization, and Acceleration
Object, Character, and Logo Recognition
Automated Video Summarization
Azure Media Analytics – Enhancing Your Content
12. • Enables speech to text conversion
• Languages supported
• English & Spanish as GA
• German, French, Italian, Chinese, Portuguese, Arabic as Preview
• Use cases
• Deep Search & First-pass captions
• Capable of custom vocabulary adaptation
• User provides list of words related to video to improve speech recognition
Indexer
13. AZURE MEDIA INDEXER
TECHNICAL DETAILS
Azure Media
Indexer
Audio Decoding
Vocabulary Adaptation
Segmentation
Speech Recognition
Caption Alignment
Closed captions
(TTML/WebVTT/SAMI)
Audio or Video
MP4, WMV, MP3, M4A,
AAC, WAV, WMA
Audio Indexing Blob
(AIB) for use with SQL
Server and custom
Ifilter add-on (link)
Flexible metadata files
(keywords, word info)
14. • Detect faces that appear in your videos
• Track faces as they move around the frame
• Output Metadata with face locations and timestamps
• Age Detection
• Gender Detection
• Facial Recognition
Face
Detection
15. • Recognize the emotion of a person or crowd over time based on
the facial expressions in the video
• Designed for real emotions in-the-wild.
• Identifies emotions based on expressions that psychological research has identified as universal
• A solution for personalizing experiences, analyzing responses to
media and products, and crowd analytics
• Recognize: happiness, sadness, surprise, anger, contempt, fear,
digest, neutral
• Use cases – Audience Analytics, Personalization etc.
Emotion
Recognition
16. • Extract typeset words from video
content
• Select your own sampling rate to
balance performance and quality
• Specify where in the video to
look (e.g. bottom third for
captions)
• Output describes text with
location
Video OCR
Text: Who are we?
Location:
(200,100,250,50)
Time: 0:45:02
Text: Who are you and who
is the person sitting
next to you?
Location:
(100,250,350,90)
Time: 0:45:02
17. • Transforms first-person videos into smooth time-lapses
• Designed for forward-moving camera scenarios (action sports)
dash camera)
Hyperlapse
18. • Creates an automatic summary for videos to let people see a
preview or snapshot of their video
• Frames are selected based off of video quality, diversity, and
stability of the footage
Video
Summarization
19. • Detect video content policy violations
• Save time and money spent manually reviewing your content for
offensive, illicit and inappropriate material
• Currently supports adult content classification
Content
Moderation
20. Indexer Success Story
As a company dedicated to
building intelligent cloud
solutions across industries,
we’re excited to incorporate
Microsoft Azure Media
Analytics’ advanced machine
learning technology in
speech and vision onto our
platform.
Ryan Steelberg
President of Veritone Media and Co-Founder
of Veritone Inc.
"
"
Veritone’s Cognitive Media Platform (CMP) is an open cloud
ecosystem of cognitive tools to harness the power of media
21. GrayMeta™ - The video & metadata experts
behind MetaFarm
MetaFarm is a powerful platform that tackles big data and metadata
problems, saving business’s time and money and bringing insight to the
data that is already there.
• Connect to dispersed and siloed data for the right reasons, enabling
easier migration and adoption of Azure and other services
• Extract embedded metadata from any file type across all file systems,
databases and data feeds
• Create new metadata leveraging the exponential growth of cognitive,
machine learning & AI services powered by Azure and the Cortana Suite
Easy Upload to Azure
(Signiant)
Powerful Search &
Discovery
Review, Consume, Share and
Take Action
Bring Cognitive, AI & Machine Learning
to the data in a easy to digest way
23. SORT, ORGANIZE & ACCESS
“What data do I really have? How many duplicates?”
Find and organize all assets / data across all departments, which also helps prior to
data migration.
SEARCH & DISCOVERY ACROSS ENTERPRISE DATA
SILO’s
“I need the same asset another group has, why do I need to create or source it again?”
Leverage data / content across Linear, VOD and OTT within broadcaster’s multiple data locations
and increase efficiencies across the enterprise while driving cost down.
TIME & COST EFFICENCIES
“I need more data quicker and the FTE approach doesn’t have the ROI.”
Bring efficiencies to the workplace, reducing the need for manual tagging and labeling of content
and reducing the time to access the right data by 10x.
25. Celebrity
Search
• Input – Entertainment videos
• Output – Search index based on celebrities in videos
• User should be able to search for videos where
• Celebrity X and Y were sighted together
• Celebrity X said certain words or phrases
• …..
Media
Analytics
Cognitive
Vision API
Azure
Search
26. Face
recognition
• Input – Videos of any type (entertainment, surveillance etc.)
• Output – Search index based on list of known faces
• User should be able to search for videos where
• Person X and Y were sighted together
• Person X said certain words or phrases
• …..
Media
Analytics
Cognitive
Face API
Azure
Search
27. Audio
redaction
• Input – Videos of any type
• Output – Videos with keywords redacted
• Useful in following scenarios
• Identity protection (security videos)
• Applying censorship (broadcasting on public channels)
• …..
Media
Indexer
Transcript
Filter
AMS
Audio
OverlaysRedaction
timecodes
28. • Identifies objects and categories that are within a video frame
• Uses a trained model with over 2000 tags
• Output metadata with video tags by frames
Video
Tagging
29. • Identifies the actions taking place within a sequence of frames
• Starting with 61 categories (46 sports + 15 daily activities)
• Output metadata with action and time stamp
Action
Recognition
30. • Protect the Identities of individuals by blurring the video
• Automatically detect and redact faces
• Tag and blur identifiable information in dynamic settings such as
License Plates
Redaction
31. Computer Vision and Media Analytics
Imagine what you can
do with these services?