Video AI for Media and Entertainment Industry

Video AI for Media and
Entertainment Industry
Albert Y. C. Chen, Ph.D.
Vice President, R&D
Viscovery

Albert Y. C. Chen, Ph.D.
陳彥呈博⼠士
• Experience
2017-present: Vice President of R&D @ Viscovery
2016-2017: Chief Scientist @ Viscovery
2015: Principal Scientist @ Nervve Technologies
2013-2014 Computer Vision Scientist @ Tandent
2011-2012 @ GE Global Research
• Education
Ph.D. in Computer Science, SUNY-Buffalo
M.S. in Computer Science, NTNU
B.S. in Computer Science, NTHU

Viscovery = Video Discovery
Optical Character
Recognition
Offline
Recognition
2013
2014
Product Recognition
2015
Video Content related
Advertisements
2017
Wearable Devices
Video Content Discovery &
Interaction
2016
Leading provider of Video AI analytic products

Current AI does not “solve it all”
appl.
layer
tech
layer
infra
layer
solution
platform
libraries
modules
data
machine computing power
data accumulation via open API
AI/DNN library AI/DNN library
gen purpose
platforms
gen purpose
platforms
app-specific
platforms
app-specific
platforms
app app app app app
HW
co.
VerticalAIStartups
agri. manu. med. fin. retail trans.
E.g., 1: Google, Amazon, FB, 2: IBM, 3: Walmart, 5: NVidia

Vertical AI
Solving industry-speciﬁc problems by combining
AI and Subject Matter Expertise.
• Full Stack Products
• Subject Matter Expertise
• Proprietary Data
• AI delivers core value
(Bradford Cross, 2017/06/14)

Media & Entertainment
Industry’s challenge
• Internet Era: Make content free, maximize trafﬁc,
ad revenue waiting at the end of the rainbow?
• It worked for nearly 20 years, with Google and
Facebook being the only beneﬁciary; they control
75% of digital ad revenue, 99% of future growth.
• Is this business model still working? Does it work
for others? The latest unicorns from Silicon Valley
are suggesting otherwise.

Content Farms, maximizing trafﬁc,
killing the Internet along the way.

NY Time saying no. WSJ and
many others are following.
Source: https://www.nytimes.com/projects/2020-report/

People are willing to pay for
good content

The curveball: App Stores
and News Syndicators!
• News Republic (acquired for 57M use, Aug 2016)
• 12.5 million daily active users
• 60k USD annual revenue
• 今⽇日頭條 (toutiao.com)
• 80 million daily active users.
• 1B USD annual revenue.

Pay source, or pay platform?
• Platform:
• More focus, less distraction: news focus on
content instead of customer service, software
development, etc.
• Potential Problem:
• Facebook and Google control 75% of all trafﬁc
and 99% of expected future growth?

Netflix
• Netflix spends $250m USD yearly on
personalization and content recommendation.
• 104m subscribers worldwide; 52m in US (75%
market penetration, #1 in US, Youtube #2 at
53%)
• Netflix subscribers watch 19 days per month, for
28H/month (#2, less than Dish’s 47 H/month)

Netﬂix annual revenue
(2002—2016)
https://www.statista.com/statistics/272545/annual-revenue-of-netﬂix/

Netﬂix net income
(2000—2016)
https://www.statista.com/statistics/272561/netﬂix-net-income/

People are willing to pay, for
good content, good service.

The evolution of methods for
monetizing text/video content
Struggling
Traditional
Media
Free Content
Ad Revenue
Subscription
Revenue
2000 2005 2010
Do nothing?
Sitting Duck.
Improve
Ad Revenue?
Ad Tech
now
Video
Content-related
ads
Own platform?
shared
platform, licensed
content?
tailored
recommendations
(improve UX & stickiness)
(user & video
content related
recommendations)
Video Data
Mining

If we already have such precise
indexing of video content
Jay Chao
singing A
dancing B
wearing C
with items D
in front of E
at time F?• We will disrupt:
• advertisement
• e-commerce
• online video platform ecosystem
• screenwriting, ﬁlm producuction and ﬁlm editing..

Video content-related
advertisements
Previous moment: dining scene Insert Food Deliver Service ad Next Moment: dining scene
饿了了吗？快点饿了了么！
Food Delivery Service Ad:
Previous moment: dining scene Insert KFC ad Next second: dining scene
炸鸡红包快
来抢！
Restaurant Ad:

advertisements
Previous moment: driving scene Insert Automobile ad Next moment: driving scene
Automobile Ad:
Consumer Electronics Ad:

interactive shopping

Recommendations

Video-content insights
(for producers, writers, editors)
Viscovery’s video insight publication on “Ode to Joy 2”

Mining Video Content with
Computer Vision
• 85% of data are unstructured, e.g., videos.
• Previously, videos need manual tagging before its
content can be indexed and further utilized.
• Computer Vision is the AI subﬁeld that focuses on
recognizing and understanding visual content.

What algorithms do we need?
Face Motion
Image
scene Text Audio Object
Semantics

Where are we now?
• Face
• Object
• Scene
• Logos
• Text
• Audio
• Motion
• Semantics

Where are new now?
Face Recognition
• 1 to 1: 99%+
• 1 to 100: 90%
• 1 to 10,000:
50%-70%.
• 1 to 1M: 30%.
LFW dataset, common FN↑, FP↓

Where are we now?
Image Scene Classiﬁcation
• MIT Places 365
dataset.
• top-5 accuracy
rates >85%.

Where are we now?
Object Detection & Classification
• ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)
• 1000+ classes, 1.2M images.
0
0.125
0.25
0.375
0.5
11 12 13 14 11 12 13 14
classification
error
classification
+localization error

Putting things together is not
trivial and often very messy.
Classical Workflow:
1. Data collection
2. Feature Extraction
3. Dimension Reduction
4. Classifier (re)Design
5. Classifier Verification
6. Deploy
Modern Brute-force workflow
1. Data collection
2. Throw everything into a Deep Neural Network
3. Mommy, why doesn’t it work ???

Classical Problem #1:
Curse of Dimensionality
坐
ze
sit
って
앉다
sentarse
• Number of Variables vs Number of Samples
Q. Who would make such naive mistakes?
A. Many “newbies” repeatedly do so.

Example 1-1:
illegal parking detection
legal parking samples x100 illegal parking samples x100
Let’s train a 150-layer Res-Net!!!
What could possibly go wrong?

Example 1-1:
illegal parking detection
• Data: try cleaner data
• Feature: fine-tune with pre-trained model; don’t
train from scratch
• Classifier overfitting: beware of statistical
coincidences,

Example 1-2: Smart Photo
Album with Google Cloud Vision

Example 1-2: Smart Photo
Album with Google Cloud Vision
No effective distance measure for thousands,
if not millions of dimensions (tags); would be
approximately zero most of the time.

Classical Problem #2:
Overfitting Data
• Make sure your deep learning algorithm is
learning better features for data, not overfitting
the data with complex classifiers.

Luckily, we’re in AI startup boom!
(BCG AI Report, 2016/10)
appl.
layer
tech
layer
infra
layer
solution
platform
libraries
modules
data
machine computing power
data accumulation via open API
AI/DNN library AI/DNN library
gen purpose
platforms
gen purpose
platforms
app-specific
platforms
app-specific
platforms
app app app app app
HW
co.
VerticalAIStartups
agri. manu. med. fin. retail trans.
E.g., 1: Google, Amazon, FB, 2: IBM, 3: Walmart, 5: NVidia

Vertical AI Startups
Solving industry-speciﬁc problems by combining
AI and Subject Matter Expertise.
• Full Stack Products
• Subject Matter Expertise
• Proprietary Data
• AI delivers core value
(Bradford Cross, 2017/06/14)

Examples of Vertical AI
beating General Purpose AI

TOP 5 TAGS COMPARISON
TAG
AD PLACEMENT
VALUE
TAG
AD PLACEMENT
VALUE
Person Low
Coulee Nazha
(actress)
High
Anime Low Sean Sun (actor) High
Screenshot Low Back of smartphone High
Cartoon Low Female Medium
Adult Medium Young Medium
“FIRST LOVE” DRAMA SERIES SCENE
Competitive Analysis
Baidu vs. Viscovery
TOP 5 TAGS COMPARISON
TAG (Man’s Face)
AD PLACEMENT
VALUE
TAG
AD PLACEMENT
VALUE
Age: 32 Medium Necklace High
Asian Medium Baseball cap High
Male Medium Bracelet High
Not smiling Low (inaccurate) Ziwen Wang High
Examples of Vertical AI
beating General Purpose AI

Use AI to turn unstructured
video data into a gold mine!
60 mins0 mins
服饰汽⻋车
代⾔言⼈人
聚会
⼿手机
居家
z
CTR: 0.2%
60 mins0 mins
旅游活⼒力力汽⻋车
⼯工作聊天
z
60 mins0 mins
学习
using only physical tags
for recommendation
CTR: 0.9%
CTR: 2.0%
z
z
Smartphone Ad physical plus abstract
and emotional tags
physical, abstract and
emotional tags plus feedback
客厅
欢乐客厅
聊天⼯工作⼿手机代⾔言⼈人欢乐旅游

Thank you!
albert@viscovery.com

Video AI for Media and Entertainment Industry

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Video AI for Media and Entertainment Industry

Similar to Video AI for Media and Entertainment Industry (20)

More from Albert Y. C. Chen

More from Albert Y. C. Chen (16)

Recently uploaded

Recently uploaded (20)

Video AI for Media and Entertainment Industry