SlideShare a Scribd company logo
1 of 216
Computer Vision Crash Course
Jia-Bin Huang
Electrical and Computer Engineering
Virginia Tech
Today’s agenda
14:00 - 15:20 • A little about me
• Introduction to computer vision
15:20 - 15:40
• Coffee break
15:40 - 17:00
• Fundamentals of computer vision
• Recent advances
What this crash course is about?
• Learn cool applications in computer vision
• Understand the fundamentals
• light, geometry, matching, alignment, grouping, recognition
• Pointers to explore further
WARNING
The following contains
shameless self-promotion,
and may cause audience to
groan, scoff, or ill. Consider
reading at your own risk
About me
7 months old!
National Chiao-Tung University
B.S. in EE
IIS, Academia Sinica
Research Assistant
UIUC
Ph.D. in ECE 2016
UC, Merced
Visiting Student
Microsoft Research
Research Intern
2012, 2013
Disney Research
Research Intern 2014
Quick facts
• Faculty members x 86
• IEEE Fellows x 31
• NAE Members x 4
• $34+ Million in annual
research
• Rank 19-21 (US News)
Area of Focus
• Computer Systems
• Networking
• VLSI and Design
Automation
• Software and Machine
Intelligence
Vision and Learning Lab @ Virginia Tech
Jinwoo Choi
PhD student
Badour AlBahar
MS student
Hao-Wei Yeh
(Univ. Tokyo)
Yen-Chen Lin
(NTHU)
Graduate students Visiting students (Summer 2017)
Open positions for PhD students and visiting students!
?
Alumni
Former undergrad students
• [Le, Zelun, JunYoung] Stanford x 3
• [Linjia, Sakshi, Danyang] UIUC x 3
• [Kevin] UC Berkeley x 1
• [Anarghya] Google x 1
Former student collaborators
• [Shun Zhang] Assistant Prof. at Northwestern Polytechnical Univ.
• [Chao Ma] Postdoc at University of Adelaide
• [Dong Li] Research Scientist at Face++
What we do?
Vision/learning/graphics
• Unsupervised learning
• Deep generative/predictive
modeling
• Computational photography
Applied, interdisciplinary research
• Visual sensing for ecology and conservation
• Interactive neurorehabilitation
Image Completion [SIGGRAPH14]
- Revealing unseen pixels
Video Completion [SIGGRAPH Asia16]
- Revealing temporally coherent pixels
Image super-resolution [CVPR15]
- Revealing unseen high frequency details
Deep Joint Image Filtering [ECCV16]
- Transferring structural details
Depth upsampling Noise reduction Inverse halftoning Texture removal
Detecting migrating birds [CVPR16]
Object tracking [ICCV15]
Multi-face tracking [ECCV16]
Visual Tracking
- Locating moving objects across video frames
Weakly supervised localization [CVPR16]
k-NN GraphUnlabeled images
Negative miningPositive mining
Unsupervised feature learning [ECCV16]
Learning with weak supervision
What is Computer Vision?
• Make computers understand images and videos.
• Where is this?
• What are they doing?
• Why is this happening?
• What is important?
• What will I see?
Computer Vision and Nearby Fields
Images (2D)
Geometry (3D)
Shape
Photometry
Appearance
Digital Image Processing
Computational Photography
Computer Graphics
Computer Vision
Machine learning:
Vision = Machine learning applied to visual data
Visual data on the Internet
• Flickr
• 10+ billion photographs
• 60 million images uploaded a month
• Facebook
• 250 billion+
• 300 million a day
• Instagram
• 55 million a day
• YouTube
• 100 hours uploaded every minute
90% of net traffic
will be visual!
Mostly about cats
Too big for humans
• Need automatic tools to access and analyze visual data!
http://www.petittube.com/
Slide credit: Alexei A. Efros
Vision is Really Hard
• Vision is an amazing feature of natural intelligence
• Visual cortex occupies about 50% of Macaque brain
• More human brain devoted to vision than anything else
Is that a
queen or a
bishop?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
What did you see?
• Where this picture was taken?
• How many people are there?
• What are they doing?
• What object the person on the left standing on?
• Why this is a funny picture?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Computer: okay, it’s a funny picture
Safety Health Security
Comfort AccessFun
Computer Vision
Technology
Can Better Our Lives
History of Computer Vision
Marvin Minsky, MIT
Turing award, 1969
“In 1966, Minsky hired a first-year
undergraduate student and
assigned him a problem to solve
over the summer:
connect a camera to a computer
and get the machine to describe
what it sees.”
Crevier 1993, pg. 88
Half a century later,
we're still working on it.
History of Computer Vision
Marvin Minsky, MIT
Turing award, 1969
Gerald Sussman, MIT
“You’ll notice that Sussman never worked
in vision again!” – Berthold Horn
1960’s: interpretation of synthetic worlds
Larry Roberts
“Father of Computer Vision”
Larry Roberts PhD Thesis, MIT, 1963,
Machine Perception of Three-Dimensional Solids
Input image 2x2 gradient operator computed 3D model
rendered from new viewpoint
Slide credit: Steve Seitz
1970’s: some progress on interpreting selected images
The representation and matching of pictorial structures
Fischler and Elschlager, 1973
1970’s: some progress on interpreting selected images
The representation and matching of pictorial structures
Fischler and Elschlager, 1973
1980’s: ANNs come and go; shift toward geometry
and increased mathematical rigor
Image credit: Rick Szeliski
Goodbye
science
1990’s: face recognition; statistical analysis in vogue
Image credit: Rick Szeliski
2000’s: broader recognition; large annotated
datasets available; video processing starts
Image credit: Rick Szeliski
2010’s: resurgence of deep learning
[AlexNet NIPS 2012] [DeepFace CVPR 2014]
[DeepPose CVPR 2014] [Show, Attend and Tell ICML 2015]
2020’s: autonomous vehicles
2030’s: robot uprising?
3-mins break
Examples of Computer Vision Applications
• How is computer vision used today?
Face detection
• Most digital cameras and smart phones detect faces (and more)
• Canon, Sony, Fuji, …
• For smart focus, exposure compensation, and cropping
Slide credit: Steve Seitz
Face recognition
Facebook face auto-tagging
Application: building a boss detector
http://ahogrammer.com/2016/11/15/deep-learning-enables-you-to-hide-screen-
when-your-boss-is-approaching/
Face Landmark Alignment
TAAZ.com: Virtual makeover Face morphing
Jia-Bin Huang
Miss Daegu 2013 Contestants Face Morphing
Face Landmark Alignment- Pose Estimation
Pitch: ~ 15 degree
Yaw: ~ +20, -20 degree
Jia-Bin Huang What's the Best Pose for a Selfie?
Face Landmark Alignment- Augmented Reality
SnapChat Lens
Face Landmark Alignment- Reenactment
Face2Face CVPR 2016
Video-based face replacement
Hand tracking
HandPose, CHI 2015
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Smile Detection
Sony Cyber-shot® T70 Digital Still Camera Slide credit: Steve Seitz
Eye contact detection
Gaze locking, UIST 2013
Vision-based Biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia
Slide credit: Steve Seitz
Vision-based Biometrics
Optical Character Recognition (OCR)
Digit recognition, AT&T labs
http://www.research.att.com/~yann/
License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
• Technology to convert scanned docs to text
• If you have a scanner, it probably came with OCR software
Slide credit: Steve Seitz
Computer vision in sports
Hawk-Eye: helping/improving referee decisions
Computer vision in sports
SportVision: improving viewer experiences
Computer vision in sports
Replay Technologies: improving viewer experiences
Computer vision in sports
Play tracking
Computer vision in sports
Second Spectrum: visual analytics
Visual recognition for photo organization
Google photo
Matching street clothing photos in online Shops
Street2shop, ICCV 2015
3D scanner from a mobile camera
MobileFusion
Earth viewers (3D modeling)
Image from Microsoft’s Virtual Earth
(see also: Google Earth)
Slide credit: Steve Seitz
3D from thousands of images
Building Rome in a Day: Agarwal et al. 2009
3D from thousands of images
[Furukawa et al. CVPR 2010]
Microsoft PhotoSynth: Photo Tourism
MS PhotoSynth in CSI
Indoor Scene Reconstruction
Structured Indoor Modeling, ICCV2015
Tour into picture
Adobe Affer Effect
First-person Hyperlapse Videos
[Kopf et al. SIGGRAPH 2014]
3D Time-lapse from Internet Photos
3D Time-lapse from Internet Photos, ICCV 2015
Special effects: matting and composition
Kylie Minogue - Come Into My World
Style transfer
A Neural Algorithm of Artistic Style [Gatys et al. 2015]
Target image (Content)Source image (Style) Output (deepart)
Prisma – Best app of the year
EverFilter
The Matrix movies, ESC Entertainment, XYZRGB, NRC
Special effects: shape capture
Slide credit: Steve Seitz
Pirates of the Carribean, Industrial Light and Magic
Special effects: motion capture
Slide credit: Steve Seitz
Google cars
Google in talks with Ford, Toyota and Volkswagen to realise driverless cars
http://www.theatlantic.com/technology/archive/2014/05/all-the-world-a-track-
the-trick-that-makes-googles-self-driving-cars-work/370871/
Interactive Games: Kinect
• Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o
• Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg
• 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A
• Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
Vision in space
Vision systems (JPL) used for several tasks
• Panorama stitching
• 3D terrain modeling
• Obstacle detection, position tracking
• For more, read “Computer Vision on Mars” by Matthies et al.
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
Industrial robots
Vision-guided robots position nut runners on wheels
http://www.automationworld.com/computer-vision-opportunity-or-threat
Mobile robots
http://www.robocup.org/NASA’s Mars Spirit Rover
Saxena et al. 2008
STAIR at Stanford
http://www.youtube.com/w
atch?v=DF39Ygp53mQ
Medical imaging
Image guided surgery
Grimson et al., MIT3D imaging
MRI, CT
Computer vision for healthcare
BiliCam, Ubicomp 2014
Computer vision for healthcare
Autism Screening
Computer vision for healthcare
Video magnification
Computer vision for the mass
Counting cells Predicting poverty
Current state of the art
• Many of these are less than 5 years old
• Very active and exciting research area!
• To learn more about vision applications and companies
– David Lowe maintains an excellent overview of vision companies
• http://www.cs.ubc.ca/spider/lowe/vision.html
20-mins coffee break
Fundamentals of Computer Vision
• Light
– What an image records
• Matching
– How to measure the similarity of two regions
• Alignment
– How to recover transformation parameters based on matched points
• Geometry
– How to relate world coordinates and image coordinates
• Grouping
– What points/regions/lines belong together?
• Recognition
– What similarities are important?
Slide credit: Derek Hoiem
Light
–What an image records
Image Formation
Digital camera
A digital camera replaces film with a sensor array
• Each cell in the array is light-sensitive diode that converts photons to electrons
• Two common types:
• Charge Coupled Device (CCD): larger yet slower, better quality
• Complementary Metal Oxide Semiconductor (CMOS): high bandwidth, lower quality
• http://electronics.howstuffworks.com/digital-camera.htm
Slide by Steve Seitz
Sensor Array
CMOS sensor
CCD sensor
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
Perception of Intensity
from Ted Adelson
• A is brighter?
• B is brighter?
• They are equally bright
Perception of Intensity
from Ted Adelson
Digital Color Images
Color Image R
G
B
Image filtering
• Linear filtering: function is a weighted sum/difference of pixel
values
• Really important!
• Enhance images
• Denoise, smooth, increase contrast, etc.
• Extract information from images
• Texture, edges, distinctive points, etc.
• Detect patterns
• Template matching
111
111
111
Slide credit: David Lowe (UBC)
],[g 
Example: box filter
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
 
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
 
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
 
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
 
0 10 20 30 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
 
0 10 20 30 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
Credit: S. Seitz
?
],[],[],[
,
lnkmflkgnmh
lk
 
0 10 20 30 30
50
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
Credit: S. Seitz
?
],[],[],[
,
lnkmflkgnmh
lk
 
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 0 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 0 0 0 0 0 0 0
0 0 90 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 10 20 30 30 30 20 10
0 20 40 60 60 60 40 20
0 30 60 90 90 90 60 30
0 30 50 80 80 90 60 30
0 30 50 80 80 90 60 30
0 20 30 50 50 60 40 20
10 20 30 30 30 30 20 10
10 10 10 0 0 0 0 0
[.,.]h[.,.]f
Image filtering
111
111
111
],[g 
Credit: S. Seitz
],[],[],[
,
lnkmflkgnmh
lk
 
What does it do?
• Replaces each pixel with an
average of its neighborhood
• Achieve smoothing effect
(remove sharp features)
111
111
111
Slide credit: David Lowe (UBC)
],[g 
Box Filter
Smoothing with box filter
Practice with linear filters
000
010
000
Original
?
Source: D. Lowe
Practice with linear filters
000
010
000
Original Filtered
(no change)
Source: D. Lowe
Practice with linear filters
000
100
000
Original
?
Source: D. Lowe
Practice with linear filters
000
100
000
Original Shifted left
By 1 pixel
Source: D. Lowe
Practice with linear filters
Original
111
111
111
000
020
000
- ?
(Note that filter sums to 1)
Source: D. Lowe
Practice with linear filters
Original
111
111
111
000
020
000
-
Sharpening filter
- Accentuates differences with local
average
Source: D. Lowe
Sharpening
Source: D. Lowe
Other filters
-101
-202
-101
Vertical Edge
(absolute value)
Sobel
Other filters
-1-2-1
000
121
Horizontal Edge
(absolute value)
Sobel
Basic gradient filters
000
10-1
000
010
000
0-10
10-1
or
Horizontal Gradient Vertical Gradient
1
0
-1
or
Demo - image filtering
Matching
How to measure the similarity of two regions
Alignment
How to recover transformation parameters
based on matched points
Correspondence and alignment
• Correspondence: matching points, patches, edges, or regions across
images
≈
Correspondence and alignment
• Alignment: solving the transformation that makes two things match
better
A
Example: fitting an 2D shape template
Slide from Silvio Savarese
Example: fitting a 3D object model
Slide from Silvio Savarese
Example: estimating “fundamental matrix” that
corresponds two views
Slide from Silvio Savarese
Example: tracking points
frame 0 frame 22 frame 49
x x
x
Overview of Keypoint Matching
K. Grauman, B. Leibe
Af Bf
A1
A2 A3
Tffd BA ),(
1. Find a set of
distinctive key-points
3. Extract and normalize
the region content
2. Define a region around
each keypoint
4. Compute a local descriptor
from the normalized region
5. Match local descriptors
Goals for Keypoints
Detect points that are repeatable and distinctive
Key trade-offs
More Repeatable More Points
A1
A2 A3
Detection
More Distinctive More Flexible
Description
Robust to occlusion
Works with less texture
Minimize wrong matches Robust to expected variations
Maximize correct matches
Robust detection
Precise localization
Choosing interest points
Where would you tell your
friend to meet you?
Choosing interest points
Where would you tell your
friend to meet you?
Local Descriptors: SIFT Descriptor
[Lowe, ICCV 1999]
Histogram of oriented
gradients
• Captures important texture
information
• Robust to small translations /
affine deformations
K. Grauman, B. Leibe
Examples of recognized objects
Demo – interest point detection
Geometry
How to relate world coordinates and
image coordinates
Single-view Geometry
How tall is this woman?
Which ball is closer?
How high is the camera?
What is the camera
rotation?
What is the focal length of
the camera?
Single-view Geometry
Mapping between image and world coordinates
• Pinhole camera model
• Projective geometry
• Vanishing points and lines
Image formation
Let’s design a camera
– Idea 1: put a piece of film in front of an object
– Do we get a reasonable image?
Slide credit: Steve Seitz
Pinhole camera
Idea 2: add a barrier to block off most of the rays
• This reduces blurring
• The opening known as the aperture
Slide credit: Steve Seitz
Pinhole camera
Figure from David Forsyth
f
f = focal length
c = center of the camera
c
Figures © Stephen E. Palmer, 2002
Dimensionality Reduction Machine (3D to 2D)
Point of observation
3D world 2D image
Projection can be tricky…
Slide source: Seitz
Projection can be tricky…
Slide source: Seitz
Making of 3D sidewalk art: http://www.youtube.com/watch?v=3SNYtd0Ayt0
Vanishing points and lines
Parallel lines in the world intersect in the image at a “vanishing point”
Vanishing points and lines
Vanishing
point
Vanishing
line
Vanishing
point
Vertical vanishing
point
(at infinity)
Slide from Efros, Photo from Criminisi
Comparing heights
Vanishing
Point
Slide by Steve Seitz
163
Measuring height
1
2
3
4
5
5.3
2.8
3.3
Camera height
Slide by Steve Seitz
164
Image Stitching
• Combine two or more overlapping images to make one larger image
Add example
Slide credit: Vaibhav Vaish
Illustration
Camera Center
RANSAC for Homography
Initial Matched Points
Final Matched Points
RANSAC for Homography
RANSAC for Homography
Bundle Adjustment
• New images initialised with rotation, focal length of best matching
image
Bundle Adjustment
• New images initialised with rotation, focal length of best matching
image
Details to make it look good
• Choosing seams
• Blending
Choosing seams
• Better method: dynamic program to find seam along well-matched
regions
Illustration: http://en.wikipedia.org/wiki/File:Rochester_NY.jpg
Gain compensation
• Simple gain adjustment
• Compute average RGB intensity of each image in overlapping region
• Normalize intensities by ratio of averages
Multi-band Blending
• Burt & Adelson 1983
• Blend frequency bands over range  l
Blending comparison (IJCV 2007)
Depth from Stereo
• Goal: recover depth by finding image coordinate x’ that corresponds
to x
f
x x’
Baseline
B
z
C C’
X
f
X
x
x'
Correspondence Problem
• We have two images taken from cameras with different intrinsic and extrinsic
parameters
• How do we match a point in the first image to a point in the second? How can
we constrain our search?
x ?
Potential matches for x have to lie on the corresponding line l’.
Potential matches for x’ have to lie on the corresponding line l.
Key idea: Epipolar constraint
x x’
X
x’
X
x’
X
Basic stereo matching algorithm
•If necessary, rectify the two stereo images to transform epipolar lines into
scanlines
•For each pixel x in the first image
• Find corresponding epipolar scanline in the right image
• Search the scanline and pick the best match x’
• Compute disparity x-x’ and set depth(x) = fB/(x-x’)
800 most confident matches among
10,000+ local features.
Epipolar lines
Keep only the matches at are “inliers” with
respect to the “best” fundamental matrix
The Fundamental Matrix Song
5-mins break
Recognition
What similarities are important?
What do you see in this image?
Can I put stuff in it?
Forest
Trees
Bear
Man
Rabbit Grass
Camera
Describe, predict, or interact with the
object based on visual cues
Is it alive?
Is it dangerous?
How fast does it run? Is it soft?
Does it have a tail? Can I poke with it?
Why do we care about categories?
• From an object’s category, we can make predictions about its behavior in the future,
beyond of what is immediately perceived.
• Pointers to knowledge
• Help to understand individual cases not previously encountered
• Communication
Image categorization: Fine-grained recognition
Visipedia Project
Place recognition
Places Database [Zhou et al. NIPS 2014]
Image style recognition
[Karayev et al. BMVC 2014]
Training phase
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Trained
Classifier
Testing phase
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Image
Features
Testing
Test Image
Trained
Classifier
Outdoor
PredictionTrained
Classifier
Q: What are good features for…
• recognizing a beach?
Q: What are good features for…
• recognizing cloth fabric?
Q: What are good features for…
• recognizing a mug?
What are the right features?
Depend on what you want to know!
• Object: shape
• Local shape info, shading, shadows, texture
• Scene : geometric layout
• linear perspective, gradients, line segments
• Material properties: albedo, feel, hardness
• Color, texture
• Action: motion
• Optical flow, tracked points
• Image features: map images to feature space
• Classifiers: map feature space to label space
x x
x
x
x
x
x
x
o
o
o
o
o
x2
x1
x
xx
x
o
ooo
o
o
o
x x
x
x
x
x
x
x
o
o
o
o
o
x2
x1
x
xx
x
o
ooo
o
o
o
x x
x
x
x
x
x
x
o
o
o
o
o
x2
x1
x
xx
x
o
ooo
o
o
o
“Shallow” vs. “deep” architectures
Hand-designed
feature extraction
Trainable
classifier
Image/ Video
Pixels
Object
Class
Layer 1 Layer N
Simple
classifier
Object
Class
Image/ Video
Pixels
Traditional recognition: “Shallow” architecture
Deep learning: “Deep” architecture
…
Progress from Supervised Learning
2012
AlexNet
2013
ZF
2014
VGG
2014
GoogLeNet
2015
ResNet
2016
GoogLeNet-v4
15
10
5
ImageNet Image Classification Top5 Error
Object Detection
Search by Sliding Window Detector
• May work well for rigid objects
• Key idea: simple alignment for simple deformations
Object or
Background?
Object Detection
Search by Parts-based model
• Key idea: more flexible alignment for articulated objects
• Defined by models of part appearance, geometry or spatial
layout, and search algorithm
Vision as part of an intelligent system
3D Scene
Feature
Extraction
Interpretation
Action
Texture Color
Optical
Flow
Stereo
Disparity
Grouping Surfaces
Bits of
objects
Sense of
depth
Objects
Agents
and goals
Shapes and
properties
Open
paths
Words
Walk, touch, contemplate, smile, evade, read on, pick up, …
Motion
patterns
Well-Established
(patch matching)
Major Progress
(pattern matching++)
New Opportunities
(interpretation/tasks)
Multi-view Geometry
Face Detection/Recognition
Object Tracking / Flow
Category Detection
Human Pose
3D Scene Layout Vision for Robots
Life-long Learning
Entailment/Prediction
Slide credit: Derek Hoiem
Scene Understanding =
Objects + People + Layout +
Interpretation within Task Context
What do I see?  Why is this happening?
What is important?
What will I see?
How can we learn about the world through vision?
How do we create/evaluate vision systems that adapt
to useful tasks?
Important open problems
• How can we interpret vision given structured plans of a scene?
Important open problems
• Algorithms: works pretty well  perfect
• E.g., stereo: top of wish list from Pixar guy Micheal Kass
• Good directions:
• Incorporate higher level knowledge
Important open problems
• Spatial understanding: what is it doing? Or how do I do it?
Important questions:
• What are good representations of space for navigation and
interaction? What kind of details are important?
• How can we combine single-image cues with multi-view cues?
Important open problems
Object representation: what is it?
Important questions:
• How can we pose recognition so that it lets us deal with new objects?
• What do we want to predict or infer, and to what extent does that rely
on categorization?
• How do we transfer knowledge of one type of object to another?
Important problems
• Learning features and intermediate representations that
transfer to new tasks
Figure from http://www.datarobot.com/blog/a-primer-on-deep-learning/
Important open problems
• Almost everything is still unsolved!
• Robust 3D shape from multiple images
• Recognize objects (only faces and maybe cars is really
solved, thanks to tons of data)
• Caption images/video
• Predict intention
• Object segmentation
• Count objects in an image
• Estimate pose
• Recognize actions
• ….
This course has provided fundamentals
• What is computer vision?
• Examples of computer vision applications
• Basic principles of computer vision:
• Image formation
• Filtering
• Correspondence and alignment
• Geometry
• Recognition.
How do you learn more?
Explore!
Resources – where to learn more
• Awesome Computer Vision, Awesome Deep Vision
• A curated list of awesome computer vision resources
• Computer Vision Taiwanese Group
• > 3,000 members in facebook
• Videolectures and ML&CV Talks
• Lectures, Keynotes, Panel Discussions
• OpenCV – C++/Python/Java
Thank You!
Image: www.clubtuzki.com

More Related Content

What's hot

Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
pattern classification
pattern classificationpattern classification
pattern classification
Ranjan Ganguli
 
Facial Recognition System
Facial Recognition SystemFacial Recognition System
Facial Recognition System
Arun ACE
 

What's hot (20)

Computer Vision for autonomous driving
Computer Vision for autonomous drivingComputer Vision for autonomous driving
Computer Vision for autonomous driving
 
Differences Between Machine Learning Ml Artificial Intelligence Ai And Deep L...
Differences Between Machine Learning Ml Artificial Intelligence Ai And Deep L...Differences Between Machine Learning Ml Artificial Intelligence Ai And Deep L...
Differences Between Machine Learning Ml Artificial Intelligence Ai And Deep L...
 
Uses of AI
Uses of AIUses of AI
Uses of AI
 
Statistical learning vs. Machine Learning
Statistical learning vs. Machine LearningStatistical learning vs. Machine Learning
Statistical learning vs. Machine Learning
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Kapil dikshit ppt
Kapil dikshit pptKapil dikshit ppt
Kapil dikshit ppt
 
Machine Learning Interview Questions and Answers | Machine Learning Interview...
Machine Learning Interview Questions and Answers | Machine Learning Interview...Machine Learning Interview Questions and Answers | Machine Learning Interview...
Machine Learning Interview Questions and Answers | Machine Learning Interview...
 
pattern classification
pattern classificationpattern classification
pattern classification
 
Computer vision
Computer visionComputer vision
Computer vision
 
Computer vision and robotics
Computer vision and roboticsComputer vision and robotics
Computer vision and robotics
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
TYBSC CS SEM 5 AI NOTES
TYBSC CS SEM 5 AI NOTESTYBSC CS SEM 5 AI NOTES
TYBSC CS SEM 5 AI NOTES
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
Traffic congestion prediction with images
Traffic congestion prediction with imagesTraffic congestion prediction with images
Traffic congestion prediction with images
 
Facial Recognition System
Facial Recognition SystemFacial Recognition System
Facial Recognition System
 
Attendance Management System using Face Recognition
Attendance Management System using Face RecognitionAttendance Management System using Face Recognition
Attendance Management System using Face Recognition
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Driver fatigue detection system
Driver fatigue detection systemDriver fatigue detection system
Driver fatigue detection system
 
MCS Best Presentation - Artificial intelligence
MCS Best Presentation - Artificial intelligenceMCS Best Presentation - Artificial intelligence
MCS Best Presentation - Artificial intelligence
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 

Viewers also liked

UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation
Jia-Bin Huang
 
Recent Advances in Computer Vision
Recent Advances in Computer VisionRecent Advances in Computer Vision
Recent Advances in Computer Vision
antiw
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiuc
Jia-Bin Huang
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
Jia-Bin Huang
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
Jia-Bin Huang
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
Hiroshi Kuwajima
 

Viewers also liked (20)

Research 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXResearch 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeX
 
How to Read Academic Papers
How to Read Academic PapersHow to Read Academic Papers
How to Read Academic Papers
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
How to come up with new research ideas
How to come up with new research ideasHow to come up with new research ideas
How to come up with new research ideas
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB Code
 
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
 
UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation
 
Recent Advances in Computer Vision
Recent Advances in Computer VisionRecent Advances in Computer Vision
Recent Advances in Computer Vision
 
What Makes a Creative Photograph?
What Makes a Creative Photograph?What Makes a Creative Photograph?
What Makes a Creative Photograph?
 
Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiuc
 
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
 
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012
Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012
 
Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.
 
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
 
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 

Similar to Computer Vision Crash Course

01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Vision01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Vision
butest
 

Similar to Computer Vision Crash Course (20)

Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
Introduction
IntroductionIntroduction
Introduction
 
Overview of Computer Vision For Footwear Industry
Overview of Computer Vision For Footwear IndustryOverview of Computer Vision For Footwear Industry
Overview of Computer Vision For Footwear Industry
 
vision.ppt
vision.pptvision.ppt
vision.ppt
 
vision_2.ppt
vision_2.pptvision_2.ppt
vision_2.ppt
 
vision.ppt
vision.pptvision.ppt
vision.ppt
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
 
Computer vision - Applications and Trends
Computer vision - Applications and TrendsComputer vision - Applications and Trends
Computer vision - Applications and Trends
 
ICS1020 CV
ICS1020 CVICS1020 CV
ICS1020 CV
 
01 cie552 introduction
01 cie552 introduction01 cie552 introduction
01 cie552 introduction
 
Lecture 1 computer vision introduction
Lecture 1 computer vision introductionLecture 1 computer vision introduction
Lecture 1 computer vision introduction
 
Intro
IntroIntro
Intro
 
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
 
AN Introduction to Augmented Reality(AR)
AN Introduction to Augmented Reality(AR)AN Introduction to Augmented Reality(AR)
AN Introduction to Augmented Reality(AR)
 
OpenCV
OpenCVOpenCV
OpenCV
 
بینایی ماشین
بینایی ماشینبینایی ماشین
بینایی ماشین
 
01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Vision01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Vision
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
Lecture No. 1 introduction.pptx
Lecture No. 1 introduction.pptxLecture No. 1 introduction.pptx
Lecture No. 1 introduction.pptx
 

More from Jia-Bin Huang

Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual tracking
Jia-Bin Huang
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression Enhancement
Jia-Bin Huang
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure Extraction
Jia-Bin Huang
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes Recognition
Jia-Bin Huang
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
Jia-Bin Huang
 
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Jia-Bin Huang
 
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Jia-Bin Huang
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Jia-Bin Huang
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Jia-Bin Huang
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
Jia-Bin Huang
 

More from Jia-Bin Huang (14)

How to write a clear paper
How to write a clear paperHow to write a clear paper
How to write a clear paper
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and Recognition
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum Vitae
 
Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual tracking
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression Enhancement
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure Extraction
 
Static and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionStatic and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture Recognition
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes Recognition
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
 
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 

Recently uploaded (20)

UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Computer Vision Crash Course

  • 1. Computer Vision Crash Course Jia-Bin Huang Electrical and Computer Engineering Virginia Tech
  • 2.
  • 3. Today’s agenda 14:00 - 15:20 • A little about me • Introduction to computer vision 15:20 - 15:40 • Coffee break 15:40 - 17:00 • Fundamentals of computer vision • Recent advances
  • 4. What this crash course is about? • Learn cool applications in computer vision • Understand the fundamentals • light, geometry, matching, alignment, grouping, recognition • Pointers to explore further
  • 5. WARNING The following contains shameless self-promotion, and may cause audience to groan, scoff, or ill. Consider reading at your own risk
  • 7. National Chiao-Tung University B.S. in EE IIS, Academia Sinica Research Assistant UIUC Ph.D. in ECE 2016 UC, Merced Visiting Student Microsoft Research Research Intern 2012, 2013 Disney Research Research Intern 2014
  • 8.
  • 9. Quick facts • Faculty members x 86 • IEEE Fellows x 31 • NAE Members x 4 • $34+ Million in annual research • Rank 19-21 (US News) Area of Focus • Computer Systems • Networking • VLSI and Design Automation • Software and Machine Intelligence
  • 10. Vision and Learning Lab @ Virginia Tech Jinwoo Choi PhD student Badour AlBahar MS student Hao-Wei Yeh (Univ. Tokyo) Yen-Chen Lin (NTHU) Graduate students Visiting students (Summer 2017) Open positions for PhD students and visiting students! ? Alumni Former undergrad students • [Le, Zelun, JunYoung] Stanford x 3 • [Linjia, Sakshi, Danyang] UIUC x 3 • [Kevin] UC Berkeley x 1 • [Anarghya] Google x 1 Former student collaborators • [Shun Zhang] Assistant Prof. at Northwestern Polytechnical Univ. • [Chao Ma] Postdoc at University of Adelaide • [Dong Li] Research Scientist at Face++
  • 11. What we do? Vision/learning/graphics • Unsupervised learning • Deep generative/predictive modeling • Computational photography Applied, interdisciplinary research • Visual sensing for ecology and conservation • Interactive neurorehabilitation
  • 12. Image Completion [SIGGRAPH14] - Revealing unseen pixels
  • 13. Video Completion [SIGGRAPH Asia16] - Revealing temporally coherent pixels
  • 14. Image super-resolution [CVPR15] - Revealing unseen high frequency details
  • 15. Deep Joint Image Filtering [ECCV16] - Transferring structural details Depth upsampling Noise reduction Inverse halftoning Texture removal
  • 16. Detecting migrating birds [CVPR16] Object tracking [ICCV15] Multi-face tracking [ECCV16] Visual Tracking - Locating moving objects across video frames
  • 17. Weakly supervised localization [CVPR16] k-NN GraphUnlabeled images Negative miningPositive mining Unsupervised feature learning [ECCV16] Learning with weak supervision
  • 18. What is Computer Vision? • Make computers understand images and videos. • Where is this? • What are they doing? • Why is this happening? • What is important? • What will I see?
  • 19. Computer Vision and Nearby Fields Images (2D) Geometry (3D) Shape Photometry Appearance Digital Image Processing Computational Photography Computer Graphics Computer Vision Machine learning: Vision = Machine learning applied to visual data
  • 20.
  • 21. Visual data on the Internet • Flickr • 10+ billion photographs • 60 million images uploaded a month • Facebook • 250 billion+ • 300 million a day • Instagram • 55 million a day • YouTube • 100 hours uploaded every minute 90% of net traffic will be visual! Mostly about cats
  • 22. Too big for humans • Need automatic tools to access and analyze visual data! http://www.petittube.com/
  • 24. Vision is Really Hard • Vision is an amazing feature of natural intelligence • Visual cortex occupies about 50% of Macaque brain • More human brain devoted to vision than anything else Is that a queen or a bishop?
  • 25. Why is Computer Vision Hard?
  • 26. Why is Computer Vision Hard?
  • 27. What did you see? • Where this picture was taken? • How many people are there? • What are they doing? • What object the person on the left standing on? • Why this is a funny picture?
  • 28. Why is Computer Vision Hard?
  • 29. Why is Computer Vision Hard?
  • 30. Why is Computer Vision Hard?
  • 31. Why is Computer Vision Hard?
  • 32. Why is Computer Vision Hard?
  • 33. Why is Computer Vision Hard?
  • 34. Computer: okay, it’s a funny picture
  • 35. Safety Health Security Comfort AccessFun Computer Vision Technology Can Better Our Lives
  • 36. History of Computer Vision Marvin Minsky, MIT Turing award, 1969 “In 1966, Minsky hired a first-year undergraduate student and assigned him a problem to solve over the summer: connect a camera to a computer and get the machine to describe what it sees.” Crevier 1993, pg. 88
  • 37. Half a century later, we're still working on it.
  • 38. History of Computer Vision Marvin Minsky, MIT Turing award, 1969 Gerald Sussman, MIT “You’ll notice that Sussman never worked in vision again!” – Berthold Horn
  • 39. 1960’s: interpretation of synthetic worlds Larry Roberts “Father of Computer Vision” Larry Roberts PhD Thesis, MIT, 1963, Machine Perception of Three-Dimensional Solids Input image 2x2 gradient operator computed 3D model rendered from new viewpoint Slide credit: Steve Seitz
  • 40. 1970’s: some progress on interpreting selected images The representation and matching of pictorial structures Fischler and Elschlager, 1973
  • 41. 1970’s: some progress on interpreting selected images The representation and matching of pictorial structures Fischler and Elschlager, 1973
  • 42. 1980’s: ANNs come and go; shift toward geometry and increased mathematical rigor Image credit: Rick Szeliski
  • 44. 1990’s: face recognition; statistical analysis in vogue Image credit: Rick Szeliski
  • 45. 2000’s: broader recognition; large annotated datasets available; video processing starts Image credit: Rick Szeliski
  • 46. 2010’s: resurgence of deep learning [AlexNet NIPS 2012] [DeepFace CVPR 2014] [DeepPose CVPR 2014] [Show, Attend and Tell ICML 2015]
  • 50. Examples of Computer Vision Applications • How is computer vision used today?
  • 51. Face detection • Most digital cameras and smart phones detect faces (and more) • Canon, Sony, Fuji, … • For smart focus, exposure compensation, and cropping Slide credit: Steve Seitz
  • 53. Application: building a boss detector http://ahogrammer.com/2016/11/15/deep-learning-enables-you-to-hide-screen- when-your-boss-is-approaching/
  • 54. Face Landmark Alignment TAAZ.com: Virtual makeover Face morphing Jia-Bin Huang Miss Daegu 2013 Contestants Face Morphing
  • 55. Face Landmark Alignment- Pose Estimation Pitch: ~ 15 degree Yaw: ~ +20, -20 degree Jia-Bin Huang What's the Best Pose for a Selfie?
  • 56. Face Landmark Alignment- Augmented Reality SnapChat Lens
  • 57. Face Landmark Alignment- Reenactment Face2Face CVPR 2016
  • 60. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
  • 61. Smile Detection Sony Cyber-shot® T70 Digital Still Camera Slide credit: Steve Seitz
  • 62. Eye contact detection Gaze locking, UIST 2013
  • 63. Vision-based Biometrics “How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia Slide credit: Steve Seitz
  • 65. Optical Character Recognition (OCR) Digit recognition, AT&T labs http://www.research.att.com/~yann/ License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition • Technology to convert scanned docs to text • If you have a scanner, it probably came with OCR software Slide credit: Steve Seitz
  • 66. Computer vision in sports Hawk-Eye: helping/improving referee decisions
  • 67. Computer vision in sports SportVision: improving viewer experiences
  • 68. Computer vision in sports Replay Technologies: improving viewer experiences
  • 69. Computer vision in sports Play tracking
  • 70. Computer vision in sports Second Spectrum: visual analytics
  • 71. Visual recognition for photo organization Google photo
  • 72. Matching street clothing photos in online Shops Street2shop, ICCV 2015
  • 73. 3D scanner from a mobile camera MobileFusion
  • 74. Earth viewers (3D modeling) Image from Microsoft’s Virtual Earth (see also: Google Earth) Slide credit: Steve Seitz
  • 75. 3D from thousands of images Building Rome in a Day: Agarwal et al. 2009
  • 76. 3D from thousands of images [Furukawa et al. CVPR 2010]
  • 79. Indoor Scene Reconstruction Structured Indoor Modeling, ICCV2015
  • 80. Tour into picture Adobe Affer Effect
  • 81. First-person Hyperlapse Videos [Kopf et al. SIGGRAPH 2014]
  • 82. 3D Time-lapse from Internet Photos 3D Time-lapse from Internet Photos, ICCV 2015
  • 83. Special effects: matting and composition Kylie Minogue - Come Into My World
  • 84. Style transfer A Neural Algorithm of Artistic Style [Gatys et al. 2015] Target image (Content)Source image (Style) Output (deepart)
  • 85. Prisma – Best app of the year
  • 87. The Matrix movies, ESC Entertainment, XYZRGB, NRC Special effects: shape capture Slide credit: Steve Seitz
  • 88. Pirates of the Carribean, Industrial Light and Magic Special effects: motion capture Slide credit: Steve Seitz
  • 89. Google cars Google in talks with Ford, Toyota and Volkswagen to realise driverless cars http://www.theatlantic.com/technology/archive/2014/05/all-the-world-a-track- the-trick-that-makes-googles-self-driving-cars-work/370871/
  • 90. Interactive Games: Kinect • Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o • Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg • 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A • Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
  • 91. Vision in space Vision systems (JPL) used for several tasks • Panorama stitching • 3D terrain modeling • Obstacle detection, position tracking • For more, read “Computer Vision on Mars” by Matthies et al. NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007.
  • 92. Industrial robots Vision-guided robots position nut runners on wheels http://www.automationworld.com/computer-vision-opportunity-or-threat
  • 93. Mobile robots http://www.robocup.org/NASA’s Mars Spirit Rover Saxena et al. 2008 STAIR at Stanford http://www.youtube.com/w atch?v=DF39Ygp53mQ
  • 94. Medical imaging Image guided surgery Grimson et al., MIT3D imaging MRI, CT
  • 95. Computer vision for healthcare BiliCam, Ubicomp 2014
  • 96. Computer vision for healthcare Autism Screening
  • 97. Computer vision for healthcare Video magnification
  • 98. Computer vision for the mass Counting cells Predicting poverty
  • 99. Current state of the art • Many of these are less than 5 years old • Very active and exciting research area! • To learn more about vision applications and companies – David Lowe maintains an excellent overview of vision companies • http://www.cs.ubc.ca/spider/lowe/vision.html
  • 101. Fundamentals of Computer Vision • Light – What an image records • Matching – How to measure the similarity of two regions • Alignment – How to recover transformation parameters based on matched points • Geometry – How to relate world coordinates and image coordinates • Grouping – What points/regions/lines belong together? • Recognition – What similarities are important? Slide credit: Derek Hoiem
  • 104. Digital camera A digital camera replaces film with a sensor array • Each cell in the array is light-sensitive diode that converts photons to electrons • Two common types: • Charge Coupled Device (CCD): larger yet slower, better quality • Complementary Metal Oxide Semiconductor (CMOS): high bandwidth, lower quality • http://electronics.howstuffworks.com/digital-camera.htm Slide by Steve Seitz
  • 106.
  • 107. 0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99 0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91 0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92 0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95 0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85 0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33 0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74 0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93 0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99 0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97 0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
  • 108. Perception of Intensity from Ted Adelson • A is brighter? • B is brighter? • They are equally bright
  • 112. Image filtering • Linear filtering: function is a weighted sum/difference of pixel values • Really important! • Enhance images • Denoise, smooth, increase contrast, etc. • Extract information from images • Texture, edges, distinctive points, etc. • Detect patterns • Template matching
  • 113. 111 111 111 Slide credit: David Lowe (UBC) ],[g  Example: box filter
  • 114. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk   [.,.]h[.,.]f Image filtering 111 111 111 ],[g 
  • 115. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  • 116. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  • 117. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  • 118. 0 10 20 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  • 119. 0 10 20 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ? ],[],[],[ , lnkmflkgnmh lk  
  • 120. 0 10 20 30 30 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ? ],[],[],[ , lnkmflkgnmh lk  
  • 121. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 20 10 0 20 40 60 60 60 40 20 0 30 60 90 90 90 60 30 0 30 50 80 80 90 60 30 0 30 50 80 80 90 60 30 0 20 30 50 50 60 40 20 10 20 30 30 30 30 20 10 10 10 10 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  • 122. What does it do? • Replaces each pixel with an average of its neighborhood • Achieve smoothing effect (remove sharp features) 111 111 111 Slide credit: David Lowe (UBC) ],[g  Box Filter
  • 124. Practice with linear filters 000 010 000 Original ? Source: D. Lowe
  • 125. Practice with linear filters 000 010 000 Original Filtered (no change) Source: D. Lowe
  • 126. Practice with linear filters 000 100 000 Original ? Source: D. Lowe
  • 127. Practice with linear filters 000 100 000 Original Shifted left By 1 pixel Source: D. Lowe
  • 128. Practice with linear filters Original 111 111 111 000 020 000 - ? (Note that filter sums to 1) Source: D. Lowe
  • 129. Practice with linear filters Original 111 111 111 000 020 000 - Sharpening filter - Accentuates differences with local average Source: D. Lowe
  • 134. Demo - image filtering
  • 135. Matching How to measure the similarity of two regions Alignment How to recover transformation parameters based on matched points
  • 136. Correspondence and alignment • Correspondence: matching points, patches, edges, or regions across images ≈
  • 137. Correspondence and alignment • Alignment: solving the transformation that makes two things match better
  • 138. A Example: fitting an 2D shape template Slide from Silvio Savarese
  • 139. Example: fitting a 3D object model Slide from Silvio Savarese
  • 140. Example: estimating “fundamental matrix” that corresponds two views Slide from Silvio Savarese
  • 141. Example: tracking points frame 0 frame 22 frame 49 x x x
  • 142. Overview of Keypoint Matching K. Grauman, B. Leibe Af Bf A1 A2 A3 Tffd BA ),( 1. Find a set of distinctive key-points 3. Extract and normalize the region content 2. Define a region around each keypoint 4. Compute a local descriptor from the normalized region 5. Match local descriptors
  • 143. Goals for Keypoints Detect points that are repeatable and distinctive
  • 144. Key trade-offs More Repeatable More Points A1 A2 A3 Detection More Distinctive More Flexible Description Robust to occlusion Works with less texture Minimize wrong matches Robust to expected variations Maximize correct matches Robust detection Precise localization
  • 145. Choosing interest points Where would you tell your friend to meet you?
  • 146. Choosing interest points Where would you tell your friend to meet you?
  • 147. Local Descriptors: SIFT Descriptor [Lowe, ICCV 1999] Histogram of oriented gradients • Captures important texture information • Robust to small translations / affine deformations K. Grauman, B. Leibe
  • 149. Demo – interest point detection
  • 150. Geometry How to relate world coordinates and image coordinates
  • 151. Single-view Geometry How tall is this woman? Which ball is closer? How high is the camera? What is the camera rotation? What is the focal length of the camera?
  • 152. Single-view Geometry Mapping between image and world coordinates • Pinhole camera model • Projective geometry • Vanishing points and lines
  • 153. Image formation Let’s design a camera – Idea 1: put a piece of film in front of an object – Do we get a reasonable image? Slide credit: Steve Seitz
  • 154. Pinhole camera Idea 2: add a barrier to block off most of the rays • This reduces blurring • The opening known as the aperture Slide credit: Steve Seitz
  • 155. Pinhole camera Figure from David Forsyth f f = focal length c = center of the camera c
  • 156. Figures © Stephen E. Palmer, 2002 Dimensionality Reduction Machine (3D to 2D) Point of observation 3D world 2D image
  • 157. Projection can be tricky… Slide source: Seitz
  • 158. Projection can be tricky… Slide source: Seitz Making of 3D sidewalk art: http://www.youtube.com/watch?v=3SNYtd0Ayt0
  • 159. Vanishing points and lines Parallel lines in the world intersect in the image at a “vanishing point”
  • 160. Vanishing points and lines Vanishing point Vanishing line Vanishing point Vertical vanishing point (at infinity) Slide from Efros, Photo from Criminisi
  • 163. Image Stitching • Combine two or more overlapping images to make one larger image Add example Slide credit: Vaibhav Vaish
  • 165. RANSAC for Homography Initial Matched Points
  • 166. Final Matched Points RANSAC for Homography
  • 168. Bundle Adjustment • New images initialised with rotation, focal length of best matching image
  • 169. Bundle Adjustment • New images initialised with rotation, focal length of best matching image
  • 170. Details to make it look good • Choosing seams • Blending
  • 171. Choosing seams • Better method: dynamic program to find seam along well-matched regions Illustration: http://en.wikipedia.org/wiki/File:Rochester_NY.jpg
  • 172. Gain compensation • Simple gain adjustment • Compute average RGB intensity of each image in overlapping region • Normalize intensities by ratio of averages
  • 173. Multi-band Blending • Burt & Adelson 1983 • Blend frequency bands over range  l
  • 175.
  • 176. Depth from Stereo • Goal: recover depth by finding image coordinate x’ that corresponds to x f x x’ Baseline B z C C’ X f X x x'
  • 177. Correspondence Problem • We have two images taken from cameras with different intrinsic and extrinsic parameters • How do we match a point in the first image to a point in the second? How can we constrain our search? x ?
  • 178. Potential matches for x have to lie on the corresponding line l’. Potential matches for x’ have to lie on the corresponding line l. Key idea: Epipolar constraint x x’ X x’ X x’ X
  • 179. Basic stereo matching algorithm •If necessary, rectify the two stereo images to transform epipolar lines into scanlines •For each pixel x in the first image • Find corresponding epipolar scanline in the right image • Search the scanline and pick the best match x’ • Compute disparity x-x’ and set depth(x) = fB/(x-x’)
  • 180. 800 most confident matches among 10,000+ local features.
  • 182. Keep only the matches at are “inliers” with respect to the “best” fundamental matrix
  • 186. What do you see in this image? Can I put stuff in it? Forest Trees Bear Man Rabbit Grass Camera
  • 187. Describe, predict, or interact with the object based on visual cues Is it alive? Is it dangerous? How fast does it run? Is it soft? Does it have a tail? Can I poke with it?
  • 188. Why do we care about categories? • From an object’s category, we can make predictions about its behavior in the future, beyond of what is immediately perceived. • Pointers to knowledge • Help to understand individual cases not previously encountered • Communication
  • 189. Image categorization: Fine-grained recognition Visipedia Project
  • 190. Place recognition Places Database [Zhou et al. NIPS 2014]
  • 191. Image style recognition [Karayev et al. BMVC 2014]
  • 194. Q: What are good features for… • recognizing a beach?
  • 195. Q: What are good features for… • recognizing cloth fabric?
  • 196. Q: What are good features for… • recognizing a mug?
  • 197. What are the right features? Depend on what you want to know! • Object: shape • Local shape info, shading, shadows, texture • Scene : geometric layout • linear perspective, gradients, line segments • Material properties: albedo, feel, hardness • Color, texture • Action: motion • Optical flow, tracked points
  • 198. • Image features: map images to feature space • Classifiers: map feature space to label space x x x x x x x x o o o o o x2 x1 x xx x o ooo o o o x x x x x x x x o o o o o x2 x1 x xx x o ooo o o o x x x x x x x x o o o o o x2 x1 x xx x o ooo o o o
  • 199. “Shallow” vs. “deep” architectures Hand-designed feature extraction Trainable classifier Image/ Video Pixels Object Class Layer 1 Layer N Simple classifier Object Class Image/ Video Pixels Traditional recognition: “Shallow” architecture Deep learning: “Deep” architecture …
  • 200. Progress from Supervised Learning 2012 AlexNet 2013 ZF 2014 VGG 2014 GoogLeNet 2015 ResNet 2016 GoogLeNet-v4 15 10 5 ImageNet Image Classification Top5 Error
  • 201. Object Detection Search by Sliding Window Detector • May work well for rigid objects • Key idea: simple alignment for simple deformations Object or Background?
  • 202. Object Detection Search by Parts-based model • Key idea: more flexible alignment for articulated objects • Defined by models of part appearance, geometry or spatial layout, and search algorithm
  • 203. Vision as part of an intelligent system 3D Scene Feature Extraction Interpretation Action Texture Color Optical Flow Stereo Disparity Grouping Surfaces Bits of objects Sense of depth Objects Agents and goals Shapes and properties Open paths Words Walk, touch, contemplate, smile, evade, read on, pick up, … Motion patterns
  • 204. Well-Established (patch matching) Major Progress (pattern matching++) New Opportunities (interpretation/tasks) Multi-view Geometry Face Detection/Recognition Object Tracking / Flow Category Detection Human Pose 3D Scene Layout Vision for Robots Life-long Learning Entailment/Prediction Slide credit: Derek Hoiem
  • 205. Scene Understanding = Objects + People + Layout + Interpretation within Task Context
  • 206. What do I see?  Why is this happening? What is important? What will I see? How can we learn about the world through vision? How do we create/evaluate vision systems that adapt to useful tasks?
  • 207. Important open problems • How can we interpret vision given structured plans of a scene?
  • 208. Important open problems • Algorithms: works pretty well  perfect • E.g., stereo: top of wish list from Pixar guy Micheal Kass • Good directions: • Incorporate higher level knowledge
  • 209. Important open problems • Spatial understanding: what is it doing? Or how do I do it? Important questions: • What are good representations of space for navigation and interaction? What kind of details are important? • How can we combine single-image cues with multi-view cues?
  • 210. Important open problems Object representation: what is it? Important questions: • How can we pose recognition so that it lets us deal with new objects? • What do we want to predict or infer, and to what extent does that rely on categorization? • How do we transfer knowledge of one type of object to another?
  • 211. Important problems • Learning features and intermediate representations that transfer to new tasks Figure from http://www.datarobot.com/blog/a-primer-on-deep-learning/
  • 212. Important open problems • Almost everything is still unsolved! • Robust 3D shape from multiple images • Recognize objects (only faces and maybe cars is really solved, thanks to tons of data) • Caption images/video • Predict intention • Object segmentation • Count objects in an image • Estimate pose • Recognize actions • ….
  • 213. This course has provided fundamentals • What is computer vision? • Examples of computer vision applications • Basic principles of computer vision: • Image formation • Filtering • Correspondence and alignment • Geometry • Recognition.
  • 214. How do you learn more? Explore!
  • 215. Resources – where to learn more • Awesome Computer Vision, Awesome Deep Vision • A curated list of awesome computer vision resources • Computer Vision Taiwanese Group • > 3,000 members in facebook • Videolectures and ML&CV Talks • Lectures, Keynotes, Panel Discussions • OpenCV – C++/Python/Java