SlideShare a Scribd company logo
1 of 66
Image Features and Categorization
Computer Vision
CS 543 / ECE 549
University of Illinois
Derek Hoiem
Jia-Bin Huang
04/07/15
Where are we now?
• Object instance recognition
• Face recognition
• Today: Image features and
categorization
• Object detection
• Object tracking
Today: Image features and categorization
• General concepts of categorization
– Why? What? How?
• Image features
– Color, texture, gradient, shape, interest points
– Histograms, feature encoding, and pooling
– CNN as feature
• Image and region categorization
What do you see in this image?
Can I put stuff in it?
Forest
Trees
Bear
Man
Rabbit Grass
Camera
Describe, predict, or interact with the
object based on visual cues
Is it alive?
Is it dangerous?
How fast does it run? Is it soft?
Does it have a tail? Can I poke with it?
Why do we care about categories?
• From an object’s category, we can make predictions about its
behavior in the future, beyond of what is immediately
perceived.
• Pointers to knowledge
– Help to understand individual cases not previously encountered
• Communication
Theory of categorization
How do we determine if something is a member
of a particular category?
•Definitional approach
•Prototype approach
•Exemplar approach
Definitional approach:
classical view of categories
• Plato & Aristotle
– Categories are defined by a list of
properties shared by all elements
in a category
– Category membership is binary
– Every member in the category is
equal
The Categories (Aristotle)
Slide Credit: A. A. Efros
Aristotle by Francesco Hayez
Prototype or sum of exemplars ?
Prototype Model Exemplars Model
Category judgments are made
by comparing a new exemplar
to the prototype.
Category judgments are made
by comparing a new exemplar
to all the old exemplars of a category
or to the exemplar that is the most
appropriate
Slide Credit: Torralba
Levels of categorization [Rosch 70s]
Definition of Basic Level:
• Similar shape: Basic level categories are the highest-level
category for which their members have similar shapes.
• Similar motor interactions: … for which people interact with its
members using similar motor sequences.
• Common attributes: … there are a significant number
of attributes in common between pairs of members.
Sub Basic Superordinate
similarity
Basic level
Subordinate
level
Superordinate
levels
“Fido”
dog
animal
quadruped
German
shepherd
Doberman
cat cow
…
…
……
… …
Rosch et a. Principle of categorization, 1978
Image categorization
• Cat vs Dog
Image categorization
• Object recognition
Caltech 101 Average Object Images
Image categorization
• Fine-grained recognition
Visipedia Project
Image categorization
• Place recognition
Places Database [Zhou et al. NIPS 2014]
Image categorization
• Visual font recognition
[Chen et al. CVPR 2014]
Image categorization
• Dating historical photos
[Palermo et al. ECCV 2012]
1940 1953 1966 1977
Image categorization
• Image style recognition
[Karayev et al. BMVC 2014]
Region categorization
• Layout prediction
Assign regions to orientation
Geometric context [Hoiem et al. IJCV 2007]
Assign regions to depth
Make3D [Saxena et al. PAMI 2008]
Region categorization
• Semantic segmentation from RGBD images
[Silberman et al. ECCV 2012]
Region categorization
• Material recognition
[Bell et al. CVPR 2015]
Supervised learning
LAB Histogram
Textons
Bag of SIFT
HOG
xx
x x
x
x
x
x
x
o
o
o
o
o
= Category
label
Examples Image Features Classifier+ +
Training phase
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Trained
Classifier
Testing phase
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Image
Features
Testing
Test Image
Trained
Classifier
Outdoor
PredictionTrained
Classifier
Testing phase
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Image
Features
Testing
Test Image
Trained
Classifier
Outdoor
PredictionTrained
Classifier
Q: What are good features for…
• recognizing a beach?
Q: What are good features for…
• recognizing cloth fabric?
Q: What are good features for…
• recognizing a mug?
What are the right features?
Depend on what you want to know!
•Object: shape
– Local shape info, shading, shadows, texture
•Scene : geometric layout
– linear perspective, gradients, line segments
•Material properties: albedo, feel, hardness
– Color, texture
•Action: motion
– Optical flow, tracked points
General principles of representation
• Coverage
– Ensure that all relevant info is
captured
• Concision
– Minimize number of features
without sacrificing coverage
• Directness
– Ideal features are independently
useful for prediction
Image representations
• Templates
– Intensity, gradients, etc.
• Histograms
– Color, texture, SIFT descriptors,
etc.
• Average of features
Image
Intensity
Gradient
template
Space Shuttle
Cargo Bay
Image representations: histograms
Global histogram
- Represent distribution of features
– Color, texture, depth, …
Images from Dave Kauchak
Image representations: histograms
• Data samples in 2D
Feature 1
Feature2
Image representations: histograms
• Probability or count of data in each bin
• Marginal histogram on feature 1
Feature 1
Feature2
bin
Image representations: histograms
• Marginal histogram on feature 2
Feature 1
Feature2
bin
Image representations: histograms
• Joint histogram
Feature 1
Feature2
bin
Modeling multi-dimensional data
Joint histogram
•Requires lots of data
•Loss of resolution to
avoid empty bins
Feature 1
Feature2
Feature 1
Feature2
Marginal histogram
• Requires independent features
• More data/bin than
joint histogram
Feature 1
Feature2
Modeling multi-dimensional data
• Clustering
• Use the same cluster centers for all images
Feature 1
Feature2
bin
Computing histogram distance
• Histogram intersection
• Chi-squared Histogram matching distance
• Earth mover’s distance
(Cross-bin similarity measure)
– minimal cost paid to transform one distribution into the
other
( )∑=
−=
K
m
jiji mhmhhh
1
)(),(min1),histint(
∑= +
−
=
K
m ji
ji
ji
mhmh
mhmh
hh
1
2
2
)()(
)]()([
2
1
),(χ
[Rubner et al. The Earth Mover's Distance as a Metric for Image Retrieval, IJCV 2000]
Histograms: implementation issues
Few Bins
Need less data
Coarser representation
Many Bins
Need more data
Finer representation
• Quantization
– Grids: fast but applicable only with few dimensions
– Clustering: slower but can quantize data in higher
dimensions
• Matching
– Histogram intersection or Euclidean may be faster
– Chi-squared often works better
– Earth mover’s distance is good for when nearby bins
represent similar values
• Color
• Texture (filter banks or HOG over regions)
L*a*b* color space HSV color space
What kind of things do we compute
histograms of?
What kind of things do we compute
histograms of?
• Histograms of descriptors
• “Bag of visual words”
SIFT – [Lowe IJCV 2004]
Analogy to documentsAnalogy to documents
Of all the sensory impressions proceeding to
the brain, the visual experiences are the
dominant ones. Our perception of the world
around us is based essentially on the
messages that reach the brain from our eyes.
For a long time it was thought that the retinal
image was transmitted point by point to visual
centers in the brain; the cerebral cortex was
a movie screen, so to speak, upon which the
image in the eye was projected. Through the
discoveries of Hubel and Wiesel we now
know that behind the origin of the visual
perception in the brain there is a considerably
more complicated course of events. By
following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to
demonstrate that the message about the
image falling on the retina undergoes a step-
wise analysis in a system of nerve cells
stored in columns. In this system each cell
has its specific function and is responsible for
a specific detail in the pattern of the retinal
image.
sensory, brain,
visual, perception,
retinal, cerebral cortex,
eye, cell, optical
nerve, image
Hubel, Wiesel
China is forecasting a trade surplus of $90bn
(£51bn) to $100bn this year, a threefold
increase on 2004's $32bn. The Commerce
Ministry said the surplus would be created by
a predicted 30% jump in exports to $750bn,
compared with a 18% rise in imports to
$660bn. The figures are likely to further
annoy the US, which has long argued that
China's exports are unfairly helped by a
deliberately undervalued yuan. Beijing
agrees the surplus is too high, but says the
yuan is only one factor. Bank of China
governor Zhou Xiaochuan said the country
also needed to do more to boost domestic
demand so more goods stayed within the
country. China increased the value of the
yuan against the dollar by 2.1% in July and
permitted it to trade within a narrow band, but
the US wants the yuan to be allowed to trade
freely. However, Beijing has made it clear
that it will take its time and tread carefully
before allowing the yuan to rise further in
value.
China, trade,
surplus, commerce,
exports, imports, US,
yuan, bank, domestic,
foreign, increase,
trade, value
ICCV 2005 short course, L. Fei-Fei
Bag of visual words
• Image
patches
• BoW
histogram
• Codewords
Image categorization with bag of words
Training
1. Extract keypoints and descriptors for all training images
2. Cluster descriptors
3. Quantize descriptors using cluster centers to get “visual words”
4. Represent each image by normalized counts of “visual words”
5. Train classifier on labeled examples using histogram values as features
Testing
1. Extract keypoints/descriptors and quantize into visual words
2. Compute visual word histogram
3. Compute label or confidence using classifier
HW5 - Prob2 scene categorization
• Training and testing images for 8 categories
• Implement representation: color histogram
• BoW model: preprocessed descriptors
– Learning dictionary using K-Means
– Learning classifier (NN, SVM)
• Report results
Take a break…
Image source:
Bag of visual words image classification
[Chatfieldet al. BMVC 2011]
Feature encoding
• Hard/soft assignment to clusters
Fisher encoding
Kernel codebook encoding
Locality constrained encoding
[Chatfieldet al. BMVC 2011]
Histogram encoding
Fisher vector encoding
• Fit Gaussian Mixture Models
• Posterior probability
• First and second order differences to cluster k
[Perronnin et al. ECCV 2010]
Performance comparisons
• Fisher vector encoding outperforms others
• Higher-order statistics helps
[Chatfieldet al. BMVC 2011]
But what about spatial layout?
All of these images have the same color histogram
Spatial pyramid
Compute histogram in each spatial bin
Spatial pyramid
High number of features – PCA to reduce dimensionality
[Lazebnik et al. CVPR 2006]
Pooling
• Average/max pooling
• Second-order pooling
[Joao et al. PAMI 2014]
Source: Unsupervised Feature
Learning and Deep Learning
=avg/max
=avg/max
2012 ImageNet 1K
(Fall 2012)
2012 ImageNet 1K
(Fall 2012)
Shallow vs. deep learning
• Engineered vs. learned
features
ImageImage
Feature extractionFeature extraction
PoolingPooling
ClassifierClassifier
Label
ImageImage
ConvolutionConvolution
ConvolutionConvolution
ConvolutionConvolution
ConvolutionConvolution
ConvolutionConvolution
DenseDense
DenseDense
DenseDense
Label
Imagenet Classification with Deep Convolutional Neural
Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012
Gradient-Based Learning Applied to Document
Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of
the IEEE, 1998
Slide Credit: L. Zitnick
Imagenet Classification with Deep Convolutional Neural
Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012
Gradient-Based Learning Applied to Document
Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of
the IEEE, 1998
GPUs
+
Data*
* Rectified activations and dropout
Slide Credit: L. Zitnick
Convolutional activation features
[Donahue et al. ICML 2013]
CNN Features off-the-shelf:
an Astounding Baseline for Recognition
[Razavian et al. 2014]
Region representation
• Segment the image into superpixels
• Use features to represent each image
segment
Joseph Tighe and Svetlana Lazebnik
Region representation
• Color, texture, BoW
– Only computed within the local region
• Shape of regions
• Position in the image
Working with regions
• Spatial support is important –
multiple segmentation
• Spatial consistency – MRF smoothing
Geometric context [Hoiem et al. ICCV 2005]
Beyond categorization
• Exemplar models [Malisiewicz and Efros NIPS09, ICCV11]
– Ask not “what is this?”, ask “what is this like” –
Moshe Bar
• A train?
Things to remember
• Visual categorization help transfer knowledge
• Image features
– Coverage, concision, directness
– Color, gradients, textures, motion, descriptors
– Histogram, feature encoding, and pooling
– CNN as features
• Image/region categorization
Next lecture - Classifiers
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Image
Features
Testing
Test Image
Trained
Classifier
Outdoor
PredictionTrained
Classifier

More Related Content

What's hot

Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
NAVER Engineering
 
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
Edge AI and Vision Alliance
 

What's hot (20)

Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Dip chapter 2
Dip chapter 2Dip chapter 2
Dip chapter 2
 
Fame cvpr
Fame cvprFame cvpr
Fame cvpr
 
CV_2 Image Processing
CV_2 Image ProcessingCV_2 Image Processing
CV_2 Image Processing
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Digital image processing 2
Digital image processing   2Digital image processing   2
Digital image processing 2
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
 
Deep Learning in Robotics
Deep Learning in RoboticsDeep Learning in Robotics
Deep Learning in Robotics
 
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Iciap 2
Iciap 2Iciap 2
Iciap 2
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and Now
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering II
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Literature Review on Single Image Super Resolution
Literature Review on Single Image Super ResolutionLiterature Review on Single Image Super Resolution
Literature Review on Single Image Super Resolution
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
 

Viewers also liked

Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
Jia-Bin Huang
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiuc
Jia-Bin Huang
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
Jia-Bin Huang
 
UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation
Jia-Bin Huang
 
美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)
美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)
美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)
Jia-Bin Huang
 
An Exemplar Model For Learning Object Classes
An Exemplar Model For Learning Object ClassesAn Exemplar Model For Learning Object Classes
An Exemplar Model For Learning Object Classes
Shao-Chuan Wang
 

Viewers also liked (20)

Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...A Computational Framework for Concept Representation in Cognitive Systems and...
A Computational Framework for Concept Representation in Cognitive Systems and...
 
Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013
 
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
 
How to Read Academic Papers
How to Read Academic PapersHow to Read Academic Papers
How to Read Academic Papers
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
Research 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXResearch 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeX
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB Code
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiuc
 
Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum Vitae
 
UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation UIUC CS 498 - Computational Photography - Final project presentation
UIUC CS 498 - Computational Photography - Final project presentation
 
美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)
美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)
美國研究所申請流程 (A Guide for Applying Graduate Schools in USA)
 
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
 
What Makes a Creative Photograph?
What Makes a Creative Photograph?What Makes a Creative Photograph?
What Makes a Creative Photograph?
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
How to come up with new research ideas
How to come up with new research ideasHow to come up with new research ideas
How to come up with new research ideas
 
An Exemplar Model For Learning Object Classes
An Exemplar Model For Learning Object ClassesAn Exemplar Model For Learning Object Classes
An Exemplar Model For Learning Object Classes
 

Similar to Lecture 21 - Image Categorization - Computer Vision Spring2015

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
Bo Li
 
Wits presentation 2_19052015
Wits presentation 2_19052015Wits presentation 2_19052015
Wits presentation 2_19052015
Beatrice van Eden
 
Fcv scene efros
Fcv scene efrosFcv scene efros
Fcv scene efros
zukun
 
Cvpr2007 tutorial bag_of_words
Cvpr2007 tutorial bag_of_wordsCvpr2007 tutorial bag_of_words
Cvpr2007 tutorial bag_of_words
Bo Li
 

Similar to Lecture 21 - Image Categorization - Computer Vision Spring2015 (20)

12 cie552 object_recognition
12 cie552 object_recognition12 cie552 object_recognition
12 cie552 object_recognition
 
Promising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in visionPromising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in vision
 
Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
 
Wits presentation 2_19052015
Wits presentation 2_19052015Wits presentation 2_19052015
Wits presentation 2_19052015
 
Bagwords
BagwordsBagwords
Bagwords
 
Fcv scene efros
Fcv scene efrosFcv scene efros
Fcv scene efros
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Ch 5 Object Recognition.pptx
Ch 5 Object Recognition.pptxCh 5 Object Recognition.pptx
Ch 5 Object Recognition.pptx
 
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
 
Cvpr2007 tutorial bag_of_words
Cvpr2007 tutorial bag_of_wordsCvpr2007 tutorial bag_of_words
Cvpr2007 tutorial bag_of_words
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
Digital immortality Roadmap
Digital immortality RoadmapDigital immortality Roadmap
Digital immortality Roadmap
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 
Idm unit i ppt (deleted 38112ace3a82cbb8fba22044606fd8dc)
Idm  unit i ppt (deleted 38112ace3a82cbb8fba22044606fd8dc)Idm  unit i ppt (deleted 38112ace3a82cbb8fba22044606fd8dc)
Idm unit i ppt (deleted 38112ace3a82cbb8fba22044606fd8dc)
 
How ‘How’ Reflects What’s What: Content-based Exploitation of How Users Frame...
How ‘How’ Reflects What’s What: Content-based Exploitation of How Users Frame...How ‘How’ Reflects What’s What: Content-based Exploitation of How Users Frame...
How ‘How’ Reflects What’s What: Content-based Exploitation of How Users Frame...
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
Untangling Concepts, Objects, and Information
Untangling Concepts, Objects, and InformationUntangling Concepts, Objects, and Information
Untangling Concepts, Objects, and Information
 
CV
CVCV
CV
 
strong slot and filler
strong slot and fillerstrong slot and filler
strong slot and filler
 
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
 

More from Jia-Bin Huang

Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual tracking
Jia-Bin Huang
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression Enhancement
Jia-Bin Huang
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure Extraction
Jia-Bin Huang
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes Recognition
Jia-Bin Huang
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
Jia-Bin Huang
 
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Jia-Bin Huang
 
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Jia-Bin Huang
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Jia-Bin Huang
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Jia-Bin Huang
 

More from Jia-Bin Huang (12)

How to write a clear paper
How to write a clear paperHow to write a clear paper
How to write a clear paper
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and Recognition
 
Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual tracking
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression Enhancement
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure Extraction
 
Static and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionStatic and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture Recognition
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes Recognition
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...Information Preserving Color Transformation for Protanopia and Deuteranopia (...
Information Preserving Color Transformation for Protanopia and Deuteranopia (...
 
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
Enhancing Color Representation for the Color Vision Impaired (CVAVI 2008)
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 

Lecture 21 - Image Categorization - Computer Vision Spring2015

  • 1. Image Features and Categorization Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Jia-Bin Huang 04/07/15
  • 2. Where are we now? • Object instance recognition • Face recognition • Today: Image features and categorization • Object detection • Object tracking
  • 3. Today: Image features and categorization • General concepts of categorization – Why? What? How? • Image features – Color, texture, gradient, shape, interest points – Histograms, feature encoding, and pooling – CNN as feature • Image and region categorization
  • 4. What do you see in this image? Can I put stuff in it? Forest Trees Bear Man Rabbit Grass Camera
  • 5. Describe, predict, or interact with the object based on visual cues Is it alive? Is it dangerous? How fast does it run? Is it soft? Does it have a tail? Can I poke with it?
  • 6. Why do we care about categories? • From an object’s category, we can make predictions about its behavior in the future, beyond of what is immediately perceived. • Pointers to knowledge – Help to understand individual cases not previously encountered • Communication
  • 7. Theory of categorization How do we determine if something is a member of a particular category? •Definitional approach •Prototype approach •Exemplar approach
  • 8. Definitional approach: classical view of categories • Plato & Aristotle – Categories are defined by a list of properties shared by all elements in a category – Category membership is binary – Every member in the category is equal The Categories (Aristotle) Slide Credit: A. A. Efros Aristotle by Francesco Hayez
  • 9. Prototype or sum of exemplars ? Prototype Model Exemplars Model Category judgments are made by comparing a new exemplar to the prototype. Category judgments are made by comparing a new exemplar to all the old exemplars of a category or to the exemplar that is the most appropriate Slide Credit: Torralba
  • 10. Levels of categorization [Rosch 70s] Definition of Basic Level: • Similar shape: Basic level categories are the highest-level category for which their members have similar shapes. • Similar motor interactions: … for which people interact with its members using similar motor sequences. • Common attributes: … there are a significant number of attributes in common between pairs of members. Sub Basic Superordinate similarity Basic level Subordinate level Superordinate levels “Fido” dog animal quadruped German shepherd Doberman cat cow … … …… … … Rosch et a. Principle of categorization, 1978
  • 12. Image categorization • Object recognition Caltech 101 Average Object Images
  • 13. Image categorization • Fine-grained recognition Visipedia Project
  • 14. Image categorization • Place recognition Places Database [Zhou et al. NIPS 2014]
  • 15. Image categorization • Visual font recognition [Chen et al. CVPR 2014]
  • 16. Image categorization • Dating historical photos [Palermo et al. ECCV 2012] 1940 1953 1966 1977
  • 17. Image categorization • Image style recognition [Karayev et al. BMVC 2014]
  • 18. Region categorization • Layout prediction Assign regions to orientation Geometric context [Hoiem et al. IJCV 2007] Assign regions to depth Make3D [Saxena et al. PAMI 2008]
  • 19. Region categorization • Semantic segmentation from RGBD images [Silberman et al. ECCV 2012]
  • 20. Region categorization • Material recognition [Bell et al. CVPR 2015]
  • 21. Supervised learning LAB Histogram Textons Bag of SIFT HOG xx x x x x x x x o o o o o = Category label Examples Image Features Classifier+ +
  • 25. Q: What are good features for… • recognizing a beach?
  • 26. Q: What are good features for… • recognizing cloth fabric?
  • 27. Q: What are good features for… • recognizing a mug?
  • 28. What are the right features? Depend on what you want to know! •Object: shape – Local shape info, shading, shadows, texture •Scene : geometric layout – linear perspective, gradients, line segments •Material properties: albedo, feel, hardness – Color, texture •Action: motion – Optical flow, tracked points
  • 29. General principles of representation • Coverage – Ensure that all relevant info is captured • Concision – Minimize number of features without sacrificing coverage • Directness – Ideal features are independently useful for prediction
  • 30. Image representations • Templates – Intensity, gradients, etc. • Histograms – Color, texture, SIFT descriptors, etc. • Average of features Image Intensity Gradient template
  • 31. Space Shuttle Cargo Bay Image representations: histograms Global histogram - Represent distribution of features – Color, texture, depth, … Images from Dave Kauchak
  • 32. Image representations: histograms • Data samples in 2D Feature 1 Feature2
  • 33. Image representations: histograms • Probability or count of data in each bin • Marginal histogram on feature 1 Feature 1 Feature2 bin
  • 34. Image representations: histograms • Marginal histogram on feature 2 Feature 1 Feature2 bin
  • 35. Image representations: histograms • Joint histogram Feature 1 Feature2 bin
  • 36. Modeling multi-dimensional data Joint histogram •Requires lots of data •Loss of resolution to avoid empty bins Feature 1 Feature2 Feature 1 Feature2 Marginal histogram • Requires independent features • More data/bin than joint histogram Feature 1 Feature2
  • 37. Modeling multi-dimensional data • Clustering • Use the same cluster centers for all images Feature 1 Feature2 bin
  • 38. Computing histogram distance • Histogram intersection • Chi-squared Histogram matching distance • Earth mover’s distance (Cross-bin similarity measure) – minimal cost paid to transform one distribution into the other ( )∑= −= K m jiji mhmhhh 1 )(),(min1),histint( ∑= + − = K m ji ji ji mhmh mhmh hh 1 2 2 )()( )]()([ 2 1 ),(χ [Rubner et al. The Earth Mover's Distance as a Metric for Image Retrieval, IJCV 2000]
  • 39. Histograms: implementation issues Few Bins Need less data Coarser representation Many Bins Need more data Finer representation • Quantization – Grids: fast but applicable only with few dimensions – Clustering: slower but can quantize data in higher dimensions • Matching – Histogram intersection or Euclidean may be faster – Chi-squared often works better – Earth mover’s distance is good for when nearby bins represent similar values
  • 40. • Color • Texture (filter banks or HOG over regions) L*a*b* color space HSV color space What kind of things do we compute histograms of?
  • 41. What kind of things do we compute histograms of? • Histograms of descriptors • “Bag of visual words” SIFT – [Lowe IJCV 2004]
  • 42. Analogy to documentsAnalogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step- wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. sensory, brain, visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. China, trade, surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value ICCV 2005 short course, L. Fei-Fei
  • 43. Bag of visual words • Image patches • BoW histogram • Codewords
  • 44. Image categorization with bag of words Training 1. Extract keypoints and descriptors for all training images 2. Cluster descriptors 3. Quantize descriptors using cluster centers to get “visual words” 4. Represent each image by normalized counts of “visual words” 5. Train classifier on labeled examples using histogram values as features Testing 1. Extract keypoints/descriptors and quantize into visual words 2. Compute visual word histogram 3. Compute label or confidence using classifier
  • 45. HW5 - Prob2 scene categorization • Training and testing images for 8 categories • Implement representation: color histogram • BoW model: preprocessed descriptors – Learning dictionary using K-Means – Learning classifier (NN, SVM) • Report results
  • 47. Bag of visual words image classification [Chatfieldet al. BMVC 2011]
  • 48. Feature encoding • Hard/soft assignment to clusters Fisher encoding Kernel codebook encoding Locality constrained encoding [Chatfieldet al. BMVC 2011] Histogram encoding
  • 49. Fisher vector encoding • Fit Gaussian Mixture Models • Posterior probability • First and second order differences to cluster k [Perronnin et al. ECCV 2010]
  • 50. Performance comparisons • Fisher vector encoding outperforms others • Higher-order statistics helps [Chatfieldet al. BMVC 2011]
  • 51. But what about spatial layout? All of these images have the same color histogram
  • 52. Spatial pyramid Compute histogram in each spatial bin
  • 53. Spatial pyramid High number of features – PCA to reduce dimensionality [Lazebnik et al. CVPR 2006]
  • 54. Pooling • Average/max pooling • Second-order pooling [Joao et al. PAMI 2014] Source: Unsupervised Feature Learning and Deep Learning =avg/max =avg/max
  • 57. Shallow vs. deep learning • Engineered vs. learned features ImageImage Feature extractionFeature extraction PoolingPooling ClassifierClassifier Label ImageImage ConvolutionConvolution ConvolutionConvolution ConvolutionConvolution ConvolutionConvolution ConvolutionConvolution DenseDense DenseDense DenseDense Label
  • 58. Imagenet Classification with Deep Convolutional Neural Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012 Gradient-Based Learning Applied to Document Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of the IEEE, 1998 Slide Credit: L. Zitnick
  • 59. Imagenet Classification with Deep Convolutional Neural Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012 Gradient-Based Learning Applied to Document Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of the IEEE, 1998 GPUs + Data* * Rectified activations and dropout Slide Credit: L. Zitnick
  • 60. Convolutional activation features [Donahue et al. ICML 2013] CNN Features off-the-shelf: an Astounding Baseline for Recognition [Razavian et al. 2014]
  • 61. Region representation • Segment the image into superpixels • Use features to represent each image segment Joseph Tighe and Svetlana Lazebnik
  • 62. Region representation • Color, texture, BoW – Only computed within the local region • Shape of regions • Position in the image
  • 63. Working with regions • Spatial support is important – multiple segmentation • Spatial consistency – MRF smoothing Geometric context [Hoiem et al. ICCV 2005]
  • 64. Beyond categorization • Exemplar models [Malisiewicz and Efros NIPS09, ICCV11] – Ask not “what is this?”, ask “what is this like” – Moshe Bar • A train?
  • 65. Things to remember • Visual categorization help transfer knowledge • Image features – Coverage, concision, directness – Color, gradients, textures, motion, descriptors – Histogram, feature encoding, and pooling – CNN as features • Image/region categorization
  • 66. Next lecture - Classifiers Training Labels Training Images Classifier Training Training Image Features Image Features Testing Test Image Trained Classifier Outdoor PredictionTrained Classifier

Editor's Notes

  1. Afternoon Image and region categorization Use one image , what do you see Can see different, transfer knowledge Can know a lot of things by seeing it a dog Type of categorization
  2. In the previous classes covered vision tasks involving recognition of particular instance Object instance recognition is localizing particular object instance in an image The other related task is face recognition, match one face to a particular person Today, talking about another task, mapping images and regions to categories. What is the difference? For example for face images, finding if the image contains John is a instance recognition task, but john is not a category. Recognizing a face images is a male or a female is categorization task.
  3. What do you say about this image? So this is a forest scene, and there is a lot trees, grass on the ground, man and holding a camera looking rabbit, bear behind. Giving out category names, doing categorization
  4. Looking at this image, by categorizing each of the object, we infer a lot of information. Animal alive, dangerous, how fast runs, I poke it. Infer in man is in danger
  5. I think about things in categories because there are many benefit to it. Mainly the benefit is, from categories, we can go beyond what we see and make prediction
  6. Knowing what a category is, how do we recognize them? One way to think about it is that I have a generic prototype of something say dog, and I compare everything to the prototype to make this decision. The other way to think about it is I have kept in my memory a number of exemplars and I compare new things to those exemplars and make the prediction
  7. So category is important but what kind of categories do we what to work with? It turns out that philosophers have been thinking about it over thousands of years and some of recent theories about categorization is made by a research Elenor Rosch in 1976. They found that first categories can be put into taxonomies. Dog is subcategory of animal, and German shepherd is subcategory of dog. At higher level things share very little in common, and at lower level, things share a lot in common. a fly is very different from a dog, but german shepherd is very similar to a greyhound, but the boundary may be less clear. So in the taxonomy tree, there is level called basic level where people are very comfortable to work with, they are like dog, cat, cow. In this level categories still share a lot in common and the boundaries of categories are very clear.
  8. Supervised learning First we represent the visual appearance of our examples using some image features, we want the features to capture important visual characteristics Then, we will build models to separate the examples from the same category from the examples that do not belong to this category
  9. At testing time, we are given with an new image, we also represent it in terms of image features and apply the trained model to obtain the prediction
  10. At testing time, we are given with an new image, we also represent it in terms of image features and apply the trained model to obtain the prediction
  11. At testing time, we are given with an new image, we also represent it in terms of image features and apply the trained model to obtain the prediction
  12. Object: shape, shadows
  13. To represent images using features, there is not a certain best way to represent an image. However, there are a few guidelines that apply: 1. we want our features to capture what is relveant to our categorization task.For example, we want to focus on material category, look at color and texture categorize the scene category, then we want to use some feature that can capture the overall layout. 2. Features to be simple, less complicated model. In the extreme case, we can directly use raw image intensity as features. But the problem is, structured, correlated and classifier would not be able to learn. Learned by classifiers 3. Correlated features, each predict different things. RGB, LAB, color. RGB and texture, different perspective.
  14. Genres of representation. Template of gradients, capturing the shape, shape are cues for object 2.histogram distribution of color and texture
  15. Histograms, measure feature distribution. General representation, The feature maybe color text, depth
  16. Recap on histogram Design issue histogram how many bin grids. Fewer loss more information compact More bins data points fine Euclidean
  17. Histogram of what. Color, RGB values LAB approximate human perception, HSV hue staturation and illumination. Textures: texture filter response
  18. Histogram of distinctive interesting points. Detect and describe, interesting point descriptor, histogram of descriptors Bag of visual words
  19. Analogy Language Bag of lexical word articles
  20. Patch local shape, by quantize them, dictionary of visual word and then we patch into a codeword Frequency distribution Visual words to represent
  21. So far, global histogram over the whole image discard spatial information Images contain structures and looking at a scene spatial distribution of feature we want to capture but we throw them away Result in the fact that I can construct a very different image and still get the same histogram This image is obvious a scene on grassland but with the noisy and gradient image on the bottom, color histogram
  22. Overcome this, global, pyramid of histogram Split into spatial bins at several levels and compute the histogram in each of them
  23. In more details, we still have the global histogram, but then I also have histograms computed in this 2x2 grid and histogram computed in 4x4 grid Each of the histograms are individual normalized. They are weighted because you want each level to have equal weights, and combined together. This spatial pyramid bag of bow is often considered as the goto feature to image categorization, works best for scenes, also works for most object. One critism high dimensionality and you can deal with by doing PCA dimension reduction
  24. Whole images regions Each region, extract features, predict on that. Features are extract within the region or the surrounding rather than global
  25. Color texture bag of word
  26. At testing time, we are given with an new image, we also represent it in terms of image features and apply the trained model to obtain the prediction