SlideShare a Scribd company logo
1 of 27
Download to read offline
Learning structured representations


           Deva Ramanan
             UC Irvine
fw (x) =Traini
                                                                                      wĀ·       ā€¢
           Visual            representations
                               ā€¢ Training data consists of images with labeled
                                                                             N
                               ā€¢ Need to learn the model structure, ļ¬lters and d               ā€¢
                                                positives                         negatives
                                        Learned model
                                          Training
                                               fw (x) = w Ā· Ī¦(x)
                    ā€¢   Training data consists of images with labeled bounding boxes
                                                                                          Training
                    ā€¢   Need to learn the model structure, ļ¬lters and deformation costs




                                                        Training




Geometric models                                                 positive    negative
                                                         Statistical classifiers
 (1970s-1990s)                                             (1990s-present)weights
                                                                 weights

                                                  Large-scale training
Hand-coded models
                                            Appearance-based representations
Learned model
Learned visual fw (x) = w Ā· Ī¦(x)
               representations
                           Training
     ā€¢   Training data consists of images with labeled bounding boxes

     ā€¢   Need Wherethe invariance built in? deformation costs
              to learn is model structure, ļ¬lters and

                          Representation
                       (linear classifier, ...)

                                             Training




                            Features
ViolaJones                                              Dalal Triggs
                                                         positive       nega
                                                         weights        wei
Learned visual representations
                                        Where is invariance built in?
                                                              4                                                     4
                                                       4       4
                                                               4                                          4          4
                                                                                                                     4
                                                       Representation
                                                               4                                                     4
                                                  (latent-variable classifier)




                                                           Features
               (a)           (b)           (c)
      (a)       (a) (b)
                (a)
                (a)
                              (b) (c)
                              (b)
                              (b)
                                            (c)
                                            (c)
                                            (c)
                                                                 (a)           (b)      (c)
                                                                                Felzenszwalb      et al 09
                                                        (a)       (a) (b)
                                                                  (a)           (b) (c)
                                                                                (b)      (c)
                                                                                         (c)
on model. The model is deļ¬ned by a coarse root ļ¬lter (a), several (a)           (b)      (c)
ections obtained withby single by a coarse root ļ¬lter (a), The model is deļ¬ned by a coarse (b) ļ¬lter (a), several
on model. The deļ¬ned isa deļ¬ned component person model.several
on model. The model is deļ¬ned byroot ļ¬lter root several several
 The model is             a coarse          (a),
on model. The model is deļ¬ned by a coarse root ļ¬lter (a), several
               model               a coarse      ļ¬lter (a),
                                                                          (a)              root              (c)
btained with each with relative tocomponent personļ¬lters specifydeļ¬ned is deļ¬ned byroot ļ¬lter root several several
e locationobtained with a single component person model. The model is deļ¬ned by a coarse root ļ¬lter (a), several
 tections obtained partcomponent the root model. The modelThe model by a coarse a coarse (a), ļ¬lter (a),
             of a single                 person (c). The               is
 tections obtained with relative to(c). The ļ¬ltersThe ļ¬lters specify model is deļ¬ned by a coarse root ļ¬lter (a), several
    location of each part a a root component person model. The
                            a single model (c). specify
 tections part relative andthe spatialthe root for the location of each part relative to the root (c). The ļ¬lters specify
                               single                         model.
  eof each ļ¬lters (b) to relative to the root (c). The ļ¬lters specify
ution part of each part relative to the root (c). The ļ¬lters specify
  e location
 isualization of each(b) positive spatial model for the location of relative to relative to(c). The ļ¬ltersThe ļ¬lters specify
  e location and a spatial model for thedifferent orientations. The
         (b) show part
 ution part ļ¬lters the andaa single model for theof each part each part relative to the deļ¬ned The a coarse root ļ¬lte
  ļ¬lters obtained with a weights at location person model. The model is root (c). by
 ions part ofshow (b) positivespatial component location of each part relative to theatroot (c). The ļ¬lters specify
visualizationļ¬lters the and a at weights atorientations. location of each part
 ution part ļ¬lters (b) positivespatial model for the The
                          and       different different orientations. The
visualization show the positive weights at different orientations. The
histogram show the gradients features. Their visualization The
                oriented
                                                                                 the root
                                                                                             the different
                                                                                                           specify
 ution the positive weights weights at different orientations. show the positive weights root (c). orientations. The
 n show                                                                                                        ļ¬lters specify
  ingorientedof oriented gradients features. Their visualization show the positive different different orientations. model. T
      the center of a part at different1.
 histogram of oriented gradients
 of                              Fig.features. Their visualization show the positive weightscomponent person The
                                                Detections the root.
                                                                 obtained with a single at different orientations. The
visualization gradients features. Their visualization show the positive weights at weights atorientations. The
                                          locations relative to the root.
 histogrampart of a part at different locations the root.
enter the acenterat different locations relative Their visualization show the of eachweightsrelative toorientations. (c). The ļ¬
cing the center of (b) anddifferent locations relative tothe root.
 n part center of a part at different ā€œcostā€ to relative to the location positive part at different the root The
   of of
cing the ļ¬lters a part at a the locations placing
 histogram of models reļ¬‚ects spatial model for the center
cingthe spatialoriented gradients features. of relative to the root. of a part at different locations relative to the root.
person                           bottle

           Where does learning fit in?
Training                                  Alg              Ground
images                                   output             truth

                Matching                                         17



                  alg
                                         cat
                       person                  bottle




 Tune parameters ( ,            ) till desired output on training set

      ā€˜Graduate Student Descentā€™ might take a while
                 (phrase from Marshall Tappen)
                                cat
5 years of PASCAL people detection
                                                                   Matching results

             50
            37.5
 average
             25
precision
            12.5
              0
                   05
                        06
                             07
                                  08
                                       09
                                            10
                                                 (after non-maximum suppression)
                   20
                        20
                             20
                                  20
                                       20
                                            20
                                                    ~1 second to search all scales


                             1% to 47% in 5 years

               How do we move beyond the plateau?
How do we move beyond the plateau?

1. Develop more structured models with less invariant features
Invariance vs Search

    Projective Invariants




    View-Based Mixtures
person            person
                      person                            person bottle
                                                        person bottle
     person
                         person                            person        bottle
     person                                                 bottle
                                                           bottle
Invariance vs Parametric Search
     person              person
                         person
                                                        person
                                                          bottle
                                                       person
                                                                           bottle
                                                                          bottle

              Part-Based Models




                                                            cat             cat
                             cat
                              cat                      4
                             cat                   4    4
                                                        4
                                                        4   cat           cat
                                                                        cat
                                                            cat cat
                                                                 cat           c
                                    cat
                                     cat



                      (a)        (b)        (c)
                (a)    (a) (b)
                       (a)        (b) (c)
                                  (b)        (c)
                                             (c)
                       (a)        (b)        (c)
Learned visual representations
                  Where is invariance built in?

                             Representation
                       (latent-variable classifier)




                              Features
Yi & Ramanan 11



                    Buffy performance: 88% vs 73%
Qualitative Results
How do we move beyond the plateau?

1. Develop more structured models with less invariant features


2. Score syntax as semantics
The forgotten challenge....




!"#$%&#

 '()*+"&,)-#.*/)&,*$#012*-"&"3&)4#*&4501"-*)1*)&,"4*-5&5
   678)4-*+"&,)-*-)"#*1)&*5&&"+9&*&)*-"&"3&*8""&
                                  Head Hand ;))&
                                  :"5- :51- Foot
             <=>?=@A:$+51@5B)$&   CDED    FEF   GEH
                   6I;6!JAK<J     LHEC   GMED   MEM
ure 8: Top: heat equilibrium for two bones. Bottom: the result
otating the right bone with the heat-based attachment
                                                                  Structured classifiers
                                                                          Figure 10: A centaur pirate with a centaur skeleton embedded looks
                                                                          at a cat with a quadruped skeleton embedded
  the character volume as an insulated heat-conducting body and
e the temperature of bone i to be 1 while keeping the tempera-
 of all of the other bones at 0. Then we can take the equilibrium
perature at each vertex on the surface as the weight of bone i at
 vertex. Figure 8 illustrates this in two dimensions.
 olving for heat equilibrium over a volume would require tes-
ating the volume and would be slow. Therefore, for simplic-
Pinocchio solves for equilibrium over the surface only, but at
 e vertices, it adds the heat transferred from the nearest bone.
                                                              i
  equilibrium over the surface for bone i is given by āˆ‚w = āˆ‚t
 i
   + H(pi āˆ’ wi ) = 0, which can be written as

                   āˆ’āˆ†wi + Hwi = Hpi ,                          (1)

 re āˆ† is the discrete surface Laplacian, calculated with the
 ngent formula [Meyer et al. 2003], pi is a vector with pi = 1
                                                             j
 e nearest bone to vertex j is i and pi = 0 otherwise, and H is
                                                                                                     shape
                                                                          Figure 11: The human scan on the left is rigged by Pinocchio and is
                                                                          posed on the right by changing joint angles in the embedded skele-
                                                                          ton. The well-known deļ¬ciencies of LBS can be seen in the right
                                                                                                                                                 Estimated
                                                                                                                                                   shape
                                       j
diagonal matrix with Hjj being the heat contribution weight of            knee and hip areas.
nearest bone to vertex j. Because āˆ† has units of lengthāˆ’2 , so
 t H. Letting d(j) be the distance from vertex j to the nearest
e, Pinocchio uses Hjj = c/d(j)2 if the shortest line segment              5.1 Generality
m the vertex to the bone is contained in the character volume             Figure 9 shows our 16 test characters and the skeletons Pinocchio
 Hjj = 0 if it is not. It uses the precomputed distance ļ¬eld to           embedded. The skeleton was correctly embedded into 13 of these



                                                                                         classifier
 rmine whether a line segment is entirely contained in the char-          models (81% success). For Models 7, 10 and 13, a hint for a single
 r volume. For c ā‰ˆ 0.22, this method gives weights with similar           joint was sufļ¬cient to produce a good embedding.
sitions to those computed by ļ¬nding the equilibrium over the                 These tests demonstrate the range of proportions that our method
 me. Pinocchio uses c = 1 (corresponding to anisotropic heat              can tolerate: we have a well-proportioned human (Models 1ā€“4, 8),
usion) because the results look more natural. When k bones are            large arms and tiny legs (6; in 10, this causes problems), and large
 distant from vertex j, heat contributions from all of them are           legs and small arms (15; in 13, the small arms cause problems). For
d: pj is 1/k for all of them, and Hjj = kc/d(j)2 .                        other characters we tested, skeletons were almost always correctly
 quation (1) is a sparse linear system, and the left hand side            embedded into well-proportioned characters whose pose matched

                                                                                                                                                  Estimated
rix āˆ’āˆ† + H does not depend on i, the bone we are interested               the given skeleton. Pinocchio was even able to transfer a biped
Thus we can factor the system once and back-substitute to ļ¬nd             walk onto a human hand, a cat on its hind legs, and a donut.
weights for each bone. Botsch et al. [2005] show how to use                  The most common issues we ran into on other characters were:
 arse Cholesky solver to compute the factorization for this kind
 ystem. Pinocchio uses the TAUCS [Toledo 2003] library for
 computation. Note also that the weights wi sum to 1 for each
                                                                                              reflectance
                                                                            ā€¢ The thinnest limb into which we may hope to embed a bone
                                                                              has a radius of 2Ļ„ . Characters with extremely thin limbs often    reflectance
                                                                              fail because the the graph we extract is disconnected. Reduc-
ex: if we sum (1) over i, we get (āˆ’āˆ† + H) i wi = H Ā· 1,
                                                P
                                                                              ing Ļ„ , however, hurts performance.
ch yields i wi = 1.
           P
  is possible to speed up this method slightly by ļ¬nding vertices           ā€¢ Degree 2 joints such as knees and elbows are often positioned
  are unambiguously attached to a single bone and forcing their               incorrectly within a limb. We do not know of a reliable way
ght to 1. An earlier variant of our algorithm did this, but the im-           to identify the right locations for them: on some characters
  ement was negligible, and this introduced occasional artifacts.             they are thicker than the rest of the limb, and on others they
                                                                              are thinner.
 Results                                                                    Although most of our tests were done with the biped skeleton,
 evaluate Pinocchio with respect to the three criteria stated in          we have also used other skeletons for other characters (Figure 10).
introduction: generality, quality, and performance. To ensure
 bjective evaluation, we use inputs that were not used during             5.2 Quality
elopment. To this end, once the development was complete, we              Figure 11 shows the results of manually posing a human scan us-
ed Pinocchio on 16 biped Cosmic Blobs models that we had not              ing our attachment. Our video [Baran and PopoviĀ“ 2007b] demon-
                                                                                                                            c
 iously tried.                                                            strates the quality of the animation produced by Pinocchio.



                                                                      6
Lead: Jitendra Malik (UC Berkeley)
                 Structured object reports
  Participants: Deva Ramanan (UC Irvine), Steve Seitz (U Washington




duction/goal: Human detection and pose estimation are tasks with many applicat
ng next-generation human-computer interfaces and activity understanding. Detection
                      ā€œIf youā€™re not winning the game, change the rulesā€
 s a classiļ¬cation problem (does this window contain a person or not?), while pose es
en cast as a regression problem, where given an image or sequence of frames, one m
oint angles. This project will take a more general view and cast both tasks as one of ā€œp
e a full syntactic parse will report the number of people present (if any), their body
Lead: J
Caveat: we need                               more pixels Rama
                                              Participants: Deva

                              Multiresolution models for object d
                               Dennis Park                 Deva Ramanan                     Charless Fowlkes

         Motivation & Goal                                                                      S3. Now we re
           Objects in images come with various resolutions.                                      star model
           Most recognition systems are scale-invariant,                                           eliminate bl
           i.e. ļ¬xed-size template
                                                                                                 LR global tem
           More pixels mean more information!
                                                                                                   naturally ļ¬ts
           We want to use the information when it is avail-
                                                                                                   LR template
           able.
                                                                                                   HR templat
                Test image                                                                       trained by La
                                    Goal :                                                         part locatio
                                    1. We want to use more pixels.
                                    2. We want to detect small instances as well.
                                    3. In addition, we try to address the correlation be-         Ī¦(x, s, z) =
                                       tween resolution and the role of context.



                           Introduction/goal: Human scoring funct
  We should focus on high-resolution data
         Model
                                                           detect
                           cluding next-generation human-com=
   (in contrast to most learning methods)
          Building blocks
                                                              f (x, s)

          HOG features [1]
          SVM              cast as a classiļ¬cation problem &(does
                                                         S4. ļ¬nal mod
                                                          The boundar
Caltech Pedestrian Benchmark
                             missed
     10
d detections                 detections
               Multiresolution model




 , we show the result of our low-resolution rigid-template baseline.
                                                              Park et al. 2010
s to detect large instances. On the right, we show detections of
, part-based baseline, which fails to ļ¬nd small instances. On the
detections of our multiresolution model that is able to detect both
tances. The threshold of each model is set todecrease same rate of
       Multiresolution representations yield the error by 2X           compared to previous work
How do we move beyond the plateau?

1. Develop more structured models with less invariant features


2. Score syntax as semantics


3. Generate ground-truth datasets of structured labels
Case study: small or big parts




Skeleton   Parts/Poselets   Mini-parts
What are good representations?

            Exemplars
               Parts
             Attributes
           Visual Phrases
             Grammars
                  ?
Even worse: what are the parts
         (if any)?
     Is there any structure to label here?
Sharing surfaces?
Selective parameter sharing


                                              v
           v                v




Exemplars => Parts => Attributes => Grammars

   Multi-task training of instance-specific classifiers
Human-in-the-loop structure learning
How do we move beyond the plateau?

1. Develop more structured models with less invariant features


2. Score ā€œnuisanceā€ variables as meaningful output


3. Generate ground-truth datasets of structured labels
Diagram for Eero

       Machine Learning


      Vision




Vision as applied machine learning
Diagram for Eero
                  Vision




    Graphics               Machine Learning
(shape & appearance)

    Vision as structured pattern recognition

More Related Content

What's hot

Ph d colloquium 2
Ph d colloquium 2Ph d colloquium 2
Ph d colloquium 2Rishi Roy
Ā 
COLLADA to WebGL (GDC 2013 presentation)
COLLADA to WebGL (GDC 2013 presentation)COLLADA to WebGL (GDC 2013 presentation)
COLLADA to WebGL (GDC 2013 presentation)Remi Arnaud
Ā 
Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words
Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-WordsVideo Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words
Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-WordsWesley De Neve
Ā 
7806 java 6 programming essentials using helios eclipse
7806 java 6 programming essentials using helios eclipse7806 java 6 programming essentials using helios eclipse
7806 java 6 programming essentials using helios eclipsebestip
Ā 
Reconsidering Custom Memory Allocation
Reconsidering Custom Memory AllocationReconsidering Custom Memory Allocation
Reconsidering Custom Memory AllocationEmery Berger
Ā 
Ten Commandments of Formal Methods: A decade later
Ten Commandments of Formal Methods: A decade laterTen Commandments of Formal Methods: A decade later
Ten Commandments of Formal Methods: A decade laterJonathan Bowen
Ā 

What's hot (6)

Ph d colloquium 2
Ph d colloquium 2Ph d colloquium 2
Ph d colloquium 2
Ā 
COLLADA to WebGL (GDC 2013 presentation)
COLLADA to WebGL (GDC 2013 presentation)COLLADA to WebGL (GDC 2013 presentation)
COLLADA to WebGL (GDC 2013 presentation)
Ā 
Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words
Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-WordsVideo Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words
Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words
Ā 
7806 java 6 programming essentials using helios eclipse
7806 java 6 programming essentials using helios eclipse7806 java 6 programming essentials using helios eclipse
7806 java 6 programming essentials using helios eclipse
Ā 
Reconsidering Custom Memory Allocation
Reconsidering Custom Memory AllocationReconsidering Custom Memory Allocation
Reconsidering Custom Memory Allocation
Ā 
Ten Commandments of Formal Methods: A decade later
Ten Commandments of Formal Methods: A decade laterTen Commandments of Formal Methods: A decade later
Ten Commandments of Formal Methods: A decade later
Ā 

Viewers also liked

Fcv appli science_fergus
Fcv appli science_fergusFcv appli science_fergus
Fcv appli science_ferguszukun
Ā 
Fcv acad ind_martin
Fcv acad ind_martinFcv acad ind_martin
Fcv acad ind_martinzukun
Ā 
Fcv hum mach_perona
Fcv hum mach_peronaFcv hum mach_perona
Fcv hum mach_peronazukun
Ā 
Fcv appli science_perona
Fcv appli science_peronaFcv appli science_perona
Fcv appli science_peronazukun
Ā 
Fcv acad ind_lowe
Fcv acad ind_loweFcv acad ind_lowe
Fcv acad ind_lowezukun
Ā 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probabilityzukun
Ā 
Fcv learn fergus
Fcv learn fergusFcv learn fergus
Fcv learn ferguszukun
Ā 
Fcv the revolution will be curated: human in the loop fine grained visual cat...
Fcv the revolution will be curated: human in the loop fine grained visual cat...Fcv the revolution will be curated: human in the loop fine grained visual cat...
Fcv the revolution will be curated: human in the loop fine grained visual cat...zukun
Ā 

Viewers also liked (8)

Fcv appli science_fergus
Fcv appli science_fergusFcv appli science_fergus
Fcv appli science_fergus
Ā 
Fcv acad ind_martin
Fcv acad ind_martinFcv acad ind_martin
Fcv acad ind_martin
Ā 
Fcv hum mach_perona
Fcv hum mach_peronaFcv hum mach_perona
Fcv hum mach_perona
Ā 
Fcv appli science_perona
Fcv appli science_peronaFcv appli science_perona
Fcv appli science_perona
Ā 
Fcv acad ind_lowe
Fcv acad ind_loweFcv acad ind_lowe
Fcv acad ind_lowe
Ā 
02 cv mil_intro_to_probability
02 cv mil_intro_to_probability02 cv mil_intro_to_probability
02 cv mil_intro_to_probability
Ā 
Fcv learn fergus
Fcv learn fergusFcv learn fergus
Fcv learn fergus
Ā 
Fcv the revolution will be curated: human in the loop fine grained visual cat...
Fcv the revolution will be curated: human in the loop fine grained visual cat...Fcv the revolution will be curated: human in the loop fine grained visual cat...
Fcv the revolution will be curated: human in the loop fine grained visual cat...
Ā 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
Ā 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
Ā 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
Ā 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
Ā 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
Ā 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
Ā 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
Ā 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
Ā 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
Ā 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
Ā 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
Ā 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
Ā 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
Ā 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
Ā 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
Ā 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
Ā 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
Ā 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
Ā 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
Ā 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
Ā 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
Ā 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
Ā 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
Ā 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
Ā 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
Ā 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
Ā 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
Ā 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
Ā 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
Ā 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
Ā 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
Ā 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
Ā 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
Ā 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
Ā 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
Ā 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
Ā 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
Ā 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
Ā 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
Ā 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
Ā 

Recently uploaded

Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...amitlee9823
Ā 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
Ā 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
Ā 
Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...
Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...
Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...Lviv Startup Club
Ā 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
Ā 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
Ā 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
Ā 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
Ā 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
Ā 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
Ā 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
Ā 
Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
Ā 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
Ā 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
Ā 
VIP Call Girls In Saharaganj ( Lucknow ) šŸ” 8923113531 šŸ” Cash Payment (COD) šŸ‘’
VIP Call Girls In Saharaganj ( Lucknow  ) šŸ” 8923113531 šŸ”  Cash Payment (COD) šŸ‘’VIP Call Girls In Saharaganj ( Lucknow  ) šŸ” 8923113531 šŸ”  Cash Payment (COD) šŸ‘’
VIP Call Girls In Saharaganj ( Lucknow ) šŸ” 8923113531 šŸ” Cash Payment (COD) šŸ‘’anilsa9823
Ā 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
Ā 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
Ā 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
Ā 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
Ā 

Recently uploaded (20)

Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Bang...
Ā 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
Ā 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Ā 
Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...
Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...
Yaroslav Rozhankivskyy: Š¢Ń€Šø сŠŗŠ»Š°Š“Š¾Š²Ń– і трŠø ŠæŠµŃ€ŠµŠ“уŠ¼Š¾Š²Šø Š¼Š°ŠŗсŠøŠ¼Š°Š»ŃŒŠ½Š¾Ń— ŠæрŠ¾Š“уŠŗтŠøŠ²Š½...
Ā 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Ā 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
Ā 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
Ā 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
Ā 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Ā 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
Ā 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
Ā 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
Ā 
Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow šŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Ā 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
Ā 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Ā 
VIP Call Girls In Saharaganj ( Lucknow ) šŸ” 8923113531 šŸ” Cash Payment (COD) šŸ‘’
VIP Call Girls In Saharaganj ( Lucknow  ) šŸ” 8923113531 šŸ”  Cash Payment (COD) šŸ‘’VIP Call Girls In Saharaganj ( Lucknow  ) šŸ” 8923113531 šŸ”  Cash Payment (COD) šŸ‘’
VIP Call Girls In Saharaganj ( Lucknow ) šŸ” 8923113531 šŸ” Cash Payment (COD) šŸ‘’
Ā 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
Ā 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Ā 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
Ā 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
Ā 

Fcv learn ramanan

  • 1. Learning structured representations Deva Ramanan UC Irvine
  • 2. fw (x) =Traini wĀ· ā€¢ Visual representations ā€¢ Training data consists of images with labeled N ā€¢ Need to learn the model structure, ļ¬lters and d ā€¢ positives negatives Learned model Training fw (x) = w Ā· Ī¦(x) ā€¢ Training data consists of images with labeled bounding boxes Training ā€¢ Need to learn the model structure, ļ¬lters and deformation costs Training Geometric models positive negative Statistical classifiers (1970s-1990s) (1990s-present)weights weights Large-scale training Hand-coded models Appearance-based representations
  • 3. Learned model Learned visual fw (x) = w Ā· Ī¦(x) representations Training ā€¢ Training data consists of images with labeled bounding boxes ā€¢ Need Wherethe invariance built in? deformation costs to learn is model structure, ļ¬lters and Representation (linear classifier, ...) Training Features ViolaJones Dalal Triggs positive nega weights wei
  • 4. Learned visual representations Where is invariance built in? 4 4 4 4 4 4 4 4 Representation 4 4 (latent-variable classifier) Features (a) (b) (c) (a) (a) (b) (a) (a) (b) (c) (b) (b) (c) (c) (c) (a) (b) (c) Felzenszwalb et al 09 (a) (a) (b) (a) (b) (c) (b) (c) (c) on model. The model is deļ¬ned by a coarse root ļ¬lter (a), several (a) (b) (c) ections obtained withby single by a coarse root ļ¬lter (a), The model is deļ¬ned by a coarse (b) ļ¬lter (a), several on model. The deļ¬ned isa deļ¬ned component person model.several on model. The model is deļ¬ned byroot ļ¬lter root several several The model is a coarse (a), on model. The model is deļ¬ned by a coarse root ļ¬lter (a), several model a coarse ļ¬lter (a), (a) root (c) btained with each with relative tocomponent personļ¬lters specifydeļ¬ned is deļ¬ned byroot ļ¬lter root several several e locationobtained with a single component person model. The model is deļ¬ned by a coarse root ļ¬lter (a), several tections obtained partcomponent the root model. The modelThe model by a coarse a coarse (a), ļ¬lter (a), of a single person (c). The is tections obtained with relative to(c). The ļ¬ltersThe ļ¬lters specify model is deļ¬ned by a coarse root ļ¬lter (a), several location of each part a a root component person model. The a single model (c). specify tections part relative andthe spatialthe root for the location of each part relative to the root (c). The ļ¬lters specify single model. eof each ļ¬lters (b) to relative to the root (c). The ļ¬lters specify ution part of each part relative to the root (c). The ļ¬lters specify e location isualization of each(b) positive spatial model for the location of relative to relative to(c). The ļ¬ltersThe ļ¬lters specify e location and a spatial model for thedifferent orientations. The (b) show part ution part ļ¬lters the andaa single model for theof each part each part relative to the deļ¬ned The a coarse root ļ¬lte ļ¬lters obtained with a weights at location person model. The model is root (c). by ions part ofshow (b) positivespatial component location of each part relative to theatroot (c). The ļ¬lters specify visualizationļ¬lters the and a at weights atorientations. location of each part ution part ļ¬lters (b) positivespatial model for the The and different different orientations. The visualization show the positive weights at different orientations. The histogram show the gradients features. Their visualization The oriented the root the different specify ution the positive weights weights at different orientations. show the positive weights root (c). orientations. The n show ļ¬lters specify ingorientedof oriented gradients features. Their visualization show the positive different different orientations. model. T the center of a part at different1. histogram of oriented gradients of Fig.features. Their visualization show the positive weightscomponent person The Detections the root. obtained with a single at different orientations. The visualization gradients features. Their visualization show the positive weights at weights atorientations. The locations relative to the root. histogrampart of a part at different locations the root. enter the acenterat different locations relative Their visualization show the of eachweightsrelative toorientations. (c). The ļ¬ cing the center of (b) anddifferent locations relative tothe root. n part center of a part at different ā€œcostā€ to relative to the location positive part at different the root The of of cing the ļ¬lters a part at a the locations placing histogram of models reļ¬‚ects spatial model for the center cingthe spatialoriented gradients features. of relative to the root. of a part at different locations relative to the root.
  • 5. person bottle Where does learning fit in? Training Alg Ground images output truth Matching 17 alg cat person bottle Tune parameters ( , ) till desired output on training set ā€˜Graduate Student Descentā€™ might take a while (phrase from Marshall Tappen) cat
  • 6. 5 years of PASCAL people detection Matching results 50 37.5 average 25 precision 12.5 0 05 06 07 08 09 10 (after non-maximum suppression) 20 20 20 20 20 20 ~1 second to search all scales 1% to 47% in 5 years How do we move beyond the plateau?
  • 7. How do we move beyond the plateau? 1. Develop more structured models with less invariant features
  • 8. Invariance vs Search Projective Invariants View-Based Mixtures
  • 9. person person person person bottle person bottle person person person bottle person bottle bottle Invariance vs Parametric Search person person person person bottle person bottle bottle Part-Based Models cat cat cat cat 4 cat 4 4 4 4 cat cat cat cat cat cat c cat cat (a) (b) (c) (a) (a) (b) (a) (b) (c) (b) (c) (c) (a) (b) (c)
  • 10. Learned visual representations Where is invariance built in? Representation (latent-variable classifier) Features Yi & Ramanan 11 Buffy performance: 88% vs 73%
  • 12. How do we move beyond the plateau? 1. Develop more structured models with less invariant features 2. Score syntax as semantics
  • 13. The forgotten challenge.... !"#$%&# '()*+"&,)-#.*/)&,*$#012*-"&"3&)4#*&4501"-*)1*)&,"4*-5&5 678)4-*+"&,)-*-)"#*1)&*5&&"+9&*&)*-"&"3&*8""& Head Hand ;))& :"5- :51- Foot <=>?=@A:$+51@5B)$& CDED FEF GEH 6I;6!JAK<J LHEC GMED MEM
  • 14. ure 8: Top: heat equilibrium for two bones. Bottom: the result otating the right bone with the heat-based attachment Structured classifiers Figure 10: A centaur pirate with a centaur skeleton embedded looks at a cat with a quadruped skeleton embedded the character volume as an insulated heat-conducting body and e the temperature of bone i to be 1 while keeping the tempera- of all of the other bones at 0. Then we can take the equilibrium perature at each vertex on the surface as the weight of bone i at vertex. Figure 8 illustrates this in two dimensions. olving for heat equilibrium over a volume would require tes- ating the volume and would be slow. Therefore, for simplic- Pinocchio solves for equilibrium over the surface only, but at e vertices, it adds the heat transferred from the nearest bone. i equilibrium over the surface for bone i is given by āˆ‚w = āˆ‚t i + H(pi āˆ’ wi ) = 0, which can be written as āˆ’āˆ†wi + Hwi = Hpi , (1) re āˆ† is the discrete surface Laplacian, calculated with the ngent formula [Meyer et al. 2003], pi is a vector with pi = 1 j e nearest bone to vertex j is i and pi = 0 otherwise, and H is shape Figure 11: The human scan on the left is rigged by Pinocchio and is posed on the right by changing joint angles in the embedded skele- ton. The well-known deļ¬ciencies of LBS can be seen in the right Estimated shape j diagonal matrix with Hjj being the heat contribution weight of knee and hip areas. nearest bone to vertex j. Because āˆ† has units of lengthāˆ’2 , so t H. Letting d(j) be the distance from vertex j to the nearest e, Pinocchio uses Hjj = c/d(j)2 if the shortest line segment 5.1 Generality m the vertex to the bone is contained in the character volume Figure 9 shows our 16 test characters and the skeletons Pinocchio Hjj = 0 if it is not. It uses the precomputed distance ļ¬eld to embedded. The skeleton was correctly embedded into 13 of these classifier rmine whether a line segment is entirely contained in the char- models (81% success). For Models 7, 10 and 13, a hint for a single r volume. For c ā‰ˆ 0.22, this method gives weights with similar joint was sufļ¬cient to produce a good embedding. sitions to those computed by ļ¬nding the equilibrium over the These tests demonstrate the range of proportions that our method me. Pinocchio uses c = 1 (corresponding to anisotropic heat can tolerate: we have a well-proportioned human (Models 1ā€“4, 8), usion) because the results look more natural. When k bones are large arms and tiny legs (6; in 10, this causes problems), and large distant from vertex j, heat contributions from all of them are legs and small arms (15; in 13, the small arms cause problems). For d: pj is 1/k for all of them, and Hjj = kc/d(j)2 . other characters we tested, skeletons were almost always correctly quation (1) is a sparse linear system, and the left hand side embedded into well-proportioned characters whose pose matched Estimated rix āˆ’āˆ† + H does not depend on i, the bone we are interested the given skeleton. Pinocchio was even able to transfer a biped Thus we can factor the system once and back-substitute to ļ¬nd walk onto a human hand, a cat on its hind legs, and a donut. weights for each bone. Botsch et al. [2005] show how to use The most common issues we ran into on other characters were: arse Cholesky solver to compute the factorization for this kind ystem. Pinocchio uses the TAUCS [Toledo 2003] library for computation. Note also that the weights wi sum to 1 for each reflectance ā€¢ The thinnest limb into which we may hope to embed a bone has a radius of 2Ļ„ . Characters with extremely thin limbs often reflectance fail because the the graph we extract is disconnected. Reduc- ex: if we sum (1) over i, we get (āˆ’āˆ† + H) i wi = H Ā· 1, P ing Ļ„ , however, hurts performance. ch yields i wi = 1. P is possible to speed up this method slightly by ļ¬nding vertices ā€¢ Degree 2 joints such as knees and elbows are often positioned are unambiguously attached to a single bone and forcing their incorrectly within a limb. We do not know of a reliable way ght to 1. An earlier variant of our algorithm did this, but the im- to identify the right locations for them: on some characters ement was negligible, and this introduced occasional artifacts. they are thicker than the rest of the limb, and on others they are thinner. Results Although most of our tests were done with the biped skeleton, evaluate Pinocchio with respect to the three criteria stated in we have also used other skeletons for other characters (Figure 10). introduction: generality, quality, and performance. To ensure bjective evaluation, we use inputs that were not used during 5.2 Quality elopment. To this end, once the development was complete, we Figure 11 shows the results of manually posing a human scan us- ed Pinocchio on 16 biped Cosmic Blobs models that we had not ing our attachment. Our video [Baran and PopoviĀ“ 2007b] demon- c iously tried. strates the quality of the animation produced by Pinocchio. 6
  • 15. Lead: Jitendra Malik (UC Berkeley) Structured object reports Participants: Deva Ramanan (UC Irvine), Steve Seitz (U Washington duction/goal: Human detection and pose estimation are tasks with many applicat ng next-generation human-computer interfaces and activity understanding. Detection ā€œIf youā€™re not winning the game, change the rulesā€ s a classiļ¬cation problem (does this window contain a person or not?), while pose es en cast as a regression problem, where given an image or sequence of frames, one m oint angles. This project will take a more general view and cast both tasks as one of ā€œp e a full syntactic parse will report the number of people present (if any), their body
  • 16. Lead: J Caveat: we need more pixels Rama Participants: Deva Multiresolution models for object d Dennis Park Deva Ramanan Charless Fowlkes Motivation & Goal S3. Now we re Objects in images come with various resolutions. star model Most recognition systems are scale-invariant, eliminate bl i.e. ļ¬xed-size template LR global tem More pixels mean more information! naturally ļ¬ts We want to use the information when it is avail- LR template able. HR templat Test image trained by La Goal : part locatio 1. We want to use more pixels. 2. We want to detect small instances as well. 3. In addition, we try to address the correlation be- Ī¦(x, s, z) = tween resolution and the role of context. Introduction/goal: Human scoring funct We should focus on high-resolution data Model detect cluding next-generation human-com= (in contrast to most learning methods) Building blocks f (x, s) HOG features [1] SVM cast as a classiļ¬cation problem &(does S4. ļ¬nal mod The boundar
  • 17. Caltech Pedestrian Benchmark missed 10 d detections detections Multiresolution model , we show the result of our low-resolution rigid-template baseline. Park et al. 2010 s to detect large instances. On the right, we show detections of , part-based baseline, which fails to ļ¬nd small instances. On the detections of our multiresolution model that is able to detect both tances. The threshold of each model is set todecrease same rate of Multiresolution representations yield the error by 2X compared to previous work
  • 18. How do we move beyond the plateau? 1. Develop more structured models with less invariant features 2. Score syntax as semantics 3. Generate ground-truth datasets of structured labels
  • 19. Case study: small or big parts Skeleton Parts/Poselets Mini-parts
  • 20. What are good representations? Exemplars Parts Attributes Visual Phrases Grammars ?
  • 21. Even worse: what are the parts (if any)? Is there any structure to label here?
  • 23. Selective parameter sharing v v v Exemplars => Parts => Attributes => Grammars Multi-task training of instance-specific classifiers
  • 25. How do we move beyond the plateau? 1. Develop more structured models with less invariant features 2. Score ā€œnuisanceā€ variables as meaningful output 3. Generate ground-truth datasets of structured labels
  • 26. Diagram for Eero Machine Learning Vision Vision as applied machine learning
  • 27. Diagram for Eero Vision Graphics Machine Learning (shape & appearance) Vision as structured pattern recognition