SlideShare a Scribd company logo
1 of 28
Robust Video Duplicates Detection & Localization
                 in Large Video Repositories
                                            Zhu Li

                                     Dept of Computing
                             The Hong Kong Polytechnic University




Z. Li, HK Polytechnic Univ                    1
Robust Video Duplicates Localization in Large Repository
●   Outline
     ●   The duplicate and near duplicate localization problem
           –   The problem
           –   The applications
           –   The challenges
     ●   Duplicate Likelihood Maximization Formulation
           –   The duplicate likelihood Gaussian Process modeling
           –   Duplicate Likelihood approximation via multi-indexed locality search
           –   Sequence Likelihood pruning
     ●   Simulation Results & Discussions
           –   Data Set and Simulation Setup
           –   Accuracy, Complexity and Tradeoffs
     ●   Conclusion, Related & Future Work




Z. Li, HK Polytechnic Univ                          2
The Application & Problem
   ●   Robust duplicate and near duplicate video detection and localization
       has many applications
        ●   Copyright protection
        ●   Video repository mining – find out how video programs are related
        ●   Query by capture
   ●   The challenges:
        ●   Robustness:
              –   Be able to detect edited,corrupted and reformatted video duplicates.
              –   Be able to locate the duplicate clips in a very large repository
              –   Scales with the length of the query clip
        ●   Complexity:
              –   The algorithm should scale well with the size of the repository,
              –   Complexity – Robustness tradeoffs
              –   Parallelization



Z. Li, HK Polytechnic Univ                             3
The Duplicate Likelihood of a frame
  ●   Duplicate frames are not the exact copy of the original in the
      repository
       ●   Otherwise, the problem is becoming trivial, just design a hash function would
           give us the exact match
  ●   Instead, duplicate frames can have various degrees of degradation
      and corruptions due to:
       ●   Coding and Communication Losses, i.e, quantizaiton effect, packet losses
       ●   Image formation variations: lighting change, re-capture process induced affine
           transforms, ...etc
       ●   Editing effects: subtitles, graphics and text overlay, re-sizing, cropping...etc




Z. Li, HK Polytechnic Univ                       4
The Duplicate Likelihood of a frame
  ●   This can be captured by a probabilistic model, for a given frame x,
      the likelihood of y is a duplicate of x is given by a Gaussian
      distribution:
                                   L(y; x) » N (mx ; ¾x )
                                               1                       ¡1
                                                         ¡ 1 (y¡mx )T ¾x (y¡mx )
                       L(y; mx ; ¾x ) =    d=2 j¾ j1=2
                                                       e   2
                                        2¼       x


       ●   With d x1 mean vector and dxd variance matrix in feature space of choice
  ●   The solution is actually feature agnostic.
       ●   The choice of feature space is up to the application, and the amount of loss,
           corruption and editing effects in the video frames affects the covariance matrix.
       ●   The choice of frame vs shot/GoP level features only affects the localization
           accuracy in this framework.




Z. Li, HK Polytechnic Univ                        5
An Scaled Appearance Feature
●
    Icon frame and its PCA projection
      ●
          Video Frame size: W=352, H=288
      ●
          Icon size: w=16, h=12
      ●
          A: d x 192,




                Info loss vs dimensions            Basis functions


Z. Li, HK Polytechnic Univ                 6
Likelihood Covariance Modeling
  ●   The mean of duplicate likelihood function is the original frame in the
      feature space
  ●   The covariance is determined by the type of feature and severity of
      the degradation of the duplicate frame, examples of JPEG
      quantization and AWGN cases, 32x32 covariance, eigen values




                             ¾quant                       ¾awgn
Z. Li, HK Polytechnic Univ                  7
Likelihood mean process : x(t)
  Video sequence mean process mx(t) in PCA space with 1st ,2nd and 3rd components




            40                                                            •“foreman” : 400 frames
            30                                                            •“stefan” : 300 frames
            20
                                                                          •“mother-daughter”: 300
            10
                                                                          frames
                                                                          •“mixed”: 40 shots of 60
             0
                                                                          frames each from
                                                                          randomly selected
           -10


           -20
            40                                                            sequences.
                  20                                                250
                                                              200
                         0                              150
                             -20                  100
                                             50
                                   -40   0




           . “foreman”, . “stefan”, . “mother-daughter” , . “mixed”



Z. Li, HK Polytechnic Univ                              8
The Likelihood Function of a Video Sequence
  ●   Given a video sequence x(t),t=1..n, the likelihood of a sequence y(t)
      is a duplicate of x(t) is given by,
                                         n
                                         Y
                  L(y(t); x(t)) =            L(y(t); mx (t); ¾x (t))
                                    t=1
       ●   Where mx(t) is the mean and ¾x (t) is the covariance matrix of the likelihood
           function at time t.
  ●   Then the video duplicate detection problem is a maximum likelihood
      problem, given k=1..K video sequences { xk(t) } in the repository, the
      duplicate is found by maximum likelihood detection:
                                 n
                                 Y
           k¤ = arg max                L(y(t); mk (t); ¾k (t))
                             k
                                 t=1
           ½
                         mk (t) = xk (t)
             2        2         2        2     2
            ¾k (t) = ¾quant + ¾loss + ¾gama + ¾edit

Z. Li, HK Polytechnic Univ                       9
Illustration of the Duplicate Likelihood Function for
                                  sequences
   ●   A set of 6 sequences and their likelihood function in R2(t): for a given
       query, need to find out which one gives the max likelihood.
   ●   To un-tangle the mess.....




Z. Li, HK Polytechnic Univ               10
The Likelihood Pruning Algorithm
  ●   Brutal force computing of likelihood function for all candidate
      sequences in the repository is computationally prohibitive
       ●   Eg, if we have 1 million clips of 5 minute video, the total number of likelihood
           evaluation would be 106 x 5 x 60 x 30 = 9 billions of evaluations.
  ●   Instead, we can start with a set of sequences that sharing non-zero
      likelihood at certain timestamp, and pruning those sequences with
      zero likelihood at other timestamps.
  ●   The key is to approximate the duplicate likelihood estimation, with a
      non zero-likelihood detection approximation via a multi-indexed
      locality search solution
  ●   The sequence duplicate likelihood function is approximated by a time
      stamp subset pruning process, finding matching duplicate sequences
      as,
                                        Y
                              S = fkj          L(y(t); xk (t); ¾k (t) ¸ 0g
                                        t2ts


Z. Li, HK Polytechnic Univ                         11
The Likelihood Pruning Algorithm
  ●   This leads to a likelihood pruning algorithm:
       ●   Given query clip y(t), Compute a time stamp set, ts={t1, t2, …, tm}
       ●   Init S0=     fk; jL(y(t1 ); xk (t1 ); ¾(t1 )) ¸ 0g
       ●   FOR i=1:m
                                 
                 Si+1        = Si fkjL(y(ti ); xk (ti ); ¾(ti ) ¸ 0g
                Break if Si has 1 or 0 elements left.
       ●   END
  ●   Criteria of ts selection:
       ●   The selection of time stamps should reflect maximum spatial-temporal diversity,
           in query clip y(t), i,e, find m points on y(t) that has the maximum separation
           possible among them, otherwise, becomes an image search problem




Z. Li, HK Polytechnic Univ                              12
Timestamp Selection Heuristic
  ●   We want the selected matching point has the best spatial-temporal
      diversity, ie., don't want video duplicate search degrade into image
      duplicate search.
  ●   An effective heuristic solution : Curve Length Guided Selection
       ●   Compute cumulative curve length of y(t) as by fitting a spline model y(s) on y(t):
                                        Z   t
                             Ly (t) =           y(s)ds
                                        s=0

       ●   If to have m time stamps, just find step size as


                             ts(k) : Ly (ts(k)) = k¢; k = 1::m
                                Ly (n)
                             ¢=
                                 m

Z. Li, HK Polytechnic Univ                         13
Likelihood approximation via Multi-Indexing Locality
  ●   We need to find out frames that have non-zero likelihood of a query
      frame y(t)
                             S(y) = fxjL(y; mx ; ¾x ) ¸ 0g
      This can be approximated by an epsilon-NN on mx, i.e,

                             S(y) = fxjd¾x (x; y) · d0 g

  ●   The choice of d0 reflects the zero likelihood prob cut-off Mahalanobis
      distance.




Z. Li, HK Polytechnic Univ                    14
Likelihood approximation via Multi-Indexing Locality
  ●   This can be approximated by multi-indexing structure by searching
      for leaf node containing frame y:
                                               [
                                    S(y) =          Fj (y)
                                                j


           where Fj(y) returns the indexing leaf node frame set from index structure j. By
           choosing an appropriate indexing height we can control the leaf node volume
           and therefore d0.
       ●   However, single indexing structure is not a good approximation: if query frame
           y is not at the centroid of the leaf node, then many non-zero likelihood frames
           will be missed.




Z. Li, HK Polytechnic Univ                     15
Multi-Index Likelihood Approximation
  ●   Graphical illustration
       ●   A query frame y, in +, is at northwest corner of its leaf node, F1(y) in index
           scheme 1. non-zero likelihood frames out side the red rectangles will be
           missed
       ●   But other 3 indexing scheme leaf nodes in black, can cover this loss
       ●   Careful selection of indexing scheme and leaf node volume can have a good
           approximation of the likelihood function




                                                                           +




                                                 Zero likelihood prob cut off
Z. Li, HK Polytechnic Univ                       16
Likelihood Pruning
  ●   Taking advantage of temporal dimension constraint, the sequence
      duplicate likelihood Lk(y) becomes zero, if any frame's duplicate
      likelihood is zero:
                             n
                             Y
                  Lk (y) =         L(y(t); mk (t); ¾k (t))
                             t=1

                   Lk (y) = 0; if9ts:t:L(y(t); mk (t); ¾k (t)) = 0

  ●   This means that at certain timestamp, t1, if the duplicate likelihood of
      y(t1) is zero w.r.t sequence k in the repository, we can remove it from
      the candidate set.




Z. Li, HK Polytechnic Univ                    17
Likelihood Trellis Pruning
●   Illustration of the process:
     ●   Start with initial non-zero likelihood set
     ●   Prune those having zero likelihood in the next stage of likelihood evaluation
     ●   Stop 'till only one, or no trellis left in the node (in simulation, 3~5 stages
         typically okay for locating 2~4 minute clip in 116 hour repository)




                      S(t1)   S(t2)      S(t3)        S(t4)    S(t5)      S(t6)

Z. Li, HK Polytechnic Univ                       18
Simulation
  ●   Data Set
       ●   Indexed Set:
             –   12 x 220 frames, or approx 116 hours of video from various sources
             –   Each frame is scaled to 16x12 icon size, and projected to a 32-dimensional space
             –   4 kd-tree index structure is built for 1~8, 9~16, 17~24, 25~32 dimensions, to provide
                 diversity in approximate likelihood function
             –   Alternative approach: hierarchical quadtree structure that can have better likelihood
                 approximation.
       ●   Negative Probe Set:
             –   5 hours of video from TRECVID shot detection set
             –   Scale to 16x12 icon and project to 32 dimensional space
       ●   Distortions/Image Formation Variations in query
             –   Quantization induced distortions: JPEG quality 30, 40, ..., 90.
             –   Block losses: random block loss of rate 2%~10%
             –   Blurs, Gama Corrections
             –   AWGN: equivalent to 20dB~40dB PSNR range


Z. Li, HK Polytechnic Univ                           19
Performance - Accuracy
       ●   Randomly generated 4000 negative probes and 8000 positive probes with
           various degree of corruptions
       ●   Probe clip length = 1, 2 and 4 minuets
       ●   Recall = 100%.
       ●   The precision performance for equivalent AWGN range of 20~40 dB in PSNR:




                                                                Diversity in likelihood
                                                                Modeling improves
                                                                accuracy




Z. Li, HK Polytechnic Univ                    20
Complexity Analysis
  ●   Duplicate Likelihood Estimation
       ●   Basically, O(L), for repository with a L level kd-tree.
             –   Notice that if we allow sacrifice in duplicate localization accuracy, a GoP level
                 feature indexing can reduce the problem set size by 15 times.
       ●   If we involve multiple indexing structure, need to merge multiple localities to
           come up with non-zero likelihood repository frame estimate for the given probe.
       ●   The robustness of the likelihood estimation is tied to the index tree height,
           covariances of the distortions in the probe
  ●   Likelihood Pruning
       ●   The pruning stops if there is only one trellis left, or no trellis.
       ●   Starting with an initial set, the size of candidate trellis decreases
       ●   Complexity is content dependent.
  ●   Parallelization
       ●   The index search can be parallelized, as each index structure is independent.
       ●   Pruning

Z. Li, HK Polytechnic Univ                            21
Performance - Complexity
  ●   Computing resource
       ●   Pure Matlab implementation, not optimized thoroughly.
       ●    3.0GHz Linux running Lenovo ThinkCenter, 4GB memory
  ●   Index complexity:
       ●   To build 12x220 8-dimension kd-tree of 214 leaf nodes, it takes approximately
           63~65 seconds on a Linux PC with 3.0GHz processor.
       ●   Leaf node has 768 frames.
       ●   There is a tradeoff between the size of leaf nodes and the robustness of the
           likelihood estimation




Z. Li, HK Polytechnic Univ                     22
Performance - Complexity
      Likelihood evaluation complexity, 5 time stamps:
       ●   For a given duplicate frame, to find all its non-zero likelihood repository frames
           takes average 0.13 ms for one index structure
       ●   For multi-index structure, find and merge non-zero likelihood repository frames
           takes approx 1.01 ms for 2 index case, and 2.20 ms for 3 index case.
       ●   Not much precision performance gain for 4 index case, so not reported.
  ●   Likelihood pruning complexity
       ●   For 5 timestamp case, 0.67, 0.94, 0.94 ms for 1, 2, and 3 index structures
       ●   Not a significant source of complexity, can afford more elaborate pruning
           sequences
  ●   Total time complexity:
       ●   Within 3 ms for 116 hour repository
       ●   Goal is to handle 10,000 hour within 10ms, will prob need more elaborate
           multiple pass likelihood pruning process



Z. Li, HK Polytechnic Univ                       23
Conclusion & Future Work
  ●   Conclusion
       ●   A feature agnostic and flexible likelihood modeling and pruning scheme for
           video (and also audio) duplicate detection and localization
       ●   Very high accuracy in precision-recall performance, roughly 98% precision on
           100% recall, to localize 2~4 minute probes within 3 seconds in a repository of
           116 hours
       ●   Good response time, takes only 3 ms for 116 hour repository and on track to
           achieve 10 ms response time for 10,000 hour repository (need some help in
           getting the data set).
       ●   Flexible Computational complexity – Robustness – Localization Accuracy
           scalability in performance.
  ●   Future Work
       ●   More sophisticated likelihood pruning schemes
       ●   Subspace learning in minimizing distortion induced likelihood covariance
       ●   Hierarchical approach in localization accuracy, ie., shot level color/SIFT
           histogram feature + frame level appearance feature mixed indexing.


Z. Li, HK Polytechnic Univ                      24
Acknowledgement
  ●   The work is supported with generous grants from MSRA and HK
      RGC.
  ●   Thanks for the stimulating discussions with Xian-Sheng, Lei Zhang,
      Mei Tao, and Lingjun.
  ●   Questions ?
  ●   If time permits......




Z. Li, HK Polytechnic Univ            25
Related Research – Graph Indexing
       ●   Model duplicate/object/biometric recognition problems as graph matching
           problem, G(V,E)
       ●   Bag of words approach – treat V as a set
       ●   Computational Topology approach – model 2 affinity matrix of G
             –   Apos (j,k) = | pos(vj) – pos(vk) |
             –   Afea (j,k) = dfea (vj, vk)
             –   Then use Eigen values of Lapacian of Apos and Afea as feature vector for
                 indexing/hashing

                                                                  f1-s 1             f1-s 2             f1-s 3             f1-s 4


                                                             10                 10                 10                 10

                                                             20                 20                 20                 20
                                                                   10      20         10      20         10      20         10      20
                                                                  f2-s 1             f2-s 2             f2-s 3             f2-s 4


                                                             10                 10                 10                 10

                                                             20                 20                 20                 20
                                                                   10      20         10      20         10      20         10      20
                                                                  f3-s 1             f3-s 2             f3-s 3             f3-s 4


                                                             10                 10                 10                 10

                                                             20                 20                 20                 20
                                                                   10      20         10      20         10      20         10      20
                                                                  f4-s 1             f4-s 2             f4-s 3             f4-s 4


                                                             10                 10                 10                 10

                                                             20                 20                 20                 20
                                                                  10       20        10       20         10      20         10      20



Z. Li, HK Polytechnic Univ                            26
Related Work – Subspace Indexing on Grassmann
                               Manifold
  ●   Subspace Indexing on Grassmann Manifold
       ●   Motivation: rich set of subspaces for large subject set recognition problem
       ●   Judicious use of local discriminative models over local data patch
       ●   Dual tree structure – Data Partition Tree (DPT) and Model Hierarchical
           Tree(MHT)
       ●   DPT partition the data set into local patches, train local discriminative models
           on local patches
       ●   Then VQ on subspaces on G(D, d) Grassmann manifold, with data patch
           locality constraint, and in the process also generate MHT
       ●   Label model accuracy on each node in MHT
       ●   At query time, probe search the DPT to locate local data patch, and then
           backtrack to root and finding the best model along the way.




Z. Li, HK Polytechnic Univ                      27
Subspace Indexing on Grassmann Manifold
                                Model Hierarchical Tree




                              Performance on Face Recognition




Z. Li, HK Polytechnic Univ               28

More Related Content

What's hot

MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...Dagmar Monett
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Hiroto Honda
 
Convolution Neural Networks
Convolution Neural NetworksConvolution Neural Networks
Convolution Neural NetworksAhmedMahany
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
210523 swin transformer v1.5
210523 swin transformer v1.5210523 swin transformer v1.5
210523 swin transformer v1.5taeseon ryu
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)Deep Learning JP
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
P-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problemsP-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problemsFrancesco Corucci
 
Robust-Circuit-Sizing:-EDA-for-EDA
Robust-Circuit-Sizing:-EDA-for-EDARobust-Circuit-Sizing:-EDA-for-EDA
Robust-Circuit-Sizing:-EDA-for-EDAhauschildm
 

What's hot (12)

MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
MATHEON D-Day: Numerical simulation of integrated circuits for future chip ge...
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
Convolution Neural Networks
Convolution Neural NetworksConvolution Neural Networks
Convolution Neural Networks
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
210523 swin transformer v1.5
210523 swin transformer v1.5210523 swin transformer v1.5
210523 swin transformer v1.5
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
P-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problemsP-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problems
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
Robust-Circuit-Sizing:-EDA-for-EDA
Robust-Circuit-Sizing:-EDA-for-EDARobust-Circuit-Sizing:-EDA-for-EDA
Robust-Circuit-Sizing:-EDA-for-EDA
 

Viewers also liked

Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 Meeting
Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 MeetingTutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 Meeting
Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 MeetingUnited States Air Force Academy
 
Light Weight Fingerprinting for Video Playback Verification in MPEG DASH
Light Weight Fingerprinting for Video Playback Verification in MPEG DASHLight Weight Fingerprinting for Video Playback Verification in MPEG DASH
Light Weight Fingerprinting for Video Playback Verification in MPEG DASHUnited States Air Force Academy
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationUnited States Air Force Academy
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesUnited States Air Force Academy
 
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorLec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorUnited States Air Force Academy
 

Viewers also liked (16)

ECE 4490 Multimedia Communication Lec01
ECE 4490 Multimedia Communication Lec01ECE 4490 Multimedia Communication Lec01
ECE 4490 Multimedia Communication Lec01
 
Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 Meeting
Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 MeetingTutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 Meeting
Tutorial on MPEG CDVS/CDVA Standardization at ICNITS L3 Meeting
 
Lec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-systemLec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-system
 
Lec11 rate distortion optimization
Lec11 rate distortion optimizationLec11 rate distortion optimization
Lec11 rate distortion optimization
 
Light Weight Fingerprinting for Video Playback Verification in MPEG DASH
Light Weight Fingerprinting for Video Playback Verification in MPEG DASHLight Weight Fingerprinting for Video Playback Verification in MPEG DASH
Light Weight Fingerprinting for Video Playback Verification in MPEG DASH
 
Multimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and EntropyMultimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and Entropy
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
 
Lec14 eigenface and fisherface
Lec14 eigenface and fisherfaceLec14 eigenface and fisherface
Lec14 eigenface and fisherface
 
Lec15 graph laplacian embedding
Lec15 graph laplacian embeddingLec15 graph laplacian embedding
Lec15 graph laplacian embedding
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
 
Lec17 sparse signal processing & applications
Lec17 sparse signal processing & applicationsLec17 sparse signal processing & applications
Lec17 sparse signal processing & applications
 
Lec12 review-part-i
Lec12 review-part-iLec12 review-part-i
Lec12 review-part-i
 
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb CodesLec-03 Entropy Coding I: Hoffmann & Golomb Codes
Lec-03 Entropy Coding I: Hoffmann & Golomb Codes
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorLec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
 
Lec16 subspace optimization
Lec16 subspace optimizationLec16 subspace optimization
Lec16 subspace optimization
 

Similar to Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplicate Detection and Localization

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...홍배 김
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
 
CG OpenGL line & area-course 3
CG OpenGL line & area-course 3CG OpenGL line & area-course 3
CG OpenGL line & area-course 3fungfung Chen
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfMohammadAzreeYahaya
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Factorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsFactorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsEvgeniy Marinov
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseWei Yang
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 

Similar to Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplicate Detection and Localization (20)

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
CG OpenGL line & area-course 3
CG OpenGL line & area-course 3CG OpenGL line & area-course 3
CG OpenGL line & area-course 3
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdf
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Factorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsFactorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender Systems
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defense
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 

Recently uploaded

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplicate Detection and Localization

  • 1. Robust Video Duplicates Detection & Localization in Large Video Repositories Zhu Li Dept of Computing The Hong Kong Polytechnic University Z. Li, HK Polytechnic Univ 1
  • 2. Robust Video Duplicates Localization in Large Repository ● Outline ● The duplicate and near duplicate localization problem – The problem – The applications – The challenges ● Duplicate Likelihood Maximization Formulation – The duplicate likelihood Gaussian Process modeling – Duplicate Likelihood approximation via multi-indexed locality search – Sequence Likelihood pruning ● Simulation Results & Discussions – Data Set and Simulation Setup – Accuracy, Complexity and Tradeoffs ● Conclusion, Related & Future Work Z. Li, HK Polytechnic Univ 2
  • 3. The Application & Problem ● Robust duplicate and near duplicate video detection and localization has many applications ● Copyright protection ● Video repository mining – find out how video programs are related ● Query by capture ● The challenges: ● Robustness: – Be able to detect edited,corrupted and reformatted video duplicates. – Be able to locate the duplicate clips in a very large repository – Scales with the length of the query clip ● Complexity: – The algorithm should scale well with the size of the repository, – Complexity – Robustness tradeoffs – Parallelization Z. Li, HK Polytechnic Univ 3
  • 4. The Duplicate Likelihood of a frame ● Duplicate frames are not the exact copy of the original in the repository ● Otherwise, the problem is becoming trivial, just design a hash function would give us the exact match ● Instead, duplicate frames can have various degrees of degradation and corruptions due to: ● Coding and Communication Losses, i.e, quantizaiton effect, packet losses ● Image formation variations: lighting change, re-capture process induced affine transforms, ...etc ● Editing effects: subtitles, graphics and text overlay, re-sizing, cropping...etc Z. Li, HK Polytechnic Univ 4
  • 5. The Duplicate Likelihood of a frame ● This can be captured by a probabilistic model, for a given frame x, the likelihood of y is a duplicate of x is given by a Gaussian distribution: L(y; x) » N (mx ; ¾x ) 1 ¡1 ¡ 1 (y¡mx )T ¾x (y¡mx ) L(y; mx ; ¾x ) = d=2 j¾ j1=2 e 2 2¼ x ● With d x1 mean vector and dxd variance matrix in feature space of choice ● The solution is actually feature agnostic. ● The choice of feature space is up to the application, and the amount of loss, corruption and editing effects in the video frames affects the covariance matrix. ● The choice of frame vs shot/GoP level features only affects the localization accuracy in this framework. Z. Li, HK Polytechnic Univ 5
  • 6. An Scaled Appearance Feature ● Icon frame and its PCA projection ● Video Frame size: W=352, H=288 ● Icon size: w=16, h=12 ● A: d x 192, Info loss vs dimensions Basis functions Z. Li, HK Polytechnic Univ 6
  • 7. Likelihood Covariance Modeling ● The mean of duplicate likelihood function is the original frame in the feature space ● The covariance is determined by the type of feature and severity of the degradation of the duplicate frame, examples of JPEG quantization and AWGN cases, 32x32 covariance, eigen values ¾quant ¾awgn Z. Li, HK Polytechnic Univ 7
  • 8. Likelihood mean process : x(t) Video sequence mean process mx(t) in PCA space with 1st ,2nd and 3rd components 40 •“foreman” : 400 frames 30 •“stefan” : 300 frames 20 •“mother-daughter”: 300 10 frames •“mixed”: 40 shots of 60 0 frames each from randomly selected -10 -20 40 sequences. 20 250 200 0 150 -20 100 50 -40 0 . “foreman”, . “stefan”, . “mother-daughter” , . “mixed” Z. Li, HK Polytechnic Univ 8
  • 9. The Likelihood Function of a Video Sequence ● Given a video sequence x(t),t=1..n, the likelihood of a sequence y(t) is a duplicate of x(t) is given by, n Y L(y(t); x(t)) = L(y(t); mx (t); ¾x (t)) t=1 ● Where mx(t) is the mean and ¾x (t) is the covariance matrix of the likelihood function at time t. ● Then the video duplicate detection problem is a maximum likelihood problem, given k=1..K video sequences { xk(t) } in the repository, the duplicate is found by maximum likelihood detection: n Y k¤ = arg max L(y(t); mk (t); ¾k (t)) k t=1 ½ mk (t) = xk (t) 2 2 2 2 2 ¾k (t) = ¾quant + ¾loss + ¾gama + ¾edit Z. Li, HK Polytechnic Univ 9
  • 10. Illustration of the Duplicate Likelihood Function for sequences ● A set of 6 sequences and their likelihood function in R2(t): for a given query, need to find out which one gives the max likelihood. ● To un-tangle the mess..... Z. Li, HK Polytechnic Univ 10
  • 11. The Likelihood Pruning Algorithm ● Brutal force computing of likelihood function for all candidate sequences in the repository is computationally prohibitive ● Eg, if we have 1 million clips of 5 minute video, the total number of likelihood evaluation would be 106 x 5 x 60 x 30 = 9 billions of evaluations. ● Instead, we can start with a set of sequences that sharing non-zero likelihood at certain timestamp, and pruning those sequences with zero likelihood at other timestamps. ● The key is to approximate the duplicate likelihood estimation, with a non zero-likelihood detection approximation via a multi-indexed locality search solution ● The sequence duplicate likelihood function is approximated by a time stamp subset pruning process, finding matching duplicate sequences as, Y S = fkj L(y(t); xk (t); ¾k (t) ¸ 0g t2ts Z. Li, HK Polytechnic Univ 11
  • 12. The Likelihood Pruning Algorithm ● This leads to a likelihood pruning algorithm: ● Given query clip y(t), Compute a time stamp set, ts={t1, t2, …, tm} ● Init S0= fk; jL(y(t1 ); xk (t1 ); ¾(t1 )) ¸ 0g ● FOR i=1:m Si+1 = Si fkjL(y(ti ); xk (ti ); ¾(ti ) ¸ 0g Break if Si has 1 or 0 elements left. ● END ● Criteria of ts selection: ● The selection of time stamps should reflect maximum spatial-temporal diversity, in query clip y(t), i,e, find m points on y(t) that has the maximum separation possible among them, otherwise, becomes an image search problem Z. Li, HK Polytechnic Univ 12
  • 13. Timestamp Selection Heuristic ● We want the selected matching point has the best spatial-temporal diversity, ie., don't want video duplicate search degrade into image duplicate search. ● An effective heuristic solution : Curve Length Guided Selection ● Compute cumulative curve length of y(t) as by fitting a spline model y(s) on y(t): Z t Ly (t) = y(s)ds s=0 ● If to have m time stamps, just find step size as ts(k) : Ly (ts(k)) = k¢; k = 1::m Ly (n) ¢= m Z. Li, HK Polytechnic Univ 13
  • 14. Likelihood approximation via Multi-Indexing Locality ● We need to find out frames that have non-zero likelihood of a query frame y(t) S(y) = fxjL(y; mx ; ¾x ) ¸ 0g This can be approximated by an epsilon-NN on mx, i.e, S(y) = fxjd¾x (x; y) · d0 g ● The choice of d0 reflects the zero likelihood prob cut-off Mahalanobis distance. Z. Li, HK Polytechnic Univ 14
  • 15. Likelihood approximation via Multi-Indexing Locality ● This can be approximated by multi-indexing structure by searching for leaf node containing frame y: [ S(y) = Fj (y) j where Fj(y) returns the indexing leaf node frame set from index structure j. By choosing an appropriate indexing height we can control the leaf node volume and therefore d0. ● However, single indexing structure is not a good approximation: if query frame y is not at the centroid of the leaf node, then many non-zero likelihood frames will be missed. Z. Li, HK Polytechnic Univ 15
  • 16. Multi-Index Likelihood Approximation ● Graphical illustration ● A query frame y, in +, is at northwest corner of its leaf node, F1(y) in index scheme 1. non-zero likelihood frames out side the red rectangles will be missed ● But other 3 indexing scheme leaf nodes in black, can cover this loss ● Careful selection of indexing scheme and leaf node volume can have a good approximation of the likelihood function + Zero likelihood prob cut off Z. Li, HK Polytechnic Univ 16
  • 17. Likelihood Pruning ● Taking advantage of temporal dimension constraint, the sequence duplicate likelihood Lk(y) becomes zero, if any frame's duplicate likelihood is zero: n Y Lk (y) = L(y(t); mk (t); ¾k (t)) t=1 Lk (y) = 0; if9ts:t:L(y(t); mk (t); ¾k (t)) = 0 ● This means that at certain timestamp, t1, if the duplicate likelihood of y(t1) is zero w.r.t sequence k in the repository, we can remove it from the candidate set. Z. Li, HK Polytechnic Univ 17
  • 18. Likelihood Trellis Pruning ● Illustration of the process: ● Start with initial non-zero likelihood set ● Prune those having zero likelihood in the next stage of likelihood evaluation ● Stop 'till only one, or no trellis left in the node (in simulation, 3~5 stages typically okay for locating 2~4 minute clip in 116 hour repository) S(t1) S(t2) S(t3) S(t4) S(t5) S(t6) Z. Li, HK Polytechnic Univ 18
  • 19. Simulation ● Data Set ● Indexed Set: – 12 x 220 frames, or approx 116 hours of video from various sources – Each frame is scaled to 16x12 icon size, and projected to a 32-dimensional space – 4 kd-tree index structure is built for 1~8, 9~16, 17~24, 25~32 dimensions, to provide diversity in approximate likelihood function – Alternative approach: hierarchical quadtree structure that can have better likelihood approximation. ● Negative Probe Set: – 5 hours of video from TRECVID shot detection set – Scale to 16x12 icon and project to 32 dimensional space ● Distortions/Image Formation Variations in query – Quantization induced distortions: JPEG quality 30, 40, ..., 90. – Block losses: random block loss of rate 2%~10% – Blurs, Gama Corrections – AWGN: equivalent to 20dB~40dB PSNR range Z. Li, HK Polytechnic Univ 19
  • 20. Performance - Accuracy ● Randomly generated 4000 negative probes and 8000 positive probes with various degree of corruptions ● Probe clip length = 1, 2 and 4 minuets ● Recall = 100%. ● The precision performance for equivalent AWGN range of 20~40 dB in PSNR: Diversity in likelihood Modeling improves accuracy Z. Li, HK Polytechnic Univ 20
  • 21. Complexity Analysis ● Duplicate Likelihood Estimation ● Basically, O(L), for repository with a L level kd-tree. – Notice that if we allow sacrifice in duplicate localization accuracy, a GoP level feature indexing can reduce the problem set size by 15 times. ● If we involve multiple indexing structure, need to merge multiple localities to come up with non-zero likelihood repository frame estimate for the given probe. ● The robustness of the likelihood estimation is tied to the index tree height, covariances of the distortions in the probe ● Likelihood Pruning ● The pruning stops if there is only one trellis left, or no trellis. ● Starting with an initial set, the size of candidate trellis decreases ● Complexity is content dependent. ● Parallelization ● The index search can be parallelized, as each index structure is independent. ● Pruning Z. Li, HK Polytechnic Univ 21
  • 22. Performance - Complexity ● Computing resource ● Pure Matlab implementation, not optimized thoroughly. ● 3.0GHz Linux running Lenovo ThinkCenter, 4GB memory ● Index complexity: ● To build 12x220 8-dimension kd-tree of 214 leaf nodes, it takes approximately 63~65 seconds on a Linux PC with 3.0GHz processor. ● Leaf node has 768 frames. ● There is a tradeoff between the size of leaf nodes and the robustness of the likelihood estimation Z. Li, HK Polytechnic Univ 22
  • 23. Performance - Complexity Likelihood evaluation complexity, 5 time stamps: ● For a given duplicate frame, to find all its non-zero likelihood repository frames takes average 0.13 ms for one index structure ● For multi-index structure, find and merge non-zero likelihood repository frames takes approx 1.01 ms for 2 index case, and 2.20 ms for 3 index case. ● Not much precision performance gain for 4 index case, so not reported. ● Likelihood pruning complexity ● For 5 timestamp case, 0.67, 0.94, 0.94 ms for 1, 2, and 3 index structures ● Not a significant source of complexity, can afford more elaborate pruning sequences ● Total time complexity: ● Within 3 ms for 116 hour repository ● Goal is to handle 10,000 hour within 10ms, will prob need more elaborate multiple pass likelihood pruning process Z. Li, HK Polytechnic Univ 23
  • 24. Conclusion & Future Work ● Conclusion ● A feature agnostic and flexible likelihood modeling and pruning scheme for video (and also audio) duplicate detection and localization ● Very high accuracy in precision-recall performance, roughly 98% precision on 100% recall, to localize 2~4 minute probes within 3 seconds in a repository of 116 hours ● Good response time, takes only 3 ms for 116 hour repository and on track to achieve 10 ms response time for 10,000 hour repository (need some help in getting the data set). ● Flexible Computational complexity – Robustness – Localization Accuracy scalability in performance. ● Future Work ● More sophisticated likelihood pruning schemes ● Subspace learning in minimizing distortion induced likelihood covariance ● Hierarchical approach in localization accuracy, ie., shot level color/SIFT histogram feature + frame level appearance feature mixed indexing. Z. Li, HK Polytechnic Univ 24
  • 25. Acknowledgement ● The work is supported with generous grants from MSRA and HK RGC. ● Thanks for the stimulating discussions with Xian-Sheng, Lei Zhang, Mei Tao, and Lingjun. ● Questions ? ● If time permits...... Z. Li, HK Polytechnic Univ 25
  • 26. Related Research – Graph Indexing ● Model duplicate/object/biometric recognition problems as graph matching problem, G(V,E) ● Bag of words approach – treat V as a set ● Computational Topology approach – model 2 affinity matrix of G – Apos (j,k) = | pos(vj) – pos(vk) | – Afea (j,k) = dfea (vj, vk) – Then use Eigen values of Lapacian of Apos and Afea as feature vector for indexing/hashing f1-s 1 f1-s 2 f1-s 3 f1-s 4 10 10 10 10 20 20 20 20 10 20 10 20 10 20 10 20 f2-s 1 f2-s 2 f2-s 3 f2-s 4 10 10 10 10 20 20 20 20 10 20 10 20 10 20 10 20 f3-s 1 f3-s 2 f3-s 3 f3-s 4 10 10 10 10 20 20 20 20 10 20 10 20 10 20 10 20 f4-s 1 f4-s 2 f4-s 3 f4-s 4 10 10 10 10 20 20 20 20 10 20 10 20 10 20 10 20 Z. Li, HK Polytechnic Univ 26
  • 27. Related Work – Subspace Indexing on Grassmann Manifold ● Subspace Indexing on Grassmann Manifold ● Motivation: rich set of subspaces for large subject set recognition problem ● Judicious use of local discriminative models over local data patch ● Dual tree structure – Data Partition Tree (DPT) and Model Hierarchical Tree(MHT) ● DPT partition the data set into local patches, train local discriminative models on local patches ● Then VQ on subspaces on G(D, d) Grassmann manifold, with data patch locality constraint, and in the process also generate MHT ● Label model accuracy on each node in MHT ● At query time, probe search the DPT to locate local data patch, and then backtrack to root and finding the best model along the way. Z. Li, HK Polytechnic Univ 27
  • 28. Subspace Indexing on Grassmann Manifold Model Hierarchical Tree Performance on Face Recognition Z. Li, HK Polytechnic Univ 28