SlideShare a Scribd company logo
1 of 55
Download to read offline
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Title of presentation
Subtitle
Name of presenter
Date
Performance over Random: A Robust Evaluation Protocol for
Video Summarization Methods
E. Apostolidis1,2, E. Adamantidou1, A. I. Metsai1, V. Mezaris1, I. Patras2
1 CERTH-ITI, Thermi - Thessaloniki, Greece
2 School of EECS, Queen Mary University of London, London, UK
28th ACM Int. Conf. on Multimedia
Seattle, WA, USA, October 2020
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Outline
2
 What’s the goal of video summarization?
 How to evaluate video summarization?
 Established evaluation protocol and its weaknesses
 Proposed approach: Performance over Random
 Experiments
 Conclusions
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
3
Video summary: a short visual synopsis that encapsulates the flow of the story and
the essential parts of the full-length video
Original video
1. Video storyboard
What’s the goal of video summarization?
2. Video skim
Summary
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
How to evaluate video summarization?
4
 An evaluation approach along with a benchmark dataset for video summarization was
introduced in [11]
 SumMe dataset (https://gyglim.github.io/me/vsum/index.html#benchmark)
 25 videos capturing multiple events (e.g. cooking and sports)
 video length: 1 to 6 min
 annotation: fragment-based video summaries (15-18 per video)
Evaluating video skims
[11] M. Gygli, H. Grabner, H. Riemenschneider, L. Van Gool. 2014. Creating Summaries from User Videos. In Proc. of the 2014 European
Conf. on Computer Vision (ECCV), D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.). Springer International Publishing, Cham, 505–520.
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
How to evaluate video summarization?
5
 Agreement between automatically-generated (A) and user-defined (U) summary is expressed
by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩)
 Typical metrics for computing Precision and Recall at the frame-level
 80% of video samples are used for training and the remaining 20% for testing
 Typically, the generated summary should not exceed 15% of the video length
Evaluating video skims
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
How to evaluate video summarization?
6
 This protocol was used to evaluate summarization based on another benchmark dataset [12]
 TVSum dataset (https://github.com/yalesong/tvsum)
 50 videos from 10 categories of TRECVid MED task
 video length: 1 to 11 min
 annotation: frame-level importance scores (20 per video)
Evaluating video skims
[12] Y. Song, J. Vallmitjana, A. Stent, A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proc. of the 2015 IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR). 5179–5187.
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
Established evaluation protocol
7
 Mostly used benchmark datasets: SumMe and TVSum
 Alignment between automatically-created and user-defined summaries quantified by F-Score
 Max of the computed values is kept for SumMe; Average of these values is kept for TVSum
 Summary length should be less than 15% of the video duration
 80% of data is used for training (plus validation) and the remaining 20% for testing
 Most works perform evaluations using 5 different randomly-created data splits and report the
average performance
 Though variations of this setting (1-split, 10-splits, “few”-splits, 5-fold cross validation) exist
Typical setting in bibliography
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
8
Setting of the study
Studying the established protocol
 Considered aspects
 Representativeness of results when evaluation relies on a small set of randomly-created splits
 Reliability of performance comparisons that use different data splits for each algorithm
 Used algorithms
 Supervised dppLSTM [14] and VASNet [15] methods
 Unsupervised DR-DSN [16], SUM-GAN-sl [17] and SUM-GAN-AAE [18] methods
 First experiment: performance evaluation using a fixed set of 5 randomly-created data splits
of SumMe and TVSum
 Second experiment: performance evaluation using a fixed set of 50 randomly-created data
splits of SumMe and TVSum
 Plus: comparison with the reported values in the corresponding papers
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
9
 Noticeable difference of evaluation results on
5 and 50 splits
 Differences between 5 and 50 splits are often
larger than differences between methods
 Methods' rankings are different on 5 and 50
splits; plus they do not match the ranking
based on the reported results
Outcomes
Values denote F-Score (%)
Rep. is the reported value from the relevant paper
Best score → bold, Second-best → underlined
Studying the established protocol
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
10
 Noticeable difference of evaluation results on
5 and 50 splits
 Differences between 5 and 50 splits are often
larger than differences between methods
 Methods' rankings are different on 5 and 50
splits; plus they do not match the ranking
based on the reported results
Outcomes
Values denote F-Score (%)
Rep. is the reported value from the relevant paper
Best score → bold, Second-best → underlined
Serious lack of reliability of comparisons that
rely on a limited number of data splits
Studying the established protocol
Limited representativeness of results when the
evaluation relies on a few data splits
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
11
Studying the established protocol
 Noticeable variability of
performance over the set
of splits
 Variability follows a quite
similar pattern for all
methods
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
12
Studying the established protocol
 Noticeable variability of
performance over the set
of splits
 Variability follows a quite
similar pattern for all
methods
Hypothesis: different
levels of difficulty for the
used splits
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
13
How to mitigate the observed weaknesses?
 Check potential association between the method’s performance and a measure of how
challenging each data split is
 Use these data splits and examine the performance of:
 Random Summarizer
 Average Human Summarizer
Reduce the impact of the used data splits
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
14
Estimate random performance
For a given video of a test set:
1) Random frame-level importance scores based on a uniform distribution
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
15
Estimate random performance
For a given video of a test set:
1) Random frame-level importance scores based on a uniform distribution
2) Fragment-level importance scores
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
16
Estimate random performance
For a given video of a test set:
1) Random frame-level importance scores based on a uniform distribution
2) Fragment-level importance scores
3) Summary of the random summarizer Knapsack
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
17
Estimate random performance
For a given video of a test set:
4) Compare the random summary with the user-generated summaries
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
18
Estimate random performance
For a given video of a test set:
4) Compare the random summary with the user-generated summaries
F-Score1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
19
Estimate random performance
For a given video of a test set:
4) Compare the random summary with the user-generated summaries
F-Score1
F-Score2
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
20
Estimate random performance
For a given video of a test set:
4) Compare the random summary with the user-generated summaries
F-Score1
F-ScoreN
F-Score2
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
21
Estimate random performance
For a given video of a test set:
4) Compare the random summary with the user-generated summaries
F-Score1
F-Score2
F-ScoreN
F-Score for Video #1
=max{F-Score}i=1
N
=avg{F-Score}i=1
N
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
22
Estimate random performance
For the entire test set of a data split:
4) Compare the random summary with the user-generated summaries
F-Score1
F-Score2
F-ScoreN
F-Score for Video #1
F-Score for Video #M
***
Calculate F-Score
for Video #M
F-Score
for test set
Average
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
23
Estimate random performance
For the entire test set of a data split:
4) Compare the random summary with the user-generated summaries
F-Score1
F-Score2
F-ScoreN
F-Score for Video #1
F-Score for Video #M
***
Calculate F-Score
for Video #M
F-Score
for test set
Average
x 100 times
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
24
Estimate average human performance
Performance of User #1 on a given video of a test set:
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
25
Estimate average human performance
Performance of User #1 on a given video of a test set:
F-Score12
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
26
Estimate average human performance
Performance of User #1 on a given video of a test set:
F-Score12
F-Score13
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
27
Estimate average human performance
Performance of User #1 on a given video of a test set:
F-Score12
F-Score13
F-Score1N
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
28
Estimate average human performance
Performance of User #1 on a given video of a test set:
F-Score12
F-Score13
F-Score1N
F-Score1User 1 -
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
29
Estimate average human performance
Performance of User #N on a given video of a test set:
F-Score12
F-Score13
F-Score1N
F-Score1
F-ScoreN2
F-ScoreN3
F-ScoreN(N-1)
F-ScoreNUser 1 - User N -
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
30
Estimate average human performance
Calculate the average human performance on a given video of a test set:
F-Score12
F-Score13
F-Score1N
F-Score1
F-ScoreN2
F-ScoreN3
F-ScoreN(N-1)
F-ScoreNUser 1 - User N -
Average
F-Score for Video #1
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
31
Estimate average human performance
Calculate the average human performance on the entire test set:
F-Score12
F-Score13
F-Score1N
F-Score1
F-ScoreN2
F-ScoreN3
F-ScoreN(N-1)
F-ScoreNUser 1 - User N -
Average
F-Score for Video #1
F-Score for Video #M
***
Calculate F-Score
for Video #M
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
32
Estimate average human performance
Calculate the average human performance on the entire test set:
F-Score12
F-Score13
F-Score1N
F-Score1
F-ScoreN2
F-ScoreN3
F-ScoreN(N-1)
F-ScoreNUser 1 - User N -
Average
F-Score for Video #1
F-Score for Video #M
***
Calculate F-Score
for Video #M
Final
F-Score
Average
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
33
Updated performance curve
 Noticeable variance in the
performance of random
and human summarizer
Different levels of difficulty
for the used splits
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
34
How to decide on the most suitable measure?
 Covariance: measure of the joint variability of two random variables
For two jointly distributed real-valued random variables X and Y with finite second moments:
 Pearson Correlation Coefficient: normalized version of Covariance that indicates (via its
magnitude) the strength of the linear relation (values in [0,1])
Correlation with the performance of random and human summarizers
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
35
How to decide on the most suitable measure?
Correlation with the performance of random and human summarizers
In terms of performance there is a clearly stronger correlation of the tested
methods with the random summarizer
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
36
Proposed approach: Performance over Random (PoR)
Core idea
 Estimate the difficulty of a data split by computing the performance of a random summarizer
 Exploit this information when using the data split to assess a video summarization algorithm
Main targets
 Reduce the impact of the used data splits in the performance evaluation
 Increase the representativeness of evaluation outcomes
 Enhance the reliability of comparisons based on different data splits
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
37
Proposed approach: Performance over Random (PoR)
Computing steps
For a given summarization method and a data split:
1) Compute Ƒ, the performance of a random summarizer for this split
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
38
Proposed approach: Performance over Random (PoR)
Computing steps
For a given summarization method and a data split:
1) Compute Ƒ, the performance of a random summarizer for this split
2) Compute the method's performance S on the data split
F-Score (%)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
39
Proposed approach: Performance over Random (PoR)
Computing steps
For a given summarization method and a data split:
1) Compute Ƒ, the performance of a random summarizer for this split
2) Compute the method's performance S on the data split
3) Compute "Performance over Random" as:
F-Score (%) based on the
established evaluation
protocol
𝑃𝑜𝑅 =
𝑆
Ƒ
∙ 100
PoR < 100 : performance worse than baseline (random)
PoR > 100 : performance better than baseline (random)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
40
Experiments
Representativeness of performance evaluation
 Considered evaluation approaches:
 Estimate performance using F-Score
 Estimate performance using Performance over Random (PoR)
 Estimate performance using Performance over Human (PoH) 
 Methods’ performance was examined on:
 The large-scale setting of 50 fixed splits
 20 fixed split-sets of 5 data splits each
 Main focus: to which the extent the methods’ performance varies across the different data
splits / split-sets
 Used measure: Relative Standard Deviation (RSD) 
𝑃𝑜𝑅 =
𝑆
H
∙ 100
𝑅𝑆𝐷(𝑥) =
𝑆𝑇𝐷(𝑥)
Mean(x)
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
41
Experiments
Representativeness of performance evaluation
 Similar RSD values for F-Score and PoH in most cases
 Remarkably smaller RSD values for PoR
 Reminder: the results need to vary as little as possible!
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
42
Experiments
Representativeness of performance evaluation
 Similar RSD values for F-Score and PoH in most cases
 Remarkably smaller RSD values for PoR
 Reminder: the results need to vary as little as possible!
PoR is more representative of
an algorithm's performance
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
43
Experiments
Reliability of performance comparisons
 But the data splits can affect the
evaluation outcomes!
 Assess the robustness of each evaluation
protocol to such comparisons
 Simulate 20 such comparisons by
creating 20 mixed split-sets
 Rank methods from best to worst
Generation of mixed split-sets
 Performance comparisons in the bibliography rely on the reported values in the relevant
papers and the used data splits are completely unknown
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
44
Experiments
Reliability of performance comparisons
 For each method, we studied: i) the overall ranking and ii) the variation of its ranking
when using: i) 20 fixed split-sets and ii) 20 mixed split-sets
 Variation quantified by computing the STD of a method’s ranking over the group of split-sets
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
45
Experiments
Reliability of performance comparisons
 For each method, we studied: i) the overall ranking and ii) the variation of its ranking
when using: i) 20 fixed split-sets and ii) 20 mixed split-sets
 Variation quantified by computing the STD of a method’s ranking over the group of split-sets
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
46
Experiments
Reliability of performance comparisons
 For each method, we studied: i) the overall ranking and ii) the variation of its ranking
when using: i) 20 fixed split-sets and ii) 20 mixed split-sets
 Variation quantified by computing the STD of a method’s ranking over the group of split-sets
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
47
Experiments
Reliability of performance comparisons
 For each method, we studied: i) the overall ranking and ii) the variation of its ranking
when using: i) 20 fixed split-sets and ii) 20 mixed split-sets
 Variation quantified by computing the STD of a method’s ranking over the group of split-sets
PoR is much more robust than F-Score
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
48
Experiments
Reliability of performance comparisons
Using the same (fixed) split-sets
 Same average ranking for all methods for
both evaluation protocols
Using different (mixed) split-sets:
 Average ranking may differ as PoR
considers the difficulty of each split-set
 STD of average ranking differs
significantly between F-Score and PoR
 Lower STD values for PoR
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
49
Experiments
Reliability of performance comparisons
Using the same (fixed) split-sets
 Same average ranking for all methods for
both evaluation protocols
Using different (mixed) split-sets:
 Average ranking may differ as PoR
considers the difficulty of each split-set
 STD of average ranking differs
significantly between F-Score and PoR
 Lower STD values for PoR
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
50
Experiments
Reliability of performance comparisons
Using the same (fixed) split-sets
 Same average ranking for all methods for
both evaluation protocols
Using different (mixed) split-sets:
 Average ranking may differ as PoR
considers the difficulty of each split-set
 STD of average ranking differs
significantly between F-Score and PoR
 Lower STD values for PoR
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
51
Experiments
Reliability of performance comparisons
Using the same (fixed) split-sets
 Same average ranking for all methods for
both evaluation protocols
Using different (mixed) split-sets:
 Average ranking may differ as PoR
considers the difficulty of each split-set
 STD of average ranking differs
significantly between F-Score and PoR
 Lower STD values for PoR
PoR is more suitable for comparing methods ran on
different split-sets
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
 Early experiments documented the varying difficulty of the different randomly-created data
splits of the established benchmarking datasets
 Most SoA works use just a handful of different splits for evaluation
 The varying difficulty significantly affects the evaluation results and the reliability of
performance comparisons that rely on the reported values
 New evaluation protocol: Performance Over Random (PoR), which takes under consideration
estimates about the level of difficulty of each used data split
 Experiments documented the increased robustness of PoR over F-Score and its suitability for
comparing methods ran on different split-sets
Conclusions
52
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
1. S. E. F. de Avila, A. da Luz Jr., A. de A. Araújo, M. Cord. 2008. VSUMM: An Approach for Automatic Video Summarization and Quantitative Evaluation. In Proc.
of the 2008 XXI Brazilian Symposium on Computer Graphics and Image Processing. 103–110.
2. N. Ejaz, I. Mehmood, S. W. Baik. 2014. Feature Aggregation Based Visual Attention model for Video Summarization. Computers and Electrical Engineering 40,
3 (2014), 993 – 1005. Special Issue on Image and Video Processing.
3. V. Chasanis, A. Likas, N. Galatsanos. 2008. Efficient Video Shot Summarization Using an Enhanced Spectral Clustering Approach. In Proc. of the Artificial
Neural Networks - ICANN 2008, V. Kurková, R. Neruda, J. Koutník (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 847–856.
4. S. E. F. de Avila, A. P. B. Lopes, A. da Luz Jr., A. de A. Araújo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation
Method. Pattern Recognition Letters 32, 1 (Jan. 2011), 56–68.
5. J. Almeida, N. J. Leite, R. da S. Torres. 2012. VISON: VIdeo Summarization for ONline Applications. Pattern Recogn. Lett. 33, 4 (March 2012), 397–409.
6. E. J. Y. C. Cahuina G. C. Chavez. 2013. A New Method for Static Video Summarization Using Local Descriptors and Video Temporal Segmentation. In Proc. of
the 2013 XXVI Conf. on Graphics, Patterns and Images. 226–233.
7. N. Ejaz, T. Bin Tariq, S. W. Baik. 2012. Adaptive Key Frame Extraction for Video Summarization Using an Aggregation Mechanism. Journal of Visual
Communication and Image Representation 23, 7 (Oct. 2012), 1031–1040.
8. H. Jacob, F. L. Pádua, A. Lacerda, A. C. Pereira. 2017. A Video Summarization Approach Based on the Emulation of Bottom-up Mechanisms of Visual Attention.
Journal of Intelligent Information Systems 49, 2 (Oct. 2017), 193–211.
9. K. M. Mahmoud, N. M. Ghanem, M. A. Ismail. 2013. Unsupervised Video Summarization via Dynamic Modeling-Based Hierarchical Clustering. In Proc. of the
12th Int. Conf. on Machine Learning and Applications, Vol. 2. 303–308.
10. B. Gong, W.-L. Chao, K. Grauman, F. Sha. 2014. Diverse Sequential Subset Selection for Supervised Video Summarization. In Advances in Neural Information
Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, K. Q.Weinberger (Eds.). Curran Associates, Inc., 2069–2077.
References
53
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
11. M. Gygli, H. Grabner, H. Riemenschneider, L. Van Gool. 2014. Creating Summaries from User Videos. In Proc. of the 2014 European Conf. on Computer Vision
(ECCV), D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.). Springer International Publishing, Cham, 505–520.
12. Y. Song, J. Vallmitjana, A. Stent, A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proc. of the 2015 IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR). 5179–5187.
13. E. Rahtu, M. Otani, Y. Nakahima, J. Heikkilä. 2019. Rethinking the Evaluation of Video Summaries. In Proc. of the 2019 IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR).
14. K. Zhang, W.-L. Chao, F. Sha, K. Grauman. 2016. Video Summarization with Long Short-Term Memory. In Proc. of the 2016 European Conf. on Computer
Vision (ECCV), B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.). Springer International Publishing, Cham, 766–782.
15. J. Fajtl, H. S. Sokeh, V. Argyriou, D. Monekosso, P. Remagnino. 2019. Summarizing Videos with Attention. In Proc. of the 2018 Asian Conf. on Computer Vision
(ACCV) Workshops, G. Carneiro, S. You (Eds.). Springer International Publishing, Cham, 39–54.
16. K. Zhou, Y. Qiao, T. Xiang. 2018. Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. In Proc. of
the 2018 AAAI Conf. on Artificial Intelligence
17. E. Apostolidis, A. I. Metsai, E. Adamantidou, V. Mezaris, I. Patras. 2019. A Stepwise, Label-based Approach for Improving the Adversarial Training in
Unsupervised Video Summarization. In Proc. Of the 1st Int. Workshop on AI for Smart TV Content Production, Access and Delivery (Nice, France) (AI4TV ’19).
Association for Computing Machinery, New York, NY, USA, 17–25.
18. E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, I. Patras. 2020. Unsupervised Video Summarization via Attention-Driven Adversarial Learning. In Proc.
of the MultiMedia Modeling 2020, Y. M. Ro, W.-H. Cheng, J. Kim, W.-T. Chu, P. Cui, J.-W. Choi, M.-C. Hu, W. De Neve (Eds.). Springer International Publishing,
Cham, 492–504.
References
54
retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project
55
Thank you for your attention!
Questions?
Evlampios Apostolidis, apostolid@iti.gr
Vasileios Mezaris, bmezaris@iti.gr
Code and documentation publicly available at:
https://github.com/e-apostolidis/PoR-Summarization-Measure
This work was supported by the EUs Horizon 2020 research and innovation
programme under grant agreement H2020-780656 ReTV. The work of Ioannis
Patras has been supported by EPSRC under grant No. EP/R026424/1.

More Related Content

What's hot

PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...sipij
 
Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...sipij
 
Scenario demonstrators
Scenario demonstratorsScenario demonstrators
Scenario demonstratorsLinkedTV
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET Journal
 
First LinkedTV End-to-end Platform
First LinkedTV End-to-end PlatformFirst LinkedTV End-to-end Platform
First LinkedTV End-to-end PlatformLinkedTV
 
Syllabus for fourth year of engineering
Syllabus for fourth year of engineeringSyllabus for fourth year of engineering
Syllabus for fourth year of engineeringtakshakpdesai
 
PEPPOL Test Guidelines
PEPPOL Test GuidelinesPEPPOL Test Guidelines
PEPPOL Test GuidelinesFriso de Jong
 
Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...
Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...
Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...Gonçalo Cruz Matos
 

What's hot (9)

PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
 
Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...
 
Scenario demonstrators
Scenario demonstratorsScenario demonstrators
Scenario demonstrators
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
 
First LinkedTV End-to-end Platform
First LinkedTV End-to-end PlatformFirst LinkedTV End-to-end Platform
First LinkedTV End-to-end Platform
 
Syllabus for fourth year of engineering
Syllabus for fourth year of engineeringSyllabus for fourth year of engineering
Syllabus for fourth year of engineering
 
PEPPOL Test Guidelines
PEPPOL Test GuidelinesPEPPOL Test Guidelines
PEPPOL Test Guidelines
 
Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...
Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...
Social networks, microblogging, virtual worlds, and Web 2.0 in the teaching o...
 

Similar to PoR_evaluation_measure_acm_mm_2020

Unsupervised Video Summarization via Attention-Driven Adversarial Learning
Unsupervised Video Summarization via Attention-Driven Adversarial LearningUnsupervised Video Summarization via Attention-Driven Adversarial Learning
Unsupervised Video Summarization via Attention-Driven Adversarial LearningVasileiosMezaris
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video SummarizationVasileiosMezaris
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1VasileiosMezaris
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testingjfrchicanog
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression VerificationAung Thu Rha Hein
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionVasileiosMezaris
 
UVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxUVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxnikitha992646
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video SummarizationVasileiosMezaris
 
Cocomo ( cot constrictive model) and capability maturity model
Cocomo ( cot constrictive model) and capability maturity modelCocomo ( cot constrictive model) and capability maturity model
Cocomo ( cot constrictive model) and capability maturity modelPrakash Poudel
 
Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...Vincenzo Ferme
 
Reengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachReengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachIJECEIAES
 
SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...
SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...
SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...ijma
 
Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...
Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...
Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...ijma
 
ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2onsoftwaretest
 
Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...IRJET Journal
 
Design & Implementation.pptx
Design & Implementation.pptxDesign & Implementation.pptx
Design & Implementation.pptxSalmaItagi2
 
IceBreaker Solving Cold Start Problem For Video Recommendation Engines
IceBreaker  Solving Cold Start Problem For Video Recommendation EnginesIceBreaker  Solving Cold Start Problem For Video Recommendation Engines
IceBreaker Solving Cold Start Problem For Video Recommendation EnginesJamie Boyd
 
An Investigation Of EXtreme Programming Practices
An Investigation Of EXtreme Programming PracticesAn Investigation Of EXtreme Programming Practices
An Investigation Of EXtreme Programming PracticesGabriel Moreira
 

Similar to PoR_evaluation_measure_acm_mm_2020 (20)

Unsupervised Video Summarization via Attention-Driven Adversarial Learning
Unsupervised Video Summarization via Attention-Driven Adversarial LearningUnsupervised Video Summarization via Attention-Driven Adversarial Learning
Unsupervised Video Summarization via Attention-Driven Adversarial Learning
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video Summarization
 
Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1Icme2020 tutorial video_summarization_part1
Icme2020 tutorial video_summarization_part1
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testing
 
Partitioned Based Regression Verification
Partitioned Based Regression VerificationPartitioned Based Regression Verification
Partitioned Based Regression Verification
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
 
UVM_Full_Print_n.pptx
UVM_Full_Print_n.pptxUVM_Full_Print_n.pptx
UVM_Full_Print_n.pptx
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
Cocomo ( cot constrictive model) and capability maturity model
Cocomo ( cot constrictive model) and capability maturity modelCocomo ( cot constrictive model) and capability maturity model
Cocomo ( cot constrictive model) and capability maturity model
 
Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...Declarative Performance Testing Automation - Automating Performance Testing f...
Declarative Performance Testing Automation - Automating Performance Testing f...
 
Furuyama - analysis of factors that affect productivity
Furuyama - analysis of factors that affect productivityFuruyama - analysis of factors that affect productivity
Furuyama - analysis of factors that affect productivity
 
Reengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approachReengineering framework for open source software using decision tree approach
Reengineering framework for open source software using decision tree approach
 
SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...
SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...
SUBJECTIVE QUALITY EVALUATION OF H.264 AND H.265 ENCODED VIDEO SEQUENCES STRE...
 
Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...
Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...
Subjective Quality Evaluation of H.264 and H.265 Encoded Video Sequences Stre...
 
ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2ISTQB, ISEB Lecture Notes- 2
ISTQB, ISEB Lecture Notes- 2
 
Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...Automated Context-based Question-Distractor Generation using Extractive Summa...
Automated Context-based Question-Distractor Generation using Extractive Summa...
 
Design & Implementation.pptx
Design & Implementation.pptxDesign & Implementation.pptx
Design & Implementation.pptx
 
IceBreaker Solving Cold Start Problem For Video Recommendation Engines
IceBreaker  Solving Cold Start Problem For Video Recommendation EnginesIceBreaker  Solving Cold Start Problem For Video Recommendation Engines
IceBreaker Solving Cold Start Problem For Video Recommendation Engines
 
An Investigation Of EXtreme Programming Practices
An Investigation Of EXtreme Programming PracticesAn Investigation Of EXtreme Programming Practices
An Investigation Of EXtreme Programming Practices
 

More from VasileiosMezaris

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationVasileiosMezaris
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskVasileiosMezaris
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosVasileiosMezaris
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...VasileiosMezaris
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022VasileiosMezaris
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersVasileiosMezaris
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...VasileiosMezaris
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web applicationVasileiosMezaris
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AIVasileiosMezaris
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrievalVasileiosMezaris
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruningVasileiosMezaris
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...VasileiosMezaris
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networksVasileiosMezaris
 
Video & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulationsVideo & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulationsVasileiosMezaris
 

More from VasileiosMezaris (20)

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and Localization
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages Task
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees Videos
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video Search
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web application
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AI
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruning
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networks
 
Video & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulationsVideo & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulations
 

Recently uploaded

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

PoR_evaluation_measure_acm_mm_2020

  • 1. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Title of presentation Subtitle Name of presenter Date Performance over Random: A Robust Evaluation Protocol for Video Summarization Methods E. Apostolidis1,2, E. Adamantidou1, A. I. Metsai1, V. Mezaris1, I. Patras2 1 CERTH-ITI, Thermi - Thessaloniki, Greece 2 School of EECS, Queen Mary University of London, London, UK 28th ACM Int. Conf. on Multimedia Seattle, WA, USA, October 2020
  • 2. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Outline 2  What’s the goal of video summarization?  How to evaluate video summarization?  Established evaluation protocol and its weaknesses  Proposed approach: Performance over Random  Experiments  Conclusions
  • 3. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 3 Video summary: a short visual synopsis that encapsulates the flow of the story and the essential parts of the full-length video Original video 1. Video storyboard What’s the goal of video summarization? 2. Video skim Summary
  • 4. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project How to evaluate video summarization? 4  An evaluation approach along with a benchmark dataset for video summarization was introduced in [11]  SumMe dataset (https://gyglim.github.io/me/vsum/index.html#benchmark)  25 videos capturing multiple events (e.g. cooking and sports)  video length: 1 to 6 min  annotation: fragment-based video summaries (15-18 per video) Evaluating video skims [11] M. Gygli, H. Grabner, H. Riemenschneider, L. Van Gool. 2014. Creating Summaries from User Videos. In Proc. of the 2014 European Conf. on Computer Vision (ECCV), D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.). Springer International Publishing, Cham, 505–520.
  • 5. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project How to evaluate video summarization? 5  Agreement between automatically-generated (A) and user-defined (U) summary is expressed by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩)  Typical metrics for computing Precision and Recall at the frame-level  80% of video samples are used for training and the remaining 20% for testing  Typically, the generated summary should not exceed 15% of the video length Evaluating video skims
  • 6. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project How to evaluate video summarization? 6  This protocol was used to evaluate summarization based on another benchmark dataset [12]  TVSum dataset (https://github.com/yalesong/tvsum)  50 videos from 10 categories of TRECVid MED task  video length: 1 to 11 min  annotation: frame-level importance scores (20 per video) Evaluating video skims [12] Y. Song, J. Vallmitjana, A. Stent, A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 5179–5187.
  • 7. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Established evaluation protocol 7  Mostly used benchmark datasets: SumMe and TVSum  Alignment between automatically-created and user-defined summaries quantified by F-Score  Max of the computed values is kept for SumMe; Average of these values is kept for TVSum  Summary length should be less than 15% of the video duration  80% of data is used for training (plus validation) and the remaining 20% for testing  Most works perform evaluations using 5 different randomly-created data splits and report the average performance  Though variations of this setting (1-split, 10-splits, “few”-splits, 5-fold cross validation) exist Typical setting in bibliography
  • 8. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 8 Setting of the study Studying the established protocol  Considered aspects  Representativeness of results when evaluation relies on a small set of randomly-created splits  Reliability of performance comparisons that use different data splits for each algorithm  Used algorithms  Supervised dppLSTM [14] and VASNet [15] methods  Unsupervised DR-DSN [16], SUM-GAN-sl [17] and SUM-GAN-AAE [18] methods  First experiment: performance evaluation using a fixed set of 5 randomly-created data splits of SumMe and TVSum  Second experiment: performance evaluation using a fixed set of 50 randomly-created data splits of SumMe and TVSum  Plus: comparison with the reported values in the corresponding papers
  • 9. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 9  Noticeable difference of evaluation results on 5 and 50 splits  Differences between 5 and 50 splits are often larger than differences between methods  Methods' rankings are different on 5 and 50 splits; plus they do not match the ranking based on the reported results Outcomes Values denote F-Score (%) Rep. is the reported value from the relevant paper Best score → bold, Second-best → underlined Studying the established protocol
  • 10. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 10  Noticeable difference of evaluation results on 5 and 50 splits  Differences between 5 and 50 splits are often larger than differences between methods  Methods' rankings are different on 5 and 50 splits; plus they do not match the ranking based on the reported results Outcomes Values denote F-Score (%) Rep. is the reported value from the relevant paper Best score → bold, Second-best → underlined Serious lack of reliability of comparisons that rely on a limited number of data splits Studying the established protocol Limited representativeness of results when the evaluation relies on a few data splits
  • 11. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 11 Studying the established protocol  Noticeable variability of performance over the set of splits  Variability follows a quite similar pattern for all methods
  • 12. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 12 Studying the established protocol  Noticeable variability of performance over the set of splits  Variability follows a quite similar pattern for all methods Hypothesis: different levels of difficulty for the used splits
  • 13. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 13 How to mitigate the observed weaknesses?  Check potential association between the method’s performance and a measure of how challenging each data split is  Use these data splits and examine the performance of:  Random Summarizer  Average Human Summarizer Reduce the impact of the used data splits
  • 14. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 14 Estimate random performance For a given video of a test set: 1) Random frame-level importance scores based on a uniform distribution
  • 15. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 15 Estimate random performance For a given video of a test set: 1) Random frame-level importance scores based on a uniform distribution 2) Fragment-level importance scores
  • 16. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 16 Estimate random performance For a given video of a test set: 1) Random frame-level importance scores based on a uniform distribution 2) Fragment-level importance scores 3) Summary of the random summarizer Knapsack
  • 17. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 17 Estimate random performance For a given video of a test set: 4) Compare the random summary with the user-generated summaries
  • 18. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 18 Estimate random performance For a given video of a test set: 4) Compare the random summary with the user-generated summaries F-Score1
  • 19. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 19 Estimate random performance For a given video of a test set: 4) Compare the random summary with the user-generated summaries F-Score1 F-Score2
  • 20. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 20 Estimate random performance For a given video of a test set: 4) Compare the random summary with the user-generated summaries F-Score1 F-ScoreN F-Score2
  • 21. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 21 Estimate random performance For a given video of a test set: 4) Compare the random summary with the user-generated summaries F-Score1 F-Score2 F-ScoreN F-Score for Video #1 =max{F-Score}i=1 N =avg{F-Score}i=1 N
  • 22. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 22 Estimate random performance For the entire test set of a data split: 4) Compare the random summary with the user-generated summaries F-Score1 F-Score2 F-ScoreN F-Score for Video #1 F-Score for Video #M *** Calculate F-Score for Video #M F-Score for test set Average
  • 23. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 23 Estimate random performance For the entire test set of a data split: 4) Compare the random summary with the user-generated summaries F-Score1 F-Score2 F-ScoreN F-Score for Video #1 F-Score for Video #M *** Calculate F-Score for Video #M F-Score for test set Average x 100 times
  • 24. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 24 Estimate average human performance Performance of User #1 on a given video of a test set:
  • 25. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 25 Estimate average human performance Performance of User #1 on a given video of a test set: F-Score12
  • 26. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 26 Estimate average human performance Performance of User #1 on a given video of a test set: F-Score12 F-Score13
  • 27. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 27 Estimate average human performance Performance of User #1 on a given video of a test set: F-Score12 F-Score13 F-Score1N
  • 28. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 28 Estimate average human performance Performance of User #1 on a given video of a test set: F-Score12 F-Score13 F-Score1N F-Score1User 1 -
  • 29. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 29 Estimate average human performance Performance of User #N on a given video of a test set: F-Score12 F-Score13 F-Score1N F-Score1 F-ScoreN2 F-ScoreN3 F-ScoreN(N-1) F-ScoreNUser 1 - User N -
  • 30. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 30 Estimate average human performance Calculate the average human performance on a given video of a test set: F-Score12 F-Score13 F-Score1N F-Score1 F-ScoreN2 F-ScoreN3 F-ScoreN(N-1) F-ScoreNUser 1 - User N - Average F-Score for Video #1
  • 31. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 31 Estimate average human performance Calculate the average human performance on the entire test set: F-Score12 F-Score13 F-Score1N F-Score1 F-ScoreN2 F-ScoreN3 F-ScoreN(N-1) F-ScoreNUser 1 - User N - Average F-Score for Video #1 F-Score for Video #M *** Calculate F-Score for Video #M
  • 32. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 32 Estimate average human performance Calculate the average human performance on the entire test set: F-Score12 F-Score13 F-Score1N F-Score1 F-ScoreN2 F-ScoreN3 F-ScoreN(N-1) F-ScoreNUser 1 - User N - Average F-Score for Video #1 F-Score for Video #M *** Calculate F-Score for Video #M Final F-Score Average
  • 33. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 33 Updated performance curve  Noticeable variance in the performance of random and human summarizer Different levels of difficulty for the used splits
  • 34. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 34 How to decide on the most suitable measure?  Covariance: measure of the joint variability of two random variables For two jointly distributed real-valued random variables X and Y with finite second moments:  Pearson Correlation Coefficient: normalized version of Covariance that indicates (via its magnitude) the strength of the linear relation (values in [0,1]) Correlation with the performance of random and human summarizers
  • 35. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 35 How to decide on the most suitable measure? Correlation with the performance of random and human summarizers In terms of performance there is a clearly stronger correlation of the tested methods with the random summarizer
  • 36. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 36 Proposed approach: Performance over Random (PoR) Core idea  Estimate the difficulty of a data split by computing the performance of a random summarizer  Exploit this information when using the data split to assess a video summarization algorithm Main targets  Reduce the impact of the used data splits in the performance evaluation  Increase the representativeness of evaluation outcomes  Enhance the reliability of comparisons based on different data splits
  • 37. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 37 Proposed approach: Performance over Random (PoR) Computing steps For a given summarization method and a data split: 1) Compute Ƒ, the performance of a random summarizer for this split
  • 38. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 38 Proposed approach: Performance over Random (PoR) Computing steps For a given summarization method and a data split: 1) Compute Ƒ, the performance of a random summarizer for this split 2) Compute the method's performance S on the data split F-Score (%)
  • 39. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 39 Proposed approach: Performance over Random (PoR) Computing steps For a given summarization method and a data split: 1) Compute Ƒ, the performance of a random summarizer for this split 2) Compute the method's performance S on the data split 3) Compute "Performance over Random" as: F-Score (%) based on the established evaluation protocol 𝑃𝑜𝑅 = 𝑆 Ƒ ∙ 100 PoR < 100 : performance worse than baseline (random) PoR > 100 : performance better than baseline (random)
  • 40. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 40 Experiments Representativeness of performance evaluation  Considered evaluation approaches:  Estimate performance using F-Score  Estimate performance using Performance over Random (PoR)  Estimate performance using Performance over Human (PoH)   Methods’ performance was examined on:  The large-scale setting of 50 fixed splits  20 fixed split-sets of 5 data splits each  Main focus: to which the extent the methods’ performance varies across the different data splits / split-sets  Used measure: Relative Standard Deviation (RSD)  𝑃𝑜𝑅 = 𝑆 H ∙ 100 𝑅𝑆𝐷(𝑥) = 𝑆𝑇𝐷(𝑥) Mean(x)
  • 41. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 41 Experiments Representativeness of performance evaluation  Similar RSD values for F-Score and PoH in most cases  Remarkably smaller RSD values for PoR  Reminder: the results need to vary as little as possible!
  • 42. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 42 Experiments Representativeness of performance evaluation  Similar RSD values for F-Score and PoH in most cases  Remarkably smaller RSD values for PoR  Reminder: the results need to vary as little as possible! PoR is more representative of an algorithm's performance
  • 43. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 43 Experiments Reliability of performance comparisons  But the data splits can affect the evaluation outcomes!  Assess the robustness of each evaluation protocol to such comparisons  Simulate 20 such comparisons by creating 20 mixed split-sets  Rank methods from best to worst Generation of mixed split-sets  Performance comparisons in the bibliography rely on the reported values in the relevant papers and the used data splits are completely unknown
  • 44. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 44 Experiments Reliability of performance comparisons  For each method, we studied: i) the overall ranking and ii) the variation of its ranking when using: i) 20 fixed split-sets and ii) 20 mixed split-sets  Variation quantified by computing the STD of a method’s ranking over the group of split-sets
  • 45. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 45 Experiments Reliability of performance comparisons  For each method, we studied: i) the overall ranking and ii) the variation of its ranking when using: i) 20 fixed split-sets and ii) 20 mixed split-sets  Variation quantified by computing the STD of a method’s ranking over the group of split-sets
  • 46. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 46 Experiments Reliability of performance comparisons  For each method, we studied: i) the overall ranking and ii) the variation of its ranking when using: i) 20 fixed split-sets and ii) 20 mixed split-sets  Variation quantified by computing the STD of a method’s ranking over the group of split-sets
  • 47. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 47 Experiments Reliability of performance comparisons  For each method, we studied: i) the overall ranking and ii) the variation of its ranking when using: i) 20 fixed split-sets and ii) 20 mixed split-sets  Variation quantified by computing the STD of a method’s ranking over the group of split-sets PoR is much more robust than F-Score
  • 48. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 48 Experiments Reliability of performance comparisons Using the same (fixed) split-sets  Same average ranking for all methods for both evaluation protocols Using different (mixed) split-sets:  Average ranking may differ as PoR considers the difficulty of each split-set  STD of average ranking differs significantly between F-Score and PoR  Lower STD values for PoR
  • 49. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 49 Experiments Reliability of performance comparisons Using the same (fixed) split-sets  Same average ranking for all methods for both evaluation protocols Using different (mixed) split-sets:  Average ranking may differ as PoR considers the difficulty of each split-set  STD of average ranking differs significantly between F-Score and PoR  Lower STD values for PoR
  • 50. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 50 Experiments Reliability of performance comparisons Using the same (fixed) split-sets  Same average ranking for all methods for both evaluation protocols Using different (mixed) split-sets:  Average ranking may differ as PoR considers the difficulty of each split-set  STD of average ranking differs significantly between F-Score and PoR  Lower STD values for PoR
  • 51. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 51 Experiments Reliability of performance comparisons Using the same (fixed) split-sets  Same average ranking for all methods for both evaluation protocols Using different (mixed) split-sets:  Average ranking may differ as PoR considers the difficulty of each split-set  STD of average ranking differs significantly between F-Score and PoR  Lower STD values for PoR PoR is more suitable for comparing methods ran on different split-sets
  • 52. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project  Early experiments documented the varying difficulty of the different randomly-created data splits of the established benchmarking datasets  Most SoA works use just a handful of different splits for evaluation  The varying difficulty significantly affects the evaluation results and the reliability of performance comparisons that rely on the reported values  New evaluation protocol: Performance Over Random (PoR), which takes under consideration estimates about the level of difficulty of each used data split  Experiments documented the increased robustness of PoR over F-Score and its suitability for comparing methods ran on different split-sets Conclusions 52
  • 53. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 1. S. E. F. de Avila, A. da Luz Jr., A. de A. Araújo, M. Cord. 2008. VSUMM: An Approach for Automatic Video Summarization and Quantitative Evaluation. In Proc. of the 2008 XXI Brazilian Symposium on Computer Graphics and Image Processing. 103–110. 2. N. Ejaz, I. Mehmood, S. W. Baik. 2014. Feature Aggregation Based Visual Attention model for Video Summarization. Computers and Electrical Engineering 40, 3 (2014), 993 – 1005. Special Issue on Image and Video Processing. 3. V. Chasanis, A. Likas, N. Galatsanos. 2008. Efficient Video Shot Summarization Using an Enhanced Spectral Clustering Approach. In Proc. of the Artificial Neural Networks - ICANN 2008, V. Kurková, R. Neruda, J. Koutník (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 847–856. 4. S. E. F. de Avila, A. P. B. Lopes, A. da Luz Jr., A. de A. Araújo. 2011. VSUMM: A Mechanism Designed to Produce Static Video Summaries and a Novel Evaluation Method. Pattern Recognition Letters 32, 1 (Jan. 2011), 56–68. 5. J. Almeida, N. J. Leite, R. da S. Torres. 2012. VISON: VIdeo Summarization for ONline Applications. Pattern Recogn. Lett. 33, 4 (March 2012), 397–409. 6. E. J. Y. C. Cahuina G. C. Chavez. 2013. A New Method for Static Video Summarization Using Local Descriptors and Video Temporal Segmentation. In Proc. of the 2013 XXVI Conf. on Graphics, Patterns and Images. 226–233. 7. N. Ejaz, T. Bin Tariq, S. W. Baik. 2012. Adaptive Key Frame Extraction for Video Summarization Using an Aggregation Mechanism. Journal of Visual Communication and Image Representation 23, 7 (Oct. 2012), 1031–1040. 8. H. Jacob, F. L. Pádua, A. Lacerda, A. C. Pereira. 2017. A Video Summarization Approach Based on the Emulation of Bottom-up Mechanisms of Visual Attention. Journal of Intelligent Information Systems 49, 2 (Oct. 2017), 193–211. 9. K. M. Mahmoud, N. M. Ghanem, M. A. Ismail. 2013. Unsupervised Video Summarization via Dynamic Modeling-Based Hierarchical Clustering. In Proc. of the 12th Int. Conf. on Machine Learning and Applications, Vol. 2. 303–308. 10. B. Gong, W.-L. Chao, K. Grauman, F. Sha. 2014. Diverse Sequential Subset Selection for Supervised Video Summarization. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, K. Q.Weinberger (Eds.). Curran Associates, Inc., 2069–2077. References 53
  • 54. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 11. M. Gygli, H. Grabner, H. Riemenschneider, L. Van Gool. 2014. Creating Summaries from User Videos. In Proc. of the 2014 European Conf. on Computer Vision (ECCV), D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.). Springer International Publishing, Cham, 505–520. 12. Y. Song, J. Vallmitjana, A. Stent, A. Jaimes. 2015. TVSum: Summarizing Web Videos Using Titles. In Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 5179–5187. 13. E. Rahtu, M. Otani, Y. Nakahima, J. Heikkilä. 2019. Rethinking the Evaluation of Video Summaries. In Proc. of the 2019 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 14. K. Zhang, W.-L. Chao, F. Sha, K. Grauman. 2016. Video Summarization with Long Short-Term Memory. In Proc. of the 2016 European Conf. on Computer Vision (ECCV), B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.). Springer International Publishing, Cham, 766–782. 15. J. Fajtl, H. S. Sokeh, V. Argyriou, D. Monekosso, P. Remagnino. 2019. Summarizing Videos with Attention. In Proc. of the 2018 Asian Conf. on Computer Vision (ACCV) Workshops, G. Carneiro, S. You (Eds.). Springer International Publishing, Cham, 39–54. 16. K. Zhou, Y. Qiao, T. Xiang. 2018. Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. In Proc. of the 2018 AAAI Conf. on Artificial Intelligence 17. E. Apostolidis, A. I. Metsai, E. Adamantidou, V. Mezaris, I. Patras. 2019. A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization. In Proc. Of the 1st Int. Workshop on AI for Smart TV Content Production, Access and Delivery (Nice, France) (AI4TV ’19). Association for Computing Machinery, New York, NY, USA, 17–25. 18. E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, I. Patras. 2020. Unsupervised Video Summarization via Attention-Driven Adversarial Learning. In Proc. of the MultiMedia Modeling 2020, Y. M. Ro, W.-H. Cheng, J. Kim, W.-T. Chu, P. Cui, J.-W. Choi, M.-C. Hu, W. De Neve (Eds.). Springer International Publishing, Cham, 492–504. References 54
  • 55. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 55 Thank you for your attention! Questions? Evlampios Apostolidis, apostolid@iti.gr Vasileios Mezaris, bmezaris@iti.gr Code and documentation publicly available at: https://github.com/e-apostolidis/PoR-Summarization-Measure This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV. The work of Ioannis Patras has been supported by EPSRC under grant No. EP/R026424/1.