SlideShare a Scribd company logo
1 of 38
Download to read offline
Thessaloniki, October 2020
GAN-based Video Summarization
Vasileios Mezaris
CERTH-ITI
Presentation at the AI4Media
Workshop on GANs for Media
Content Generation
1
Joint work with
E. Apostolidis, E. Adamantidou,
A. Metsai (CERTH-ITI);
I. Patras (QMUL)
Thessaloniki, October 2020Vasileios Mezaris
2
Video summary: a short visual summary that encapsulates the flow of the story and
the essential parts of the full-length video
Original video
Video summary (storyboard)
Problem statement
Thessaloniki, October 2020Vasileios Mezaris
3
Problem statement
Applications of video summarization
 Professional CMS: effective indexing,
browsing, retrieval & promotion of media
assets
 Video sharing platforms: improved viewer
experience, enhanced viewer engagement &
increased content consumption
 Other summarization scenarios: movie trailer production, sports highlights video generation,
video synopsis of 24h surveillance recordings
Thessaloniki, October 2020Vasileios Mezaris
4
Related work
Deep-learning approaches
 Various supervised methods (i.e., learning from ground-truth manually-generated summaries)
 Using feedforward neural nets (CNNs) for e.g. identifying semantically-important video parts
 Exploiting video-level metadata
 Capturing the story flow using recurrent neural nets (e.g. LSTMs)
 …and many more
 Unsupervised algorithms that do not rely on human-annotations, and build summaries
 Using adversarial learning to: minimize the distance between videos and their summary-based
reconstructions; maximize the mutual information between summary and video; learn a mapping
from raw videos to human-like summaries based on online available summaries
 …and a few more approaches (see tutorial at IEEE ICME 2020,
https://www.slideshare.net/VasileiosMezaris/icme2020-tutorial-videosummarizationpart1)
+ No need for training data (limited, hard to produce)
+ Avoid the subjectivity & biases of manually-generated summaries
+ Adaptability to different types of video
Thessaloniki, October 2020Vasileios Mezaris
GANs for unsupervised video summarization
 Our starting point: the SUM-GAN architecture [1]
 Main idea: build a keyframe selection mechanism
by minimizing the distance between the deep
representations of the original video and a
reconstructed version of it based on the selected
keyframes
 Problem: how to define a good distance?
 Solution: use a trainable discriminator network!
 Goal: train the Summarizer to maximally confuse
the Discriminator when distinguishing the original
from the reconstructed video
5
SUM-GAN
[1] B. Mahasseni, M. Lam, S. Todorovic, "Unsupervised Video
Summarization with Adversarial LSTM Networks“, 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp.
2982–2991.
Thessaloniki, October 2020Vasileios Mezaris
 Introduces two extensions [2]:
 A linear compression layer that reduces the size
of the CNN feature vectors
 An incremental and fine-grained approach to
train the model’s components
[2] E. Apostolidis, A. Metsai, E. Adamantidou, V. Mezaris, I. Patras, "A Stepwise, Label-
based Approach for Improving the Adversarial Training in Unsupervised Video
Summarization", Proc. 1st Int. Workshop on AI for Smart TV Content Production,
Access and Delivery (AI4TV'19) at ACM Multimedia 2019, Nice, France, October 2019.
6
SUM-GAN-sl
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris
 Incremental approach to train the model’s components
7
SUM-GAN-sl
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris 8
(regularization factor)
SUM-GAN-sl
GANs for unsupervised video summarization
 Incremental approach to train the model’s components
Thessaloniki, October 2020Vasileios Mezaris 9
SUM-GAN-sl
GANs for unsupervised video summarization
 Incremental approach to train the model’s components
Thessaloniki, October 2020Vasileios Mezaris 10
SUM-GAN-sl
GANs for unsupervised video summarization
 Incremental approach to train the model’s components
Thessaloniki, October 2020Vasileios Mezaris
 Adversarial learning driven by deterministic
attention auto-encoder
 The VAE in previous architecture was entirely
replaced by an attention auto-encoder (AAE)
network, forming the SUM-GAN-AAE
architecture [3]
[3] E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras, "Unsupervised
Video Summarization via Attention-Driven Adversarial Learning", Proc. 26th Int.
Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020.
11
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris 12
Attention auto-encoder
Processing pipeline
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris 13
Processing pipeline
 Weighted feature vectors fed to the Encoder
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris 14
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 For t > 1: use the hidden state of the previous
Decoder’s step (h1)
 For t = 1: use the hidden state of the last
Encoder’s step (He)
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris 15
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
16
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
 αt multiplied with V and form Context Vector vt’
17
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
 αt multiplied with V and form Context Vector vt’
 vt’ combined with Decoder’s previous output yt-1
18
Attention auto-encoder
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris 19
Attention auto-encoder
Processing pipeline
 Weighted feature vectors fed to the Encoder
 Encoder’s output (V) and Decoder’s previous
hidden state fed to the Attention component
 Attention weights (αt) computed using:
 Energy score function
 Soft-max function
 αt multiplied with V and form Context Vector vt’
 vt’ combined with Decoder’s previous output yt-1
 Decoder gradually reconstructs the video
SUM-GAN-AAE
GANs for unsupervised video summarization
Thessaloniki, October 2020Vasileios Mezaris
Video summarization practicalities
 Input: The CNN feature vectors of the (sampled) video frames
 Output: Frame-level importance scores
 Summarization process:
 CNN features pass through the linear compression layer and the frame selector  importance
scores computed at frame-level
 Given a video segmentation (using KTS) calculate fragment-level importance scores by averaging
the scores of each fragment's frames
 Summary is created by selecting the fragments that maximize the total importance score provided
that summary length does not exceed 15% of video duration, by solving the 0/1 Knapsack problem
20
Model’s I/O and summarization process
Thessaloniki, October 2020Vasileios Mezaris
Experiments
21
Datasets
 SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)
 25 videos capturing multiple events (e.g. cooking and sports)
 video length: 1 to 6 min
 annotation: fragment-based video summaries
 TVSum (https://github.com/yalesong/tvsum)
 50 videos from 10 categories of TRECVid MED task
 video length: 1 to 11 min
 annotation: frame-level importance scores
Thessaloniki, October 2020Vasileios Mezaris
Experiments
22
Evaluation protocol
 The generated summary should not exceed 15% of the video length
 Similarity between automatically generated (A) and ground-truth (G) summary is expressed
by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| ||
means duration)
 Typical metrics for computing Precision and Recall at the frame-level
Thessaloniki, October 2020Vasileios Mezaris
Experiments
23
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach in the literature
Thessaloniki, October 2020Vasileios Mezaris
Experiments
24
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach in the literature
Thessaloniki, October 2020Vasileios Mezaris
Experiments
25
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach in the literature
F-Score1
Thessaloniki, October 2020Vasileios Mezaris
Experiments
26
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach in the literature
F-Score2
F-Score1
Thessaloniki, October 2020Vasileios Mezaris
Experiments
27
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach in the literature
F-ScoreN
F-Score2
F-Score1
Thessaloniki, October 2020Vasileios Mezaris
Experiments
28
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Most used approach in the literature
F-ScoreN
F-Score2
F-Score1
SumMe: TVSum:
N
Thessaloniki, October 2020Vasileios Mezaris
Experiments
29
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Alternative approach
Thessaloniki, October 2020Vasileios Mezaris
Experiments
30
Evaluation protocol
 Slight but important distinction w.r.t. what is eventually used as ground-truth summary
 Alternative approach
F-Score
Thessaloniki, October 2020Vasileios Mezaris
 Videos were down-sampled to 2 fps
 Feature extraction was based on the pool5 layer of GoogleNet trained on ImageNet
 Linear compression layer reduces the size of these vectors from 1024 to 500
 All components are 2-layer LSTMs with 500 hidden units; Frame selector is a bi-directional LSTM
 Training based on the Adam optimizer; Summarizer’s learning rate = 10-4; Discriminator’s
learning rate = 10-5
 Dataset was split into two non-overlapping sets; a training set having 80% of data and a testing
set having the remaining 20% of data
 Ran experiments on 5 differently created random splits and report the average performance at
the training-epoch-level (i.e. for the same training epoch) over these runs
Experiments
31
Implementation details
Thessaloniki, October 2020Vasileios Mezaris
 Comparison with SoA unsupervised approaches based on multiple user summaries
 Outcomes
 A few SoA methods are comparable (or even worse) with a random summary generator
 Best method on TVSum shows random-level performance on SumMe
 Best method on SumMe performs worse than SUM-GAN-AAE and is less competitive on TVSum
 Variational attention reduces SUM-GAN-sl efficiency due to the difficulty in efficiently defining two
latent spaces in parallel to the continuous update of the model's components during the training
 Replacement of VAE with AAE leads to a noticeable performance improvement over SUM-GAN-sl
Experiments
32
Note: SUM-GAN is not listed in this table as it follows
the single gt-summary evaluation protocol
Thessaloniki, October 2020Vasileios Mezaris
 Evaluating the effect of the AAE component
 Training efficiency: much faster and more stable training of the model
Experiments
33
Loss curves for the SUM-GAN-sl and SUM-GAN-AAE
Thessaloniki, October 2020Vasileios Mezaris
 Comparison with SoA supervised approaches based on multiple user summaries
 Outcomes
 Best methods in TVSum (MAVS and Tessellationsup, respectively) seem adapted to this dataset, as
they exhibit random-level performance on SumMe
 Only a few supervised methods surpass the performance of a random summary generator on both
datasets, with VASNet being the best among them
 The performance of these methods ranges between 44.1 - 49.7 on SumMe, and 56.1 - 61.4 on TVSum
 Τhe unsupervised SUM-GAN-AAE model is comparable with SoA supervised methods
Experiments
34
+/- indicate
better/worse
performance
compared to
SUM-GAN-AAE
Thessaloniki, October 2020Vasileios Mezaris
Adapting / re-purposing the content
 Main requirements:
 Target distribution platforms & devices have varying requirements (e.g. the optimal
duration of a video differs from one platform to another)
 Target audiences have different preferences / information needs
 Video summarization:
 Create editions of the content that are adapted to different platforms and audiences
35
Thessaloniki, October 2020Vasileios Mezaris
Adapting / re-purposing the content
Web application [4] for video summarization (try it with your video!):
http://multimedia2.iti.gr/videosummarization/service/start.html
Demo video:
https://youtu.be/LbjPLJzeNII
36
[4] C. Collyda, K. Apostolidis, E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, "A
Web Service for Video Summarization", Proc. ACM Int. Conf. on Interactive Media
Experiences (IMX 2020), Barcelona, Spain, June 2020.
Thessaloniki, October 2020Vasileios Mezaris
 Presented two new video summarization methods, making use of:
 The learning efficiency of the generative adversarial networks for unsupervised training
 The effectiveness of attention mechanisms in spotting the most important parts of the video
 Experimental evaluations on two benchmarking datasets
 Documented the positive contribution of the introduced attention auto-encoder component in the
model's training and summarization performance
 Highlighted the competitiveness of the unsupervised SUM-GAN-AAE method against SoA video
summarization techniques
 Used GANs in a new web application for video summarization
 Keep in mind: complete automation is sometimes not desired! (AI + human symbiosis is key)
Conclusions
37
Thessaloniki, October 2020Vasileios Mezaris
Questions?
38
Contact: Dr. Vasileios Mezaris
Information Technologies Institute
Centre for Research and Technology Hellas
Thermi-Thessaloniki, Greece
Tel: +30 2311 257770
Email: bmezaris@iti.gr, web: http://www.iti.gr/~bmezaris/
This work was supported in part by the EU’s Horizon 2020 research and innovation programme under grant
agreement H2020-780656 ReTV.

More Related Content

What's hot

Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clusteringSahil Biswas
 
ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV project
 
ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
 ReTV: Bringing Broadcaster Archives to the 21st-century Audiences ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
ReTV: Bringing Broadcaster Archives to the 21st-century AudiencesReTV project
 
The Trans-Vector Platform, by Lyndon Nixon at TVX 2019 @datatv
The Trans-Vector Platform,  by Lyndon Nixon at TVX 2019 @datatv The Trans-Vector Platform,  by Lyndon Nixon at TVX 2019 @datatv
The Trans-Vector Platform, by Lyndon Nixon at TVX 2019 @datatv ReTV project
 
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONMtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONNEERAJ BAGHEL
 
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...sipij
 
MPEG Visual Quality Assessment: Tasks and Perspectives
MPEG Visual Quality Assessment: Tasks and PerspectivesMPEG Visual Quality Assessment: Tasks and Perspectives
MPEG Visual Quality Assessment: Tasks and PerspectivesAlpen-Adria-Universität
 
Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...sipij
 
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and MetricsMPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and MetricsAlpen-Adria-Universität
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
2018 FiTCE congress
2018 FiTCE congress2018 FiTCE congress
2018 FiTCE congressSilvia Rossi
 
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET Journal
 
Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV
 
First LinkedTV End-to-end Platform
First LinkedTV End-to-end PlatformFirst LinkedTV End-to-end Platform
First LinkedTV End-to-end PlatformLinkedTV
 
Tutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisationTutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisationRufael Mekuria
 

What's hot (18)

Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clustering
 
ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020
 
ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
 ReTV: Bringing Broadcaster Archives to the 21st-century Audiences ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
ReTV: Bringing Broadcaster Archives to the 21st-century Audiences
 
The Trans-Vector Platform, by Lyndon Nixon at TVX 2019 @datatv
The Trans-Vector Platform,  by Lyndon Nixon at TVX 2019 @datatv The Trans-Vector Platform,  by Lyndon Nixon at TVX 2019 @datatv
The Trans-Vector Platform, by Lyndon Nixon at TVX 2019 @datatv
 
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONMtech Second progresspresentation ON VIDEO SUMMARIZATION
Mtech Second progresspresentation ON VIDEO SUMMARIZATION
 
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...
 
MPEG Visual Quality Assessment: Tasks and Perspectives
MPEG Visual Quality Assessment: Tasks and PerspectivesMPEG Visual Quality Assessment: Tasks and Perspectives
MPEG Visual Quality Assessment: Tasks and Perspectives
 
AVSTP2P Overview
AVSTP2P OverviewAVSTP2P Overview
AVSTP2P Overview
 
Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...Perceptually Lossless Compression with Error Concealment for Periscope and So...
Perceptually Lossless Compression with Error Concealment for Periscope and So...
 
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and MetricsMPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics
MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
2018 FiTCE congress
2018 FiTCE congress2018 FiTCE congress
2018 FiTCE congress
 
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
IRJET- A Hybrid Image and Video Compression of DCT and DWT Techniques for H.2...
 
Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...
 
First LinkedTV End-to-end Platform
First LinkedTV End-to-end PlatformFirst LinkedTV End-to-end Platform
First LinkedTV End-to-end Platform
 
Tutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisationTutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisation
 
Activity report
Activity reportActivity report
Activity report
 

Similar to GAN-based video summarization

absorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDabsorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDIJERA Editor
 
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...IJERA Editor
 
Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...IJECEIAES
 
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVignesh V Menon
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Alpen-Adria-Universität
 
HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)Alpen-Adria-Universität
 
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCSVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCIRJET Journal
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingAlpen-Adria-Universität
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfVignesh V Menon
 
Effective Compression of Digital Video
Effective Compression of Digital VideoEffective Compression of Digital Video
Effective Compression of Digital VideoIRJET Journal
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesAlpen-Adria-Universität
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesAlpen-Adria-Universität
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...ijcseit
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...ijcseit
 
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)Motaz Sabri
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding Systemijtsrd
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...ijcseit
 

Similar to GAN-based video summarization (20)

absorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDabsorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRD
 
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
 
Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...Video content analysis and retrieval system using video storytelling and inde...
Video content analysis and retrieval system using video storytelling and inde...
 
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
 
A04840107
A04840107A04840107
A04840107
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdf
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
 
HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)
 
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCSVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Effective Compression of Digital Video
Effective Compression of Digital VideoEffective Compression of Digital Video
Effective Compression of Digital Video
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
 
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding System
 
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
OBSERVATIONAL DISCRETE LINES FOR THE DETECTION OF MOVING VEHICLES IN ROAD TRA...
 
C0161018
C0161018C0161018
C0161018
 

More from VasileiosMezaris

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationVasileiosMezaris
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskVasileiosMezaris
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosVasileiosMezaris
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...VasileiosMezaris
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022VasileiosMezaris
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionVasileiosMezaris
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersVasileiosMezaris
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...VasileiosMezaris
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video SummarizationVasileiosMezaris
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AIVasileiosMezaris
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrievalVasileiosMezaris
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruningVasileiosMezaris
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...VasileiosMezaris
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networksVasileiosMezaris
 
Video & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulationsVideo & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulationsVasileiosMezaris
 

More from VasileiosMezaris (20)

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and Localization
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages Task
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees Videos
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
 
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video Search
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AI
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruning
 
Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...Video, AI and News: video analysis and verification technologies for supporti...
Video, AI and News: video analysis and verification technologies for supporti...
 
Subclass deep neural networks
Subclass deep neural networksSubclass deep neural networks
Subclass deep neural networks
 
Video & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulationsVideo & AI: capabilities and limitations of AI in detecting video manipulations
Video & AI: capabilities and limitations of AI in detecting video manipulations
 

Recently uploaded

basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 

Recently uploaded (20)

basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 

GAN-based video summarization

  • 1. Thessaloniki, October 2020 GAN-based Video Summarization Vasileios Mezaris CERTH-ITI Presentation at the AI4Media Workshop on GANs for Media Content Generation 1 Joint work with E. Apostolidis, E. Adamantidou, A. Metsai (CERTH-ITI); I. Patras (QMUL)
  • 2. Thessaloniki, October 2020Vasileios Mezaris 2 Video summary: a short visual summary that encapsulates the flow of the story and the essential parts of the full-length video Original video Video summary (storyboard) Problem statement
  • 3. Thessaloniki, October 2020Vasileios Mezaris 3 Problem statement Applications of video summarization  Professional CMS: effective indexing, browsing, retrieval & promotion of media assets  Video sharing platforms: improved viewer experience, enhanced viewer engagement & increased content consumption  Other summarization scenarios: movie trailer production, sports highlights video generation, video synopsis of 24h surveillance recordings
  • 4. Thessaloniki, October 2020Vasileios Mezaris 4 Related work Deep-learning approaches  Various supervised methods (i.e., learning from ground-truth manually-generated summaries)  Using feedforward neural nets (CNNs) for e.g. identifying semantically-important video parts  Exploiting video-level metadata  Capturing the story flow using recurrent neural nets (e.g. LSTMs)  …and many more  Unsupervised algorithms that do not rely on human-annotations, and build summaries  Using adversarial learning to: minimize the distance between videos and their summary-based reconstructions; maximize the mutual information between summary and video; learn a mapping from raw videos to human-like summaries based on online available summaries  …and a few more approaches (see tutorial at IEEE ICME 2020, https://www.slideshare.net/VasileiosMezaris/icme2020-tutorial-videosummarizationpart1) + No need for training data (limited, hard to produce) + Avoid the subjectivity & biases of manually-generated summaries + Adaptability to different types of video
  • 5. Thessaloniki, October 2020Vasileios Mezaris GANs for unsupervised video summarization  Our starting point: the SUM-GAN architecture [1]  Main idea: build a keyframe selection mechanism by minimizing the distance between the deep representations of the original video and a reconstructed version of it based on the selected keyframes  Problem: how to define a good distance?  Solution: use a trainable discriminator network!  Goal: train the Summarizer to maximally confuse the Discriminator when distinguishing the original from the reconstructed video 5 SUM-GAN [1] B. Mahasseni, M. Lam, S. Todorovic, "Unsupervised Video Summarization with Adversarial LSTM Networks“, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2982–2991.
  • 6. Thessaloniki, October 2020Vasileios Mezaris  Introduces two extensions [2]:  A linear compression layer that reduces the size of the CNN feature vectors  An incremental and fine-grained approach to train the model’s components [2] E. Apostolidis, A. Metsai, E. Adamantidou, V. Mezaris, I. Patras, "A Stepwise, Label- based Approach for Improving the Adversarial Training in Unsupervised Video Summarization", Proc. 1st Int. Workshop on AI for Smart TV Content Production, Access and Delivery (AI4TV'19) at ACM Multimedia 2019, Nice, France, October 2019. 6 SUM-GAN-sl GANs for unsupervised video summarization
  • 7. Thessaloniki, October 2020Vasileios Mezaris  Incremental approach to train the model’s components 7 SUM-GAN-sl GANs for unsupervised video summarization
  • 8. Thessaloniki, October 2020Vasileios Mezaris 8 (regularization factor) SUM-GAN-sl GANs for unsupervised video summarization  Incremental approach to train the model’s components
  • 9. Thessaloniki, October 2020Vasileios Mezaris 9 SUM-GAN-sl GANs for unsupervised video summarization  Incremental approach to train the model’s components
  • 10. Thessaloniki, October 2020Vasileios Mezaris 10 SUM-GAN-sl GANs for unsupervised video summarization  Incremental approach to train the model’s components
  • 11. Thessaloniki, October 2020Vasileios Mezaris  Adversarial learning driven by deterministic attention auto-encoder  The VAE in previous architecture was entirely replaced by an attention auto-encoder (AAE) network, forming the SUM-GAN-AAE architecture [3] [3] E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, I. Patras, "Unsupervised Video Summarization via Attention-Driven Adversarial Learning", Proc. 26th Int. Conf. on Multimedia Modeling (MMM2020), Daejeon, Korea, Jan. 2020. 11 SUM-GAN-AAE GANs for unsupervised video summarization
  • 12. Thessaloniki, October 2020Vasileios Mezaris 12 Attention auto-encoder Processing pipeline SUM-GAN-AAE GANs for unsupervised video summarization
  • 13. Thessaloniki, October 2020Vasileios Mezaris 13 Processing pipeline  Weighted feature vectors fed to the Encoder Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  • 14. Thessaloniki, October 2020Vasileios Mezaris 14 Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  For t > 1: use the hidden state of the previous Decoder’s step (h1)  For t = 1: use the hidden state of the last Encoder’s step (He) Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  • 15. Thessaloniki, October 2020Vasileios Mezaris 15 Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using: Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  • 16. Thessaloniki, October 2020Vasileios Mezaris Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function 16 Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  • 17. Thessaloniki, October 2020Vasileios Mezaris Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’ 17 Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  • 18. Thessaloniki, October 2020Vasileios Mezaris Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’  vt’ combined with Decoder’s previous output yt-1 18 Attention auto-encoder SUM-GAN-AAE GANs for unsupervised video summarization
  • 19. Thessaloniki, October 2020Vasileios Mezaris 19 Attention auto-encoder Processing pipeline  Weighted feature vectors fed to the Encoder  Encoder’s output (V) and Decoder’s previous hidden state fed to the Attention component  Attention weights (αt) computed using:  Energy score function  Soft-max function  αt multiplied with V and form Context Vector vt’  vt’ combined with Decoder’s previous output yt-1  Decoder gradually reconstructs the video SUM-GAN-AAE GANs for unsupervised video summarization
  • 20. Thessaloniki, October 2020Vasileios Mezaris Video summarization practicalities  Input: The CNN feature vectors of the (sampled) video frames  Output: Frame-level importance scores  Summarization process:  CNN features pass through the linear compression layer and the frame selector  importance scores computed at frame-level  Given a video segmentation (using KTS) calculate fragment-level importance scores by averaging the scores of each fragment's frames  Summary is created by selecting the fragments that maximize the total importance score provided that summary length does not exceed 15% of video duration, by solving the 0/1 Knapsack problem 20 Model’s I/O and summarization process
  • 21. Thessaloniki, October 2020Vasileios Mezaris Experiments 21 Datasets  SumMe (https://gyglim.github.io/me/vsum/index.html#benchmark)  25 videos capturing multiple events (e.g. cooking and sports)  video length: 1 to 6 min  annotation: fragment-based video summaries  TVSum (https://github.com/yalesong/tvsum)  50 videos from 10 categories of TRECVid MED task  video length: 1 to 11 min  annotation: frame-level importance scores
  • 22. Thessaloniki, October 2020Vasileios Mezaris Experiments 22 Evaluation protocol  The generated summary should not exceed 15% of the video length  Similarity between automatically generated (A) and ground-truth (G) summary is expressed by the F-Score (%), with (P)recision and (R)ecall measuring the temporal overlap (∩) (|| || means duration)  Typical metrics for computing Precision and Recall at the frame-level
  • 23. Thessaloniki, October 2020Vasileios Mezaris Experiments 23 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature
  • 24. Thessaloniki, October 2020Vasileios Mezaris Experiments 24 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature
  • 25. Thessaloniki, October 2020Vasileios Mezaris Experiments 25 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-Score1
  • 26. Thessaloniki, October 2020Vasileios Mezaris Experiments 26 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-Score2 F-Score1
  • 27. Thessaloniki, October 2020Vasileios Mezaris Experiments 27 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-ScoreN F-Score2 F-Score1
  • 28. Thessaloniki, October 2020Vasileios Mezaris Experiments 28 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Most used approach in the literature F-ScoreN F-Score2 F-Score1 SumMe: TVSum: N
  • 29. Thessaloniki, October 2020Vasileios Mezaris Experiments 29 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Alternative approach
  • 30. Thessaloniki, October 2020Vasileios Mezaris Experiments 30 Evaluation protocol  Slight but important distinction w.r.t. what is eventually used as ground-truth summary  Alternative approach F-Score
  • 31. Thessaloniki, October 2020Vasileios Mezaris  Videos were down-sampled to 2 fps  Feature extraction was based on the pool5 layer of GoogleNet trained on ImageNet  Linear compression layer reduces the size of these vectors from 1024 to 500  All components are 2-layer LSTMs with 500 hidden units; Frame selector is a bi-directional LSTM  Training based on the Adam optimizer; Summarizer’s learning rate = 10-4; Discriminator’s learning rate = 10-5  Dataset was split into two non-overlapping sets; a training set having 80% of data and a testing set having the remaining 20% of data  Ran experiments on 5 differently created random splits and report the average performance at the training-epoch-level (i.e. for the same training epoch) over these runs Experiments 31 Implementation details
  • 32. Thessaloniki, October 2020Vasileios Mezaris  Comparison with SoA unsupervised approaches based on multiple user summaries  Outcomes  A few SoA methods are comparable (or even worse) with a random summary generator  Best method on TVSum shows random-level performance on SumMe  Best method on SumMe performs worse than SUM-GAN-AAE and is less competitive on TVSum  Variational attention reduces SUM-GAN-sl efficiency due to the difficulty in efficiently defining two latent spaces in parallel to the continuous update of the model's components during the training  Replacement of VAE with AAE leads to a noticeable performance improvement over SUM-GAN-sl Experiments 32 Note: SUM-GAN is not listed in this table as it follows the single gt-summary evaluation protocol
  • 33. Thessaloniki, October 2020Vasileios Mezaris  Evaluating the effect of the AAE component  Training efficiency: much faster and more stable training of the model Experiments 33 Loss curves for the SUM-GAN-sl and SUM-GAN-AAE
  • 34. Thessaloniki, October 2020Vasileios Mezaris  Comparison with SoA supervised approaches based on multiple user summaries  Outcomes  Best methods in TVSum (MAVS and Tessellationsup, respectively) seem adapted to this dataset, as they exhibit random-level performance on SumMe  Only a few supervised methods surpass the performance of a random summary generator on both datasets, with VASNet being the best among them  The performance of these methods ranges between 44.1 - 49.7 on SumMe, and 56.1 - 61.4 on TVSum  Τhe unsupervised SUM-GAN-AAE model is comparable with SoA supervised methods Experiments 34 +/- indicate better/worse performance compared to SUM-GAN-AAE
  • 35. Thessaloniki, October 2020Vasileios Mezaris Adapting / re-purposing the content  Main requirements:  Target distribution platforms & devices have varying requirements (e.g. the optimal duration of a video differs from one platform to another)  Target audiences have different preferences / information needs  Video summarization:  Create editions of the content that are adapted to different platforms and audiences 35
  • 36. Thessaloniki, October 2020Vasileios Mezaris Adapting / re-purposing the content Web application [4] for video summarization (try it with your video!): http://multimedia2.iti.gr/videosummarization/service/start.html Demo video: https://youtu.be/LbjPLJzeNII 36 [4] C. Collyda, K. Apostolidis, E. Apostolidis, E. Adamantidou, A. Metsai, V. Mezaris, "A Web Service for Video Summarization", Proc. ACM Int. Conf. on Interactive Media Experiences (IMX 2020), Barcelona, Spain, June 2020.
  • 37. Thessaloniki, October 2020Vasileios Mezaris  Presented two new video summarization methods, making use of:  The learning efficiency of the generative adversarial networks for unsupervised training  The effectiveness of attention mechanisms in spotting the most important parts of the video  Experimental evaluations on two benchmarking datasets  Documented the positive contribution of the introduced attention auto-encoder component in the model's training and summarization performance  Highlighted the competitiveness of the unsupervised SUM-GAN-AAE method against SoA video summarization techniques  Used GANs in a new web application for video summarization  Keep in mind: complete automation is sometimes not desired! (AI + human symbiosis is key) Conclusions 37
  • 38. Thessaloniki, October 2020Vasileios Mezaris Questions? 38 Contact: Dr. Vasileios Mezaris Information Technologies Institute Centre for Research and Technology Hellas Thermi-Thessaloniki, Greece Tel: +30 2311 257770 Email: bmezaris@iti.gr, web: http://www.iti.gr/~bmezaris/ This work was supported in part by the EU’s Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV.