SlideShare a Scribd company logo
1 of 44
Download to read offline
Low computational cost algorithms
for photo clustering and mail
signature detection in the cloud!
Daniel Manchón
Co-directors: Xavi Giró (UPC) Omar Pera (Pixable)
1
Outline
• Motivation!
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
2
Motivation: Photo clustering
3
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
Motivation: Mail signature detection
4
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
Motivation: Cloud computing
5
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
Outline
• Motivation
• Tasks summary
• Pixable internship!
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
6
Pixable internship
- Social photos aggregation!
- Photo ranking!
- Editorial content!
- Contacts feeds!
- Owned by Singtel
- Photo storage!
- Synchronization across multiple devices!
- Support for RAW
- CallerID application!
- Multiple contact source support!
- Contact backup and synchronization!
- SPAM detection
7
Photofeed tasks
• Instagram source (in-production)
• Referrals and invitations method
• "New relic" integration
• Photo clustering and
summarization
• Photo download service 

(in-production)
8
• Mail scrapping monitorization
• Signature detection!
• Identity analysis improvement
• Tooling (in-production)
Contactive tasks
9
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant!
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
10
GPI research assistant
• Mediaeval 2013 (published paper)
• ICMR SEWM (published paper)
• Pyxel software framework
• Mediaeval 2014
11
Multimedia retrieval conference
GPI: Image and Video Processing Group
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction!
• Requirements
• Design
• Results
12
Photo Clustering: Intro
PhotoTOC
[Platt et al, PACRIM 2003]
State of the artEvent detection
13
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements!
• Design
• Results
14
Photo Clustering: Requirements
• User data stored in Amazon
cloud and MongoDB.
• Low computing
• Easily configurable using
REST API
• Event generation
• Visual and metadata information
available
• F1 and NMI as evaluation metrics
• 400k annotated photo dataset
Mediaeval requirements Photofeed constrains
15
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design!
• Results
16
Design
Hi, I’m John. Hi, I’m Emily.
(a) Temporal sorting by each user independently
17
Design
(b) Temporal-based oversegmentation in mini-clusters
PhotoTOC
[Platt et al, PacRim 2003]
18
Design
(b) Temporal-based oversegmentation in mini-clusters, mean values modelization
19
Username= John
T.taken= 2010-09-10 02:10:12
GPS= (42.1,-10)
tags= live,stage,deerhunter
Username= emily
T.taken= 2010-12-13 02:11:10
GPS= (43,-8.40)
tags= live,deerhunter
Username= emily
T.taken= 2010-12-13 03:11:10
GPS= (no data)
tags= live,stones
Username= emily
T.taken= 2010-12-14 23:11:10
GPS= (43.2,-8.2)
tags= sound, test
Design
(c) Sequential merging of mini-clusters
?
t
avg(·) avg(·) avg(·)avg(·)
20
Design
(c) Sequential merging of mini-clusters
21
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
22
Results
F1 = 2
PR
P + R
UPC 3rd place of 12 teams!!!
23
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction!
• Requirements
• Design
• Results
24
Mail signature detection: Intro
• Email information extraction
• SPAM detection
• Low computation
State of the artKEY TOPICS
Learning to extract signature and reply lines from email
[Vitor R. Carvalho and William W. Cohen, 2004 ]
25
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements!
• Design
• Results
26
Mail signature detection: Requirements
• Mail scrapping service improvement
• Pre-process the input to reduce the execution time
• Adapt the mail scrapping service to Contactive product
?
fewer information
filter only signatures
MongoDB entries
User mailbox
id 89012
name John Doe
email j.doe@gmail.com
linkedin Id 7788455367_e
phone 789675463
27
Mail
scrapping
service
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements
• Design!
• Results
28
Design
2. Problem Definition and Corpus
A signature block is the set of lines, usually in the end of a message, that contain information about the sender,
such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from
famous persons and creative ASCII drawings are often present in this block also. An example of a signature block
can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1
also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In
this paper we will call such lines reply lines.
<other> From: wcohen@cs.cmu.edu
<other> To: Vitor Carvalho <vitor@cs.cmu.edu>
<other> Subject: Re: Did you try to compile javadoc recently?
<other> Date: 25 Mar 2004 12:05:51 -0500
<other>
<other> Try cvs update –dP, this removes files & directories that have been
<other> deleted from cvs.
<other> - W
<other>
<reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote:
<reply> > I’ve just checked-out the baseline m3 code and
<reply> > "Ant dist" is working fine, but "ant javadoc" is not.
<reply> > Thanks
<reply> > Vitor
<other>
<sig> ------------------------------------------------------------------
<sig> William W. Cohen “Would you drive a mime
<sig> wcohen@cs.cmu.edu nuts if you played a
<sig> http://www.wcohen.com blank audio tape
<sig> Associate Research Professor full blast?”
<sig> CALD, Carnegie-Mellon University - S. Wright
Figure 1 - Excerpt from a labeled email message
(a) Split the K last mail lines and retrieve the annotations
Last K
lines
Ground truth
annotations
29
2. Problem Definition and Corpus
A signature block is the set of lines, usually in the end of a message, that contain information about the sender,
such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from
famous persons and creative ASCII drawings are often present in this block also. An example of a signature block
can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1
also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In
Lines
N Feature
Patterns
(b) feature extraction
30
Design
Design
(c) SVM training and model generation
nition and Corpus
is the set of lines, usually in the end of a message, that contain information about the sender,
e, affiliation, postal address, web address, email address, telephone number, etc. Quotes from
reative ASCII drawings are often present in this block also. An example of a signature block
lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1
of text that were quoted from a preceding message (marked with the line label <reply>). In
such lines reply lines.
om: wcohen@cs.cmu.edu
: Vitor Carvalho <vitor@cs.cmu.edu>
31
Feature matrix
[KxN]
Vector ground truth
[K]
+ SVM
training Model=
Design
(c) SVM training and model generation
Model
● Other
● Reply
● Signature
Lines
Classes
pre-process
Features
32
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements
• Design
• Results
33
Results
F1 = 2
Precision · Recall
Precision + Recall
34
With annotated dataset Without annotated dataset
Manual evaluation
Contactive user base mailboxes
Outline
• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
35
Conclusions
• Academic
• Papers: Mediaeval 2013 and ICMR SEWM, and Mediaeval 2014 on preparation.
• UPC Pyxel framework foundations
• Industrial
• Contributions to Pixable in production servers:
• Instagram integration
• Photofeed Downloader
• Mail signature detection: Proof of concept successful.
• Work in the USA!
36
Thank you very much!!
Q&A
37
BACKUP SLIDES
38
Design
39
(c) Sequential merging of mini-clusters
Weighted
modalities
● creation (or upload) time
● geolocation
● textual labels
● same user
Design
40
(c) Sequential merging of mini-clusters
Geolocation (d=haversine)Time stamp (d=L1)
Text labels (d=Jaccard) Same user (d=boolean)
Design
41
(c) Sequential merging of mini-clusters
Design
42
(c) Sequential merging of mini-clusters
42
Mean and std.
deviation learned on
pairs of photos within
the same training
event.
Design
43
(c) Sequential merging of mini-clusters
43
phi function
Design
44
(c) Sequential merging of mini-clusters
decision threhold

More Related Content

Viewers also liked

Bebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFVBebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFVMedaly Ventocilla
 
Global Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders WorldwideGlobal Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders WorldwideRoy Maybury
 
Dr. Douglas Rosendale
Dr. Douglas Rosendale Dr. Douglas Rosendale
Dr. Douglas Rosendale Investnet
 
Presentación Psico Educa Vet Corp
Presentación Psico Educa Vet CorpPresentación Psico Educa Vet Corp
Presentación Psico Educa Vet CorpJose Rafael Romero
 
B5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFFB5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFFNivera Ishwarlall
 
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]Lorena Mariela Rodriguez
 
La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5La Gatera de la Villa
 
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick SohnemannTrend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick SohnemannHUNGRY BOYS Creative agency
 
Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están Lidia Mar Est
 
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.CA Technologies
 
Curso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para EnfermeríaCurso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para EnfermeríaBlog Materno-Infantil
 

Viewers also liked (19)

Bebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFVBebida " Real Peru "- Marketing Informatico - UNFV
Bebida " Real Peru "- Marketing Informatico - UNFV
 
Desarrollo de competencias genéricas y yoga
Desarrollo de competencias genéricas  y yogaDesarrollo de competencias genéricas  y yoga
Desarrollo de competencias genéricas y yoga
 
Global Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders WorldwideGlobal Real Estate Institute - Connecting Real Estate Leaders Worldwide
Global Real Estate Institute - Connecting Real Estate Leaders Worldwide
 
Dr. Douglas Rosendale
Dr. Douglas Rosendale Dr. Douglas Rosendale
Dr. Douglas Rosendale
 
Comenzar
ComenzarComenzar
Comenzar
 
Presentación Psico Educa Vet Corp
Presentación Psico Educa Vet CorpPresentación Psico Educa Vet Corp
Presentación Psico Educa Vet Corp
 
B5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFFB5 - OTHER REFERENCES - PRIOR TO STEINHOFF
B5 - OTHER REFERENCES - PRIOR TO STEINHOFF
 
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]Pensemos  _un__docente__actor__y_constructor__de__innovaciones[1]
Pensemos _un__docente__actor__y_constructor__de__innovaciones[1]
 
Expo
ExpoExpo
Expo
 
La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5La gatera de_la_villa_La Gatera de la Villa nº 5
La gatera de_la_villa_La Gatera de la Villa nº 5
 
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick SohnemannTrend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
Trend One (Web Expansion) Grape Online Strategies 2009 by Nick Sohnemann
 
PhD_APC_UPMC_IFPEN_dec1997
PhD_APC_UPMC_IFPEN_dec1997PhD_APC_UPMC_IFPEN_dec1997
PhD_APC_UPMC_IFPEN_dec1997
 
Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están Español Aeronáutico creado por Lidia Están
Español Aeronáutico creado por Lidia Están
 
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
TechTalk: What is DDVS and How to Make Sense of Data-Driven Service Image.
 
Man Base Datos I
Man Base Datos IMan Base Datos I
Man Base Datos I
 
Proyecto
ProyectoProyecto
Proyecto
 
Pintados de pasi
Pintados de pasiPintados de pasi
Pintados de pasi
 
Curso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para EnfermeríaCurso de Instrumentación Endoscópica para Enfermería
Curso de Instrumentación Endoscópica para Enfermería
 
Manual corporativo - MARCA
Manual corporativo - MARCAManual corporativo - MARCA
Manual corporativo - MARCA
 

Similar to Low computational cost algorithms for photo clustering and mail signature detection in the cloud

SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...South Tyrol Free Software Conference
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxMongoDB
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxMongoDB
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...Gábor Szárnyas
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittGraySystemsLab
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 

Similar to Low computational cost algorithms for photo clustering and mail signature detection in the cloud (20)

SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
SFScon 21 - Matteo Camilli - Performance assessment of microservices with str...
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
PCL (Point Cloud Library)
PCL (Point Cloud Library)PCL (Point Cloud Library)
PCL (Point Cloud Library)
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptx
 
SH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptxSH 1 - SES 1 - advanced_schema_design.pptx
SH 1 - SES 1 - advanced_schema_design.pptx
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Effective C++
Effective C++Effective C++
Effective C++
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWitt
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Low computational cost algorithms for photo clustering and mail signature detection in the cloud

  • 1. Low computational cost algorithms for photo clustering and mail signature detection in the cloud! Daniel Manchón Co-directors: Xavi Giró (UPC) Omar Pera (Pixable) 1
  • 2. Outline • Motivation! • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 2
  • 3. Motivation: Photo clustering 3 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  • 4. Motivation: Mail signature detection 4 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  • 5. Motivation: Cloud computing 5 Low computational cost algorithms for photo clustering and mail signature detection in the cloud
  • 6. Outline • Motivation • Tasks summary • Pixable internship! • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 6
  • 7. Pixable internship - Social photos aggregation! - Photo ranking! - Editorial content! - Contacts feeds! - Owned by Singtel - Photo storage! - Synchronization across multiple devices! - Support for RAW - CallerID application! - Multiple contact source support! - Contact backup and synchronization! - SPAM detection 7
  • 8. Photofeed tasks • Instagram source (in-production) • Referrals and invitations method • "New relic" integration • Photo clustering and summarization • Photo download service 
 (in-production) 8
  • 9. • Mail scrapping monitorization • Signature detection! • Identity analysis improvement • Tooling (in-production) Contactive tasks 9
  • 10. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant! • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 10
  • 11. GPI research assistant • Mediaeval 2013 (published paper) • ICMR SEWM (published paper) • Pyxel software framework • Mediaeval 2014 11 Multimedia retrieval conference GPI: Image and Video Processing Group
  • 12. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction! • Requirements • Design • Results 12
  • 13. Photo Clustering: Intro PhotoTOC [Platt et al, PACRIM 2003] State of the artEvent detection 13
  • 14. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements! • Design • Results 14
  • 15. Photo Clustering: Requirements • User data stored in Amazon cloud and MongoDB. • Low computing • Easily configurable using REST API • Event generation • Visual and metadata information available • F1 and NMI as evaluation metrics • 400k annotated photo dataset Mediaeval requirements Photofeed constrains 15
  • 16. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements • Design! • Results 16
  • 17. Design Hi, I’m John. Hi, I’m Emily. (a) Temporal sorting by each user independently 17
  • 18. Design (b) Temporal-based oversegmentation in mini-clusters PhotoTOC [Platt et al, PacRim 2003] 18
  • 19. Design (b) Temporal-based oversegmentation in mini-clusters, mean values modelization 19 Username= John T.taken= 2010-09-10 02:10:12 GPS= (42.1,-10) tags= live,stage,deerhunter Username= emily T.taken= 2010-12-13 02:11:10 GPS= (43,-8.40) tags= live,deerhunter Username= emily T.taken= 2010-12-13 03:11:10 GPS= (no data) tags= live,stones Username= emily T.taken= 2010-12-14 23:11:10 GPS= (43.2,-8.2) tags= sound, test
  • 20. Design (c) Sequential merging of mini-clusters ? t avg(·) avg(·) avg(·)avg(·) 20
  • 21. Design (c) Sequential merging of mini-clusters 21
  • 22. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering! • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 22
  • 23. Results F1 = 2 PR P + R UPC 3rd place of 12 teams!!! 23
  • 24. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction! • Requirements • Design • Results 24
  • 25. Mail signature detection: Intro • Email information extraction • SPAM detection • Low computation State of the artKEY TOPICS Learning to extract signature and reply lines from email [Vitor R. Carvalho and William W. Cohen, 2004 ] 25
  • 26. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements! • Design • Results 26
  • 27. Mail signature detection: Requirements • Mail scrapping service improvement • Pre-process the input to reduce the execution time • Adapt the mail scrapping service to Contactive product ? fewer information filter only signatures MongoDB entries User mailbox id 89012 name John Doe email j.doe@gmail.com linkedin Id 7788455367_e phone 789675463 27 Mail scrapping service
  • 28. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements • Design! • Results 28
  • 29. Design 2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender, such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In this paper we will call such lines reply lines. <other> From: wcohen@cs.cmu.edu <other> To: Vitor Carvalho <vitor@cs.cmu.edu> <other> Subject: Re: Did you try to compile javadoc recently? <other> Date: 25 Mar 2004 12:05:51 -0500 <other> <other> Try cvs update –dP, this removes files & directories that have been <other> deleted from cvs. <other> - W <other> <reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote: <reply> > I’ve just checked-out the baseline m3 code and <reply> > "Ant dist" is working fine, but "ant javadoc" is not. <reply> > Thanks <reply> > Vitor <other> <sig> ------------------------------------------------------------------ <sig> William W. Cohen “Would you drive a mime <sig> wcohen@cs.cmu.edu nuts if you played a <sig> http://www.wcohen.com blank audio tape <sig> Associate Research Professor full blast?” <sig> CALD, Carnegie-Mellon University - S. Wright Figure 1 - Excerpt from a labeled email message (a) Split the K last mail lines and retrieve the annotations Last K lines Ground truth annotations 29
  • 30. 2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender, such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In Lines N Feature Patterns (b) feature extraction 30 Design
  • 31. Design (c) SVM training and model generation nition and Corpus is the set of lines, usually in the end of a message, that contain information about the sender, e, affiliation, postal address, web address, email address, telephone number, etc. Quotes from reative ASCII drawings are often present in this block also. An example of a signature block lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 of text that were quoted from a preceding message (marked with the line label <reply>). In such lines reply lines. om: wcohen@cs.cmu.edu : Vitor Carvalho <vitor@cs.cmu.edu> 31 Feature matrix [KxN] Vector ground truth [K] + SVM training Model=
  • 32. Design (c) SVM training and model generation Model ● Other ● Reply ● Signature Lines Classes pre-process Features 32
  • 33. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection! • Conclusions • Introduction • Requirements • Design • Results 33
  • 34. Results F1 = 2 Precision · Recall Precision + Recall 34 With annotated dataset Without annotated dataset Manual evaluation Contactive user base mailboxes
  • 35. Outline • Motivation • Tasks summary • Pixable internship • GPI research assistant • Photo clustering • Mail signature detection • Conclusions • Introduction • Requirements • Design • Results 35
  • 36. Conclusions • Academic • Papers: Mediaeval 2013 and ICMR SEWM, and Mediaeval 2014 on preparation. • UPC Pyxel framework foundations • Industrial • Contributions to Pixable in production servers: • Instagram integration • Photofeed Downloader • Mail signature detection: Proof of concept successful. • Work in the USA! 36
  • 37. Thank you very much!! Q&A 37
  • 39. Design 39 (c) Sequential merging of mini-clusters Weighted modalities ● creation (or upload) time ● geolocation ● textual labels ● same user
  • 40. Design 40 (c) Sequential merging of mini-clusters Geolocation (d=haversine)Time stamp (d=L1) Text labels (d=Jaccard) Same user (d=boolean)
  • 42. Design 42 (c) Sequential merging of mini-clusters 42 Mean and std. deviation learned on pairs of photos within the same training event.
  • 43. Design 43 (c) Sequential merging of mini-clusters 43 phi function
  • 44. Design 44 (c) Sequential merging of mini-clusters decision threhold