SlideShare a Scribd company logo
1 of 45
Download to read offline
Understanding email traffic 
David Graus, University of Amsterdam 
d.p.graus@uva.nl 
@dvdgrs
Dec. 12, 2014 - Frontiers of Forensic Science 2 
Some background… 
• PhD candidate at ILPS 
• Information Extraction & Retrieval 
• Project in NWO’s Forensic Science program 
• Semantic Search in E-Discovery
Dec. 12, 2014 - Frontiers of Forensic Science 3 
Some background… 
• PhD candidate at ILPS 
• Information Extraction & Retrieval 
• Project in NWO’s Forensic Science program 
• Semantic Search in E-Discovery
Dec. 12, 2014 - Frontiers of Forensic Science 4 
Information Retrieval?
Dec. 12, 2014 - Frontiers of Forensic Science 5 
Information Retrieval? 
Ò Finding material of unstructured nature from 
large collections
Dec. 12, 2014 - Frontiers of Forensic Science 6 
Information Extraction? 
Ò Text mining 
Ò Discovering patterns in text data
Semantic Search in E-Discovery? 
Dec. 12, 2014 - Frontiers of Forensic Science 7
Dec. 12, 2014 - Frontiers of Forensic Science 8 
Semantic Search?
Dec. 12, 2014 - Frontiers of Forensic Science 9 
E-Discovery? 
• Retrieving and securing digital forensic 
evidence
Dec. 12, 2014 - Frontiers of Forensic Science 10 
E-Discovery 
⬜ Semantic Search in E-Discovery
Semantic Search in E-Discovery 
• Supporting search for digital forensic evidence 
• from emails, hard drives, mobile phones, etc… 
• not the open web 
Dec. 12, 2014 - Frontiers of Forensic Science 11 
• (Google won’t help us here)
Dec. 12, 2014 - Frontiers of Forensic Science 12 
Search in E-Discovery 
¢ Finding out who knew what, from whom, and when 
¢We don’t know what we’re looking for 
¢ What we’re looking for might be deliberately hidden 
¢ Communication might be very domain-specific, 
contextualized or incomplete
Dec. 12, 2014 - Frontiers of Forensic Science 13 
Approach 
¢ Generic search is not the answer 
¢ Google: high precision search 
¢ E-Discovery: high recall & exploratory search
Dec. 12, 2014 - Frontiers of Forensic Science 14 
Tasks 
¢ Support iterative search 
¢ Support (re)formulating questions and hypotheses 
¢ Retrieve all relevant traces
Dec. 12, 2014 - Frontiers of Forensic Science 15
Dec. 12, 2014 - Frontiers of Forensic Science 16
Recipient recommendation 
Ò Given a sender, an email, all possible 
recipients (in an enterprise); 
Ò Predict which recipient(s) are most likely to 
receive the email 
Dec. 12, 2014 - Frontiers of Forensic Science 17
Dec. 12, 2014 - Frontiers of Forensic Science 18 
Why? 
Ò Understanding communication in/structure of an 
enterprise 
Ò Finding “unexpected” communication 
Ò Applications in: 
Ò enterprise search 
Ò expert finding 
Ò community detection 
Ò spam classification 
Ò anomaly detection
Dec. 12, 2014 - Frontiers of Forensic Science 19 
How? 
Ò Gmail 
Ò Who do you frequently “co-address” 
Ò egonetwork 
Ò Related work 
Ò Social Network Analysis (SNA) 
Ò Email content 
Ò Us 
Ò SNA + email content
Part 1: Social Network Analysis? 
d.p.graus@uva.nl z.ren@uva.nl 
derijke@uva.nl 
Dec. 12, 2014 - Frontiers of Forensic Science 20
Dec. 12, 2014 - Frontiers of Forensic Science 21 
image by Calvinius - Creative Commons Attribution-Share Alike 3.0
SNA for predicting recipients? 
1. Importance of a node in the network 
Prior probability 
More important people are more likely to be recipients 
of an(y) email 
2. Connection strength between two nodes 
Conditional probability 
Given the sender, the recipients who are strongly 
associated are more likely to be the recipient 
Dec. 12, 2014 - Frontiers of Forensic Science 22
Dec. 12, 2014 - Frontiers of Forensic Science 23 
Part 2: Email content 
Ò Statistical Language Models (LMs) 
Ò Assign a probability to [a sequence of] words; 
Ò By counting words 
Ò Used in lots of places; 
Ò Web Search 
Ò Machine Translation 
Ò Speech Recognition
Dec. 12, 2014 - Frontiers of Forensic Science 24 
Language Models 
Ò Language models as communication “profiles”
Dec. 12, 2014 - Frontiers of Forensic Science 25 
Language Models 
Ò Language models as communication “profiles” 
1. Incoming LM (how people talk to user)
Dec. 12, 2014 - Frontiers of Forensic Science 26 
Language Models 
Ò Language models as communication “profiles” 
1. Incoming LM (how people talk to user) 
2. Outgoing LM (how user talks to people)
Dec. 12, 2014 - Frontiers of Forensic Science 27 
Language Models 
Ò Language models as communication “profiles” 
1. Incoming LM (how people talk to user) 
2. Outgoing LM (how user talks to people) 
3. Interpersonal LM (how node1 
talks with node2)
Dec. 12, 2014 - Frontiers of Forensic Science 28 
Language Models 
Ò Language models as communication “profiles” 
1. Incoming LM (how people talk to user) 
2. Outgoing LM (how user talks to people) 
3. Interpersonal LM (how node1 
talks with node2)
Dec. 12, 2014 - Frontiers of Forensic Science 29 
Language Models 
Ò Language models as communication “profiles” 
1. Incoming LM (how people talk to user) 
2. Outgoing LM (how user talks to people) 
3. Interpersonal LM (how node1 
talks with node2) 
4. Corpus LM (how everyone 
talks)
Dec. 12, 2014 - Frontiers of Forensic Science 30 
Why language models? 
Ò Comparisons between communication profiles: 
Ò Find nodes with most similar communication
Dec. 12, 2014 - Frontiers of Forensic Science 31 
Model 
Ò Given sender and email, predict recipients 
Ò Ranking function:
Email likelihood 
Estimate using language modeling 
Sender likelihood 
using SNA to estimate closeness of R and S 
Recipient likelihood 
using SNA to estimate importance of R 
Dec. 12, 2014 - Frontiers of Forensic Science 32
Dec. 12, 2014 - Frontiers of Forensic Science 33 
Email likelihood
Dec. 12, 2014 - Frontiers of Forensic Science 34 
Email likelihood 
P(word|R,S) P(word|R) P(word)
Recipient Likelihood 
P(R) P(R) 
P(S|R) 
Dec. 12, 2014 - Frontiers of Forensic Science 35 
Strength of connection 
between two nodes 
1. Number of emails sent 
between nodes 
2. Number of times two nodes 
are addressed together 
Importance of node 
1. Number of emails received 
2. PageRank score 
Sender Likelihood 
P(S|R)
Dec. 12, 2014 - Frontiers of Forensic Science 36 
SNA 
1. Importance of a node 
in the network 
2. Strength of 
connection between 
nodes 
Email Content 
1. Interpersonal LM 
2. Recipient LM 
3. Corpus LM
Dec. 12, 2014 - Frontiers of Forensic Science 37 
Approach: time-based 
time 
Training period: build models (SNA + LM) 
Testing period: predict recipients
Testing period: predict recipients 
Dec. 12, 2014 - Frontiers of Forensic Science 38 
Testing 
Ò Remove recipients from email 
Ò Rank all nodes in the network, by computing: 
1. P(E|R,S): Similarity between sender and 
candidate LMs 
2. P(S|R): Strength of connection between 
sender and candidate 
3. P(R): Importance of candidate
Dec. 12, 2014 - Frontiers of Forensic Science 39
Dec. 12, 2014 - Frontiers of Forensic Science 40 
Findings: What works? 
Ò Importance of node: 
Number of received emails of node 
Pagerank 
Ò Strength of connection: 
Number of emails between nodes 
Number of times co-addressed 
Ò LM Similarity: 
Interpersonal LM is most important (60%-20%-20%)
Analysis: SNA vs email content 
Dec. 12, 2014 - Frontiers of Forensic Science 41 
Ò SNA: 
Ò SNA signals deteriorate over time 
Ò SNA signals are most informative on highly 
active users 
Ò Email content: 
Ò LM signal improves over time 
Ò LM signal does worse with highly active users
Dec. 12, 2014 - Frontiers of Forensic Science 42 
Finally 
Ò Combining Social Network Analysis with 
Language Modeling is better than doing either.
Dec. 12, 2014 - Frontiers of Forensic Science 43 
Future work 
Ò Consider structure of network in more detail 
Ò Departments? 
Ò Friends/family? 
Ò Include ‘time decay’ 
Ò Dynamically weight LM/SNA?
Applications in E-Discovery/Digital Forensics 
Dec. 12, 2014 - Frontiers of Forensic Science 44 
Ò Anomaly detection 
Ò Given a working prediction model; identify 
“unexpected” communication 
Ò Language models for communication 
Ò For a node, find the most different 
interpersonal communication 
Ò Friends/family vs colleagues? 
Ò Find communication that differs from the 
corpus-based communication
Dec. 12, 2014 - Frontiers of Forensic Science 45 
Fin 
Ò Questions?

More Related Content

Similar to Understanding Email Traffic

Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Julien PLU
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus
 
Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...
Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...
Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...
Digitised Manuscripts to Europeana
 

Similar to Understanding Email Traffic (20)

Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
From Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & ProvenanceFrom Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & Provenance
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
04 pisa final_event_111214_wp1_dg
04 pisa final_event_111214_wp1_dg04 pisa final_event_111214_wp1_dg
04 pisa final_event_111214_wp1_dg
 
Genericity versus expressivity – reflections about the semantics of interoper...
Genericity versus expressivity – reflections about the semantics of interoper...Genericity versus expressivity – reflections about the semantics of interoper...
Genericity versus expressivity – reflections about the semantics of interoper...
 
Project Credit: Clifford Lynch - Developing a contributor role taxonomy for s...
Project Credit: Clifford Lynch - Developing a contributor role taxonomy for s...Project Credit: Clifford Lynch - Developing a contributor role taxonomy for s...
Project Credit: Clifford Lynch - Developing a contributor role taxonomy for s...
 
Linked Data for Knowledge Discovery: Introduction
Linked Data for Knowledge Discovery: IntroductionLinked Data for Knowledge Discovery: Introduction
Linked Data for Knowledge Discovery: Introduction
 
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
 
Dm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_novDm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_nov
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...
Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...
Beyond Infrastructure - Stefan Gradmann (Leipzig Digital Humanities Seminar, ...
 
Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
Natural Language Processing with Graphs
Natural Language Processing with GraphsNatural Language Processing with Graphs
Natural Language Processing with Graphs
 
08b final event_experimente
08b final event_experimente08b final event_experimente
08b final event_experimente
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1
 
Graph Query Languages: update from LDBC
Graph Query Languages: update from LDBCGraph Query Languages: update from LDBC
Graph Query Languages: update from LDBC
 
Visual Resources Librarianship and Information Literacy: using the Metalitera...
Visual Resources Librarianship and Information Literacy: using the Metalitera...Visual Resources Librarianship and Information Literacy: using the Metalitera...
Visual Resources Librarianship and Information Literacy: using the Metalitera...
 

More from David Graus

Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus
 

More from David Graus (20)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Understanding Email Traffic

  • 1. Understanding email traffic David Graus, University of Amsterdam d.p.graus@uva.nl @dvdgrs
  • 2. Dec. 12, 2014 - Frontiers of Forensic Science 2 Some background… • PhD candidate at ILPS • Information Extraction & Retrieval • Project in NWO’s Forensic Science program • Semantic Search in E-Discovery
  • 3. Dec. 12, 2014 - Frontiers of Forensic Science 3 Some background… • PhD candidate at ILPS • Information Extraction & Retrieval • Project in NWO’s Forensic Science program • Semantic Search in E-Discovery
  • 4. Dec. 12, 2014 - Frontiers of Forensic Science 4 Information Retrieval?
  • 5. Dec. 12, 2014 - Frontiers of Forensic Science 5 Information Retrieval? Ò Finding material of unstructured nature from large collections
  • 6. Dec. 12, 2014 - Frontiers of Forensic Science 6 Information Extraction? Ò Text mining Ò Discovering patterns in text data
  • 7. Semantic Search in E-Discovery? Dec. 12, 2014 - Frontiers of Forensic Science 7
  • 8. Dec. 12, 2014 - Frontiers of Forensic Science 8 Semantic Search?
  • 9. Dec. 12, 2014 - Frontiers of Forensic Science 9 E-Discovery? • Retrieving and securing digital forensic evidence
  • 10. Dec. 12, 2014 - Frontiers of Forensic Science 10 E-Discovery ⬜ Semantic Search in E-Discovery
  • 11. Semantic Search in E-Discovery • Supporting search for digital forensic evidence • from emails, hard drives, mobile phones, etc… • not the open web Dec. 12, 2014 - Frontiers of Forensic Science 11 • (Google won’t help us here)
  • 12. Dec. 12, 2014 - Frontiers of Forensic Science 12 Search in E-Discovery ¢ Finding out who knew what, from whom, and when ¢We don’t know what we’re looking for ¢ What we’re looking for might be deliberately hidden ¢ Communication might be very domain-specific, contextualized or incomplete
  • 13. Dec. 12, 2014 - Frontiers of Forensic Science 13 Approach ¢ Generic search is not the answer ¢ Google: high precision search ¢ E-Discovery: high recall & exploratory search
  • 14. Dec. 12, 2014 - Frontiers of Forensic Science 14 Tasks ¢ Support iterative search ¢ Support (re)formulating questions and hypotheses ¢ Retrieve all relevant traces
  • 15. Dec. 12, 2014 - Frontiers of Forensic Science 15
  • 16. Dec. 12, 2014 - Frontiers of Forensic Science 16
  • 17. Recipient recommendation Ò Given a sender, an email, all possible recipients (in an enterprise); Ò Predict which recipient(s) are most likely to receive the email Dec. 12, 2014 - Frontiers of Forensic Science 17
  • 18. Dec. 12, 2014 - Frontiers of Forensic Science 18 Why? Ò Understanding communication in/structure of an enterprise Ò Finding “unexpected” communication Ò Applications in: Ò enterprise search Ò expert finding Ò community detection Ò spam classification Ò anomaly detection
  • 19. Dec. 12, 2014 - Frontiers of Forensic Science 19 How? Ò Gmail Ò Who do you frequently “co-address” Ò egonetwork Ò Related work Ò Social Network Analysis (SNA) Ò Email content Ò Us Ò SNA + email content
  • 20. Part 1: Social Network Analysis? d.p.graus@uva.nl z.ren@uva.nl derijke@uva.nl Dec. 12, 2014 - Frontiers of Forensic Science 20
  • 21. Dec. 12, 2014 - Frontiers of Forensic Science 21 image by Calvinius - Creative Commons Attribution-Share Alike 3.0
  • 22. SNA for predicting recipients? 1. Importance of a node in the network Prior probability More important people are more likely to be recipients of an(y) email 2. Connection strength between two nodes Conditional probability Given the sender, the recipients who are strongly associated are more likely to be the recipient Dec. 12, 2014 - Frontiers of Forensic Science 22
  • 23. Dec. 12, 2014 - Frontiers of Forensic Science 23 Part 2: Email content Ò Statistical Language Models (LMs) Ò Assign a probability to [a sequence of] words; Ò By counting words Ò Used in lots of places; Ò Web Search Ò Machine Translation Ò Speech Recognition
  • 24. Dec. 12, 2014 - Frontiers of Forensic Science 24 Language Models Ò Language models as communication “profiles”
  • 25. Dec. 12, 2014 - Frontiers of Forensic Science 25 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user)
  • 26. Dec. 12, 2014 - Frontiers of Forensic Science 26 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people)
  • 27. Dec. 12, 2014 - Frontiers of Forensic Science 27 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 talks with node2)
  • 28. Dec. 12, 2014 - Frontiers of Forensic Science 28 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 talks with node2)
  • 29. Dec. 12, 2014 - Frontiers of Forensic Science 29 Language Models Ò Language models as communication “profiles” 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 talks with node2) 4. Corpus LM (how everyone talks)
  • 30. Dec. 12, 2014 - Frontiers of Forensic Science 30 Why language models? Ò Comparisons between communication profiles: Ò Find nodes with most similar communication
  • 31. Dec. 12, 2014 - Frontiers of Forensic Science 31 Model Ò Given sender and email, predict recipients Ò Ranking function:
  • 32. Email likelihood Estimate using language modeling Sender likelihood using SNA to estimate closeness of R and S Recipient likelihood using SNA to estimate importance of R Dec. 12, 2014 - Frontiers of Forensic Science 32
  • 33. Dec. 12, 2014 - Frontiers of Forensic Science 33 Email likelihood
  • 34. Dec. 12, 2014 - Frontiers of Forensic Science 34 Email likelihood P(word|R,S) P(word|R) P(word)
  • 35. Recipient Likelihood P(R) P(R) P(S|R) Dec. 12, 2014 - Frontiers of Forensic Science 35 Strength of connection between two nodes 1. Number of emails sent between nodes 2. Number of times two nodes are addressed together Importance of node 1. Number of emails received 2. PageRank score Sender Likelihood P(S|R)
  • 36. Dec. 12, 2014 - Frontiers of Forensic Science 36 SNA 1. Importance of a node in the network 2. Strength of connection between nodes Email Content 1. Interpersonal LM 2. Recipient LM 3. Corpus LM
  • 37. Dec. 12, 2014 - Frontiers of Forensic Science 37 Approach: time-based time Training period: build models (SNA + LM) Testing period: predict recipients
  • 38. Testing period: predict recipients Dec. 12, 2014 - Frontiers of Forensic Science 38 Testing Ò Remove recipients from email Ò Rank all nodes in the network, by computing: 1. P(E|R,S): Similarity between sender and candidate LMs 2. P(S|R): Strength of connection between sender and candidate 3. P(R): Importance of candidate
  • 39. Dec. 12, 2014 - Frontiers of Forensic Science 39
  • 40. Dec. 12, 2014 - Frontiers of Forensic Science 40 Findings: What works? Ò Importance of node: Number of received emails of node Pagerank Ò Strength of connection: Number of emails between nodes Number of times co-addressed Ò LM Similarity: Interpersonal LM is most important (60%-20%-20%)
  • 41. Analysis: SNA vs email content Dec. 12, 2014 - Frontiers of Forensic Science 41 Ò SNA: Ò SNA signals deteriorate over time Ò SNA signals are most informative on highly active users Ò Email content: Ò LM signal improves over time Ò LM signal does worse with highly active users
  • 42. Dec. 12, 2014 - Frontiers of Forensic Science 42 Finally Ò Combining Social Network Analysis with Language Modeling is better than doing either.
  • 43. Dec. 12, 2014 - Frontiers of Forensic Science 43 Future work Ò Consider structure of network in more detail Ò Departments? Ò Friends/family? Ò Include ‘time decay’ Ò Dynamically weight LM/SNA?
  • 44. Applications in E-Discovery/Digital Forensics Dec. 12, 2014 - Frontiers of Forensic Science 44 Ò Anomaly detection Ò Given a working prediction model; identify “unexpected” communication Ò Language models for communication Ò For a node, find the most different interpersonal communication Ò Friends/family vs colleagues? Ò Find communication that differs from the corpus-based communication
  • 45. Dec. 12, 2014 - Frontiers of Forensic Science 45 Fin Ò Questions?