SlideShare a Scribd company logo
1 of 18
Download to read offline
Semantic Search in E-Discovery
Research on the application of text mining and information retrieval
for fact finding in regulatory investigations

                                                                       David Graus
Who’s Involved?

     Prof. dr. Maarten de Rijke                   Dr. Hans Henseler
                                                  Lector E-Discovery, CREATE-IT applied
     Director Intelligent Systems Lab, UvA        research



     David Graus, MSc.                            David van Dijk, MSc.
     PhD Candidate, Semantic Search               Researcher E-Discovery, CREATE-IT
                                                  applied research
     in E-Discovery, UvA



     Zhaochun Ren, MSc.                           Menno Israël, MSc.
                                                  Teamleader Knowledge and Expertise
     PhD Candidate, Semantic Search               Centre for Intelligent Data Analysis
     in E-Discovery, UvA                          (Kecida), NFI




                                             Semantic search in e-discovery              2
Introduction

 £   Semantic Search in E-Discovery




                                Semantic search in e-discovery   3
What is

 £   Semantic Search in E-Discovery
      ˜   retrieving and securing digital forensic evidence




                                           Semantic search in e-discovery   4
What is

 £   Semantic Search in E-Discovery




                               Semantic search in e-discovery   5
What is

 £   Semantic Search in E-Discovery
      ˜ retrieving and securing digital forensic evidence
      ˜ from emails, forums, etc...




                                          Semantic search in e-discovery   6
What is

 £   Semantic Search in e-Discovery




                               Semantic search in e-discovery   7
Challenge

¢   Finding out who knew what, from whom, and when




                                Semantic search in e-discovery   8
Challenge

¢   Finding out who knew what, from whom, and when
¢   Generic search is not the answer




                                  Semantic search in e-discovery   9
Finding evidence for E-Discovery

¢   We don’t know what we’re looking for
¢   What we’re looking for might be deliberately hidden
¢   Communication might be very domain-specific,
     contextualized or incomplete




                                   Semantic search in e-discovery   10
Task

¢   Retrieve all relevant traces
¢   Highly iterative search process
¢   Support (re)formulating questions and hypotheses




                                       Semantic search in e-discovery   11
How do we approach this?

¢   Two subprojects:
     £   Information Retrieval
          ˜   Finding material of unstructured nature from large collections
     £   Information Extraction/Text Mining
          ˜   Discovering patterns in data




                                                    Semantic search in e-discovery   12
How do we approach this?

¢   Information Retrieval
     £   Integrating structure/context of data in retrieval models
          ˜   Capturing forum and email context
          ˜   Conversational search




                                                   Semantic search in e-discovery   13
How do we approach this?

¢   Information Extraction/Text Mining
     £   Extracting structured knowledge from user generated
          content
          ˜   Semantic pre-processing
          ˜   Social network inference
          ˜   Information maps




                                          Semantic search in e-discovery   14
How do we approach this?

¢   Information Retrieval <-> Information Extraction




                                   Semantic search in e-discovery   15
Current work (first steps)

¢   Information Retrieval
     £   Twitter Mining (as a form of conversational search)


¢   Information Extraction/Text Mining
     £   Entity linking (for semantic document enrichment)


¢   TREC/TAC benchmarking events
     £   TREC Legal Track 2011 (2013?)



                                            Semantic search in e-discovery   16
Contributions

¢   xTAS: Open source text analysis toolkit
¢   iColumbo: Internet monitoring framework
¢   Used by:
     £   Internet Recherche Netwerk
     £   Koninklijke Bibliotheek
     £   Beeld en Geluid
     £   ... You?




                                       Semantic search in e-discovery   17
Semantic search in E-discovery

¢   David Graus
¢   d.p.graus@uva.nl




                        Semantic search in e-discovery   18

More Related Content

Similar to Semantic Search in E-Discovery

20130318 社群網路與人工智慧
20130318 社群網路與人工智慧20130318 社群網路與人工智慧
20130318 社群網路與人工智慧
景淳 許
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...
Ciera Martinez
 

Similar to Semantic Search in E-Discovery (20)

Paul Henning Krogh A New Dawn For E Collaboration In Science
Paul Henning Krogh   A New Dawn For E Collaboration In SciencePaul Henning Krogh   A New Dawn For E Collaboration In Science
Paul Henning Krogh A New Dawn For E Collaboration In Science
 
01datamining.pdf
01datamining.pdf01datamining.pdf
01datamining.pdf
 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAP
 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basics
 
20130318 社群網路與人工智慧
20130318 社群網路與人工智慧20130318 社群網路與人工智慧
20130318 社群網路與人工智慧
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic Developer
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen Science
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Hello Open World - Semtech 2009
Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU Project
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU Project
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research funding
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Parcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdfParcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdf
 
Parcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdfParcos_Data Explorer v2.pdf
Parcos_Data Explorer v2.pdf
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Search and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and LearningSearch and Browsing Cycle for Knowledge Discovery and Learning
Search and Browsing Cycle for Knowledge Discovery and Learning
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...
 

More from David Graus

Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus
 

More from David Graus (20)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 

Recently uploaded

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 

Semantic Search in E-Discovery

  • 1. Semantic Search in E-Discovery Research on the application of text mining and information retrieval for fact finding in regulatory investigations David Graus
  • 2. Who’s Involved? Prof. dr. Maarten de Rijke Dr. Hans Henseler Lector E-Discovery, CREATE-IT applied Director Intelligent Systems Lab, UvA research David Graus, MSc. David van Dijk, MSc. PhD Candidate, Semantic Search Researcher E-Discovery, CREATE-IT applied research in E-Discovery, UvA Zhaochun Ren, MSc. Menno Israël, MSc. Teamleader Knowledge and Expertise PhD Candidate, Semantic Search Centre for Intelligent Data Analysis in E-Discovery, UvA (Kecida), NFI Semantic search in e-discovery 2
  • 3. Introduction £ Semantic Search in E-Discovery Semantic search in e-discovery 3
  • 4. What is £ Semantic Search in E-Discovery ˜ retrieving and securing digital forensic evidence Semantic search in e-discovery 4
  • 5. What is £ Semantic Search in E-Discovery Semantic search in e-discovery 5
  • 6. What is £ Semantic Search in E-Discovery ˜ retrieving and securing digital forensic evidence ˜ from emails, forums, etc... Semantic search in e-discovery 6
  • 7. What is £ Semantic Search in e-Discovery Semantic search in e-discovery 7
  • 8. Challenge ¢ Finding out who knew what, from whom, and when Semantic search in e-discovery 8
  • 9. Challenge ¢ Finding out who knew what, from whom, and when ¢ Generic search is not the answer Semantic search in e-discovery 9
  • 10. Finding evidence for E-Discovery ¢ We don’t know what we’re looking for ¢ What we’re looking for might be deliberately hidden ¢ Communication might be very domain-specific, contextualized or incomplete Semantic search in e-discovery 10
  • 11. Task ¢ Retrieve all relevant traces ¢ Highly iterative search process ¢ Support (re)formulating questions and hypotheses Semantic search in e-discovery 11
  • 12. How do we approach this? ¢ Two subprojects: £ Information Retrieval ˜ Finding material of unstructured nature from large collections £ Information Extraction/Text Mining ˜ Discovering patterns in data Semantic search in e-discovery 12
  • 13. How do we approach this? ¢ Information Retrieval £ Integrating structure/context of data in retrieval models ˜ Capturing forum and email context ˜ Conversational search Semantic search in e-discovery 13
  • 14. How do we approach this? ¢ Information Extraction/Text Mining £ Extracting structured knowledge from user generated content ˜ Semantic pre-processing ˜ Social network inference ˜ Information maps Semantic search in e-discovery 14
  • 15. How do we approach this? ¢ Information Retrieval <-> Information Extraction Semantic search in e-discovery 15
  • 16. Current work (first steps) ¢ Information Retrieval £ Twitter Mining (as a form of conversational search) ¢ Information Extraction/Text Mining £ Entity linking (for semantic document enrichment) ¢ TREC/TAC benchmarking events £ TREC Legal Track 2011 (2013?) Semantic search in e-discovery 16
  • 17. Contributions ¢ xTAS: Open source text analysis toolkit ¢ iColumbo: Internet monitoring framework ¢ Used by: £ Internet Recherche Netwerk £ Koninklijke Bibliotheek £ Beeld en Geluid £ ... You? Semantic search in e-discovery 17
  • 18. Semantic search in E-discovery ¢ David Graus ¢ d.p.graus@uva.nl Semantic search in e-discovery 18