SlideShare a Scribd company logo
1 of 20
Text Mining lecture
Information Retrieval
Prof.dr.ir. Arjen P. de Vries
arjen@acm.org
Nijmegen, October 18th
, 2017
A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc
Core Research Questions
 How to represent information?
- The information need and search requests
- The objects to be shown in response to an information request
 How to match information representations?
The information objects
to be retrieved
are not necessarily
textual!
Van Rijsbergen, 1979
Two views on ‘search’
DB
 Business applications
 Deductive reasoning
 Precise and efficient
query processing
 Users with technical skills
(SQL) and precise
information needs
Selection
Books where category=‘CS’
IR
 Digital libraries, patent
collections, etc.
 Inductive reasoning
 Best-effort processing
 Untrained users with
imprecise information
needs
Ranking
Books about CS
Note: SemWeb more DB than IR!!!
Symbolic Connectionist
Search Flow Chart
A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc 5
IR vs. AI
 Many related topics in AI:
- Computational Linguistics
- Natural Language Processing
- Question Answering
- Information Extraction
- Machine Translation
- Computer vision / Multimedia
vs.
 Information Retrieval?
IR vs. AI (Kunstmatige Intelligentie)
“In some sense, of course, classic IR is superhuman: there was
no pre-existing human skill, as there was with seeing, talking or
even chess playing that corresponded to the search through
millions of words of text on the basis of indices. But if one took
the view, by contrast, that theologians, lawyers and, later, literary
scholars were able, albeit slowly, to search vast libraries of
sources for relevant material, then on that view IR is just the
optimisation of a human skill and not a superhuman activity. If
one takes that view, IR is a proper part of AI, as traditionally
conceived.”
Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR
An “Essay in honour of Karen Spärck Jones”, 2006
IR vs. AI
“In some sense, of course, classic IR is superhuman: there was
no pre-existing human skill, as there was with seeing, talking or
even chess playing that corresponded to the search through
millions of words of text on the basis of indices. But if one took
the view, by contrast, that theologians, lawyers and, later, literary
scholars were able, albeit slowly, to search vast libraries of
sources for relevant material, then on that view IR is just the
optimisation of a human skill and not a superhuman activity. If
one takes that view, IR is a proper part of AI, as traditionally
conceived.”
Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR
An “Essay in honour of Karen Spärck Jones”, 2006
IR vs. AI
“In some sense, of course, classic IR is superhuman: there was
no pre-existing human skill, as there was with seeing, talking or
even chess playing that corresponded to the search through
millions of words of text on the basis of indices. But if one took
the view, by contrast, that theologians, lawyers and, later, literary
scholars were able, albeit slowly, to search vast libraries of
sources for relevant material, then on that view IR is just the
optimisation of a human skill and not a superhuman activity. If
one takes that view, IR is a proper part of AI, as traditionally
conceived.”
Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR
An “Essay in honour of Karen Spärck Jones”, 2006
Relevance
 Inherently dependent on user, context and task
 Different “relevance criteria”
- Topicality: is the document about the information request?
- Readability: can I understand the text?
- Authoritiveness: can I trust the text?
- Child-suitability: is the text appropriate for children?
- Etc.
“Computational Relevance”
“Intellectually it is possible for a human to establish the
relevance of a document to a query. For a computer to do
this we need to construct a model within which
relevance decisions can be quantified. It is interesting to
note that most research in information retrieval can be
shown to have been concerned with different aspects of
such a model.”
Van Rijsbergen, 1976
Retrieval Model
‘Computational Relevance’
 How to combine different
indicators of relevance?
- E.g., topicality, child-
suitability, polarity, …
 Apply ‘copulas’ (a
technique from
econometrics) to model
non-linear dependencies
(SIGIR 2013, CIKM 2014)
Relevance
 Various aspects of understanding this notion of relevance
position information retrieval between computer science
and information science
 Examples of questions that traditionally do not even
presume involvement of a computer:
- What makes an information object relevant?
- What stages constitute a search process?
- How does relevance evolve during this search process?
- How do users learn from the search process?
- Why do users issue short queries even if we know that long
ones are more effective?
Etc.
NLP in IR
 Stemming & Stopping
- De facto default setting
 N-grams (bi-grams)
- SDM (Sequential Dependence Model)
 Entity tagging
Footnote in Victor Lavrenko’s PhD thesis
 “It is my personal observation that almost every
mathematically inclined graduate student in Information
Retrieval attempts to formulate some sort of a non-
independent model of IR within the first two or three years
of his studies. The vast majority of these attempts yield no
improvements and remain unpublished.”
Take words as
they stand !
The Secret
 The user can simply reformulate their information need in
response to insufficiently relevant results retrieved by the
system!
Why Search Remains Difficult to Get Right
 Heterogeneous data sources
- WWW, wikipedia, news, e-mail, patents, twitter, personal
information, …
 Varying result types
- “Documents”, tweets, courses, people, experts, gene
expressions, temperatures, …
 Multiple dimensions of relevance
- Topicality, recency, reading level, …
Actual information needs often require a mix within
and across dimensions. E.g., “recent news and
patents from our top competitors”
 System’s internal information representation
- Linguistic annotations
- Named entities, sentiment, dependencies, …
- Knowledge resources
- Wikipedia, Freebase, IDC9, IPTC, …
- Links to related documents
- Citations, urls
 Anchors that describe the URI
- Anchor text
 Queries that lead to clicks on the URI
- Session, user, dwell-time, …
 Tweets that mention the URI
- Time, location, user, …
 Other social media that describe the URI
- User, rating
- Tag, organisation of `folksonomy’
+ UNCERTAINTY ALL OVER!

More Related Content

Similar to Information Retrieval intro TMM

Information Seeking Information Literacy
Information Seeking  Information LiteracyInformation Seeking  Information Literacy
Information Seeking Information LiteracyJohan Koren
 
Information seeking
Information seekingInformation seeking
Information seekingJohan Koren
 
Casda 2013 n on-fiction current events
Casda 2013   n on-fiction current eventsCasda 2013   n on-fiction current events
Casda 2013 n on-fiction current eventsPaige Jaeger
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'mahmad
 
Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)Bradley Allen
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSilvia Puglisi
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.docbutest
 
Thinking about technology .... differently
Thinking about technology .... differentlyThinking about technology .... differently
Thinking about technology .... differentlylisld
 
Bioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalBioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalDr. Rupak Chakravarty
 
LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013PrattSILS
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework Neuroscience Information Framework
 
Whats Wrong With Online Reading
Whats Wrong With Online ReadingWhats Wrong With Online Reading
Whats Wrong With Online ReadingRandy Connolly
 
2014 Cornell University - Repackaging Research
2014   Cornell University - Repackaging Research 2014   Cornell University - Repackaging Research
2014 Cornell University - Repackaging Research Paige Jaeger
 

Similar to Information Retrieval intro TMM (20)

Information Seeking Information Literacy
Information Seeking  Information LiteracyInformation Seeking  Information Literacy
Information Seeking Information Literacy
 
Information seeking
Information seekingInformation seeking
Information seeking
 
Casda 2013 n on-fiction current events
Casda 2013   n on-fiction current eventsCasda 2013   n on-fiction current events
Casda 2013 n on-fiction current events
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'
 
Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Internet Research Ethics and IRBs by Elizabeth Buchanan
Internet Research Ethics and IRBs by Elizabeth BuchananInternet Research Ethics and IRBs by Elizabeth Buchanan
Internet Research Ethics and IRBs by Elizabeth Buchanan
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
Ir 01
Ir   01Ir   01
Ir 01
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.doc
 
Thinking about technology .... differently
Thinking about technology .... differentlyThinking about technology .... differently
Thinking about technology .... differently
 
Bioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalBioinformatioc: Information Retrieval
Bioinformatioc: Information Retrieval
 
LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013
 
Oss swot
Oss swotOss swot
Oss swot
 
Week12
Week12Week12
Week12
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Whats Wrong With Online Reading
Whats Wrong With Online ReadingWhats Wrong With Online Reading
Whats Wrong With Online Reading
 
2014 Cornell University - Repackaging Research
2014   Cornell University - Repackaging Research 2014   Cornell University - Repackaging Research
2014 Cornell University - Repackaging Research
 

More from Arjen de Vries

Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social MediaArjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsArjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part IIArjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelArjen de Vries
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeArjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Arjen de Vries
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaArjen de Vries
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseArjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by StrategyArjen de Vries
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?Arjen de Vries
 

More from Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 

Recently uploaded (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 

Information Retrieval intro TMM

  • 1. Text Mining lecture Information Retrieval Prof.dr.ir. Arjen P. de Vries arjen@acm.org Nijmegen, October 18th , 2017
  • 2. A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc
  • 3. Core Research Questions  How to represent information? - The information need and search requests - The objects to be shown in response to an information request  How to match information representations? The information objects to be retrieved are not necessarily textual! Van Rijsbergen, 1979
  • 4. Two views on ‘search’ DB  Business applications  Deductive reasoning  Precise and efficient query processing  Users with technical skills (SQL) and precise information needs Selection Books where category=‘CS’ IR  Digital libraries, patent collections, etc.  Inductive reasoning  Best-effort processing  Untrained users with imprecise information needs Ranking Books about CS Note: SemWeb more DB than IR!!! Symbolic Connectionist
  • 5. Search Flow Chart A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc 5
  • 6. IR vs. AI  Many related topics in AI: - Computational Linguistics - Natural Language Processing - Question Answering - Information Extraction - Machine Translation - Computer vision / Multimedia vs.  Information Retrieval?
  • 7. IR vs. AI (Kunstmatige Intelligentie) “In some sense, of course, classic IR is superhuman: there was no pre-existing human skill, as there was with seeing, talking or even chess playing that corresponded to the search through millions of words of text on the basis of indices. But if one took the view, by contrast, that theologians, lawyers and, later, literary scholars were able, albeit slowly, to search vast libraries of sources for relevant material, then on that view IR is just the optimisation of a human skill and not a superhuman activity. If one takes that view, IR is a proper part of AI, as traditionally conceived.” Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR An “Essay in honour of Karen Spärck Jones”, 2006
  • 8. IR vs. AI “In some sense, of course, classic IR is superhuman: there was no pre-existing human skill, as there was with seeing, talking or even chess playing that corresponded to the search through millions of words of text on the basis of indices. But if one took the view, by contrast, that theologians, lawyers and, later, literary scholars were able, albeit slowly, to search vast libraries of sources for relevant material, then on that view IR is just the optimisation of a human skill and not a superhuman activity. If one takes that view, IR is a proper part of AI, as traditionally conceived.” Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR An “Essay in honour of Karen Spärck Jones”, 2006
  • 9. IR vs. AI “In some sense, of course, classic IR is superhuman: there was no pre-existing human skill, as there was with seeing, talking or even chess playing that corresponded to the search through millions of words of text on the basis of indices. But if one took the view, by contrast, that theologians, lawyers and, later, literary scholars were able, albeit slowly, to search vast libraries of sources for relevant material, then on that view IR is just the optimisation of a human skill and not a superhuman activity. If one takes that view, IR is a proper part of AI, as traditionally conceived.” Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR An “Essay in honour of Karen Spärck Jones”, 2006
  • 10. Relevance  Inherently dependent on user, context and task  Different “relevance criteria” - Topicality: is the document about the information request? - Readability: can I understand the text? - Authoritiveness: can I trust the text? - Child-suitability: is the text appropriate for children? - Etc.
  • 11. “Computational Relevance” “Intellectually it is possible for a human to establish the relevance of a document to a query. For a computer to do this we need to construct a model within which relevance decisions can be quantified. It is interesting to note that most research in information retrieval can be shown to have been concerned with different aspects of such a model.” Van Rijsbergen, 1976 Retrieval Model
  • 12. ‘Computational Relevance’  How to combine different indicators of relevance? - E.g., topicality, child- suitability, polarity, …  Apply ‘copulas’ (a technique from econometrics) to model non-linear dependencies (SIGIR 2013, CIKM 2014)
  • 13. Relevance  Various aspects of understanding this notion of relevance position information retrieval between computer science and information science  Examples of questions that traditionally do not even presume involvement of a computer: - What makes an information object relevant? - What stages constitute a search process? - How does relevance evolve during this search process? - How do users learn from the search process? - Why do users issue short queries even if we know that long ones are more effective? Etc.
  • 14. NLP in IR  Stemming & Stopping - De facto default setting  N-grams (bi-grams) - SDM (Sequential Dependence Model)  Entity tagging
  • 15. Footnote in Victor Lavrenko’s PhD thesis  “It is my personal observation that almost every mathematically inclined graduate student in Information Retrieval attempts to formulate some sort of a non- independent model of IR within the first two or three years of his studies. The vast majority of these attempts yield no improvements and remain unpublished.”
  • 17.
  • 18. The Secret  The user can simply reformulate their information need in response to insufficiently relevant results retrieved by the system!
  • 19. Why Search Remains Difficult to Get Right  Heterogeneous data sources - WWW, wikipedia, news, e-mail, patents, twitter, personal information, …  Varying result types - “Documents”, tweets, courses, people, experts, gene expressions, temperatures, …  Multiple dimensions of relevance - Topicality, recency, reading level, … Actual information needs often require a mix within and across dimensions. E.g., “recent news and patents from our top competitors”
  • 20.  System’s internal information representation - Linguistic annotations - Named entities, sentiment, dependencies, … - Knowledge resources - Wikipedia, Freebase, IDC9, IPTC, … - Links to related documents - Citations, urls  Anchors that describe the URI - Anchor text  Queries that lead to clicks on the URI - Session, user, dwell-time, …  Tweets that mention the URI - Time, location, user, …  Other social media that describe the URI - User, rating - Tag, organisation of `folksonomy’ + UNCERTAINTY ALL OVER!

Editor's Notes

  1. The fundamental research questions are all about REPRESENTATION And MATCHING these representations. MOUSECLICK The long term research agenda is to unify two fundamentally different views on these problems: those from the database domain, and those from the information retrieval domain Fundamental, as the deductive approach of DB world is not that easily brought together with the inductive approach underlying IR.
  2. Some of research is really about the mathematical modelling, like our recent ACM SIGIR paper on MOUSECLICK deploying copulas - a mathematical approach first applied in economy to represent macro-economic process - to model MOUSECLICK the interactions between different types of relevance; here, topic relevance and subjectivity.