SlideShare a Scribd company logo
1 of 34
Web-Based Information Retrieval


       Patrick Alfred Waluchio Ongwen
       Knowledge Management Officer
African Research and Resource Forum
Web as Agent of Change
   "ICT is not an end in itself or an agent of
    change by itself but when incorporated
    into a well managed change process it is a
    powerful enabler and amplifier."
                                           Bryn Jones,
   Effective use of Web therefore, requires content,
    content management, hyperlinks and navigation
    tools, retrieval approaches, among others
Introduction
 A critical goal of successful information retrieval
  on the web is to identify which pages are of high
  quality and relevance to user’s query.
 Each search engine index web page,
  representing it by a set of weighted keywords
 A crawler (robot or spider) performs traversal of
  the web with goal of fetching high quality pages
  for indexing and retrieval.
Introduction
 Search engines must filter the most
 relevant information matching a user’s
 query and present retrieved information in
 a way a user will understand.
 Hyperlinks provide a valuable source of
  information for web retrieval.
 Hypertext and hypermedia enables searching
  the web in non-sequential manner
Challenges of Web Information
Retrieval
 Management    of huge amount of hyperlinked
  pages
 Crawling the web to find appropriate web
  sites to index
 Accessing documents

 Measuring the quality or authority of available
  information
Hypertext
   Non-Linear arrangements of textual material is called
    hypertext. The term hyper means extension to other
    dimensions.
   Converting text into a multidimensional space
   The term was invented by Ted Nelson in 1965
   Hypertext
       “non-sequential writing” Nelson, T. 1987. Literary
       Machines.
   Non-linear sequences of information (dictionary,
    encyclopaedia, newspaper)
   Hypertext are systems to manage collection of
    information that can be accessed non-sequentially.
Hypertext
 Consists of a network of nodes and logical links
  between nodes
 Node refer to chunks of content or web page
   The variety of nodes and links make hypertext a
    flexible structure in which information can be
    provided by what is stored in nodes and links to
    each node.
   Hypertext retrieval systems are the products of
    emerging technology that specifies alternative
    approach to the retrieval of information from web
Hypertext and Non-Linearity
 Allows a user to follow their own path in a
  non-sequential manner to access
  information
 Hypertext not usually read linearly
 Links encourage branching off
     History and back button permit backtracking
 The immediacy of following links by clicking
  creates a different experience from traditional
  non-linearity
Structure of Hypertext
   In a hypertext system objects in a database called
    nodes, are connected to one another by machine-
    supported links. Users follow links to access
    information.
   Text augmented with links
       Link: pointer to another piece of text in same or different
        document
       Hypertext systems can accessed by selecting link to icons
        and following links from node to node
       By searching the database for some key word in the
        normal way
       Use of browser to view and navigate hypertext
Hierarchical Structure
Hierarchical Structure
 Hierarchy  is the basis of almost all websites
  as well as hypertext
 They are orderly and provide ample
  navigational freedom
 Users start at the home page, descend the
  branch that most interest them, and continue
  making further choices as the branch divides
Web-like Structures
Web-like structures
 Relatively   unsystematic and difficult to
  navigate
 Mostly used in works of short stories and
  fiction in which artistic considerations may
  override desire for efficient navigation
Multipath Structures
Multipath Structures
 Largely linear and to some extent hierarchical
 but offers alternative pathways hence
 multipath structures
Hypertext Component
Hypertext model:
 The run-time layer, which controls the user
  interface
 The storage layer, which is a database
  containing a network of nodes connected by
  links
 The within-component layer, which is the
  content structure inside the node.
Hypermedia
(Hypermedia = Hypertext + Multimedia)
 Hypermedia integrate text, images, video,
  graphics, sound within Web page or node
 Hyper- representation of textual and non-
  textual information in a non-sequential
  manner.
 Allows embedding bitmapped images (GIF,
  JPEG, PNG)
History of Hypertext
 1945:  Vannevar Bush describes “memex”
  (Atlantic Monthly)
 1965: Ted Nelson coins the term “hypertext”

 1985: Peter Brown, University of Kent,
  develops first commercially available
  hypertext - Guide
 1986-1990: More sophisticated hypertext
  systems developed
History …
 1991:  Tim Berners-Lee builds IP-based
  distributed hypertext system at CERN
  Develops UDI/URI, HTTP, and HTML…
 1993: Mosaic, first graphical Web browser,
  released
 2002: Work begins on Semantic Web
Hypertext and Hypermedia- In
Information retrieval
 Browsing – retrieve information by
 association
    Follow links, backtrack
    Maintain history, bookmarks
 Searching   – retrieve information by content
    Construct indexes of URLs
    Search by keyword/description of page
Cont…
   Hypertext and Hypermedia with new standards,
    HTML, XHTML, have brought tremendous revolution
    in the creation and delivery of content, as well as
    access and processing of information.
   Allows users to navigate within or across a range of
    documents from several computer networks.
   Allows browsers and other software to interpret and
    process information for different purposes
   Search engines use the links among pages to select
    information resources from the Internet.
   Google use the link data to rank pages in order of
    their relevance to query.
Underlying Principles and
Challenges of IR
 Information retrieval is a complex task
 Query-based IR system must be able to accept
  a query about any topic and find texts that
  contain the specified information of query.
 IR systems are required to operate in real-time,
  which demand they should be fast and efficient.
 Most searches are conducted on the natural
  language text, which inherently have all the
  ambiguities and imprecision.
 The following are some of the challenges IR
  systems face in natural language processing:
Synonyms
 Synonym occurs when different words of
  phrases mean essentially the same thing.
 For example, the words: “finance”, “fund”,
  “support”, may be related depending on context
  of inquiry.
 Natural language is filled with many words and
  phrases that have similar meanings, and it is
  often impossible for users to provide all the
  words which might be relevant to the query.
 To address this problem, some IR systems
  expand the query to include all the synonymous
  words for a given word with the help of
  thesaurus.
Polysemy
 Polysemy  occurs when a single word has
  more than one meaning. For example, the
  word “shot” can refer to following meanings:
 A shooting, in - He shot at a tiger.

 An attempt, in - I took a shot at playing.

 A photograph, in - He took a nice shot
Phrases in Information
Retrieval
 Expressions consisting of multiple words often
  have a meaning that is substantially different
  from the meaning of the individual words.
 The phrase “Artificial Intelligence” is different
  from the individual word “Artificial” and
  “Intelligence”, and “Operating System” is
  different from: “Operating” and “System”.
 One method for phrase-based indexing is to use
  proximity measures to specify the acceptable
  distance between the words. WITH or NEAR
Object Recognition
   Certain types of information require special
    procedures to identify them. For example, dates
    come in various forms such as: July 3, 2001,
    3.7.2001, as well as 7.3.2001 (American
    System). (greater and less than logic <>)
Semantics and Role- Relationships
 Some information can only be identified through
  semantics.
 If a user is interested in finding out the names of
  lecturers teaching the courses in the area of:
  “Artificial Intelligence”.
 First, the system must be able to know the
  courses related to Artificial Intelligence. These
  can be AI, Fuzzy logic, Genetic Algorithms,
  Neural Networks, Machine learning.
 This should then be linked to lecturers allocated
  the courses.
Computable Values
 Determining   whether information is relevant
  some times depends on a specific
  calculation.
 Suppose a user is interested in news paper
  article about merger in corporation that
  occurred after January, 1995.
 The IR system must identify the documents
  using search logic operator LESS THAN or
  GREATER THAN <>
Text Representation Techniques
 The purpose of IR system is to search the text
  database for relevant documents in real-time.
 Consequently, the text database is
  preprocessed and stored in a structure which
  helps in fast searching.
 This preprocessed form is called text
  representation
Inverted File Approach
   The inverted file approach is used in text representation.
    It allows an IR system to quickly determine what
    documents contain a given set of words, and how often
    each word appears in the document.
   In inverted file system, each database contains two files
   Text file –normal form in which documents appear in a
    database and,
    inverted file- which contain all index terms drawn
    automatically from the document records.
   Provides indirect file access
Using Probability Methods
 All IR systems draw conclusions about the
  content of a document by examining source
  representation
 IR must base its conclusions about the
  document features, such as the present or
  absence of particular word or phrases.
 IR system must take into account these
  uncertain relationships to determine the strength
  of the relevance of a document/s to a particular
  request.
Relevance feedback
 Relevance   feedback is a technique used by
  some IR systems to improve performance on
  query by asking the user for feedback about
  retrieved texts.
 Evaluation forms given to users to seek their
  views on performance of information retrieval
  systems.
CAT TWO- Search Engines


 Compare the assigned search engines and
  competently comment on the following:
 Structure of the search engines

 Indexing techniques used

 Information resources offered

 Links between nodes

 Search facilities and retrieval approaches used

 Ease of use by novice and experienced users

 Interface design and display of screen layout
Search Engines
 Group 1: Yahoo and Lycos
 Group 2: Google and Alta Vista

 Group 3: Ask Jeeves and Excite

 Group 4: All the Web and HotBot

 Group 5: Web crawler and MSN

 Group 6: Dogpile and EBay

More Related Content

What's hot

Comparative study of web 1, Web 2 and Web 3
Comparative study of web 1, Web 2 and Web 3Comparative study of web 1, Web 2 and Web 3
Comparative study of web 1, Web 2 and Web 3Dlis Mu
 
Type of websites
Type of websitesType of websites
Type of websitesEdy Indra
 
Current and Future Trends of Media and Information
Current and Future Trends of Media and InformationCurrent and Future Trends of Media and Information
Current and Future Trends of Media and InformationElijah Miguel Cuenca
 
World Wide Web (WWW)
World Wide Web (WWW)World Wide Web (WWW)
World Wide Web (WWW)Al Mamun
 
Uniform Resource Locator (URL)
Uniform Resource Locator (URL)Uniform Resource Locator (URL)
Uniform Resource Locator (URL)Mary Daine Napuli
 
Advanced Web Development
Advanced Web DevelopmentAdvanced Web Development
Advanced Web DevelopmentRobert J. Stein
 
Human computer interaction
Human computer interactionHuman computer interaction
Human computer interactionsai anjaneya
 
WEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web DevelopmentWEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web DevelopmentRandy Connolly
 
Principles of web design
Principles of web designPrinciples of web design
Principles of web designdswebdesign
 
WEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptx
WEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptxWEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptx
WEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptxjwhuqyqtayaw
 
Web Design & Development - Session 1
Web Design & Development - Session 1Web Design & Development - Session 1
Web Design & Development - Session 1Shahrzad Peyman
 
Intro to infographics
Intro to infographicsIntro to infographics
Intro to infographicsAndy Dorn
 

What's hot (20)

Comparative study of web 1, Web 2 and Web 3
Comparative study of web 1, Web 2 and Web 3Comparative study of web 1, Web 2 and Web 3
Comparative study of web 1, Web 2 and Web 3
 
Hyper Media
Hyper MediaHyper Media
Hyper Media
 
Type of websites
Type of websitesType of websites
Type of websites
 
Current and Future Trends of Media and Information
Current and Future Trends of Media and InformationCurrent and Future Trends of Media and Information
Current and Future Trends of Media and Information
 
Web design principles
Web design principlesWeb design principles
Web design principles
 
Hyperlink
HyperlinkHyperlink
Hyperlink
 
Semantic web
Semantic webSemantic web
Semantic web
 
Multimedia
MultimediaMultimedia
Multimedia
 
World Wide Web (WWW)
World Wide Web (WWW)World Wide Web (WWW)
World Wide Web (WWW)
 
Uniform Resource Locator (URL)
Uniform Resource Locator (URL)Uniform Resource Locator (URL)
Uniform Resource Locator (URL)
 
web 2.0
web 2.0web 2.0
web 2.0
 
World Wide Web (WWW)
World Wide Web (WWW)World Wide Web (WWW)
World Wide Web (WWW)
 
Advanced Web Development
Advanced Web DevelopmentAdvanced Web Development
Advanced Web Development
 
Human computer interaction
Human computer interactionHuman computer interaction
Human computer interaction
 
Hypermedia
HypermediaHypermedia
Hypermedia
 
WEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web DevelopmentWEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web Development
 
Principles of web design
Principles of web designPrinciples of web design
Principles of web design
 
WEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptx
WEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptxWEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptx
WEB-SYSTEM-AND-TECHNOLOGIES-INTRODUCTION-1.pptx
 
Web Design & Development - Session 1
Web Design & Development - Session 1Web Design & Development - Session 1
Web Design & Development - Session 1
 
Intro to infographics
Intro to infographicsIntro to infographics
Intro to infographics
 

Viewers also liked

Web Cookies
Web CookiesWeb Cookies
Web Cookiesapwebco
 
Presentation on Internet Cookies
Presentation on Internet CookiesPresentation on Internet Cookies
Presentation on Internet CookiesRitika Barethia
 
Cookie & Session In ASP.NET
Cookie & Session In ASP.NETCookie & Session In ASP.NET
Cookie & Session In ASP.NETShingalaKrupa
 
Cookies PowerPoint
Cookies PowerPointCookies PowerPoint
Cookies PowerPointemurfield
 

Viewers also liked (8)

Sessions and cookies
Sessions and cookiesSessions and cookies
Sessions and cookies
 
Cookies and sessions
Cookies and sessionsCookies and sessions
Cookies and sessions
 
Web Cookies
Web CookiesWeb Cookies
Web Cookies
 
Presentation on Internet Cookies
Presentation on Internet CookiesPresentation on Internet Cookies
Presentation on Internet Cookies
 
Cookie & Session In ASP.NET
Cookie & Session In ASP.NETCookie & Session In ASP.NET
Cookie & Session In ASP.NET
 
Javascript
JavascriptJavascript
Javascript
 
Cookies PowerPoint
Cookies PowerPointCookies PowerPoint
Cookies PowerPoint
 
Cookies!
Cookies!Cookies!
Cookies!
 

Similar to Web-Based Information Retrieval Techniques

Context Based Indexing in Search Engines Using Ontology: Review
Context Based Indexing in Search Engines Using Ontology: ReviewContext Based Indexing in Search Engines Using Ontology: Review
Context Based Indexing in Search Engines Using Ontology: Reviewiosrjce
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
 
Chapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and RetrievalChapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and Retrievalcaptainmactavish1996
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
lessonhypertextandintertext-220308014510.pptx
lessonhypertextandintertext-220308014510.pptxlessonhypertextandintertext-220308014510.pptx
lessonhypertextandintertext-220308014510.pptxEVAMAEBONGHANOY5
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud ComputingCarmen Sanborn
 
Lesson hypertext and intertext
Lesson hypertext and intertextLesson hypertext and intertext
Lesson hypertext and intertextCristinaGrumal
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerIOSR Journals
 
Exposing the Hyperlink
Exposing the HyperlinkExposing the Hyperlink
Exposing the HyperlinkPremlal Dewli
 
An Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataAn Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataMelinda Watson
 
Exposing the Hyperlink
Exposing the HyperlinkExposing the Hyperlink
Exposing the HyperlinkMarc Duchene
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 

Similar to Web-Based Information Retrieval Techniques (20)

Hci
HciHci
Hci
 
N017249497
N017249497N017249497
N017249497
 
Context Based Indexing in Search Engines Using Ontology: Review
Context Based Indexing in Search Engines Using Ontology: ReviewContext Based Indexing in Search Engines Using Ontology: Review
Context Based Indexing in Search Engines Using Ontology: Review
 
Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...Extracting and Reducing the Semantic Information Content of Web Documents to ...
Extracting and Reducing the Semantic Information Content of Web Documents to ...
 
Chapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and RetrievalChapter 1: Introduction to Information Storage and Retrieval
Chapter 1: Introduction to Information Storage and Retrieval
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Mam assign
Mam assignMam assign
Mam assign
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
CS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdfCS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdf
 
lessonhypertextandintertext-220308014510.pptx
lessonhypertextandintertext-220308014510.pptxlessonhypertextandintertext-220308014510.pptx
lessonhypertextandintertext-220308014510.pptx
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
Lesson hypertext and intertext
Lesson hypertext and intertextLesson hypertext and intertext
Lesson hypertext and intertext
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 
Semantic web
Semantic webSemantic web
Semantic web
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal Computer
 
Exposing the Hyperlink
Exposing the HyperlinkExposing the Hyperlink
Exposing the Hyperlink
 
An Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured DataAn Improved Annotation Based Summary Generation For Unstructured Data
An Improved Annotation Based Summary Generation For Unstructured Data
 
Exposing the Hyperlink
Exposing the HyperlinkExposing the Hyperlink
Exposing the Hyperlink
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 

More from patrickalfredwaluchio

Knowledge management and the role of libraries
Knowledge management and the role of librariesKnowledge management and the role of libraries
Knowledge management and the role of librariespatrickalfredwaluchio
 
Discuss the relevance information policy in africa
Discuss the relevance information policy in africaDiscuss the relevance information policy in africa
Discuss the relevance information policy in africapatrickalfredwaluchio
 
Choosing content management software for an organization
Choosing content management software for an organizationChoosing content management software for an organization
Choosing content management software for an organizationpatrickalfredwaluchio
 
Bibliotherapy as a strategy for facilitating behavior change proposal defence
Bibliotherapy as a strategy for facilitating behavior change proposal defenceBibliotherapy as a strategy for facilitating behavior change proposal defence
Bibliotherapy as a strategy for facilitating behavior change proposal defencepatrickalfredwaluchio
 

More from patrickalfredwaluchio (7)

Knowledge management and the role of libraries
Knowledge management and the role of librariesKnowledge management and the role of libraries
Knowledge management and the role of libraries
 
Discuss the relevance information policy in africa
Discuss the relevance information policy in africaDiscuss the relevance information policy in africa
Discuss the relevance information policy in africa
 
Alfreds cv and documents
Alfreds cv and documentsAlfreds cv and documents
Alfreds cv and documents
 
Market rm
Market rmMarket rm
Market rm
 
Choosing content management software for an organization
Choosing content management software for an organizationChoosing content management software for an organization
Choosing content management software for an organization
 
Recordsmanagement
RecordsmanagementRecordsmanagement
Recordsmanagement
 
Bibliotherapy as a strategy for facilitating behavior change proposal defence
Bibliotherapy as a strategy for facilitating behavior change proposal defenceBibliotherapy as a strategy for facilitating behavior change proposal defence
Bibliotherapy as a strategy for facilitating behavior change proposal defence
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

Web-Based Information Retrieval Techniques

  • 1. Web-Based Information Retrieval Patrick Alfred Waluchio Ongwen Knowledge Management Officer African Research and Resource Forum
  • 2. Web as Agent of Change  "ICT is not an end in itself or an agent of change by itself but when incorporated into a well managed change process it is a powerful enabler and amplifier." Bryn Jones,  Effective use of Web therefore, requires content, content management, hyperlinks and navigation tools, retrieval approaches, among others
  • 3. Introduction  A critical goal of successful information retrieval on the web is to identify which pages are of high quality and relevance to user’s query.  Each search engine index web page, representing it by a set of weighted keywords  A crawler (robot or spider) performs traversal of the web with goal of fetching high quality pages for indexing and retrieval.
  • 4. Introduction  Search engines must filter the most relevant information matching a user’s query and present retrieved information in a way a user will understand.  Hyperlinks provide a valuable source of information for web retrieval.  Hypertext and hypermedia enables searching the web in non-sequential manner
  • 5. Challenges of Web Information Retrieval  Management of huge amount of hyperlinked pages  Crawling the web to find appropriate web sites to index  Accessing documents  Measuring the quality or authority of available information
  • 6. Hypertext  Non-Linear arrangements of textual material is called hypertext. The term hyper means extension to other dimensions.  Converting text into a multidimensional space  The term was invented by Ted Nelson in 1965  Hypertext “non-sequential writing” Nelson, T. 1987. Literary Machines.  Non-linear sequences of information (dictionary, encyclopaedia, newspaper)  Hypertext are systems to manage collection of information that can be accessed non-sequentially.
  • 7. Hypertext  Consists of a network of nodes and logical links between nodes  Node refer to chunks of content or web page  The variety of nodes and links make hypertext a flexible structure in which information can be provided by what is stored in nodes and links to each node.  Hypertext retrieval systems are the products of emerging technology that specifies alternative approach to the retrieval of information from web
  • 8. Hypertext and Non-Linearity  Allows a user to follow their own path in a non-sequential manner to access information  Hypertext not usually read linearly  Links encourage branching off  History and back button permit backtracking  The immediacy of following links by clicking creates a different experience from traditional non-linearity
  • 9. Structure of Hypertext  In a hypertext system objects in a database called nodes, are connected to one another by machine- supported links. Users follow links to access information.  Text augmented with links  Link: pointer to another piece of text in same or different document  Hypertext systems can accessed by selecting link to icons and following links from node to node  By searching the database for some key word in the normal way  Use of browser to view and navigate hypertext
  • 11. Hierarchical Structure  Hierarchy is the basis of almost all websites as well as hypertext  They are orderly and provide ample navigational freedom  Users start at the home page, descend the branch that most interest them, and continue making further choices as the branch divides
  • 13. Web-like structures  Relatively unsystematic and difficult to navigate  Mostly used in works of short stories and fiction in which artistic considerations may override desire for efficient navigation
  • 15. Multipath Structures  Largely linear and to some extent hierarchical but offers alternative pathways hence multipath structures
  • 16. Hypertext Component Hypertext model:  The run-time layer, which controls the user interface  The storage layer, which is a database containing a network of nodes connected by links  The within-component layer, which is the content structure inside the node.
  • 17. Hypermedia (Hypermedia = Hypertext + Multimedia)  Hypermedia integrate text, images, video, graphics, sound within Web page or node  Hyper- representation of textual and non- textual information in a non-sequential manner.  Allows embedding bitmapped images (GIF, JPEG, PNG)
  • 18. History of Hypertext  1945: Vannevar Bush describes “memex” (Atlantic Monthly)  1965: Ted Nelson coins the term “hypertext”  1985: Peter Brown, University of Kent, develops first commercially available hypertext - Guide  1986-1990: More sophisticated hypertext systems developed
  • 19. History …  1991: Tim Berners-Lee builds IP-based distributed hypertext system at CERN Develops UDI/URI, HTTP, and HTML…  1993: Mosaic, first graphical Web browser, released  2002: Work begins on Semantic Web
  • 20. Hypertext and Hypermedia- In Information retrieval  Browsing – retrieve information by association  Follow links, backtrack  Maintain history, bookmarks  Searching – retrieve information by content  Construct indexes of URLs  Search by keyword/description of page
  • 21. Cont…  Hypertext and Hypermedia with new standards, HTML, XHTML, have brought tremendous revolution in the creation and delivery of content, as well as access and processing of information.  Allows users to navigate within or across a range of documents from several computer networks.  Allows browsers and other software to interpret and process information for different purposes  Search engines use the links among pages to select information resources from the Internet.  Google use the link data to rank pages in order of their relevance to query.
  • 22. Underlying Principles and Challenges of IR  Information retrieval is a complex task  Query-based IR system must be able to accept a query about any topic and find texts that contain the specified information of query.  IR systems are required to operate in real-time, which demand they should be fast and efficient.  Most searches are conducted on the natural language text, which inherently have all the ambiguities and imprecision.  The following are some of the challenges IR systems face in natural language processing:
  • 23. Synonyms  Synonym occurs when different words of phrases mean essentially the same thing.  For example, the words: “finance”, “fund”, “support”, may be related depending on context of inquiry.  Natural language is filled with many words and phrases that have similar meanings, and it is often impossible for users to provide all the words which might be relevant to the query.  To address this problem, some IR systems expand the query to include all the synonymous words for a given word with the help of thesaurus.
  • 24. Polysemy  Polysemy occurs when a single word has more than one meaning. For example, the word “shot” can refer to following meanings:  A shooting, in - He shot at a tiger.  An attempt, in - I took a shot at playing.  A photograph, in - He took a nice shot
  • 25. Phrases in Information Retrieval  Expressions consisting of multiple words often have a meaning that is substantially different from the meaning of the individual words.  The phrase “Artificial Intelligence” is different from the individual word “Artificial” and “Intelligence”, and “Operating System” is different from: “Operating” and “System”.  One method for phrase-based indexing is to use proximity measures to specify the acceptable distance between the words. WITH or NEAR
  • 26. Object Recognition  Certain types of information require special procedures to identify them. For example, dates come in various forms such as: July 3, 2001, 3.7.2001, as well as 7.3.2001 (American System). (greater and less than logic <>)
  • 27. Semantics and Role- Relationships  Some information can only be identified through semantics.  If a user is interested in finding out the names of lecturers teaching the courses in the area of: “Artificial Intelligence”.  First, the system must be able to know the courses related to Artificial Intelligence. These can be AI, Fuzzy logic, Genetic Algorithms, Neural Networks, Machine learning.  This should then be linked to lecturers allocated the courses.
  • 28. Computable Values  Determining whether information is relevant some times depends on a specific calculation.  Suppose a user is interested in news paper article about merger in corporation that occurred after January, 1995.  The IR system must identify the documents using search logic operator LESS THAN or GREATER THAN <>
  • 29. Text Representation Techniques  The purpose of IR system is to search the text database for relevant documents in real-time.  Consequently, the text database is preprocessed and stored in a structure which helps in fast searching.  This preprocessed form is called text representation
  • 30. Inverted File Approach  The inverted file approach is used in text representation. It allows an IR system to quickly determine what documents contain a given set of words, and how often each word appears in the document.  In inverted file system, each database contains two files  Text file –normal form in which documents appear in a database and,  inverted file- which contain all index terms drawn automatically from the document records.  Provides indirect file access
  • 31. Using Probability Methods  All IR systems draw conclusions about the content of a document by examining source representation  IR must base its conclusions about the document features, such as the present or absence of particular word or phrases.  IR system must take into account these uncertain relationships to determine the strength of the relevance of a document/s to a particular request.
  • 32. Relevance feedback  Relevance feedback is a technique used by some IR systems to improve performance on query by asking the user for feedback about retrieved texts.  Evaluation forms given to users to seek their views on performance of information retrieval systems.
  • 33. CAT TWO- Search Engines  Compare the assigned search engines and competently comment on the following:  Structure of the search engines  Indexing techniques used  Information resources offered  Links between nodes  Search facilities and retrieval approaches used  Ease of use by novice and experienced users  Interface design and display of screen layout
  • 34. Search Engines  Group 1: Yahoo and Lycos  Group 2: Google and Alta Vista  Group 3: Ask Jeeves and Excite  Group 4: All the Web and HotBot  Group 5: Web crawler and MSN  Group 6: Dogpile and EBay