SlideShare a Scribd company logo
1 of 38
Cross Language Information
Retrieval (CLIR)
INFORMATION SEARCHING AND RETRIEVAL (MLS 712)

PREPARED FOR:
ASSOC. PROF. HAJAH FUZIAH MOHD NADZAR
PREPARED BY:
ASYURA BINTI AMINORDIN (2012482362)
MOHD IQBAL AL-FARABI B YAHYA
(2012253658)
DATE: DECEMBER 17, 2012
Introduction
Cross-language

information
retrieval
(CLIR) is a subfield of information retrieval dealing
with retrieving information written in a language
different from the language of the user's query. For
example, a user may pose their query in English but
retrieve relevant documents written in French.

http://en.wikipedia.org/wiki/Cross-language_information_retrieval
CLIR Purpose
Researchers

in
Cross-Language
Information
Retrieval (CLIR) seek to support the process of
finding documents written in one natural language
with automated systems that can accept queries
expressed in other languages.
English-Chinese
Information Retrieval System (ECIRS)
Web-based English-Chinese Information Retrieval

System, ECIRS. ECIRS provides a cross-language
platform for helping people to retrieve Chinese
information without inputting a Chinese query. The
web-based client-server architecture allows more
users to access ECIRS through the worldwide
Internet.
Conts…
ECIRS consists of a client side and a server side.

The client side is a web-based user interface. The
server side includes bilingual dictionaries, contentbased document index files, a Chinese search engine
and Chinese document collections.
Conts…
Client side

Server side

Allows a user to input a query
in English and send the query
to the server side then the
result contains an entry list of
relevant
documents
in
Chinese

An English-Chinese
dictionary and a ChineseEnglish dictionary, are used
to
translate the user's query
from English into Chinese key
word in ECIRS.
English - Chinese Information retrieval

Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English - Chinese Information retrieval

Side bar from the System
where user can choose any
of the button provided EX:
On-line English
Chinese Dictionary
allow user to translate
English word into
Chinese word
English - Chinese Information retrieval

Keyword
:
computer

From the screenshot above we insert any keyword which we
want to search
Example: Computer
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English - Chinese Information retrieval

Translation from English into Chinese

Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English Chinese Information retrieval

On-Line Chinese
Information Retrieval
System. The database
where all document or
information that relate to
the information need which
is “Computer”
English Chinese Information retrieval

The List of
document
which relate to
the computer.
There was 294
result
Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
English Chinese Information retrieval

Screen shot of English Chinese Information retrieval System Layout:
http://www.cs.nmsu.edu/~sliu/main_frame.html
Big 5 - GB
Big 5 is a Chinese character encoding method used

in Taiwan, Hong Kong, and Macau for Traditional
Chinese characters
GB (Guojia Biaozhun 国家标准 ) is the registered
internet name for a key official character set of the
People's Republic of China, used for simplified
Chinese characters
Cross Language Information Retrieval

Layout of the website where people use to book hotel and flight to travel.
Conts…

Users can choose
any language.
Example: Japanese
Conts…

Change into
Japanese
wording.

As we can see the language in the layout change into Japanese wording.
Conts…

By using Google translate it allow users to identified the
meaning of the Japanese word.
EXAMPLE: MALAY-to-JAPANESE
Conts…

Insert the translation word from the Google translate in search
engine of www.easytobook.com
Conts…

Click
any
result

A list of result where 131 hotels is available where we can
see the wording show is still in Japanese.
Conts…

The description of the hotel in Kuala Lumpur is written
in Japanese.
CLIR WEBSITE EXAMPLE
http://www.cs.nmsu.edu/~sliu/main_frame.html
http://www.easytobook.com/
CINDOR (Conceptual Interlingua Document
Retrieval)
Cross-language text retrieval system capable of accepting

a user's query stated in their native language and then
seamlessly searching, retrieving, relevance ranking and
displaying documents written in a variety of foreign
languages
CINDOR allows users of the system to state queries in
any of the supported languages (currently English,
French, Spanish, and Japanese) and search and retrieve
documents from any of the supported languages.
Adopted ‘Conceptual Interlingua’: unique approach to
cross-language information management based on a
language-independent conceptual representation
CINDOR
‘Conceptual’ resource of our conceptual interlingua
Concept of “elasticity: the tendency of a body to

return to its original shape after it has been stretched
or compressed”, which has the label 131186, is
instantiated in English and French



131186 spring, give, springiness
131186 élasticité, flexibilité, moëlleux
The Eurovision St Andrews
Photographic Collection
Site presents the collection in a variety of ways: full

text search; or browsing a list of 999 pre-defined
index terms organised alphabetically and
hierarchically via a categories page
SAC consists of 28,133 thumbnail images (around
120x76 pixels), larger versions of these images
(around 368x234 pixels), and associated captions,
giving a total of 84,399 files in the main body of the
collection.
Eurovision
Photograph metadata:









(1) a unique record number,
(2) a short title,
(3) a full title,
(4) a textual description of the image content,
(5) the date when the photograph was taken (most frequently with
the day, month and year),
(6) the originator, i.e. the name of an individual or company to which
the photograph is attributed,
(7) the location of the photograph (e.g. the county and the country),
and
(8) a line for notes to offer additional information about the
photograph
Eurovision
St Andrews collection has been used for bilingual ad-

hoc retrieval where queries typical to this kind of
historic collection have been generated in English
and translated into languages including a range of
Indo-European, Asian and Romance languages
Challenges include:




Captions which are short in length increasing the likelihood of
vocabulary mismatch, captions with text not directly associated
with the visual content of an image (e.g. expressing something
in the background),
The use of colloquial and domain-specific language in the
caption (i.e. British English).
The web interface to the St
Andrews collection
The web interface to the St Andrews
collection
CLIR University of Indonesia
Query expansion techniques: pseudo relevance

feedback






Assumption that the top few documents initially retrieved are
indeed relevant to the query, and so they must contain other
terms that are also relevant to the query
To choose the relevant terms from the top ranked documents,
we used the tf*idf term weighting formula.
We added a certain number of noun terms that have the
highest weight scores.
Interface and program demo
Interface and program demo
INFOMAP
 Chinese question classification is the process that analyzes a

question and labels it based on its question type and expected
answer type
 Adopt INFOMAP inference engine to support the knowledge-based
approach for Chinese questions, which can be formulated as
templates and use SVM (Support Vector Machines) as the machine
learning approach for large collections of labeled Chinese questions.
 INFOMAP is a knowledge representation framework that extracts
important concepts from a natural language text
 Feature of INFOMAP is its capability to represent and match
complicated template structures, such as hierarchical matching,
regular expressions, semantic template matching, frame (non-linear
relations) matching, and graph matching.
 Using INFOMAP, we can identify the question category from a
Chinese question.
Example
Question

(In which city were the Olympics held in 2004?)
INFOMAP can be formulated as a rule or template

(four elements (denoted as "HAS-PART") in this
rule)




"[5 Time]:[3 Organization]:[7 Q_Location]: ([9
LocationRelatedEvent])“
2004
Searching Demo
Searching demo
Searching demo
Thank You

More Related Content

What's hot

Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction) Primya Tamil
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.Lanujessy
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Database backup and recovery
Database backup and recoveryDatabase backup and recovery
Database backup and recoveryAnne Lee
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval systemLeslie Vargas
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systemsViet-Trung TRAN
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSAbhishek Dutta
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization Hafiz faiz
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrievalNanthini Dominique
 
Google File System
Google File SystemGoogle File System
Google File Systemguest2cb4689
 
File replication
File replicationFile replication
File replicationKlawal13
 
Introduction to filesystems and computer forensics
Introduction to filesystems and computer forensicsIntroduction to filesystems and computer forensics
Introduction to filesystems and computer forensicsMayank Chaudhari
 

What's hot (20)

Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.L
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Database backup and recovery
Database backup and recoveryDatabase backup and recovery
Database backup and recovery
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Data
DataData
Data
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Data-Persistency
Data-PersistencyData-Persistency
Data-Persistency
 
Text mining
Text miningText mining
Text mining
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Google File System
Google File SystemGoogle File System
Google File System
 
File replication
File replicationFile replication
File replication
 
Introduction to filesystems and computer forensics
Introduction to filesystems and computer forensicsIntroduction to filesystems and computer forensics
Introduction to filesystems and computer forensics
 

Similar to Cross language information retrieval (clir)slide

Arabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase ExtractionArabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase Extractioncscpconf
 
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTIONARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTIONcsandit
 
An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urduijaia
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrievaldannyijwest
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7alaa223
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
 
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALA SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALIJCI JOURNAL
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsRichard Littauer
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Jean Brenda
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMcscpconf
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir systemcsandit
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondULB - Bibliothèques
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...faflrt
 

Similar to Cross language information retrieval (clir)slide (20)

07 04-06
07 04-0607 04-06
07 04-06
 
Arabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase ExtractionArabic Dataset for Automatic Keyphrase Extraction
Arabic Dataset for Automatic Keyphrase Extraction
 
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTIONARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
ARABIC DATASET FOR AUTOMATIC KEYPHRASE EXTRACTION
 
An unsupervised approach to develop ir system the case of urdu
An unsupervised approach to develop ir system  the case of urduAn unsupervised approach to develop ir system  the case of urdu
An unsupervised approach to develop ir system the case of urdu
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
 
Ir 1 lec 7
Ir 1 lec 7Ir 1 lec 7
Ir 1 lec 7
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
 
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALA SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVAL
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
CASL Report1
CASL Report1CASL Report1
CASL Report1
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir system
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the Pond
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the Pond
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Cross language information retrieval (clir)slide

  • 1. Cross Language Information Retrieval (CLIR) INFORMATION SEARCHING AND RETRIEVAL (MLS 712) PREPARED FOR: ASSOC. PROF. HAJAH FUZIAH MOHD NADZAR PREPARED BY: ASYURA BINTI AMINORDIN (2012482362) MOHD IQBAL AL-FARABI B YAHYA (2012253658) DATE: DECEMBER 17, 2012
  • 2. Introduction Cross-language information retrieval (CLIR) is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query. For example, a user may pose their query in English but retrieve relevant documents written in French. http://en.wikipedia.org/wiki/Cross-language_information_retrieval
  • 3. CLIR Purpose Researchers in Cross-Language Information Retrieval (CLIR) seek to support the process of finding documents written in one natural language with automated systems that can accept queries expressed in other languages.
  • 4. English-Chinese Information Retrieval System (ECIRS) Web-based English-Chinese Information Retrieval System, ECIRS. ECIRS provides a cross-language platform for helping people to retrieve Chinese information without inputting a Chinese query. The web-based client-server architecture allows more users to access ECIRS through the worldwide Internet.
  • 5. Conts… ECIRS consists of a client side and a server side. The client side is a web-based user interface. The server side includes bilingual dictionaries, contentbased document index files, a Chinese search engine and Chinese document collections.
  • 6. Conts… Client side Server side Allows a user to input a query in English and send the query to the server side then the result contains an entry list of relevant documents in Chinese An English-Chinese dictionary and a ChineseEnglish dictionary, are used to translate the user's query from English into Chinese key word in ECIRS.
  • 7. English - Chinese Information retrieval Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 8. English - Chinese Information retrieval Side bar from the System where user can choose any of the button provided EX: On-line English Chinese Dictionary allow user to translate English word into Chinese word
  • 9. English - Chinese Information retrieval Keyword : computer From the screenshot above we insert any keyword which we want to search Example: Computer Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 10. English - Chinese Information retrieval Translation from English into Chinese Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 11. English Chinese Information retrieval On-Line Chinese Information Retrieval System. The database where all document or information that relate to the information need which is “Computer”
  • 12. English Chinese Information retrieval The List of document which relate to the computer. There was 294 result Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 13. English Chinese Information retrieval Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  • 14. Big 5 - GB Big 5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for Traditional Chinese characters GB (Guojia Biaozhun 国家标准 ) is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters
  • 15. Cross Language Information Retrieval Layout of the website where people use to book hotel and flight to travel.
  • 16. Conts… Users can choose any language. Example: Japanese
  • 17. Conts… Change into Japanese wording. As we can see the language in the layout change into Japanese wording.
  • 18. Conts… By using Google translate it allow users to identified the meaning of the Japanese word. EXAMPLE: MALAY-to-JAPANESE
  • 19. Conts… Insert the translation word from the Google translate in search engine of www.easytobook.com
  • 20. Conts… Click any result A list of result where 131 hotels is available where we can see the wording show is still in Japanese.
  • 21. Conts… The description of the hotel in Kuala Lumpur is written in Japanese.
  • 23. CINDOR (Conceptual Interlingua Document Retrieval) Cross-language text retrieval system capable of accepting a user's query stated in their native language and then seamlessly searching, retrieving, relevance ranking and displaying documents written in a variety of foreign languages CINDOR allows users of the system to state queries in any of the supported languages (currently English, French, Spanish, and Japanese) and search and retrieve documents from any of the supported languages. Adopted ‘Conceptual Interlingua’: unique approach to cross-language information management based on a language-independent conceptual representation
  • 24. CINDOR ‘Conceptual’ resource of our conceptual interlingua Concept of “elasticity: the tendency of a body to return to its original shape after it has been stretched or compressed”, which has the label 131186, is instantiated in English and French   131186 spring, give, springiness 131186 élasticité, flexibilité, moëlleux
  • 25. The Eurovision St Andrews Photographic Collection Site presents the collection in a variety of ways: full text search; or browsing a list of 999 pre-defined index terms organised alphabetically and hierarchically via a categories page SAC consists of 28,133 thumbnail images (around 120x76 pixels), larger versions of these images (around 368x234 pixels), and associated captions, giving a total of 84,399 files in the main body of the collection.
  • 26. Eurovision Photograph metadata:         (1) a unique record number, (2) a short title, (3) a full title, (4) a textual description of the image content, (5) the date when the photograph was taken (most frequently with the day, month and year), (6) the originator, i.e. the name of an individual or company to which the photograph is attributed, (7) the location of the photograph (e.g. the county and the country), and (8) a line for notes to offer additional information about the photograph
  • 27. Eurovision St Andrews collection has been used for bilingual ad- hoc retrieval where queries typical to this kind of historic collection have been generated in English and translated into languages including a range of Indo-European, Asian and Romance languages Challenges include:   Captions which are short in length increasing the likelihood of vocabulary mismatch, captions with text not directly associated with the visual content of an image (e.g. expressing something in the background), The use of colloquial and domain-specific language in the caption (i.e. British English).
  • 28. The web interface to the St Andrews collection
  • 29. The web interface to the St Andrews collection
  • 30. CLIR University of Indonesia Query expansion techniques: pseudo relevance feedback    Assumption that the top few documents initially retrieved are indeed relevant to the query, and so they must contain other terms that are also relevant to the query To choose the relevant terms from the top ranked documents, we used the tf*idf term weighting formula. We added a certain number of noun terms that have the highest weight scores.
  • 33. INFOMAP  Chinese question classification is the process that analyzes a question and labels it based on its question type and expected answer type  Adopt INFOMAP inference engine to support the knowledge-based approach for Chinese questions, which can be formulated as templates and use SVM (Support Vector Machines) as the machine learning approach for large collections of labeled Chinese questions.  INFOMAP is a knowledge representation framework that extracts important concepts from a natural language text  Feature of INFOMAP is its capability to represent and match complicated template structures, such as hierarchical matching, regular expressions, semantic template matching, frame (non-linear relations) matching, and graph matching.  Using INFOMAP, we can identify the question category from a Chinese question.
  • 34. Example Question  (In which city were the Olympics held in 2004?) INFOMAP can be formulated as a rule or template (four elements (denoted as "HAS-PART") in this rule)   "[5 Time]:[3 Organization]:[7 Q_Location]: ([9 LocationRelatedEvent])“ 2004