SlideShare a Scribd company logo
1 of 18
Download to read offline
Cross-Language Information Retrieval
University of Arizona

Sumin Byeon

1
Overview
안드로이드 이메일 암호화&

Matching&
algorithm&

Bilingual&
corpus&
database&

Results&in&
English&

Android&email&encryp3on&

Google&
Search&

2
Background
•

Corpus - a collection of written text; a single word or multiple words, or even
phrases and sentences

•

Comparable corpus - a collection of text from pairs of languages referring to
the same domain[1]; (source text, target text) pair

•

N-gram - n-character or n-word slice of a longer string[2]. We refer n-character
slices by the term n-gram. We use 4-gram (four-gram or quad-gram)

•

Source language - the language of the original phrases

•

Target language - the language into which CLIR translates the original phrases
[1]: Picchi, Eugenio, and Carol Peters. Cross-Language Information Retrieval: A System for Comparable Corpus Querying. Vol. 2. N.p.: Springer US, 1998. Print. 1387-5264.
[2]: Cavnar, William B., and John M. Trenkle. "N-Gram-Based Text Categorization." (1994) Print.

3
Motivation
•

Desire to acquire information even if the information is not
sufficiently available in their native language

•

Survey has shown people have a higher foreign language
proficiency level in reading than in writing

•

CLIR may bridge the gap between their desire to obtain
information and unavailability or under-availability of such
information in their native language

4
Goals
•

Allow users to query for domain-specific (i.e., computer science and software
engineering) information in their native language

•

Present relevant search results in the target language; the language in which
the largest amount of information is available

5
Components
•

Domain-specific bilingual corpus extraction from multiple sources

•

Corpus indexing

•

Querying and string matching

6
Corpus Extraction

7
Corpus Indexing
(S, T) -> (i1, h1), (i2, h2), …, (in, hn)

•

Java$

•

Quad-grams (k=4)

0:$Java$(20451)$

•

Fingerprint overlapping is okay, although it is not the most
space-efficient way

global$variable$

자바$

Frequency

전역 변수$

3:$bal_$(14870)$

50000

8:$aria$(14269)$

37500

25000

example$

예제$

12500

1:$xamp$(20451)$
0

1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

86

88

90

92

95

97

99 103

8
Querying & Matching
Java$global$variable$example$$

Java$

자바$

0:$Java$(20451)$

0:$Java$(20451)$
1:$ava_$(24085)$

…$

global$variable$

8:$bal_$(14870)$

전역 변수$

3:$bal_$(14870)$

…$

8:$aria$(14269)$

13:$aria$(14269)$

…$
22:$xamp$(20451)$

example$

예제$

1:$xamp$(20451)$

9
Multiple Candidates
global&variable&

•
•

Longest match first
Confidence: how many times does this comparable
corpus pair appear in a set of documents?

3:&bal_&(14870)&
8:&aria&(14269)&

global&

•

Outcome of matching depends on the domain of the
documents stored in the database

전역 변수&

세계적인&

0:&loba&(25848)&

variable&

변수&

1:&aria&(14269)&

variable&

가변적인&

1:&aria&(14269)&
10
Indexing and Querying Recap

자바 전역 변수 예제!

자바 :!Java!
전역 :!transfer!
전역 :!all!parts!(of)!
전역 변수 :!global!variable!
변수 :!variable!
예제 :!example!

Java!global!variable!
example!!

11
Relationship with Content Addressability

자바 전역 변수 예제&
자바&

Java&

전역 변수&
예제&

global&variable&
example&

Lorem&ipsum&dolor&sit&amet,&consectetur&adipiscing&elit.&
Quisque&id&Java&tris8que&nunc.&Ves8bulum&sit&amet&tortor&
ullamcorper,&pre8um&augue&ac,&facilisis&quam.&Ut&convallis&
suscipit&mauris,&at&porta&erat&vulputate&in.&Nulla&vitae&
consectetur&risus.&global&variable&Aenean&justo&risus,&mollis&
sed&condimentum&sed,&sagi@s&eget&nisl.&Phasellus&sem&leo,&
commodo&at&dignissim&vitae,&ullamcorper&nec&metus.&Proin&
pre8um&porta&lectus&nec&example&pulvinar.&Nulla&non&
elementum&nisi,&vel&hendrerit&quam.&Curabitur&bibendum&
lobor8s&8ncidunt.&Proin&vel&velit&porta,&tempus&ligula&a,&
interdum&leo.&Aenean&lorem&nibh,&facilisis&ut&porta&sit&amet,&
ornare&quis&ligula.&

12
Evaluation
•

Matching
•
•

•

Did it translate all the search terms to the target language properly?
Did it preserve domain-specific information?

Searching
•

Hit ratio: # of relevant web pages / # of results on the first page

•

Total number of search results
13
Evaluation
•

재귀 열거 집합 - recursively enumerable sets
•

•

배낭 문제 시간 복잡도 - 배낭 issue the time complexity
•

•

(3/3, 1/1)

(3/4, 1/2)

가상화를 통한 데이터센터 에너지 효율 극대화 - through virtualization datacenter
energy efficiency maximization
•

(7/7, 4/4)
14
Evaluation
•

Query in source language “재귀 열거 집합”
•

•

Query in target language “recursively enumerable sets”
•

•

(6/10, 15,300)

(10/10, 105,000)

Google Translate result “Set of recursive enumeration”
•

(10/10, 1,990,000)
15
Evaluation
•

Query in source language “배낭 문제 시간 복잡도”
•

•

Query in target language “배낭 issue time complexity”
•

•

(10/10, 31,200)

(2/6, 2,270)

Google Translate result “Knapsack problem, the time complexity”
•

(10/10, 206,000)
16
Evaluation
•

Query in source language “가상화를 통한 데이터센터 에너지 효율 극대화”
•

•

Query in target language “through virtualization datacenter energy efficiency
maximization”
•

•

(5/10, 36,100)

(8/10, 264,000)

Google Translate result “Maximize energy efficiency through data center
virtualization”
•

(10/10, 284,000)
17
Conclusion & Future Work
•

Preliminary results look satisfactory

•

Machine translation based CLIR appears to be more useful in many cases

•

Evaluation factors may not reflect the actual quality of the system

•

Labor-intensive evaluation process - need for an automated evaluation

•

Fuzzy matching based on lexical information (e.g., call, calls)

•

Fuzzy matching based on semantic information (e.g., maximize, maximizing,
maximization, maximum)
18

More Related Content

What's hot

Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...KozoChikai
 
Python-Introduction-slides-pkt
Python-Introduction-slides-pktPython-Introduction-slides-pkt
Python-Introduction-slides-pktPradyumna Tripathy
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIsAli Kheyrollahi
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
The vector space model
The vector space modelThe vector space model
The vector space modelpkgosh
 

What's hot (20)

Text Mining with R
Text Mining with RText Mining with R
Text Mining with R
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
 
Profile of NPOESS HDF5 Files
Profile of NPOESS HDF5 FilesProfile of NPOESS HDF5 Files
Profile of NPOESS HDF5 Files
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
 
Python-Introduction-slides-pkt
Python-Introduction-slides-pktPython-Introduction-slides-pkt
Python-Introduction-slides-pkt
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIs
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
SAC 2019 ester giallonardo
SAC 2019 ester giallonardoSAC 2019 ester giallonardo
SAC 2019 ester giallonardo
 
NLTK
NLTKNLTK
NLTK
 

Viewers also liked

Ponsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaidoPonsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaidoaledalmasso
 
Actualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудованияActualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудованияActualog
 
Mano miestas Tokijus
Mano miestas TokijusMano miestas Tokijus
Mano miestas Tokijustokyo18
 
第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成tjpucompiler
 
Blog pp cultural diversity
Blog pp cultural diversityBlog pp cultural diversity
Blog pp cultural diversityPaulineHeadley
 
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
 د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_رrawan102
 
Presentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group MembersPresentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group MembersCandi Williams
 
Internet marketing overview
Internet marketing overviewInternet marketing overview
Internet marketing overviewTom Gray
 
動畫表演
動畫表演動畫表演
動畫表演zi_yong
 
Professional Business Results & Selected Accomplishments
Professional Business Results & Selected AccomplishmentsProfessional Business Results & Selected Accomplishments
Professional Business Results & Selected Accomplishmentsmjleib
 
день семьи
день семьидень семьи
день семьиSokol194
 
Presentación t3
Presentación t3Presentación t3
Presentación t3pll-latam
 
Depositos de agua (SPANISH)
Depositos de agua (SPANISH)Depositos de agua (SPANISH)
Depositos de agua (SPANISH)Silos Cordoba
 
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda ConsultantSuccess Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda ConsultantDrSonica Krishan
 
iPad Crazy Session
iPad Crazy SessioniPad Crazy Session
iPad Crazy SessionKdeethomas1
 
東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフト東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフトTakayuki Toda
 

Viewers also liked (20)

Ponsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaidoPonsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaido
 
Actualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудованияActualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудования
 
Mano miestas Tokijus
Mano miestas TokijusMano miestas Tokijus
Mano miestas Tokijus
 
第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成
 
Blog pp cultural diversity
Blog pp cultural diversityBlog pp cultural diversity
Blog pp cultural diversity
 
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
 د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
 
Presentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group MembersPresentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group Members
 
Internet marketing overview
Internet marketing overviewInternet marketing overview
Internet marketing overview
 
動畫表演
動畫表演動畫表演
動畫表演
 
Professional Business Results & Selected Accomplishments
Professional Business Results & Selected AccomplishmentsProfessional Business Results & Selected Accomplishments
Professional Business Results & Selected Accomplishments
 
K401 L2
K401 L2K401 L2
K401 L2
 
день семьи
день семьидень семьи
день семьи
 
Uyoc
UyocUyoc
Uyoc
 
Schoo01 130906042632-
Schoo01 130906042632-Schoo01 130906042632-
Schoo01 130906042632-
 
Presentación t3
Presentación t3Presentación t3
Presentación t3
 
Depositos de agua (SPANISH)
Depositos de agua (SPANISH)Depositos de agua (SPANISH)
Depositos de agua (SPANISH)
 
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda ConsultantSuccess Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
 
iPad Crazy Session
iPad Crazy SessioniPad Crazy Session
iPad Crazy Session
 
東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフト東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフト
 
Gamze bilg ödevi
Gamze bilg ödeviGamze bilg ödevi
Gamze bilg ödevi
 

Similar to Cross-Language Information Retrieval

Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rAlexandria University
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara
 
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...geraintduck
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big DataSameer Wadkar
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புBalaSundaraRaman (Sundar)
 
A Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 DevelopmentA Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 DevelopmentCALPER
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
Using selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesUsing selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesAndrés Vargas
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3Nick Grattan
 
Mixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random EffectsMixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random EffectsScott Fraundorf
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 

Similar to Cross-Language Information Retrieval (20)

Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using r
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Plagirism checker
Plagirism checkerPlagirism checker
Plagirism checker
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்பு
 
A Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 DevelopmentA Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 Development
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Using selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesUsing selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectives
 
IR.pptx
IR.pptxIR.pptx
IR.pptx
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
 
Ir 03
Ir   03Ir   03
Ir 03
 
Mixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random EffectsMixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random Effects
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 

More from Sumin Byeon

PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]Sumin Byeon
 
BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩Sumin Byeon
 
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법Sumin Byeon
 
Are Credit Cards Evil
Are Credit Cards EvilAre Credit Cards Evil
Are Credit Cards EvilSumin Byeon
 
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법Sumin Byeon
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기Sumin Byeon
 
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가Sumin Byeon
 
2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법Sumin Byeon
 
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담Sumin Byeon
 
SLINKY: Static Linking Reloaded
SLINKY: Static Linking ReloadedSLINKY: Static Linking Reloaded
SLINKY: Static Linking ReloadedSumin Byeon
 
Project Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search EngineProject Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search EngineSumin Byeon
 
Self-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power ManagementSelf-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power ManagementSumin Byeon
 
Error tolerant search
Error tolerant searchError tolerant search
Error tolerant searchSumin Byeon
 
Git with bitbucket
Git with bitbucketGit with bitbucket
Git with bitbucketSumin Byeon
 
Git with bitbucket (draft)
Git with bitbucket (draft)Git with bitbucket (draft)
Git with bitbucket (draft)Sumin Byeon
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure PredictionSumin Byeon
 

More from Sumin Byeon (16)

PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
 
BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩
 
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
 
Are Credit Cards Evil
Are Credit Cards EvilAre Credit Cards Evil
Are Credit Cards Evil
 
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
 
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
 
2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법
 
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
 
SLINKY: Static Linking Reloaded
SLINKY: Static Linking ReloadedSLINKY: Static Linking Reloaded
SLINKY: Static Linking Reloaded
 
Project Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search EngineProject Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search Engine
 
Self-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power ManagementSelf-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power Management
 
Error tolerant search
Error tolerant searchError tolerant search
Error tolerant search
 
Git with bitbucket
Git with bitbucketGit with bitbucket
Git with bitbucket
 
Git with bitbucket (draft)
Git with bitbucket (draft)Git with bitbucket (draft)
Git with bitbucket (draft)
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure Prediction
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Cross-Language Information Retrieval

  • 3. Background • Corpus - a collection of written text; a single word or multiple words, or even phrases and sentences • Comparable corpus - a collection of text from pairs of languages referring to the same domain[1]; (source text, target text) pair • N-gram - n-character or n-word slice of a longer string[2]. We refer n-character slices by the term n-gram. We use 4-gram (four-gram or quad-gram) • Source language - the language of the original phrases • Target language - the language into which CLIR translates the original phrases [1]: Picchi, Eugenio, and Carol Peters. Cross-Language Information Retrieval: A System for Comparable Corpus Querying. Vol. 2. N.p.: Springer US, 1998. Print. 1387-5264. [2]: Cavnar, William B., and John M. Trenkle. "N-Gram-Based Text Categorization." (1994) Print. 3
  • 4. Motivation • Desire to acquire information even if the information is not sufficiently available in their native language • Survey has shown people have a higher foreign language proficiency level in reading than in writing • CLIR may bridge the gap between their desire to obtain information and unavailability or under-availability of such information in their native language 4
  • 5. Goals • Allow users to query for domain-specific (i.e., computer science and software engineering) information in their native language • Present relevant search results in the target language; the language in which the largest amount of information is available 5
  • 6. Components • Domain-specific bilingual corpus extraction from multiple sources • Corpus indexing • Querying and string matching 6
  • 8. Corpus Indexing (S, T) -> (i1, h1), (i2, h2), …, (in, hn) • Java$ • Quad-grams (k=4) 0:$Java$(20451)$ • Fingerprint overlapping is okay, although it is not the most space-efficient way global$variable$ 자바$ Frequency 전역 변수$ 3:$bal_$(14870)$ 50000 8:$aria$(14269)$ 37500 25000 example$ 예제$ 12500 1:$xamp$(20451)$ 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 86 88 90 92 95 97 99 103 8
  • 9. Querying & Matching Java$global$variable$example$$ Java$ 자바$ 0:$Java$(20451)$ 0:$Java$(20451)$ 1:$ava_$(24085)$ …$ global$variable$ 8:$bal_$(14870)$ 전역 변수$ 3:$bal_$(14870)$ …$ 8:$aria$(14269)$ 13:$aria$(14269)$ …$ 22:$xamp$(20451)$ example$ 예제$ 1:$xamp$(20451)$ 9
  • 10. Multiple Candidates global&variable& • • Longest match first Confidence: how many times does this comparable corpus pair appear in a set of documents? 3:&bal_&(14870)& 8:&aria&(14269)& global& • Outcome of matching depends on the domain of the documents stored in the database 전역 변수& 세계적인& 0:&loba&(25848)& variable& 변수& 1:&aria&(14269)& variable& 가변적인& 1:&aria&(14269)& 10
  • 11. Indexing and Querying Recap 자바 전역 변수 예제! 자바 :!Java! 전역 :!transfer! 전역 :!all!parts!(of)! 전역 변수 :!global!variable! 변수 :!variable! 예제 :!example! Java!global!variable! example!! 11
  • 12. Relationship with Content Addressability 자바 전역 변수 예제& 자바& Java& 전역 변수& 예제& global&variable& example& Lorem&ipsum&dolor&sit&amet,&consectetur&adipiscing&elit.& Quisque&id&Java&tris8que&nunc.&Ves8bulum&sit&amet&tortor& ullamcorper,&pre8um&augue&ac,&facilisis&quam.&Ut&convallis& suscipit&mauris,&at&porta&erat&vulputate&in.&Nulla&vitae& consectetur&risus.&global&variable&Aenean&justo&risus,&mollis& sed&condimentum&sed,&sagi@s&eget&nisl.&Phasellus&sem&leo,& commodo&at&dignissim&vitae,&ullamcorper&nec&metus.&Proin& pre8um&porta&lectus&nec&example&pulvinar.&Nulla&non& elementum&nisi,&vel&hendrerit&quam.&Curabitur&bibendum& lobor8s&8ncidunt.&Proin&vel&velit&porta,&tempus&ligula&a,& interdum&leo.&Aenean&lorem&nibh,&facilisis&ut&porta&sit&amet,& ornare&quis&ligula.& 12
  • 13. Evaluation • Matching • • • Did it translate all the search terms to the target language properly? Did it preserve domain-specific information? Searching • Hit ratio: # of relevant web pages / # of results on the first page • Total number of search results 13
  • 14. Evaluation • 재귀 열거 집합 - recursively enumerable sets • • 배낭 문제 시간 복잡도 - 배낭 issue the time complexity • • (3/3, 1/1) (3/4, 1/2) 가상화를 통한 데이터센터 에너지 효율 극대화 - through virtualization datacenter energy efficiency maximization • (7/7, 4/4) 14
  • 15. Evaluation • Query in source language “재귀 열거 집합” • • Query in target language “recursively enumerable sets” • • (6/10, 15,300) (10/10, 105,000) Google Translate result “Set of recursive enumeration” • (10/10, 1,990,000) 15
  • 16. Evaluation • Query in source language “배낭 문제 시간 복잡도” • • Query in target language “배낭 issue time complexity” • • (10/10, 31,200) (2/6, 2,270) Google Translate result “Knapsack problem, the time complexity” • (10/10, 206,000) 16
  • 17. Evaluation • Query in source language “가상화를 통한 데이터센터 에너지 효율 극대화” • • Query in target language “through virtualization datacenter energy efficiency maximization” • • (5/10, 36,100) (8/10, 264,000) Google Translate result “Maximize energy efficiency through data center virtualization” • (10/10, 284,000) 17
  • 18. Conclusion & Future Work • Preliminary results look satisfactory • Machine translation based CLIR appears to be more useful in many cases • Evaluation factors may not reflect the actual quality of the system • Labor-intensive evaluation process - need for an automated evaluation • Fuzzy matching based on lexical information (e.g., call, calls) • Fuzzy matching based on semantic information (e.g., maximize, maximizing, maximization, maximum) 18