SlideShare a Scribd company logo
1 of 56
Presented by
NIKHIL.P
MCA S4
CHINTECH
INTRODUCTION
 TRANSLATION??

Translation is the communication of the meaning of
a source-language text by means of an equivalent
target-language text.
 TRANSLITERATION??
It is the conversion of a text from one script to another.
INTRODUCTION
 Why TRANSLATION??

Being able to establish links between two languages
allows for transferring resources from one language to
another.
Books written in unknown foreign languages can be
read by translating the contents of the book in our
own language.
Computers
Databases

Robotics

Artificial Intelligence

Algorithms

Natural Language Processing

Information
Retrieval

Machine
Translation

Networking

Search
INTRODUCTION
 Natural Language Processing(NLP)

NLP is a field of Computer Science, Artificial
Intelligence and Linguistics, concerned with the
interactions between computers and human(natural)
languages.
Applications of NLP
Machine Translation
database access
information retrieval
Machine Translation??
 Machine Translation is the automatic translation ,

for example using a computer system, from a first
language(source language) into another
language(target language).
Background
 Automatic machine language processing was one of

the first natural language processing applications
developed in computer science.
 Explores rule based, example based, knowledge based

and statistical approaches.
 Statistical Machine Translation(SMT) is the preferred

approach in many industrial and academic research.
 Rule based Machine Translation: a system of lexical,

grammatical, and reordering rules is created for
source/target pair. Rules are then applied to source to
produce output.
 Example based Machine Translation: a bilingual text

corpus is used directly for comparison against source
text and case based reasoning is applied to create
output.
What is Moses?
 It is an open source toolkit
 Toolkit for (SMT)Statistical Machine Translation
 Moses is under LGPL license
 It uses standard external toolkits such as GIZA++ and

SRILM
Statistical Machine Translation??
 Goal is to produce a target sentence from a source

sentence that maximizes the probability
 Statistical MT system is modeled as three separate
parts:
language model
translation model
decoder
language model(LM): assigns a probability to any
target string of words {P(e)}
an LM probability distribution over strings S that
attempts to reflect how frequently a string S occurs as
a sentence.
translation model(TM): assigns a probability to any
pair of target and source strings {P(f|e)}
decoder: determines translation based on
probabilities of LM & TM
GIZA++
 It is used for making word-alignments
 This toolkit is an implementation of the original IBM

Models that started machine translation research.
 First the language pairs are aligned bi-directionally, as

English to German and German to English
 This generates two word alignments, then performs
 Intersection-, we get a high-precision alignment of

high confidence alignment points,
 Union-, we get a high-recall alignment with additional
alignment points.
SRILM
 It is used for language modeling.
 It consists of the following components

A set of C++ class libraries implementing language
models, supporting data structures and miscellaneous
utility functions.
A set of executable programs built on top of these
libraries to perform standard tasks such as training
LMs and testing them on data,
A collection of miscellaneous scripts facilitating minor
related tasks
Moses Translation Process
 It involves
 Segmenting the source sentence into source phrases
 Translating each source phrase into a target phrase
 & optionally reordering the target phrases into a target

sentence.
Moses Toolkit
 Consists of all the components needed to preprocess

data , train the language models and the translation
models.
 Also contains tools for tuning these models using

minimum error rate.
 External tools like GIZA++ & SRILM
Moses Toolkit
 Decoder is the core component of Moses.
 Phrase based decoder is used.

 Job of decoder is to find the highest scoring sentence

in the target language corresponding to source
sentence.
 Possible to output a ranked list of translation

candidates
 Principles used when developing Moses decoder
 Accessibility
 Easy

to maintain
 Flexibility
 Easy for distributed team development
 Portability

 It was developed in C++ for efficiency and followed

modular, object-oriented design.
 Decoding process in various ways:

-Input:-can be plain sentence
-Translation model
-Decoding algorithm

-Language model
 Contributed Tools
 Moses Server- provides an xml-rpc interface to the

decoder
 Web translation- set of scripts to translate webpage
 Analysis tools- scripts to enable and analyze the

visualization of Moses output
Moses Decoder
A simple translation model

Contains two files:
Phrase-table(phrase translation table)
{de ||| the ||| 0.3 ||| |||}
Moses.ini(configuration file)
The decoder is controlled by moses.ini
 Phrase table:

The phrase translation tables are the main knowledge
source for the machine translation decoder.

• entry means that the probability of translating the

English word the from the German der is 0.3.
 Configuration file

The decoder is controlled by the Moses configuration
file moses.ini

translation model files and language model files are
specified here.
Moses Decoder
Trace

This option reveals which phrase translation were used
in the best translation found by the decoder.
Moses Decoder
Tuning for Quality

the probability cost is assigned by four models
 Phrase translation table (phi(f|e)

ensures that both source and target language
phrases are good translation of each other
 Language model (LM(e))

ensures that the output is fluent target language
 Reordering model (D(e,f))

allows for the re-ordering of the input sentence

 Word penalty (W(e))

to ensure that the translation do not get too long or
too short
Moses Decoder
Tuning for Speed

speed-ups are achieved by limiting the search space
of the decoder
• Translation table size
• Hypothesis stack size
Translation table size




one strategy is to reduce the number of translation
options used for each input phrase , i.e., number of
table entries that are retrieved.

two ways to limit table size
I.
II.

fixed limits on translation options retrieved
phrase translation probability has to above some value
 Hypothesis stack size

another way to reduce the search space is to reduce
the size of hypothesis stacks.
for each number of foreign words translated, decoder
keeps a stack of the best translations.
Moses Decoder
Limit on Distortion
 Reordering cost is measured by the number of words

skipped when foreign phrases are picked out of order.
 Reordering cost is computed for finding the best target
pair probability.
Moses Decoder
Decoding Algorithm
 Decoder uses a beam search algorithm
 The output sentence is generated left to right in form

of hypothesis
 Final state in the search are hypotheses that cover all

foreign words.
 Beam Search
an efficient search algorithm that quickly finds the
highest probability translation among the exponential
number of choices.
Search through the space of hypotheses generated is
performed using beam search that keeps in each node
the list of the top best translations for the node.
The score for the translation is computed using the
weights of the individual phrases that make up the
translation and the overall LM probability of the
combination.
The scores are computed by querying the standard
Moses Phrase Table and the LM for the target
language.
Language Models
 Decoder works with the following language models:
SRI language model
IRST language model
RandLM

KenLM is included by default in moses
Translating Webpages with Moses
 Moses servers are installed in one or several computers
 On each Moses server, a daemon(daemon.pl) accepts

network connection on a given port and copies
everything it receives from the connection to Moses.
 Another web server runs Apache or any web server

software
 Through web server cgi scripts(index.cgi, translate.cgi)

are served to clients.
 A client request index.cgi via the web server, a form

containing textbox is served back to enter the URL.
 The form is submitted to “translate.cgi” which does the

job.
it fetches page from web
extract plaintext from it
send those to moses server
inserts the translation back into document& to client
Setting up MOSES server
Choosing machines for moses servers
running Moses is slow and expensive process, so the
machine used must have a fast processor and as many
GB’s of memory as possible.
Install Moses
for each moses server, need to install and configure
the language pair that we wish to use.
Setting up MOSES server
Install daemon.pl
open bin/daemon.pl and edit the $MOSES and
$MOSES_INI paths to point to the location of moses
binary and moses configuration file.
Choose a port number
pick any port number between 1024 and 49151 for the
daemon process to listen on.
Setting up MOSES server
Start the daemon
to activate Moses server, type in a shell on the server,
./daemon.pl <hostname> <port>

hostname is the name of the host where Moses is
installed.
port is the selected port
Setting up MOSES server
Configure web server to connect to Moses server
final step is to tell the front-end Web server where to
find the back-end Moses server
in the translate.cgi script set the
@MOSES_ADDRESS array to the list of hostname:port
strings identifying the Moses servers.
Comparison with pharaoh and phramer for a fren translation of 2000 sentences
Installing Moses
Need to install boost
sudo apt-get install libboost-all-dev
get source code
git clone git://github.com/mosessmt/mosesdecoder.git
Installing GIZA++
 wget http://giza-pp.googlecode.com/files/giza-pp-

v1.0.7.tar.gz
 tar xzvf giza-pp-v1.0.7.tar.gz
 cd giza-pp
 Make

 cd ~/mosesdecoder
 mkdir tools
 cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++-

v2/snt2cooc.out ~/giza-pp/mkcls-v2/mkcls tools
Installing IRSTLM
 tar zxvf irstlm-5.80.01.tgz
 cd irstlm-5.80.01
 ./regenerate-makefiles.sh
 ./configure --prefix=$HOME/irstlm

 make install
Moses Platform
 Primary development platform for Moses is Linux.
 & recommended platform is Linux since it is easier to

get support for it.
 However it works on other platforms also.
Moses Releases
 Moses 1.0 (28th Jan 2013)
 Moses 0.91 (12th Oct 2012)
Importance of Moses
 Moses is an installable software unlike other online-

only translation systems
 Online systems cannot be trained on our own data
 There is also a problem with privacy, if you have to

translate sensitive info.
Conclusion
Moses is an open source toolkit, so that the users can
modify and customize the toolkit based on their needs
and requirements.
Reference
 www.statmt.org/moses/
 www.crosslang.com/en/machine-translation/custom-

built-mt-engines/moses-smt
Questions??

More Related Content

What's hot

Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
zamakhan
 
KBS Lecture Notes
KBS Lecture NotesKBS Lecture Notes
KBS Lecture Notes
butest
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
vini89
 

What's hot (20)

Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Morphological Analysis
Morphological AnalysisMorphological Analysis
Morphological Analysis
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
KBS Lecture Notes
KBS Lecture NotesKBS Lecture Notes
KBS Lecture Notes
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
09 Literary Translation #1 Poetry
09 Literary Translation #1 Poetry09 Literary Translation #1 Poetry
09 Literary Translation #1 Poetry
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Screen translation
Screen translationScreen translation
Screen translation
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
03 ra-examples3(1)
03 ra-examples3(1)03 ra-examples3(1)
03 ra-examples3(1)
 
2023 State of Automatic Speech Recognition
2023 State of Automatic Speech Recognition2023 State of Automatic Speech Recognition
2023 State of Automatic Speech Recognition
 

Viewers also liked

Moses Presentation (religion grade 11)
Moses Presentation (religion grade 11)Moses Presentation (religion grade 11)
Moses Presentation (religion grade 11)
Amanda Iliadis
 

Viewers also liked (20)

Intro to trans 350 lecture 1
Intro to trans 350 lecture 1Intro to trans 350 lecture 1
Intro to trans 350 lecture 1
 
The Story of Moses
The Story of MosesThe Story of Moses
The Story of Moses
 
SMT3
SMT3SMT3
SMT3
 
December 14,2014 Pass The Test of Offering for God's Great Blessings
December 14,2014 Pass The Test of Offering for God's Great BlessingsDecember 14,2014 Pass The Test of Offering for God's Great Blessings
December 14,2014 Pass The Test of Offering for God's Great Blessings
 
MOSE Project
MOSE ProjectMOSE Project
MOSE Project
 
Territories of urban design
Territories of urban designTerritories of urban design
Territories of urban design
 
Robert moses
Robert mosesRobert moses
Robert moses
 
Isaiah: 'The Song of Moses and the Lamb
Isaiah:  'The Song of Moses and the LambIsaiah:  'The Song of Moses and the Lamb
Isaiah: 'The Song of Moses and the Lamb
 
210 Moses course WH
210 Moses course WH210 Moses course WH
210 Moses course WH
 
Heroes of Faith
Heroes of FaithHeroes of Faith
Heroes of Faith
 
Joseph the Dreamer
Joseph the DreamerJoseph the Dreamer
Joseph the Dreamer
 
Moses Presentation (religion grade 11)
Moses Presentation (religion grade 11)Moses Presentation (religion grade 11)
Moses Presentation (religion grade 11)
 
Rem koolhass
Rem  koolhassRem  koolhass
Rem koolhass
 
Storia di Mosè
Storia di MosèStoria di Mosè
Storia di Mosè
 
Rem Koolhaas
Rem KoolhaasRem Koolhaas
Rem Koolhaas
 
Rem koolhaas
Rem koolhaasRem koolhaas
Rem koolhaas
 
Seattle public library
Seattle public librarySeattle public library
Seattle public library
 
Moses
MosesMoses
Moses
 
Peckham Library Case Study
Peckham Library Case StudyPeckham Library Case Study
Peckham Library Case Study
 
Rem Koolhaas –designing the design process
Rem Koolhaas –designing the design processRem Koolhaas –designing the design process
Rem Koolhaas –designing the design process
 

Similar to Moses

Chapter One
Chapter OneChapter One
Chapter One
bolovv
 
Language translators
Language translatorsLanguage translators
Language translators
Aditya Sharat
 

Similar to Moses (20)

Compiler design Introduction
Compiler design IntroductionCompiler design Introduction
Compiler design Introduction
 
How to Translate from English to Khmer using Moses
How to Translate from English to Khmer using MosesHow to Translate from English to Khmer using Moses
How to Translate from English to Khmer using Moses
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
 
2 Programming Language.pdf
2 Programming Language.pdf2 Programming Language.pdf
2 Programming Language.pdf
 
Compiler an overview
Compiler  an overviewCompiler  an overview
Compiler an overview
 
3.2
3.23.2
3.2
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
 
Chapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdfChapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdf
 
Lecture1 compilers
Lecture1 compilersLecture1 compilers
Lecture1 compilers
 
Chapter One
Chapter OneChapter One
Chapter One
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Introduction to compiler development
Introduction to compiler developmentIntroduction to compiler development
Introduction to compiler development
 
Chapter#01 cc
Chapter#01 ccChapter#01 cc
Chapter#01 cc
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
Compiler Design Introduction
Compiler Design Introduction Compiler Design Introduction
Compiler Design Introduction
 
Language translators
Language translatorsLanguage translators
Language translators
 
Zerfass trends in translation technologies
Zerfass trends in translation technologiesZerfass trends in translation technologies
Zerfass trends in translation technologies
 
compiler construction tool in computer science .
compiler construction tool in computer science .compiler construction tool in computer science .
compiler construction tool in computer science .
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 

Moses

  • 1.
  • 3. INTRODUCTION  TRANSLATION?? Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text.  TRANSLITERATION?? It is the conversion of a text from one script to another.
  • 4. INTRODUCTION  Why TRANSLATION?? Being able to establish links between two languages allows for transferring resources from one language to another. Books written in unknown foreign languages can be read by translating the contents of the book in our own language.
  • 5. Computers Databases Robotics Artificial Intelligence Algorithms Natural Language Processing Information Retrieval Machine Translation Networking Search
  • 6. INTRODUCTION  Natural Language Processing(NLP) NLP is a field of Computer Science, Artificial Intelligence and Linguistics, concerned with the interactions between computers and human(natural) languages. Applications of NLP Machine Translation database access information retrieval
  • 7. Machine Translation??  Machine Translation is the automatic translation , for example using a computer system, from a first language(source language) into another language(target language).
  • 8. Background  Automatic machine language processing was one of the first natural language processing applications developed in computer science.  Explores rule based, example based, knowledge based and statistical approaches.  Statistical Machine Translation(SMT) is the preferred approach in many industrial and academic research.
  • 9.  Rule based Machine Translation: a system of lexical, grammatical, and reordering rules is created for source/target pair. Rules are then applied to source to produce output.  Example based Machine Translation: a bilingual text corpus is used directly for comparison against source text and case based reasoning is applied to create output.
  • 10. What is Moses?  It is an open source toolkit  Toolkit for (SMT)Statistical Machine Translation  Moses is under LGPL license  It uses standard external toolkits such as GIZA++ and SRILM
  • 11. Statistical Machine Translation??  Goal is to produce a target sentence from a source sentence that maximizes the probability  Statistical MT system is modeled as three separate parts: language model translation model decoder
  • 12. language model(LM): assigns a probability to any target string of words {P(e)} an LM probability distribution over strings S that attempts to reflect how frequently a string S occurs as a sentence.
  • 13. translation model(TM): assigns a probability to any pair of target and source strings {P(f|e)} decoder: determines translation based on probabilities of LM & TM
  • 14. GIZA++  It is used for making word-alignments  This toolkit is an implementation of the original IBM Models that started machine translation research.
  • 15.
  • 16.  First the language pairs are aligned bi-directionally, as English to German and German to English  This generates two word alignments, then performs  Intersection-, we get a high-precision alignment of high confidence alignment points,  Union-, we get a high-recall alignment with additional alignment points.
  • 17. SRILM  It is used for language modeling.  It consists of the following components A set of C++ class libraries implementing language models, supporting data structures and miscellaneous utility functions. A set of executable programs built on top of these libraries to perform standard tasks such as training LMs and testing them on data, A collection of miscellaneous scripts facilitating minor related tasks
  • 18. Moses Translation Process  It involves  Segmenting the source sentence into source phrases  Translating each source phrase into a target phrase  & optionally reordering the target phrases into a target sentence.
  • 19. Moses Toolkit  Consists of all the components needed to preprocess data , train the language models and the translation models.  Also contains tools for tuning these models using minimum error rate.  External tools like GIZA++ & SRILM
  • 20. Moses Toolkit  Decoder is the core component of Moses.  Phrase based decoder is used.  Job of decoder is to find the highest scoring sentence in the target language corresponding to source sentence.  Possible to output a ranked list of translation candidates
  • 21.  Principles used when developing Moses decoder  Accessibility  Easy to maintain  Flexibility  Easy for distributed team development  Portability  It was developed in C++ for efficiency and followed modular, object-oriented design.
  • 22.  Decoding process in various ways: -Input:-can be plain sentence -Translation model -Decoding algorithm -Language model
  • 23.  Contributed Tools  Moses Server- provides an xml-rpc interface to the decoder  Web translation- set of scripts to translate webpage  Analysis tools- scripts to enable and analyze the visualization of Moses output
  • 24. Moses Decoder A simple translation model Contains two files: Phrase-table(phrase translation table) {de ||| the ||| 0.3 ||| |||} Moses.ini(configuration file) The decoder is controlled by moses.ini
  • 25.  Phrase table: The phrase translation tables are the main knowledge source for the machine translation decoder. • entry means that the probability of translating the English word the from the German der is 0.3.
  • 26.  Configuration file The decoder is controlled by the Moses configuration file moses.ini translation model files and language model files are specified here.
  • 27. Moses Decoder Trace This option reveals which phrase translation were used in the best translation found by the decoder.
  • 28. Moses Decoder Tuning for Quality the probability cost is assigned by four models  Phrase translation table (phi(f|e) ensures that both source and target language phrases are good translation of each other  Language model (LM(e)) ensures that the output is fluent target language
  • 29.  Reordering model (D(e,f)) allows for the re-ordering of the input sentence  Word penalty (W(e)) to ensure that the translation do not get too long or too short
  • 30. Moses Decoder Tuning for Speed speed-ups are achieved by limiting the search space of the decoder • Translation table size • Hypothesis stack size
  • 31. Translation table size   one strategy is to reduce the number of translation options used for each input phrase , i.e., number of table entries that are retrieved. two ways to limit table size I. II. fixed limits on translation options retrieved phrase translation probability has to above some value
  • 32.  Hypothesis stack size another way to reduce the search space is to reduce the size of hypothesis stacks. for each number of foreign words translated, decoder keeps a stack of the best translations.
  • 33. Moses Decoder Limit on Distortion  Reordering cost is measured by the number of words skipped when foreign phrases are picked out of order.  Reordering cost is computed for finding the best target pair probability.
  • 34.
  • 36. Decoding Algorithm  Decoder uses a beam search algorithm  The output sentence is generated left to right in form of hypothesis  Final state in the search are hypotheses that cover all foreign words.
  • 37.  Beam Search an efficient search algorithm that quickly finds the highest probability translation among the exponential number of choices. Search through the space of hypotheses generated is performed using beam search that keeps in each node the list of the top best translations for the node.
  • 38. The score for the translation is computed using the weights of the individual phrases that make up the translation and the overall LM probability of the combination. The scores are computed by querying the standard Moses Phrase Table and the LM for the target language.
  • 39. Language Models  Decoder works with the following language models: SRI language model IRST language model RandLM KenLM is included by default in moses
  • 41.  Moses servers are installed in one or several computers  On each Moses server, a daemon(daemon.pl) accepts network connection on a given port and copies everything it receives from the connection to Moses.  Another web server runs Apache or any web server software  Through web server cgi scripts(index.cgi, translate.cgi) are served to clients.
  • 42.  A client request index.cgi via the web server, a form containing textbox is served back to enter the URL.  The form is submitted to “translate.cgi” which does the job. it fetches page from web extract plaintext from it send those to moses server inserts the translation back into document& to client
  • 43. Setting up MOSES server Choosing machines for moses servers running Moses is slow and expensive process, so the machine used must have a fast processor and as many GB’s of memory as possible. Install Moses for each moses server, need to install and configure the language pair that we wish to use.
  • 44. Setting up MOSES server Install daemon.pl open bin/daemon.pl and edit the $MOSES and $MOSES_INI paths to point to the location of moses binary and moses configuration file. Choose a port number pick any port number between 1024 and 49151 for the daemon process to listen on.
  • 45. Setting up MOSES server Start the daemon to activate Moses server, type in a shell on the server, ./daemon.pl <hostname> <port> hostname is the name of the host where Moses is installed. port is the selected port
  • 46. Setting up MOSES server Configure web server to connect to Moses server final step is to tell the front-end Web server where to find the back-end Moses server in the translate.cgi script set the @MOSES_ADDRESS array to the list of hostname:port strings identifying the Moses servers.
  • 47. Comparison with pharaoh and phramer for a fren translation of 2000 sentences
  • 48. Installing Moses Need to install boost sudo apt-get install libboost-all-dev get source code git clone git://github.com/mosessmt/mosesdecoder.git
  • 49. Installing GIZA++  wget http://giza-pp.googlecode.com/files/giza-pp- v1.0.7.tar.gz  tar xzvf giza-pp-v1.0.7.tar.gz  cd giza-pp  Make  cd ~/mosesdecoder  mkdir tools  cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++- v2/snt2cooc.out ~/giza-pp/mkcls-v2/mkcls tools
  • 50. Installing IRSTLM  tar zxvf irstlm-5.80.01.tgz  cd irstlm-5.80.01  ./regenerate-makefiles.sh  ./configure --prefix=$HOME/irstlm  make install
  • 51. Moses Platform  Primary development platform for Moses is Linux.  & recommended platform is Linux since it is easier to get support for it.  However it works on other platforms also.
  • 52. Moses Releases  Moses 1.0 (28th Jan 2013)  Moses 0.91 (12th Oct 2012)
  • 53. Importance of Moses  Moses is an installable software unlike other online- only translation systems  Online systems cannot be trained on our own data  There is also a problem with privacy, if you have to translate sensitive info.
  • 54. Conclusion Moses is an open source toolkit, so that the users can modify and customize the toolkit based on their needs and requirements.