1. Drop me a mail:Drop me a mail: rushdecoder@yahoo.comrushdecoder@yahoo.com
Visit me at:Visit me at: http://http://rushdishams.googlepages.comrushdishams.googlepages.com
1Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh
2. Translation Approach
The translation process may be stated as:
1. Decoding the meaning of the source text
2. Re-encoding this meaning in the target
language.
Machine translation can use a method
based on linguistic rules-
words will be translated in a linguistic way
the most suitable words of the target language
will replace the ones in the source language.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 2
3. Translation Approach
The success of machine translation requires
the problem of natural language
understanding to be solved first.
Generally, rule-based methods
parse a text,
usually creating an intermediary, symbolic
representation,
from which the text in the target language is
generated.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 3
4. Translation Approach
According to the nature of the intermediary
representation, an approach is described as
interlingual machine translation or
transfer-based machine translation.
These methods require extensive
lexicons with
morphological,
syntactic, and
semantic information, and
large sets of rules.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 4
5. Translation Approach
Machine translation programs often work
well enough for
a native speaker of one language to get the
approximate meaning of what is written by the
other native speaker.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 5
6. Translation Approach
the large multilingual corpus of data needed
for statistical methods to work is not
necessary for the grammar-based methods.
But then, the grammar methods need a
skilled linguist to carefully design the
grammar that they use.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 6
7. Types of Machine Translation
Text
Generation
Syntactic
Parsing
Semantic
Analysis
Sentence
Planning
Source
(Arabic)
Target
(English)
Transfer
Rules
Direct: SMT, EBMT
Interlingua
8. Rule based MT
The rule-based machine translation
paradigm includes
1. transfer-based machine translation,
2. interlingual machine translation and
3. dictionary-based machine translation
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 8
10. Transfer based MT
Itis necessary to have an intermediate
representation that captures the "meaning"
of the original sentence in order to generate
the correct translation
In interlingua-based MT this intermediate
representation must be independent of the
languages in question, whereas in transfer-
based MT, it has some dependence on the
language pair involved.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 10
11. Transfer based MT
The original text is first analyzed
morphologically and
syntactically
in order to obtain a syntactic
representation.
This representation can then be refined to a
more abstract level putting emphasis on the
parts relevant for translation and ignoring
other types of information.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 11
12. Transfer based MT
The transfer process then converts this final
representation (still in the original
language) to a representation of the same
level of abstraction in the target language.
These two representations are referred to
as "intermediate" representations.
From the target language representation,
the stages are then applied in reverse.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 12
14. Transformation process
Morphological analysis
Surface forms of the input text are classified as
○ to part-of-speech (e.g. noun, verb, etc.) and
○ sub-category (number, gender, tense, etc.)
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 14
15. Transformation process
Lexical categorization
In any given text some of the words may have
more than one meaning, causing ambiguity in
analysis.
Lexical categorization looks at the context of a
word to try and determine the correct meaning
in the context of the input.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 15
16. Transformation process
Lexical transfer
This is basically dictionary translation
the source language lemma (perhaps with sense
information) is looked up in a bilingual
dictionary and the translation is chosen.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 16
17. Transformation process
Structural transfer
While the previous stages deal with words, this
stage deals with larger constituents
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 17
18. Transformation process
Morphological generation
From the output of the structural transfer
stage, the target language surface forms are
generated.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 18
19. Transfer Types
Superficial transfer (or syntactic)
This level is characterized by transferring
"syntactic structures" between the source and
target languages.
It is suitable for languages in the same family or
of the same type.
for example in the Romance languages between
Spanish, Catalan, French, Italian, etc.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 19
20. Transfer Types
Deep transfer (or semantic)
This level constructs a semantic representation that is
dependent on the source language.
This representation can consist of a series of structures
which represent the meaning.
In these transfer systems predicates are typically
produced.
The translation also typically requires structural
transfer.
This level is used to translate between more distantly
related languages (e.g. Spanish-English or Spanish-
Basque, etc.)
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 20
24. Interlingual MT
the source language, i.e. the text to be
translated is transformed into an
interlingua, i.e., an abstract language-
independent representation.
The target language is then generated from
the interlingua.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 24
25. Interlingual MT
In the direct approach, words are translated
directly without passing through an additional
representation.
In the transfer approach the source language is
transformed into an abstract, less language-
specific representation.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 25
27. Advantage and disadvantage
The advantage in multilingual machine
translations is that no transfer component
has to be created for each language pair
The obvious disadvantage is that the
definition of an interlingua is difficult and
maybe even impossible for a wider domain.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 27
28. Components
Dictionaries for analysis and generation
A conceptual lexicon, which is
the knowledge base about events and
entities known in the domain.
A set of projection rules (specific to the
domain and the languages).
Grammars for the analysis and generation
of the languages involved.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 28
30. Dictionary-based MT
The words will be translated as a dictionary does
— word by word, usually without much
correlation of meaning between them
Dictionary lookups may be done with or without
morphological analysis or lemmatisation
used to expedite manual translation, if the person
carrying it out is fluent in both languages and
therefore capable of correcting syntax and
grammar.
Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 30
34. Example-based MT
characterized by its use of a bilingual corpus
with parallel texts as its main knowledge
base
It is essentially a translation by analogy and
can be viewed as an implementation of
case-based reasoning approach of machine
learning
35. Example-based MT
characterized by its use of a bilingual corpus
with parallel texts as its main knowledge
base
It is essentially a translation by analogy and
can be viewed as an implementation of
case-based reasoning approach of machine
learning
37. Example-based MT
bilingual parallel corpora contain sentence
pairs like the example shown in the table.
How much is that X ? corresponds to Ano X
wa ikura desu ka.
red umbrella corresponds to akai kasa
small camera corresponds to chiisai kamera
38. Example-based MT
President Kennedy was shot dead during the
parade. and The convict escaped on July
15th. We could translate the sentence The
convict was shot dead during the parade. by
substituting the appropriate parts of the
sentences.
40. Statistical MT
The idea behind statistical machine
translation comes from information theory.
A document is translated according to the
probability distribution p(e | f) that a string
e in the target language (for example,
English) is the translation of a string f in the
source language (for example, French).
41. Statistical MT
The problem of modeling the probability
distribution p(e | f) has been approached in
a number of ways. One intuitive approach is
to apply Bayes Theorem
42. where the translation model p(f | e) is the probability that the
source string is the translation of the target string, and the
language model p(e) is the probability of seeing that target
language string string.
43. Statistical MT
Finding the best translation is done by picking up
the one that gives the highest probability