This document discusses a project to directly translate Hindi text to Tamil text without an intermediate language like English. It describes using techniques like part-of-speech tagging, statistical machine translation, word sense disambiguation using the Lesk algorithm, and morphological analysis. The goal is to build an architecture that can take Hindi input, perform the necessary NLP techniques, and output the translation in Tamil. References are provided for related work.
Instrumentation, measurement and control of bio process parameters ( Temperat...
Machine Translation of Hindi to Tamil Text
1. Submitted to : Mr. Vimal Kumar K.
Hindi –Tamil Text Translation
Submitted By :
Vaibhav Agarwal 10103546
Akash Singh 10103549
2. Introduction
Natural Language Processing is a field of computer science,
artificial intelligence, and linguistics concerned with the
interactions between computers and human (natural) languages.
Traditionally , Interpreters having vast knowledge of source as
well as target languages have been involved in converting text
from source language to target language manually.
Machine Translation is a part of linguistics which includes the
task of automatically converting one natural language into
another, preserving the meaning of the input text, and producing
fluent text in the output language using the available technology.
3. Aim of Research
Hindi and Tamil are among the top 5 spoken languages in
India with a share of 41% and 5% respectively.
Translations services like Google Translate are still
working by taking an intermediate language such as
English to translate.
With this project , we have tried to directly convert Hindi
text to Tamil text without taking any intermediate language
.
4. Part-Of-Speech Tagging
Process of marking up a word in a text (corpus) as
corresponding to a particular part of speech, based on
both its definition and context
A simplified form of this is the identification of words as
nouns, verbs, adjectives, adverbs, etc.
Example :-
सोना चाांदी बहुत अनमोल धातु हैं ।
मुझे अभी सोना हैं ।
5. Statistical Machine Translation
– An Approach
Statistical Machine Translation (SMT) is a translation
system where translations are generated on the
basis of statistical models. These statistical models
parameters are derived from the analysis of bilingual
text corpora.
It is based on the view that every sentence in a
language has a possible translation in another
language. A sentence can be translated from one
language to another in many possible ways.
6. Word Sense Disambiguation (WSD)
– A Challenge
It is the process which governs the process of
identifying which sense of a word (i.e. meaning) is
used in a sentence, when the word has multiple
meanings.
For example :-
मैं आम खा रहा हूँ ।
यह आम रास्ता नहीां हैं ।
7. Lesk Algorithm
- An Approach to WSD
It selects a meaning for a particular target word
by comparing the dictionary definitions of its
possible senses with those of the other content
words in the surrounding window of context
It simply counts the number of words that overlap
between each sense of the target word and the
sense of other words in the sentence.
8. Architecture of Our Project
Hindi Input
Part-of-Speech
Tagging
Apply WSD
Morphological
Analysis
Local Word
Grouping
Perform Translation
& Produce Tamil
Text
9. Resource References
Indian Language Technology Proliferation and
Technology Centre
http://tdil-dc.in/
Centre for Indian language Technology , IIT
Bombay
http://www.cfilt.iitb.ac.in/
10. References
[1] Tripathi Sneha , Sarkhel Juran Krishna , “Approaches to machine
translation”, Annals of Library and Information Studies Vol 57 ,
December 2010
[2] Antony P.J. , “Machine Translation Approaches and Survey for Indian
Languages” , Computational Linguistics and Chinese Language
Processing Vol. 18, No. 1, March 2013
[3] Gupta Deepa , Chatterjee Niladri , “A Morpho Syntax Based
Adaptation and Retrieval Scheme for English to Hindi EBMT” ,
Department of Mathematics IIT Delhi
[4] Sobha Lalitha Devi, Pravin Pralayankar, Menaka S, Bakiyavathi
T, Vijay Sundar Ram R and Kavitha V , “Verb Transfer in a Tamil to
Hindi Machine Translation System”, 2010 International Conference on
Asian Language Processing
11. [5] Aswani Niraj, Gaizauskas Robert . “Developing Morphological
Analysers for South Asian Languages Experimenting with the Hindi and
Gujarati Languages ” ,Department of Computer Science University of
Sheffield
[6] Raghavendra Udupa U. and Tanveer A. Faruquie, “An English-Hindi
Statistical Machine Translation System” ,IBM India Research Lab New
Delhi
[7] Amba Kulkarni, Soma Paul, Malhar Kulkarni, Anil Kumar, Nitesh
Surtani ,”“Semantic processing of Compounds in Indian Languages”
[8] Pankaj Kumar , Atul Vishwakarma , Ashwini Kr. Sharma ,
“Approaches for Disambiguation in Hindi Language”