SlideShare a Scribd company logo
1 of 13
Download to read offline
Machine Translation for Indic
 Languages using Apertium

               Pranava Swaroop S
  Malaviya National Institute of Technology, Jaipur
Conclusion!
●   Most of the Machine Translation Systems are
    closed
●   They rarely are friendly towards their peers
●   Most Indian MT systems are stagnant or closed
    or badly documented
●   Interchange is mostly a “Mission Impossible –
    5” ;-)
●   Integration !!!
Results(Indian perspective)
●   Indian MT engines either stagnant
●   Undocumented
●   Coded by different people
●   No uniformity
●   Mostly CLOSED
●   Mostly Undeployable
●   Versioning is a term which most of the Indian
    organizations(producing MT) have never heard
    of
So??
●   Need for an Open Collaboration
●   Would be great if there is a base is readily
    available
●   Must be well documented
●   Must be portable
●   Must be active
Why all this?
●   India has more than 18 Languages defined in
    the constitution
●   Very less literary resources {digital}
●   Need for rapid conversion and Immediate
    generation of digital data
●   Need for collaboration
●   Though the accuracy may be low during initial
    phases
And
●   Indic Languages belong to the huge group
    namely:
●   Indo-European
●   Indo-Iranian
●   Indo-Aryan
●   Dravidian
●   Some of the well known languages from these
    groups have already well formed corpus and
    translation rules.
The use?




Can we inherit some properties?
●
Any options?
●   Apertium!!
●   Apertium is an open-source machine translation
    toolbox (http://www.apertium.org) providing:
    ●   1 An open-source modular shallow-transfer
        machine
    ●   translation engine with:
    ●   text format management
    ●   finite-state lexical processing
    ●   statistical lexical disambiguation
    ●   shallow transfer based on finite-state pattern
        matching
●   * Spanish–Catalan (apertium-es-ca)
●   * Spanish–Portuguese (apertium-es-pt)
●   * Spanish–Galician (apertium-es-gl)
●   * Occitan–Catalan (apertium-oc-ca)
●   * French–Catalan (apertium-fr-ca)
●   * English–Catalan (apertium-en-ca)
Do what with that?
●   Most of the Indic languages are well known as
    close neihbours
●   Most of the grammatical constructs are almost
    the same.
●   Use apertium for the translation of close
    neighbours, though it is known that apertium
    works for sparsely spaced languages
Introduction
●   Translation of closely related languages
●   The need to write specific translation rules
●   http://apertium.svn.sourceforge.net/viewvc/apertium

●   http://xixona.dlsi.ua.es/~fran/hindidict.txt
●   Please download apertium and lt-toolbox from
    http://www.apertium.org
●   Live demonstration
●   Extend it to different languages
●   The application to urdu hindi translation
●   http://sanskrit.uohyd.ernet.in/~anusaaraka/urdu/
    Urdu-Hindi-Translation/
Contribute
●   Mail me pmadhyastha@acm.org
●   Join apertium@irc.freenode.net
●   https://lists.sourceforge.net/lists/listinfo/apertium-stu

More Related Content

Similar to Machine Translation of Indic Languages using apertium

Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsHPCC Systems
 
Programming Languages | Computer Science
Programming Languages | Computer ScienceProgramming Languages | Computer Science
Programming Languages | Computer ScienceTransweb Global Inc
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text ProcessingSuneel Marthi
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextDataWorks Summit
 
SaudiNIC Variant Management System
SaudiNIC Variant Management SystemSaudiNIC Variant Management System
SaudiNIC Variant Management SystemAPNIC
 
Localisation
LocalisationLocalisation
LocalisationDrRider
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Janifer Gatenby
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...Apache OpenNLP
 
Sahana Internationalisation Languages and beyond
Sahana Internationalisation Languages and beyondSahana Internationalisation Languages and beyond
Sahana Internationalisation Languages and beyondTalkSahana
 
Compiled vs interpreted Linguages
Compiled vs interpreted LinguagesCompiled vs interpreted Linguages
Compiled vs interpreted LinguagesCristiano Cunha
 
MoZH propose
MoZH proposeMoZH propose
MoZH proposelittlebtc
 
Evolution or stagnation programming languages
Evolution or stagnation programming languagesEvolution or stagnation programming languages
Evolution or stagnation programming languagesDaniele Esposti
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationIconic Translation Machines
 
Python: The Programmer's Lingua Franca
Python: The Programmer's Lingua FrancaPython: The Programmer's Lingua Franca
Python: The Programmer's Lingua FrancaActiveState
 
GENERATION OF COMPUTER LANGUAGE.pptx
GENERATION OF COMPUTER LANGUAGE.pptxGENERATION OF COMPUTER LANGUAGE.pptx
GENERATION OF COMPUTER LANGUAGE.pptxRishabhkumar224575
 
Research data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitlingResearch data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitlingUniversity of Warsaw
 
Scripting Recipes for Testers
Scripting Recipes for TestersScripting Recipes for Testers
Scripting Recipes for TestersAdam Goucher
 

Similar to Machine Translation of Indic Languages using apertium (20)

Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for Translations
 
Programming Languages | Computer Science
Programming Languages | Computer ScienceProgramming Languages | Computer Science
Programming Languages | Computer Science
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
SaudiNIC Variant Management System
SaudiNIC Variant Management SystemSaudiNIC Variant Management System
SaudiNIC Variant Management System
 
Localisation
LocalisationLocalisation
Localisation
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
Sahana Internationalisation Languages and beyond
Sahana Internationalisation Languages and beyondSahana Internationalisation Languages and beyond
Sahana Internationalisation Languages and beyond
 
Compiled vs interpreted Linguages
Compiled vs interpreted LinguagesCompiled vs interpreted Linguages
Compiled vs interpreted Linguages
 
MoZH propose
MoZH proposeMoZH propose
MoZH propose
 
Evolution or stagnation programming languages
Evolution or stagnation programming languagesEvolution or stagnation programming languages
Evolution or stagnation programming languages
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine Translation
 
Python: The Programmer's Lingua Franca
Python: The Programmer's Lingua FrancaPython: The Programmer's Lingua Franca
Python: The Programmer's Lingua Franca
 
GENERATION OF COMPUTER LANGUAGE.pptx
GENERATION OF COMPUTER LANGUAGE.pptxGENERATION OF COMPUTER LANGUAGE.pptx
GENERATION OF COMPUTER LANGUAGE.pptx
 
Programming language
Programming languageProgramming language
Programming language
 
programming language.pdf
programming language.pdfprogramming language.pdf
programming language.pdf
 
Research data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitlingResearch data as an aid in teaching technical competence in subtitling
Research data as an aid in teaching technical competence in subtitling
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Scripting Recipes for Testers
Scripting Recipes for TestersScripting Recipes for Testers
Scripting Recipes for Testers
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Machine Translation of Indic Languages using apertium

  • 1. Machine Translation for Indic Languages using Apertium Pranava Swaroop S Malaviya National Institute of Technology, Jaipur
  • 2. Conclusion! ● Most of the Machine Translation Systems are closed ● They rarely are friendly towards their peers ● Most Indian MT systems are stagnant or closed or badly documented ● Interchange is mostly a “Mission Impossible – 5” ;-) ● Integration !!!
  • 3. Results(Indian perspective) ● Indian MT engines either stagnant ● Undocumented ● Coded by different people ● No uniformity ● Mostly CLOSED ● Mostly Undeployable ● Versioning is a term which most of the Indian organizations(producing MT) have never heard of
  • 4. So?? ● Need for an Open Collaboration ● Would be great if there is a base is readily available ● Must be well documented ● Must be portable ● Must be active
  • 5. Why all this? ● India has more than 18 Languages defined in the constitution ● Very less literary resources {digital} ● Need for rapid conversion and Immediate generation of digital data ● Need for collaboration ● Though the accuracy may be low during initial phases
  • 6. And ● Indic Languages belong to the huge group namely: ● Indo-European ● Indo-Iranian ● Indo-Aryan ● Dravidian ● Some of the well known languages from these groups have already well formed corpus and translation rules.
  • 7. The use? Can we inherit some properties? ●
  • 8. Any options? ● Apertium!! ● Apertium is an open-source machine translation toolbox (http://www.apertium.org) providing: ● 1 An open-source modular shallow-transfer machine ● translation engine with: ● text format management ● finite-state lexical processing ● statistical lexical disambiguation ● shallow transfer based on finite-state pattern matching
  • 9. * Spanish–Catalan (apertium-es-ca) ● * Spanish–Portuguese (apertium-es-pt) ● * Spanish–Galician (apertium-es-gl) ● * Occitan–Catalan (apertium-oc-ca) ● * French–Catalan (apertium-fr-ca) ● * English–Catalan (apertium-en-ca)
  • 10. Do what with that? ● Most of the Indic languages are well known as close neihbours ● Most of the grammatical constructs are almost the same. ● Use apertium for the translation of close neighbours, though it is known that apertium works for sparsely spaced languages
  • 11. Introduction ● Translation of closely related languages ● The need to write specific translation rules ● http://apertium.svn.sourceforge.net/viewvc/apertium ● http://xixona.dlsi.ua.es/~fran/hindidict.txt
  • 12. Please download apertium and lt-toolbox from http://www.apertium.org ● Live demonstration ● Extend it to different languages ● The application to urdu hindi translation ● http://sanskrit.uohyd.ernet.in/~anusaaraka/urdu/ Urdu-Hindi-Translation/
  • 13. Contribute ● Mail me pmadhyastha@acm.org ● Join apertium@irc.freenode.net ● https://lists.sourceforge.net/lists/listinfo/apertium-stu