SlideShare a Scribd company logo
1 of 16
Download to read offline
A Survey of Arabic
   Question Answering
Challenges, Tasks, Approaches,
   Tools, and Future Trends


  Ahmed Magdy & Dr. Mohamed Shaheen
               ACIT 2012
Outline
●   Motivation
●   Question Answering Tasks
    - Question Analysis, Passage Retrieval, and Answer
    Extraction
●   Arabic Language Challenges
●   Approaches
    - Stemming, Named Entity Recognition, Language
    Resources
●   Tools
●   Future Trends And Open Issues
Motivation
●   Arabic is the 6th most important language
●   More than 300 million speakers
●   Increasing amounts of Arabic content on the
    Internet
●   Increasing demand for Information
●   There is no survey that covers Arabic
    Question Answering
Question Answering Tasks




●   Question Analysis
●   Passage Retrieval
●   Answer Extraction
Question Analysis
●   Tokenization & Normalization
●   Remove stop words
●   Named Entity Recognition (gazetteer, maxent model)
●   Stemming all words except Named Entities
●   Question Focus determination by extracting the main NE
●   Keywords Extraction & Expansion
●   Answer type extraction by question words (Name, Place,
    Date, Quantity)
●   Query generation of keywords into a Boolean formula
●   Experiments with cross-language Arabic/English QA
●   Not Promising because of Translation Ambiguity
Passage Retrieval
●   Systems used:
    –   Salton’s vector space model based systems
    –   JIRS passage retrieval system
●   Ranking retrieved passages according to:
    –   Answer and Question words Count
    –   Answer and Question words Association
    –   Query words weight
    –   Cosine similarity between documents words and
        question words
    –   Distance Density N-gram Model
Answer Extraction
●   Ranking candidate answers according to:
    –   Manual lexical patterns
    –   Answer Snippet position
    –   Question Word frequencies in Answer
    –   Matching using N-grams
    –   Select answers with NEs of the same expected
        answer type
    –   Semantic similarity between the question’s focus and
        the answer
Challenges
●   Arabic Morphology is highly inflectional
    –   Many affixes (articles, prepositions, pronouns etc.)

●   Arabic Morphology is highly derivational
    –   10,000 root and 120 pattern for derivation

●   No Capital Letters in Named Entities
    –   Unlike Latin based languages

●   Scarceness of Arabic Language Resources
    –   corpora, lexicons, and machine-readable dictionaries
Approaches
●   Stemming
    –   Removing prefixes, suffixes and infixes from words
    –   Match root with patterns
    –   Language dependent rules
    –   defining the most used affixed statistically
●   Named Entity Recognition
    –   Maxent model or CRF
    –   ANERcorp and ANERgazet
●   Language Resources
    –   Arabic WordNet
    –   Arabic Penn Tree Bank
Tools
●
    NOOJ for Arabic NLP
    –   C# .NET Freeware linguistic engineering development environment
    –   Supports Regular Expressions and Context Free Grammars
    –   Has Arabic Language resources (Sample Text and Dictionary)
●
    Amine Platform
    –   Java platform for intelligent systems and multi-agents
    –   Used for semantic analysis of questions and answers
    –   Uses Conceptual Graphs, Knowledge bases, and Ontologies
●
    JIRS a Java Passage Retrieval
    –   Search based on question n-grams
    –   Based on the Space Vectorial Model
    –   Simple N-gram Model (SNM)
    –   Term-weight N-gram Model (TNM)
    –   Distance N-gram Model
Tools [continued]
●   Arabic Stemmers
    –   Khoja Arabic stemmer (With roots dictionary)
    –   AraMorph (uses Transliteration to English Letters)
    –   Information Science Research Institute’s (ISRI) stemmer
        (without a root dictionary)
●   GATE (General Architecture for Text Engineering)
    –   Java based platform that composes of a tokenizer, a
        gazetteer, a sentence splitter, a part of speech tagger, a
        named entities transducer and a coreference tagger.
    –   Plugins for machine learning with Weka, RASP,
        MAXENT, SVM Light
    –   Managing ontologies like WordNet
Tools [continued]
●   OpenNLP
    –   NLP tasks like tokenization, sentence segmentation, part-of-
        speech tagging, named entity extraction, chunking, parsing,
        maximum entropy, perceptron based machine learning, and
        coreference resolution
●   Stanford NLP
    –   Java Framework with many NLP modules for:
    –   Dependency parsers, and a lexicalized PCFG parser
    –   Part-of-speech (POS) tagger
    –   CRF-based Named Entity Recognizer
    –   CRF-based Word Segmenter
    –   Maxent Text Classifier
    –   Tokens Regex: regular expressions over tokens
Future Trends and Open Issues
●   More research on Arabic restricted domain QA
    – Makes semantic tasks like word sense disambiguation easier
    – Domain rules affects how the question is posed and how the answer
      is formulated
    – A Restricted domain should be circumscribed, practical, and complex
    – E.g. Agriculture, Architectural Engineering or any field of science
    – But not news and current events as they have no constraints


●   Use of deep application dependent approaches
    – use application dependent constraints and rules to guide the question
      analysis and answer extraction and validation
    – Depending on the available resources
Future Trends and Open Issues [continued]
●   Intensive usage of semantics
    –   Arabic QA focused on morpho-syntactic approaches
    –   Very little used the Arabic Wordnet
    –   Still a lot to be done in the field of word sense
        disambiguation, coreference resolution and ontology
        based reasoning
● Use of theorem proving & deep
  reasoning
● Use of logic-based and inference-

  based approaches
Summary
●   Motivation
●   Question Answering Tasks
    - Question Analysis, Passage Retrieval, and Answer
    Extraction
●   Arabic Language Challenges
●   Approaches
    - Stemming, Named Entity Recognition, Language
    Resources
●   Tools
●   Future Trends And Open Issues
Thank You


You can view the Full Paper on ACIT 2012 Proceedings

More Related Content

What's hot

Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)YerevaNN research lab
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningVo Viet Anh
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overviewalessio_ferrari
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...Traian Rebedea
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Mark Levy
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 

What's hot (20)

Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Question answering
Question answeringQuestion answering
Question answering
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 

Similar to Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav

Supporting the authoring process with linguistic software
Supporting the authoring process with linguistic softwareSupporting the authoring process with linguistic software
Supporting the authoring process with linguistic softwarevsrtwin
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)Nicolas Van Labeke
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlpPan Xiaotong
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaMohammed Attia
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Nakul Sharma
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckTao Xie
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advancedarcomem
 
Attia sfcm presentation
Attia sfcm presentationAttia sfcm presentation
Attia sfcm presentationMohammed Attia
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesSchwannden Kuo
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceLucidworks
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attributionReza Ramezani
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 

Similar to Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav (20)

Supporting the authoring process with linguistic software
Supporting the authoring process with linguistic softwareSupporting the authoring process with linguistic software
Supporting the authoring process with linguistic software
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attia
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Natural Language Processing using Java
Natural Language Processing using JavaNatural Language Processing using Java
Natural Language Processing using Java
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey Keyphrase Extraction And Source Code Similarity Detection- A Survey
Keyphrase Extraction And Source Code Similarity Detection- A Survey
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Unit 5f.pptx
Unit 5f.pptxUnit 5f.pptx
Unit 5f.pptx
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
E lex presentation_03
E lex presentation_03E lex presentation_03
E lex presentation_03
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advanced
 
Attia sfcm presentation
Attia sfcm presentationAttia sfcm presentation
Attia sfcm presentation
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attribution
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 

More from Ahmed Magdy Ezzeldin, MSc. (11)

Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
 
Win any Interview like a Boss
Win any Interview like a BossWin any Interview like a Boss
Win any Interview like a Boss
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
Distributed Coordination-Based Systems
Distributed Coordination-Based SystemsDistributed Coordination-Based Systems
Distributed Coordination-Based Systems
 
Distributed Systems Naming
Distributed Systems NamingDistributed Systems Naming
Distributed Systems Naming
 
Cyclcone a safe dialect of C
Cyclcone a safe dialect of CCyclcone a safe dialect of C
Cyclcone a safe dialect of C
 
Objective C Memory Management
Objective C Memory ManagementObjective C Memory Management
Objective C Memory Management
 
Bash Scripting Workshop
Bash Scripting WorkshopBash Scripting Workshop
Bash Scripting Workshop
 
Object Role Modeling
Object Role ModelingObject Role Modeling
Object Role Modeling
 

Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, And Future Trendsav

  • 1. A Survey of Arabic Question Answering Challenges, Tasks, Approaches, Tools, and Future Trends Ahmed Magdy & Dr. Mohamed Shaheen ACIT 2012
  • 2. Outline ● Motivation ● Question Answering Tasks - Question Analysis, Passage Retrieval, and Answer Extraction ● Arabic Language Challenges ● Approaches - Stemming, Named Entity Recognition, Language Resources ● Tools ● Future Trends And Open Issues
  • 3. Motivation ● Arabic is the 6th most important language ● More than 300 million speakers ● Increasing amounts of Arabic content on the Internet ● Increasing demand for Information ● There is no survey that covers Arabic Question Answering
  • 4. Question Answering Tasks ● Question Analysis ● Passage Retrieval ● Answer Extraction
  • 5. Question Analysis ● Tokenization & Normalization ● Remove stop words ● Named Entity Recognition (gazetteer, maxent model) ● Stemming all words except Named Entities ● Question Focus determination by extracting the main NE ● Keywords Extraction & Expansion ● Answer type extraction by question words (Name, Place, Date, Quantity) ● Query generation of keywords into a Boolean formula ● Experiments with cross-language Arabic/English QA ● Not Promising because of Translation Ambiguity
  • 6. Passage Retrieval ● Systems used: – Salton’s vector space model based systems – JIRS passage retrieval system ● Ranking retrieved passages according to: – Answer and Question words Count – Answer and Question words Association – Query words weight – Cosine similarity between documents words and question words – Distance Density N-gram Model
  • 7. Answer Extraction ● Ranking candidate answers according to: – Manual lexical patterns – Answer Snippet position – Question Word frequencies in Answer – Matching using N-grams – Select answers with NEs of the same expected answer type – Semantic similarity between the question’s focus and the answer
  • 8. Challenges ● Arabic Morphology is highly inflectional – Many affixes (articles, prepositions, pronouns etc.) ● Arabic Morphology is highly derivational – 10,000 root and 120 pattern for derivation ● No Capital Letters in Named Entities – Unlike Latin based languages ● Scarceness of Arabic Language Resources – corpora, lexicons, and machine-readable dictionaries
  • 9. Approaches ● Stemming – Removing prefixes, suffixes and infixes from words – Match root with patterns – Language dependent rules – defining the most used affixed statistically ● Named Entity Recognition – Maxent model or CRF – ANERcorp and ANERgazet ● Language Resources – Arabic WordNet – Arabic Penn Tree Bank
  • 10. Tools ● NOOJ for Arabic NLP – C# .NET Freeware linguistic engineering development environment – Supports Regular Expressions and Context Free Grammars – Has Arabic Language resources (Sample Text and Dictionary) ● Amine Platform – Java platform for intelligent systems and multi-agents – Used for semantic analysis of questions and answers – Uses Conceptual Graphs, Knowledge bases, and Ontologies ● JIRS a Java Passage Retrieval – Search based on question n-grams – Based on the Space Vectorial Model – Simple N-gram Model (SNM) – Term-weight N-gram Model (TNM) – Distance N-gram Model
  • 11. Tools [continued] ● Arabic Stemmers – Khoja Arabic stemmer (With roots dictionary) – AraMorph (uses Transliteration to English Letters) – Information Science Research Institute’s (ISRI) stemmer (without a root dictionary) ● GATE (General Architecture for Text Engineering) – Java based platform that composes of a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. – Plugins for machine learning with Weka, RASP, MAXENT, SVM Light – Managing ontologies like WordNet
  • 12. Tools [continued] ● OpenNLP – NLP tasks like tokenization, sentence segmentation, part-of- speech tagging, named entity extraction, chunking, parsing, maximum entropy, perceptron based machine learning, and coreference resolution ● Stanford NLP – Java Framework with many NLP modules for: – Dependency parsers, and a lexicalized PCFG parser – Part-of-speech (POS) tagger – CRF-based Named Entity Recognizer – CRF-based Word Segmenter – Maxent Text Classifier – Tokens Regex: regular expressions over tokens
  • 13. Future Trends and Open Issues ● More research on Arabic restricted domain QA – Makes semantic tasks like word sense disambiguation easier – Domain rules affects how the question is posed and how the answer is formulated – A Restricted domain should be circumscribed, practical, and complex – E.g. Agriculture, Architectural Engineering or any field of science – But not news and current events as they have no constraints ● Use of deep application dependent approaches – use application dependent constraints and rules to guide the question analysis and answer extraction and validation – Depending on the available resources
  • 14. Future Trends and Open Issues [continued] ● Intensive usage of semantics – Arabic QA focused on morpho-syntactic approaches – Very little used the Arabic Wordnet – Still a lot to be done in the field of word sense disambiguation, coreference resolution and ontology based reasoning ● Use of theorem proving & deep reasoning ● Use of logic-based and inference- based approaches
  • 15. Summary ● Motivation ● Question Answering Tasks - Question Analysis, Passage Retrieval, and Answer Extraction ● Arabic Language Challenges ● Approaches - Stemming, Named Entity Recognition, Language Resources ● Tools ● Future Trends And Open Issues
  • 16. Thank You You can view the Full Paper on ACIT 2012 Proceedings