Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
[ RMLL 2013, Bruxelles – Thursday 11th
July 2013 ]
Presentation of OpenNLP
Presenter : Dr Ir Robert Viseur
2
What is OpenNLP ?
• Toolkit for the processing of natural language text.
• Project of the Apache Foundation.
• Developpe...
3
What are the features ?
• For common NLP tasks :
• tokenization,
• sentence segmentation,
• part-of-speech tagging,
• na...
4
What is the part-of-speech tagging ?
• Example :
• See more:
http://opennlp.apache.org/documentation/1.5.3
/manual/openn...
5
What is the named entity
extraction ?
• Example :
• See more:
http://opennlp.apache.org/documentation/1.5.3
/manual/open...
6
How does it work ? (1/2)
• The features are associated to pre-trained models.
• Each pre-trained model is created for on...
7
How does it work ? (2/2)
• Example (English vs Spanish languages) :
8
What are the criteria of choice ?
• Support of the product.
• License.
• Available languages.
• Precision / Recall.
• Sp...
9
Are there free (as freedom)
alternative tools ?
• Other light tools :
• Stanford Log-linear Part-Of-Speech Tagger (POST)...
10
Example:
tag cloud creation (1/6)
• Starting point: website.
• Example: www.adacore.com.
• What we want (from website c...
11
Example:
tag cloud creation (2/6)
• Cleaning:
• Remove the HTML tags and keep only the useful
content.
• Warnings:
• NL...
12
Example:
tag cloud creation (3/6)
• Named entities extraction.
• Standard in OpenNLP : OpenNLP adds tags in text.
• Her...
13
Example:
tag cloud creation (4/6)
• Process :
Raw HTML
document
---- --- -- ----.
--- -- -- -- ----
--- -- ----.
---- -...
14
Example:
tag cloud creation (5/6)
• Result: common tag cloud.
15
Example:
tag cloud creation (6/6)
• Result: circular tag cloud.
16
Thanks for your attention.
Any questions ?
17
Contact
Dr Ir Robert Viseur
Email (@CETIC) : robert.viseur@cetic.be
Email (@UMONS) : robert.viseur@umons.ac.be
Phone : ...
Upcoming SlideShare
Loading in …5
×

Presentation of OpenNLP

6,230 views

Published on

Published in: Technology
  • I have done a couple of papers through ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐ they have always been great! They are always in touch with you to let you know the status of paper and always meet the deadline!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I have always found it hard to meet the requirements of being a student. Ever since my years of high school, I really have no idea what professors are looking for to give good grades. After some google searching, I found this service ⇒ www.HelpWriting.net ⇐ who helped me write my research paper.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ⇒ www.HelpWriting.net ⇐ is a good website if you’re looking to get your essay written for you. You can also request things like research papers or dissertations. It’s really convenient and helpful.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.WritePaper.info ⇐ They helped me for writing my quality research paper.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If u need a hand in making your writing assignments - visit ⇒ www.WritePaper.info ⇐ for more detailed information.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Presentation of OpenNLP

  1. 1. [ RMLL 2013, Bruxelles – Thursday 11th July 2013 ] Presentation of OpenNLP Presenter : Dr Ir Robert Viseur
  2. 2. 2 What is OpenNLP ? • Toolkit for the processing of natural language text. • Project of the Apache Foundation. • Developped in Java. • Under Apache License, Version 2. • Download and documentation: http://opennlp.apache.org/.
  3. 3. 3 What are the features ? • For common NLP tasks : • tokenization, • sentence segmentation, • part-of-speech tagging, • named entity extraction, • chuncking.
  4. 4. 4 What is the part-of-speech tagging ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html.
  5. 5. 5 What is the named entity extraction ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html.
  6. 6. 6 How does it work ? (1/2) • The features are associated to pre-trained models. • Each pre-trained model is created for one language and for one type of use. • Supported languages: da, de, en, es, nl, pt, se. • Warnings : – The functional coverage varies with languages. – The french language is not supported ! • See http://opennlp.sourceforge.net/models- 1.5/. • Use in command line or as a Java library. • Warning : loading time of models with CLI.
  7. 7. 7 How does it work ? (2/2) • Example (English vs Spanish languages) :
  8. 8. 8 What are the criteria of choice ? • Support of the product. • License. • Available languages. • Precision / Recall. • Speed of text processing.
  9. 9. 9 Are there free (as freedom) alternative tools ? • Other light tools : • Stanford Log-linear Part-Of-Speech Tagger (POST), • Stanford Named Entity Recognizer (NER), • TagEN, • Java Automatic Term Extraction toolkit. • Frameworks : • In Java : UIMA (Java), GATE (Java). • In other languages : NLTK (Python).
  10. 10. 10 Example: tag cloud creation (1/6) • Starting point: website. • Example: www.adacore.com. • What we want (from website content): • common tag cloud, • circular tag cloud. • Main steps : crawl, cleaning of HTML documents, named entities (person) and terminology extractions (+ merge) and display (tag cloud).
  11. 11. 11 Example: tag cloud creation (2/6) • Cleaning: • Remove the HTML tags and keep only the useful content. • Warnings: • NLP tools are sensitive to noise in raw data. • Pay attention to the language of the document. • Use of HTML boilerplate tool (HTML -> TXT). • Tool: Boilerpipe. • See http://code.google.com/p/boilerpipe/. • Next: normalization of the text.
  12. 12. 12 Example: tag cloud creation (3/6) • Named entities extraction. • Standard in OpenNLP : OpenNLP adds tags in text. • Here : extraction of Person NE. • Terminology extraction. • First : part-of-speech tagging (POST). • Next : identification et filtering (threshold) of : • collocations (i.e: Name_Name, Adjective_Name,...), • proper names (often: brands or people).
  13. 13. 13 Example: tag cloud creation (4/6) • Process : Raw HTML document ---- --- -- ----. --- -- -- -- ---- --- -- ----. ---- --- -- ----. --- -- -- -- ---- --- -- ----. _--- _-- _-- _ _---- _--. _--- _-- _-- _-- _____ _____ _____ Conversion to text Normalization POS tagging _____ _____ _____ Terminology extraction NE extraction Tag cloud (for a website) Website (Internet) Website (local) Crawl Tags Merge
  14. 14. 14 Example: tag cloud creation (5/6) • Result: common tag cloud.
  15. 15. 15 Example: tag cloud creation (6/6) • Result: circular tag cloud.
  16. 16. 16 Thanks for your attention. Any questions ?
  17. 17. 17 Contact Dr Ir Robert Viseur Email (@CETIC) : robert.viseur@cetic.be Email (@UMONS) : robert.viseur@umons.ac.be Phone : 0032 (0) 479 66 08 76 Website : www.robertviseur.be This presentation is covered by « CC-BY-ND » license.

×