SlideShare a Scribd company logo
1 of 17
Download to read offline
[ RMLL 2013, Bruxelles – Thursday 11th
July 2013 ]
Presentation of OpenNLP
Presenter : Dr Ir Robert Viseur
2
What is OpenNLP ?
• Toolkit for the processing of natural language text.
• Project of the Apache Foundation.
• Developped in Java.
• Under Apache License, Version 2.
• Download and documentation:
http://opennlp.apache.org/.
3
What are the features ?
• For common NLP tasks :
• tokenization,
• sentence segmentation,
• part-of-speech tagging,
• named entity extraction,
• chuncking.
4
What is the part-of-speech tagging ?
• Example :
• See more:
http://opennlp.apache.org/documentation/1.5.3
/manual/opennlp.html.
5
What is the named entity
extraction ?
• Example :
• See more:
http://opennlp.apache.org/documentation/1.5.3
/manual/opennlp.html.
6
How does it work ? (1/2)
• The features are associated to pre-trained models.
• Each pre-trained model is created for one language
and for one type of use.
• Supported languages: da, de, en, es, nl, pt, se.
• Warnings :
– The functional coverage varies with languages.
– The french language is not supported !
• See http://opennlp.sourceforge.net/models-
1.5/.
• Use in command line or as a Java library.
• Warning : loading time of models with CLI.
7
How does it work ? (2/2)
• Example (English vs Spanish languages) :
8
What are the criteria of choice ?
• Support of the product.
• License.
• Available languages.
• Precision / Recall.
• Speed of text processing.
9
Are there free (as freedom)
alternative tools ?
• Other light tools :
• Stanford Log-linear Part-Of-Speech Tagger (POST),
• Stanford Named Entity Recognizer (NER),
• TagEN,
• Java Automatic Term Extraction toolkit.
• Frameworks :
• In Java : UIMA (Java), GATE (Java).
• In other languages : NLTK (Python).
10
Example:
tag cloud creation (1/6)
• Starting point: website.
• Example: www.adacore.com.
• What we want (from website content):
• common tag cloud,
• circular tag cloud.
• Main steps : crawl, cleaning of HTML documents,
named entities (person) and terminology
extractions (+ merge) and display (tag cloud).
11
Example:
tag cloud creation (2/6)
• Cleaning:
• Remove the HTML tags and keep only the useful
content.
• Warnings:
• NLP tools are sensitive to noise in raw data.
• Pay attention to the language of the document.
• Use of HTML boilerplate tool (HTML -> TXT).
• Tool: Boilerpipe.
• See http://code.google.com/p/boilerpipe/.
• Next: normalization of the text.
12
Example:
tag cloud creation (3/6)
• Named entities extraction.
• Standard in OpenNLP : OpenNLP adds tags in text.
• Here : extraction of Person NE.
• Terminology extraction.
• First : part-of-speech tagging (POST).
• Next : identification et filtering (threshold) of :
• collocations (i.e: Name_Name, Adjective_Name,...),
• proper names (often: brands or people).
13
Example:
tag cloud creation (4/6)
• Process :
Raw HTML
document
---- --- -- ----.
--- -- -- -- ----
--- -- ----.
---- --- -- ----.
--- -- -- -- ----
--- -- ----.
_--- _-- _-- _
_---- _--.
_--- _-- _-- _--
_____
_____
_____
Conversion
to text
Normalization
POS
tagging
_____
_____
_____
Terminology
extraction
NE extraction
Tag cloud
(for a website)
Website
(Internet)
Website
(local)
Crawl
Tags
Merge
14
Example:
tag cloud creation (5/6)
• Result: common tag cloud.
15
Example:
tag cloud creation (6/6)
• Result: circular tag cloud.
16
Thanks for your attention.
Any questions ?
17
Contact
Dr Ir Robert Viseur
Email (@CETIC) : robert.viseur@cetic.be
Email (@UMONS) : robert.viseur@umons.ac.be
Phone : 0032 (0) 479 66 08 76
Website : www.robertviseur.be
This presentation is covered by « CC-BY-ND » license.

More Related Content

What's hot

Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfhelloworld28847
 
P&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand ManagementP&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand ManagementLewis Lin 🦊
 
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...Erasmo Purificato
 
Word2vec (中文)
Word2vec (中文)Word2vec (中文)
Word2vec (中文)Yiwei Chen
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Text Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer ModelText Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer ModelIRJET Journal
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 

What's hot (12)

Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdf
 
Nlp
NlpNlp
Nlp
 
P&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand ManagementP&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand Management
 
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
 
Word2vec (中文)
Word2vec (中文)Word2vec (中文)
Word2vec (中文)
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Text Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer ModelText Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer Model
 
Pegasus
PegasusPegasus
Pegasus
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
NLP
NLPNLP
NLP
 
Next word Prediction
Next word PredictionNext word Prediction
Next word Prediction
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 

Similar to Presentation of OpenNLP

Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
Python presentation of Government Engineering College Aurangabad, Bihar
Python presentation of Government Engineering College Aurangabad, BiharPython presentation of Government Engineering College Aurangabad, Bihar
Python presentation of Government Engineering College Aurangabad, BiharUttamKumar617567
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Its2 ontology-localization
Its2 ontology-localizationIts2 ontology-localization
Its2 ontology-localizationFelix Sasaki
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsMelanie Courtot
 
Medical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSparkMedical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSparkHelge Holzmann
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
Apache cTAKES - NLP in Healthcare
Apache cTAKES - NLP in HealthcareApache cTAKES - NLP in Healthcare
Apache cTAKES - NLP in HealthcareAlexandru Zbarcea
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Olivier Dobberkau
 
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine ProjectDaniel Lima
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldMilo Yip
 
Apache cTAKES- NLP in Healthcare
Apache cTAKES- NLP in HealthcareApache cTAKES- NLP in Healthcare
Apache cTAKES- NLP in HealthcareAlexandru Zbarcea
 
Approaches to document/report generation
Approaches to document/report generation Approaches to document/report generation
Approaches to document/report generation plutext
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTWNGINX, Inc.
 

Similar to Presentation of OpenNLP (20)

Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Python presentation of Government Engineering College Aurangabad, Bihar
Python presentation of Government Engineering College Aurangabad, BiharPython presentation of Government Engineering College Aurangabad, Bihar
Python presentation of Government Engineering College Aurangabad, Bihar
 
01 html-introduction
01 html-introduction01 html-introduction
01 html-introduction
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Its2 ontology-localization
Its2 ontology-localizationIts2 ontology-localization
Its2 ontology-localization
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Lecture semantic augmentation
Lecture semantic augmentationLecture semantic augmentation
Lecture semantic augmentation
 
Medical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSparkMedical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSpark
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Apache cTAKES - NLP in Healthcare
Apache cTAKES - NLP in HealthcareApache cTAKES - NLP in Healthcare
Apache cTAKES - NLP in Healthcare
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101
 
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine Project
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the World
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 
Apache cTAKES- NLP in Healthcare
Apache cTAKES- NLP in HealthcareApache cTAKES- NLP in Healthcare
Apache cTAKES- NLP in Healthcare
 
Approaches to document/report generation
Approaches to document/report generation Approaches to document/report generation
Approaches to document/report generation
 
Basics of python
Basics of pythonBasics of python
Basics of python
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 

More from Robert Viseur

La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...Robert Viseur
 
L'écosystème régional du Big Data
L'écosystème régional du Big DataL'écosystème régional du Big Data
L'écosystème régional du Big DataRobert Viseur
 
Piloter son appareil photo numérique avec des logiciels libres
Piloter son appareil photo  numérique avec des logiciels  libresPiloter son appareil photo  numérique avec des logiciels  libres
Piloter son appareil photo numérique avec des logiciels libresRobert Viseur
 
Exploiter les données issues de Wikipedia
Exploiter les données issues de WikipediaExploiter les données issues de Wikipedia
Exploiter les données issues de WikipediaRobert Viseur
 
De l’open source à l’open cloud
De l’open source à l’open cloudDe l’open source à l’open cloud
De l’open source à l’open cloudRobert Viseur
 
Développer ses photos avec RawTherapee
Développer ses photos avec RawTherapeeDévelopper ses photos avec RawTherapee
Développer ses photos avec RawTherapeeRobert Viseur
 
Convertir ses photos en N/B avec Gimp
Convertir ses photos en N/B avec GimpConvertir ses photos en N/B avec Gimp
Convertir ses photos en N/B avec GimpRobert Viseur
 
L'open hardware : l'ouverture au service de l'innovation
L'open hardware : l'ouverture au service de l'innovationL'open hardware : l'ouverture au service de l'innovation
L'open hardware : l'ouverture au service de l'innovationRobert Viseur
 
Pechakucha (Mons) : Street Art à Mons
Pechakucha (Mons) : Street Art à MonsPechakucha (Mons) : Street Art à Mons
Pechakucha (Mons) : Street Art à MonsRobert Viseur
 
L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)Robert Viseur
 
Analyse des concepts de Fab Lab, Living Lab et Hub créatif
Analyse des concepts de Fab Lab, Living Lab et Hub créatifAnalyse des concepts de Fab Lab, Living Lab et Hub créatif
Analyse des concepts de Fab Lab, Living Lab et Hub créatifRobert Viseur
 
Open Source Hardware for Dummies
Open Source Hardware for DummiesOpen Source Hardware for Dummies
Open Source Hardware for DummiesRobert Viseur
 
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...Robert Viseur
 
Etude du secteur des prestataires FLOSS en Belgique
Etude du secteur des prestataires FLOSS en BelgiqueEtude du secteur des prestataires FLOSS en Belgique
Etude du secteur des prestataires FLOSS en BelgiqueRobert Viseur
 
Hacker son appareil photo avec des outils libres
Hacker son appareil photo avec des outils libresHacker son appareil photo avec des outils libres
Hacker son appareil photo avec des outils libresRobert Viseur
 
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...Robert Viseur
 
Hacker son appareil photo, c'est possible !
Hacker son appareil photo, c'est possible !Hacker son appareil photo, c'est possible !
Hacker son appareil photo, c'est possible !Robert Viseur
 
Comprendre les licences de logiciels libres
Comprendre les licences de logiciels libresComprendre les licences de logiciels libres
Comprendre les licences de logiciels libresRobert Viseur
 
Impact of cloud computing on FOSS editors
Impact of cloud computing on FOSS editorsImpact of cloud computing on FOSS editors
Impact of cloud computing on FOSS editorsRobert Viseur
 
Une introduction à la co-création dans le domaine des TIC
Une introduction à la co-création dans le domaine des TICUne introduction à la co-création dans le domaine des TIC
Une introduction à la co-création dans le domaine des TICRobert Viseur
 

More from Robert Viseur (20)

La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
 
L'écosystème régional du Big Data
L'écosystème régional du Big DataL'écosystème régional du Big Data
L'écosystème régional du Big Data
 
Piloter son appareil photo numérique avec des logiciels libres
Piloter son appareil photo  numérique avec des logiciels  libresPiloter son appareil photo  numérique avec des logiciels  libres
Piloter son appareil photo numérique avec des logiciels libres
 
Exploiter les données issues de Wikipedia
Exploiter les données issues de WikipediaExploiter les données issues de Wikipedia
Exploiter les données issues de Wikipedia
 
De l’open source à l’open cloud
De l’open source à l’open cloudDe l’open source à l’open cloud
De l’open source à l’open cloud
 
Développer ses photos avec RawTherapee
Développer ses photos avec RawTherapeeDévelopper ses photos avec RawTherapee
Développer ses photos avec RawTherapee
 
Convertir ses photos en N/B avec Gimp
Convertir ses photos en N/B avec GimpConvertir ses photos en N/B avec Gimp
Convertir ses photos en N/B avec Gimp
 
L'open hardware : l'ouverture au service de l'innovation
L'open hardware : l'ouverture au service de l'innovationL'open hardware : l'ouverture au service de l'innovation
L'open hardware : l'ouverture au service de l'innovation
 
Pechakucha (Mons) : Street Art à Mons
Pechakucha (Mons) : Street Art à MonsPechakucha (Mons) : Street Art à Mons
Pechakucha (Mons) : Street Art à Mons
 
L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)
 
Analyse des concepts de Fab Lab, Living Lab et Hub créatif
Analyse des concepts de Fab Lab, Living Lab et Hub créatifAnalyse des concepts de Fab Lab, Living Lab et Hub créatif
Analyse des concepts de Fab Lab, Living Lab et Hub créatif
 
Open Source Hardware for Dummies
Open Source Hardware for DummiesOpen Source Hardware for Dummies
Open Source Hardware for Dummies
 
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
 
Etude du secteur des prestataires FLOSS en Belgique
Etude du secteur des prestataires FLOSS en BelgiqueEtude du secteur des prestataires FLOSS en Belgique
Etude du secteur des prestataires FLOSS en Belgique
 
Hacker son appareil photo avec des outils libres
Hacker son appareil photo avec des outils libresHacker son appareil photo avec des outils libres
Hacker son appareil photo avec des outils libres
 
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
 
Hacker son appareil photo, c'est possible !
Hacker son appareil photo, c'est possible !Hacker son appareil photo, c'est possible !
Hacker son appareil photo, c'est possible !
 
Comprendre les licences de logiciels libres
Comprendre les licences de logiciels libresComprendre les licences de logiciels libres
Comprendre les licences de logiciels libres
 
Impact of cloud computing on FOSS editors
Impact of cloud computing on FOSS editorsImpact of cloud computing on FOSS editors
Impact of cloud computing on FOSS editors
 
Une introduction à la co-création dans le domaine des TIC
Une introduction à la co-création dans le domaine des TICUne introduction à la co-création dans le domaine des TIC
Une introduction à la co-création dans le domaine des TIC
 

Recently uploaded

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Presentation of OpenNLP

  • 1. [ RMLL 2013, Bruxelles – Thursday 11th July 2013 ] Presentation of OpenNLP Presenter : Dr Ir Robert Viseur
  • 2. 2 What is OpenNLP ? • Toolkit for the processing of natural language text. • Project of the Apache Foundation. • Developped in Java. • Under Apache License, Version 2. • Download and documentation: http://opennlp.apache.org/.
  • 3. 3 What are the features ? • For common NLP tasks : • tokenization, • sentence segmentation, • part-of-speech tagging, • named entity extraction, • chuncking.
  • 4. 4 What is the part-of-speech tagging ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html.
  • 5. 5 What is the named entity extraction ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html.
  • 6. 6 How does it work ? (1/2) • The features are associated to pre-trained models. • Each pre-trained model is created for one language and for one type of use. • Supported languages: da, de, en, es, nl, pt, se. • Warnings : – The functional coverage varies with languages. – The french language is not supported ! • See http://opennlp.sourceforge.net/models- 1.5/. • Use in command line or as a Java library. • Warning : loading time of models with CLI.
  • 7. 7 How does it work ? (2/2) • Example (English vs Spanish languages) :
  • 8. 8 What are the criteria of choice ? • Support of the product. • License. • Available languages. • Precision / Recall. • Speed of text processing.
  • 9. 9 Are there free (as freedom) alternative tools ? • Other light tools : • Stanford Log-linear Part-Of-Speech Tagger (POST), • Stanford Named Entity Recognizer (NER), • TagEN, • Java Automatic Term Extraction toolkit. • Frameworks : • In Java : UIMA (Java), GATE (Java). • In other languages : NLTK (Python).
  • 10. 10 Example: tag cloud creation (1/6) • Starting point: website. • Example: www.adacore.com. • What we want (from website content): • common tag cloud, • circular tag cloud. • Main steps : crawl, cleaning of HTML documents, named entities (person) and terminology extractions (+ merge) and display (tag cloud).
  • 11. 11 Example: tag cloud creation (2/6) • Cleaning: • Remove the HTML tags and keep only the useful content. • Warnings: • NLP tools are sensitive to noise in raw data. • Pay attention to the language of the document. • Use of HTML boilerplate tool (HTML -> TXT). • Tool: Boilerpipe. • See http://code.google.com/p/boilerpipe/. • Next: normalization of the text.
  • 12. 12 Example: tag cloud creation (3/6) • Named entities extraction. • Standard in OpenNLP : OpenNLP adds tags in text. • Here : extraction of Person NE. • Terminology extraction. • First : part-of-speech tagging (POST). • Next : identification et filtering (threshold) of : • collocations (i.e: Name_Name, Adjective_Name,...), • proper names (often: brands or people).
  • 13. 13 Example: tag cloud creation (4/6) • Process : Raw HTML document ---- --- -- ----. --- -- -- -- ---- --- -- ----. ---- --- -- ----. --- -- -- -- ---- --- -- ----. _--- _-- _-- _ _---- _--. _--- _-- _-- _-- _____ _____ _____ Conversion to text Normalization POS tagging _____ _____ _____ Terminology extraction NE extraction Tag cloud (for a website) Website (Internet) Website (local) Crawl Tags Merge
  • 14. 14 Example: tag cloud creation (5/6) • Result: common tag cloud.
  • 15. 15 Example: tag cloud creation (6/6) • Result: circular tag cloud.
  • 16. 16 Thanks for your attention. Any questions ?
  • 17. 17 Contact Dr Ir Robert Viseur Email (@CETIC) : robert.viseur@cetic.be Email (@UMONS) : robert.viseur@umons.ac.be Phone : 0032 (0) 479 66 08 76 Website : www.robertviseur.be This presentation is covered by « CC-BY-ND » license.