SlideShare a Scribd company logo
1 of 21
Download to read offline
Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>
What is a Named Entity recognition task?
 A task that aims to locate and classify the name of a
  person or an organization, a location, a brand, a
  product, a numeric expression including time, date,
  money and percent in a textual document




    12 March 2012     Seminar @ Ecole Centrale, Paris   2/21
History of NER benchmarks
 CoNLL 2003 and CoNLL 2005
   schema (4 types): person, organization, location and miscellaneous
   language independent task

 ACE 2004, ACE 2005 and ACE 2007
   schema (7 types): person, organization, location, facility, weapon,
    vehicle and geo-political entity
   entity recognition, not just name (e.g. description, pronoun)
   find relationships among entities extracted

 TAC 2009 (Knowledge Base Track)
   schema (3 types): person, organization and location
   create a knowledge base from the named entities extracted

 ETAPE 2012 (Named Entity Task)
   schema: Quaero (7 main types, 32 sub-types)
    12/03/2012 -    Multimedia Semantics and Interaction - Séminaire Ecole Centrale   -3
NER Tools

 Standalone software
   GATE
   Stanford CoreNLP
   Temis

 Web APIs




   12/03/2012 -   Multimedia Semantics and Interaction - Séminaire Ecole Centrale   -4
Factual comparison of 10 Web NER tools
                   Alchemy   DBpe       Evri           Extr           Lup           Calais             Saplo    WM     Yahoo    Zemanta
                             dia
Granularity        OEN       OEN        OED            OEN            OEN           OEN                OED      OEN    OEN      OED

Language           EN        EN         EN             EN             EN            EN                 EN       EN     EN       EN
                   FR        GR*        IT                            FR            FR                 SW       FR
                   GR        PT*                                      IT            SP                          SP
                   IT        SP*
                   PT
                   RU
                   SP
                   SW
Quota              30000     unl        3000           3000           unl           50000              1333     unl    5000     10000
(calls/day)

Sample             C/C++     Java       AS             Java           N/A           Java               Java     Java   JS       C#
Clients            C#        JS         Java                                                           JS       Perl   PHP      Java
                   Java      PHP        PHP                                                            PHP                      JS
                   Perl                                                                                Python                   Perl
                   PHP5                                                                                                         PHP
                   Python                                                                                                       Python
                   Ruby                                                                                                         Ruby
Content            150KB     452KB      8KB            32KB           20KB          8KB                26KB     80KB   7769KB   970KB
chunk
              12/03/2012 -           Multimedia Semantics and Interaction - Séminaire Ecole Centrale             -5
Alchemy        DBpedia     Evri        Extr         Lup             Calais   Saplo   WM        Yahoo    Zemanta


Response      JSON           Factual comparison (II)
                             HTML HTM HTML HTML JSON JSON                                          JSON      JSON     XML
Format        MicroF         JSON        L           JSON         JSON            MicroF           XML       XML      JSON
              XML            RDF         JSO         RDF          RDFa                                                RDF
              RDF            XML         N           XML          XML
                                         RDF
Entity        324            320         5           34           319             95       5       7         13       81
type
number

Entity        N/A            char        N/A         word         range of        char     N/A     POS       range    N/A
position                     offset                  offset       chars           offset           offset    of
                                                                                                             chars

Classif.      Alchemy        DBpedia     Evri        DBpe         DBpedia         OpenC    N/A     ESTER     Yahoo    FreeBase
Ontologies                   FreeBase                dia          LinkedM         alais
                             Scema.org                            DB


Defer.        DBpeda         DBpedia     Evri        DBpe         DBpedia         OpenC    N/A     DBpedia   Wikipe   Wikipedia
Vocabulari    FreeBase                               dia          LinkedM         alais            Geonam    dia      IMDB
es            USCensus                                            DB                               es                 MusicBrai
              UMBEL                                                                                CIAFact            nz
              OpenCyc                                                                              book               Amazon
              YAGO                                                                                 Wikicom            YouTube
              MusicBrainz                                                                          panies             TechCrun
              CrunchBase                                                                                              ch
                                                                                                                      ...




             12 March 2012                      Seminar @ Ecole Centrale, Paris                    6/21
Human made benchmarks

 We performed two evaluation experiments:
        WEKEX 2011
        ISWC 2011


                                 t = (entity, type, URI, relevant)


        Each field has been rated by a Boolean value: true if
         correct, false otherwise
Rizzo G., Troncy R. (2011), NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data.
In: International Semantic Web Conference 2011 (ISWC'11), Bonn, Germany.



         12/03/2012 -              Multimedia Semantics and Interaction - Séminaire Ecole Centrale   -7
WEKEX 2011 Benchmark

  Controlled experiment
         4 human raters
         10 English news articles (5 from BBC and 5 from The
          New York Times)
         Each rater evaluated each article for 5 extractors
               200 total evaluations

  Fleiss's kappa score
         moderate agreement among raters



Rizzo G., Troncy R. (2011), NERD: Evaluating Named Entity Recognition Tools in the Web of Data.
In: (ISWC'11) Workshop on Web Scale Knowledge Extraction (WEKEX'11), Bonn, Germany.


          12/03/2012 -              Multimedia Semantics and Interaction - Séminaire Ecole Centrale   -8
Results




                                                                                          different behavior
                                                                                        for different sources




  12/03/2012 -   Multimedia Semantics and Interaction - Séminaire Ecole Centrale   -9
ISWC 2011 Benchmark

 Controlled experiment
   10 human raters
   2 English news articles from The New York Times
   each rater evaluated each article for 6 extractors
        120 total evaluations

 Fleiss's kappa score
   substantial
    agreement among
    raters




   12/03/2012 -      Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 10
Results




  12/03/2012 -   Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 11
What is NERD?
    ontology1            REST API2
                        UI3                                     The NERD ontology has been
                                                                 integrated in the NIF project,
                                                                a EU FP7 in the context of the
                                                                  LOD2: Creating Knowledge
                                                                     out of Interlinked Data

1 http://nerd.eurecom.fr/ontology
2 http://nerd.eurecom.fr/api/application.wadl
3 http://nerd.eurecom.fr/


        12 March 2012         Seminar @ Ecole Centrale, Paris           12/21
NERD Ontology




 Align the taxonomies used by the extractors


   12/03/2012 -   Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 13
Building the NERD ontology                                                         NERD type      Occurrence
                                                                                   Person                 10
                                                                                   Organization           10
                                                                                   Country                  6
                                                                                   Company                  6
                                                                                   Location                 6
                                                                                   Continent                5
                                                                                   City                     5
                                                                                   RadioStation             5
                                                                                   Album                    5
                                                                                   Product                  5
                                                                                   ...                     ...




  12/03/2012 -   Multimedia Semantics and Interaction - Séminaire Ecole Centrale         - 14
NERD REST API


   /document
   /user                                        GET,
   /annotation/{extractor}                     POST,                               JSON/RDF*
   /extraction                                  PUT,
   /evaluation                                DELETE
                                                                       “entities” : [{
   ...                                                                    “entity”: “Tim Berners-Lee” ,
                                                                          “type”: “Person” ,
                                                                          “uri”: "http://dbpedia.org/resource/Tim_berners_lee",
                                                                          “nerdType”: "http://nerd.eurecom.fr/ontology#Person",
                                                                          “startChar”: 30,
                                                                          “endChar”: 45,
                                                                          “confidence”: 1,
                                                                          “relevance”: 0.5
                                                                       }]


Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction
Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.


          12/03/2012 -            Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 15
NIF: NLP Interchange Format Framework

 Different outputs for the NLP tools
  OpenCalais                                                              DBpedia Spotlight
  "_type": "Organization",                                                "@URI": "http://dbpedia.org/resource/DBpedia",
  “name": "North Atlantic Treaty Organization",                           "@types": "DBpedia:Software,DBpedia:Work”
  "organizationtype": "governmental civilian",                            "@surfaceForm": "dbpedia",
  "nationality": "N/A",                                                   "@offset": "0",
  "_typeReference":                                                       "@support": "11",
  http://s.opencalais.com/1/type/em/e/Organization",                      "@similarityScore": "0.2387271374464035",
  ...                                                                     …

 Manual effort required for integration or reuse
   time consuming
   need to capture the definition of the attributes used in the
    response format

 NIF uses RDF for representing NER results as
  Linked Data

     12/03/2012 -               Multimedia Semantics and Interaction - Séminaire Ecole Centrale      - 16
Named Entities as textual annotations

 Let's consider the document:
 http://www.w3.org/DesignIssues/LinkedData.html

   The Semantic Web isn't just about putting data on the web. It is about
   making links, so that a person or machine can explore the web of data.
   With linked data, when you have some of it, you can find other, related,
   data.….
   All the above plus, Use open standards from W3C (RDF and SPARQL) to
   identify things, so that people can point at your stuff
   ...

  entities: {
    …
    [entity: W3C, startChar: 23107, endChar: 23110],
    …
  }
    12/03/2012 -     Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 17
NERD meets NIF
                                                                      Model documents through a
                                                                      set of strings deferencable on
                                                                      the Web
                                                    : offset_23107_ 23110 a str:String ;
                                                         str:referenceContext :offset_0_26546 .

                                                                      Map string to entity
                                                    : offset_23107_ 23110 sso:oen dbpedia:W3C.

                                                                      Classification

                                                    dbpedia:W3C                            rdf:type   nerd:Organization .

Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the Linked
Data Cloud. In: (LDOW'12) Linked Data on the Web (WWW'12), Lyon, France.


          12/03/2012 -            Multimedia Semantics and Interaction - Séminaire Ecole Centrale     - 18
NERD Demo




  12/03/2012 -   Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 19
NERD Timeline and Future Work

  beginning                       Comparison of named entity extractors

                                                          NERD benchmarks

                             NERD REST API and NERD ontology

                        Lift NERD output results to the LOD cloud

          today
                      NERD “smart” service: combining the best of
                                    all NER tools
                          Dashboard for improving the NERD user
                                        experience


  12/03/2012 -    Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 20
http://nerd.eurecom.fr

                            @giusepperizzo @rtroncy #nerd


                 http://www.slideshare.net/giusepperizzo




12/03/2012 -     Multimedia Semantics and Interaction - Séminaire Ecole Centrale   - 21

More Related Content

What's hot

OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?
Aidan Hogan
 
DZone%20-%20Essential%20Ruby
DZone%20-%20Essential%20RubyDZone%20-%20Essential%20Ruby
DZone%20-%20Essential%20Ruby
tutorialsruby
 
Sparq lreference 1.8-us
Sparq lreference 1.8-usSparq lreference 1.8-us
Sparq lreference 1.8-us
Ajay Ohri
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQL
Fariz Darari
 

What's hot (20)

The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?
 
RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
 
DZone%20-%20Essential%20Ruby
DZone%20-%20Essential%20RubyDZone%20-%20Essential%20Ruby
DZone%20-%20Essential%20Ruby
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
 
Ontologies in RDF-S/OWL
Ontologies in RDF-S/OWLOntologies in RDF-S/OWL
Ontologies in RDF-S/OWL
 
Sparq lreference 1.8-us
Sparq lreference 1.8-usSparq lreference 1.8-us
Sparq lreference 1.8-us
 
Semantic Web(Web 3.0) SPARQL
Semantic Web(Web 3.0) SPARQLSemantic Web(Web 3.0) SPARQL
Semantic Web(Web 3.0) SPARQL
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
 
A Semantic Multimedia Web (Part 2)
A Semantic Multimedia Web (Part 2)A Semantic Multimedia Web (Part 2)
A Semantic Multimedia Web (Part 2)
 
Semantic web final assignment
Semantic web final assignmentSemantic web final assignment
Semantic web final assignment
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQL
 
2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovuk
 
Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF
 
SPARQL - Basic and Federated Queries
SPARQL - Basic and Federated QueriesSPARQL - Basic and Federated Queries
SPARQL - Basic and Federated Queries
 
Rdf with contexts
Rdf with contextsRdf with contexts
Rdf with contexts
 

Viewers also liked (20)

NAME THAT NERD
NAME THAT NERDNAME THAT NERD
NAME THAT NERD
 
Journalism Today - update
Journalism Today - updateJournalism Today - update
Journalism Today - update
 
Double Vision1
Double Vision1Double Vision1
Double Vision1
 
Edisi 17 Feb Medan
Edisi 17 Feb MedanEdisi 17 Feb Medan
Edisi 17 Feb Medan
 
Edisi 6 Feb Nas
Edisi 6 Feb NasEdisi 6 Feb Nas
Edisi 6 Feb Nas
 
EARN\'s Vision For American Prosperity
EARN\'s  Vision For American ProsperityEARN\'s  Vision For American Prosperity
EARN\'s Vision For American Prosperity
 
19 Feb Nas
19 Feb Nas19 Feb Nas
19 Feb Nas
 
Edisi Nas 23 Jan
Edisi Nas 23 JanEdisi Nas 23 Jan
Edisi Nas 23 Jan
 
Edisi 3 Feb Aceh
Edisi 3 Feb AcehEdisi 3 Feb Aceh
Edisi 3 Feb Aceh
 
11jun nas
11jun nas11jun nas
11jun nas
 
To leave or not to leave
To leave or not to leaveTo leave or not to leave
To leave or not to leave
 
13mei nas
13mei nas13mei nas
13mei nas
 
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
 
Edisi 26 Nov Aceh
Edisi 26 Nov AcehEdisi 26 Nov Aceh
Edisi 26 Nov Aceh
 
Als
AlsAls
Als
 
10jun nas
10jun nas10jun nas
10jun nas
 
Coches ColeccióN Mod
Coches ColeccióN ModCoches ColeccióN Mod
Coches ColeccióN Mod
 
Rm3-A Device
Rm3-A DeviceRm3-A Device
Rm3-A Device
 
Pembelajaran berbantukan web
Pembelajaran berbantukan webPembelajaran berbantukan web
Pembelajaran berbantukan web
 
Edisi 15 mei nas
Edisi 15 mei nasEdisi 15 mei nas
Edisi 15 mei nas
 

Similar to The NERD project

Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
Turmeric SOA Cloud Mashups
Turmeric SOA Cloud MashupsTurmeric SOA Cloud Mashups
Turmeric SOA Cloud Mashups
kingargyle
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
Thomas Roessler
 
Analysis Software Development
Analysis Software DevelopmentAnalysis Software Development
Analysis Software Development
Akira Shibata
 
gStore: A Graph-based SPARQL Query Engine
gStore: A Graph-based SPARQL Query EnginegStore: A Graph-based SPARQL Query Engine
gStore: A Graph-based SPARQL Query Engine
M. Tamer Özsu
 
PyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic LanguagesPyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic Languages
Ted Leung
 

Similar to The NERD project (20)

Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
The RDFa, seo wave
The RDFa, seo waveThe RDFa, seo wave
The RDFa, seo wave
 
RDFa, SEO wave
RDFa, SEO waveRDFa, SEO wave
RDFa, SEO wave
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
 
A First Analysis of String APIs: the Case of Pharo
A First Analysis of String APIs: the Case of PharoA First Analysis of String APIs: the Case of Pharo
A First Analysis of String APIs: the Case of Pharo
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific database
 
Turmeric SOA Cloud Mashups
Turmeric SOA Cloud MashupsTurmeric SOA Cloud Mashups
Turmeric SOA Cloud Mashups
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
 
Open Source Natural Language Processing - Francis Bond
Open Source Natural Language Processing - Francis BondOpen Source Natural Language Processing - Francis Bond
Open Source Natural Language Processing - Francis Bond
 
Rc173 010d-json 2
Rc173 010d-json 2Rc173 010d-json 2
Rc173 010d-json 2
 
LOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackLOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stack
 
Ruby - The Hard Bits
Ruby - The Hard BitsRuby - The Hard Bits
Ruby - The Hard Bits
 
Language-Independent Detection of Object-Oriented Design Patterns
Language-Independent Detection of Object-Oriented Design PatternsLanguage-Independent Detection of Object-Oriented Design Patterns
Language-Independent Detection of Object-Oriented Design Patterns
 
Nate tech deck
Nate tech deckNate tech deck
Nate tech deck
 
Cshals Tech Talk
Cshals Tech TalkCshals Tech Talk
Cshals Tech Talk
 
Analysis Software Development
Analysis Software DevelopmentAnalysis Software Development
Analysis Software Development
 
gStore: A Graph-based SPARQL Query Engine
gStore: A Graph-based SPARQL Query EnginegStore: A Graph-based SPARQL Query Engine
gStore: A Graph-based SPARQL Query Engine
 
The Forces Driving Java
The Forces Driving JavaThe Forces Driving Java
The Forces Driving Java
 
PyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic LanguagesPyCon UK 2008: Challenges for Dynamic Languages
PyCon UK 2008: Challenges for Dynamic Languages
 
Tokyotextmining#1 kaneyama genta
Tokyotextmining#1 kaneyama gentaTokyotextmining#1 kaneyama genta
Tokyotextmining#1 kaneyama genta
 

More from Giuseppe Rizzo

Zenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of DataZenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of Data
Giuseppe Rizzo
 

More from Giuseppe Rizzo (20)

Artificial intelligence for social good
Artificial intelligence for social goodArtificial intelligence for social good
Artificial intelligence for social good
 
AI in 60 minutes
AI in 60 minutesAI in 60 minutes
AI in 60 minutes
 
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HRCOMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
 
Understand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsUnderstand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational Agents
 
AI For Profiling Your Customers
AI For Profiling Your CustomersAI For Profiling Your Customers
AI For Profiling Your Customers
 
AI for Personalized Chatbot
AI for Personalized ChatbotAI for Personalized Chatbot
AI for Personalized Chatbot
 
Tourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsTourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel Bookings
 
The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity Linking
 
From Data to Knowledge for Tourists
From Data to Knowledge for TouristsFrom Data to Knowledge for Tourists
From Data to Knowledge for Tourists
 
Enabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityEnabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart City
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summary
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing Alignment
 
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
 
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksCrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
 
Learning with the Web. Structuring data to ease machine understanding
Learning with the Web. Structuring data to ease  machine understandingLearning with the Web. Structuring data to ease  machine understanding
Learning with the Web. Structuring data to ease machine understanding
 
Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...
 
L'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebL'enorme archivio di dati: il Web
L'enorme archivio di dati: il Web
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 
Zenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of DataZenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of Data
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

The NERD project

  • 2. What is a Named Entity recognition task?  A task that aims to locate and classify the name of a person or an organization, a location, a brand, a product, a numeric expression including time, date, money and percent in a textual document 12 March 2012 Seminar @ Ecole Centrale, Paris 2/21
  • 3. History of NER benchmarks  CoNLL 2003 and CoNLL 2005  schema (4 types): person, organization, location and miscellaneous  language independent task  ACE 2004, ACE 2005 and ACE 2007  schema (7 types): person, organization, location, facility, weapon, vehicle and geo-political entity  entity recognition, not just name (e.g. description, pronoun)  find relationships among entities extracted  TAC 2009 (Knowledge Base Track)  schema (3 types): person, organization and location  create a knowledge base from the named entities extracted  ETAPE 2012 (Named Entity Task)  schema: Quaero (7 main types, 32 sub-types) 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -3
  • 4. NER Tools  Standalone software  GATE  Stanford CoreNLP  Temis  Web APIs 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -4
  • 5. Factual comparison of 10 Web NER tools Alchemy DBpe Evri Extr Lup Calais Saplo WM Yahoo Zemanta dia Granularity OEN OEN OED OEN OEN OEN OED OEN OEN OED Language EN EN EN EN EN EN EN EN EN EN FR GR* IT FR FR SW FR GR PT* IT SP SP IT SP* PT RU SP SW Quota 30000 unl 3000 3000 unl 50000 1333 unl 5000 10000 (calls/day) Sample C/C++ Java AS Java N/A Java Java Java JS C# Clients C# JS Java JS Perl PHP Java Java PHP PHP PHP JS Perl Python Perl PHP5 PHP Python Python Ruby Ruby Content 150KB 452KB 8KB 32KB 20KB 8KB 26KB 80KB 7769KB 970KB chunk 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -5
  • 6. Alchemy DBpedia Evri Extr Lup Calais Saplo WM Yahoo Zemanta Response JSON Factual comparison (II) HTML HTM HTML HTML JSON JSON JSON JSON XML Format MicroF JSON L JSON JSON MicroF XML XML JSON XML RDF JSO RDF RDFa RDF RDF XML N XML XML RDF Entity 324 320 5 34 319 95 5 7 13 81 type number Entity N/A char N/A word range of char N/A POS range N/A position offset offset chars offset offset of chars Classif. Alchemy DBpedia Evri DBpe DBpedia OpenC N/A ESTER Yahoo FreeBase Ontologies FreeBase dia LinkedM alais Scema.org DB Defer. DBpeda DBpedia Evri DBpe DBpedia OpenC N/A DBpedia Wikipe Wikipedia Vocabulari FreeBase dia LinkedM alais Geonam dia IMDB es USCensus DB es MusicBrai UMBEL CIAFact nz OpenCyc book Amazon YAGO Wikicom YouTube MusicBrainz panies TechCrun CrunchBase ch ... 12 March 2012 Seminar @ Ecole Centrale, Paris 6/21
  • 7. Human made benchmarks  We performed two evaluation experiments:  WEKEX 2011  ISWC 2011 t = (entity, type, URI, relevant)  Each field has been rated by a Boolean value: true if correct, false otherwise Rizzo G., Troncy R. (2011), NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data. In: International Semantic Web Conference 2011 (ISWC'11), Bonn, Germany. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -7
  • 8. WEKEX 2011 Benchmark  Controlled experiment  4 human raters  10 English news articles (5 from BBC and 5 from The New York Times)  Each rater evaluated each article for 5 extractors 200 total evaluations  Fleiss's kappa score  moderate agreement among raters Rizzo G., Troncy R. (2011), NERD: Evaluating Named Entity Recognition Tools in the Web of Data. In: (ISWC'11) Workshop on Web Scale Knowledge Extraction (WEKEX'11), Bonn, Germany. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -8
  • 9. Results different behavior for different sources 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -9
  • 10. ISWC 2011 Benchmark  Controlled experiment  10 human raters  2 English news articles from The New York Times  each rater evaluated each article for 6 extractors 120 total evaluations  Fleiss's kappa score  substantial agreement among raters 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 10
  • 11. Results 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 11
  • 12. What is NERD? ontology1 REST API2 UI3 The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data 1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr/ 12 March 2012 Seminar @ Ecole Centrale, Paris 12/21
  • 13. NERD Ontology  Align the taxonomies used by the extractors 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 13
  • 14. Building the NERD ontology NERD type Occurrence Person 10 Organization 10 Country 6 Company 6 Location 6 Continent 5 City 5 RadioStation 5 Album 5 Product 5 ... ... 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 14
  • 15. NERD REST API /document /user GET, /annotation/{extractor} POST, JSON/RDF* /extraction PUT, /evaluation DELETE “entities” : [{ ... “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }] Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 15
  • 16. NIF: NLP Interchange Format Framework  Different outputs for the NLP tools OpenCalais DBpedia Spotlight "_type": "Organization", "@URI": "http://dbpedia.org/resource/DBpedia", “name": "North Atlantic Treaty Organization", "@types": "DBpedia:Software,DBpedia:Work” "organizationtype": "governmental civilian", "@surfaceForm": "dbpedia", "nationality": "N/A", "@offset": "0", "_typeReference": "@support": "11", http://s.opencalais.com/1/type/em/e/Organization", "@similarityScore": "0.2387271374464035", ... …  Manual effort required for integration or reuse  time consuming  need to capture the definition of the attributes used in the response format  NIF uses RDF for representing NER results as Linked Data 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 16
  • 17. Named Entities as textual annotations  Let's consider the document: http://www.w3.org/DesignIssues/LinkedData.html The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.…. All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff ... entities: { … [entity: W3C, startChar: 23107, endChar: 23110], … } 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 17
  • 18. NERD meets NIF Model documents through a set of strings deferencable on the Web : offset_23107_ 23110 a str:String ; str:referenceContext :offset_0_26546 . Map string to entity : offset_23107_ 23110 sso:oen dbpedia:W3C. Classification dbpedia:W3C rdf:type nerd:Organization . Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. In: (LDOW'12) Linked Data on the Web (WWW'12), Lyon, France. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 18
  • 19. NERD Demo 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 19
  • 20. NERD Timeline and Future Work beginning Comparison of named entity extractors NERD benchmarks NERD REST API and NERD ontology Lift NERD output results to the LOD cloud today NERD “smart” service: combining the best of all NER tools Dashboard for improving the NERD user experience 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 20
  • 21. http://nerd.eurecom.fr @giusepperizzo @rtroncy #nerd http://www.slideshare.net/giusepperizzo 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 21