SlideShare a Scribd company logo
1 of 24
Download to read offline
Using entity extraction extension with 	

OpenRefine and Dandelion API	

!
food for thoughts
What we are talking about
OpenRefine www.openrefine.org
NER extension integrated with
Dandelion API
http://freeyourmetadata.org/named-entity-extraction/
(dandelion.eu)
What industries are using OpenRefine?
https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
data journalists
metadata curators
museums
libraries
research labs
SEO folks
data scientists
enterprises
universities
patent attorneys
Open Data 	

hackers
Social Media specialists
civil servants
What does OpenRefine offer that other 	

data-parsing tools don't?
http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
reconciliation of text data against reference data 	

services containing strong identifiers (Freebase,
OpenCorporates, any SPARQL or RDF, etc)	

!
simple linking of reconciled entities to other info 	

sources like Wikipedia, MusicBrainz, IMDB, etc	

[…]
[…]
How we are using it, at SpazioDati?
OpenRefine is inside 
our data curation controller
normalize, clean and extract data from different 	

sources	

reconcile against internal reconciliation services 	

( administrative regions, names and telephone 	

numbers… )
apply rules and transformations to data, aligned	

it with our internal ontologies
A look at OpenRefine &	

reconciliation
Why it’s useful reconciliation?
Instruments
bla bla bla
bla bla bla bla
…
what kind of 	

instruments?
reconciliation identifies 	

keywords in flowing text and gives them a URL
from strings to things
instruments	

data column
musical instruments
measuring instruments
aeronautical instruments
URL
URL
URL
Instruments
bla bla bla
reconciliation works great for those fields 	

in your dataset that contain single terms
names of people	

countries, 	

works of art	

[…]
and what if we have a column with	

unstructured texts, like this one?
we need a new step in the data curation workflow…
a new column data,	

labelled “dataTXT”
extract named 	

entities using	

NER extension 	

+ Dandelion API
data column with 	

some texts
in this column, there are named concepts, 	

linked to Wikipedia
label + URI
“Collective action” + http://en.wikipedia.org/wiki/Collective_action
make a text filter
looking for a concept
classify and categorize 	

the content
…
things, not strings
some scenarios
Open Data community real issues
Using OpenRefine + NER extension with 	

Dandelion API
extract meaninful informations from some	

CVs, like names, organizations, skills, …
http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction
normalize organizations names cited in some	

texts
Data journalists
Using OpenRefine + NER extension with 	

Dandelion API
extract relevant news to a precise topic	

( a person, a brand or a company )
write a summary from a politician speech, starting 	

from the main concepts extracted from the text
mine specific informations in judicial decisions 	

(judge's name, court, area of law and neutral citation
number
Using OpenRefine + NER extension with 	

Dandelion API
Text mining on tweets: extract brands,	

places and concepts easily from a twitter flow	

related to an event
Text mining on website content: extract concepts and
places easily from a webpage, to improve website	

SEO ranking
Social media specialists
Using OpenRefine + NER extension with 	

Dandelion API
Understand your own bank account statements: 	

extract useful informations, like brands and places, 	

to categorize and classify your own expenses
“Quantify self” movement
Analytics on Personal Data
@dandelionapi	

#refine	

#ner
you know other use cases?	

tell us on Twitter!
@spaziodatidandelion.eu

More Related Content

What's hot

Data Wrangling with Open Refine
Data Wrangling with Open RefineData Wrangling with Open Refine
Data Wrangling with Open RefineLOUIS Libraries
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfRichard Cyganiak
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureMichele Pasin
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine TrainingLiz Grumbach
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
 
Omitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataOmitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataTope Omitola
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise Ontotext
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemTerry Reese
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearchTope Omitola
 
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...CIARD Movement
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?mhaendel
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebasesmhaendel
 
Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineMartin Magdinier
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaCrossref
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
 

What's hot (20)

Data Wrangling with Open Refine
Data Wrangling with Open RefineData Wrangling with Open Refine
Data Wrangling with Open Refine
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
 
SemanticWebApp
SemanticWebAppSemanticWebApp
SemanticWebApp
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine Training
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
Omitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataOmitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddata
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystem
 
Linked Open (meta)Data
Linked Open (meta)DataLinked Open (meta)Data
Linked Open (meta)Data
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refine
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
 

Viewers also liked

Journaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensablesJournaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensablesCedric Motte
 
Neural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learningNeural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learningMatthew
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
OpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for LibrariansOpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for Librarianstfmorris
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 

Viewers also liked (6)

Journaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensablesJournaliste web, 5 outils indispensables
Journaliste web, 5 outils indispensables
 
OpenRefine Tutorial
OpenRefine TutorialOpenRefine Tutorial
OpenRefine Tutorial
 
Neural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learningNeural nets: How regular expressions brought about deep learning
Neural nets: How regular expressions brought about deep learning
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
OpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for LibrariansOpenRefine - Data Science Training for Librarians
OpenRefine - Data Science Training for Librarians
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 

Similar to Using entity extraction extension with OpenRefine and Dandelion API

Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at AirbnbNeo4j
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsBert Carelli
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebMathieu d'Aquin
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data contexteldorina
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Mark Wilkinson
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with searchJean Graef
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentOntotext
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseCharlie Hull
 
PoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataPoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataAndreas Blumauer
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialMathieu d'Aquin
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010George Thomas
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchablePeter Mika
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataMathieu d'Aquin
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011sssw2011
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glancepoojagupta267
 

Similar to Using entity extraction extension with OpenRefine and Dandelion API (20)

Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovations
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data context
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterprise
 
PoolParty SKOS and Linked Data
PoolParty SKOS and Linked DataPoolParty SKOS and Linked Data
PoolParty SKOS and Linked Data
 
Linked Data
Linked DataLinked Data
Linked Data
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
 
Exploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorialExploring and using the Semantic Web - SSSW09 tutorial
Exploring and using the Semantic Web - SSSW09 tutorial
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchable
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked Data
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 

More from SpazioDati

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismSpazioDati
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storySpazioDati
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsSpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APISpazioDati
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTSpazioDati
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalySpazioDati
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableSpazioDati
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked SpazioDati
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013SpazioDati
 
Introducing JSONpedia
Introducing JSONpediaIntroducing JSONpedia
Introducing JSONpediaSpazioDati
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1SpazioDati
 

More from SpazioDati (14)

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data Journalism
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meeting
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the story
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked Statistics
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch API
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in Italy
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuable
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
 
Introducing JSONpedia
Introducing JSONpediaIntroducing JSONpedia
Introducing JSONpedia
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Using entity extraction extension with OpenRefine and Dandelion API

  • 1. Using entity extraction extension with OpenRefine and Dandelion API ! food for thoughts
  • 2. What we are talking about OpenRefine www.openrefine.org NER extension integrated with Dandelion API http://freeyourmetadata.org/named-entity-extraction/ (dandelion.eu)
  • 3. What industries are using OpenRefine? https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
  • 4. data journalists metadata curators museums libraries research labs SEO folks data scientists enterprises universities patent attorneys Open Data hackers Social Media specialists civil servants
  • 5. What does OpenRefine offer that other data-parsing tools don't? http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
  • 6. reconciliation of text data against reference data services containing strong identifiers (Freebase, OpenCorporates, any SPARQL or RDF, etc) ! simple linking of reconciled entities to other info sources like Wikipedia, MusicBrainz, IMDB, etc […] […]
  • 7. How we are using it, at SpazioDati?
  • 8. OpenRefine is inside our data curation controller
  • 9. normalize, clean and extract data from different sources reconcile against internal reconciliation services ( administrative regions, names and telephone numbers… ) apply rules and transformations to data, aligned it with our internal ontologies
  • 10. A look at OpenRefine & reconciliation
  • 11. Why it’s useful reconciliation? Instruments bla bla bla bla bla bla bla … what kind of instruments?
  • 12. reconciliation identifies keywords in flowing text and gives them a URL from strings to things
  • 13. instruments data column musical instruments measuring instruments aeronautical instruments URL URL URL Instruments bla bla bla
  • 14. reconciliation works great for those fields in your dataset that contain single terms names of people countries, works of art […]
  • 15. and what if we have a column with unstructured texts, like this one?
  • 16. we need a new step in the data curation workflow… a new column data, labelled “dataTXT” extract named entities using NER extension + Dandelion API data column with some texts
  • 17. in this column, there are named concepts, linked to Wikipedia label + URI “Collective action” + http://en.wikipedia.org/wiki/Collective_action
  • 18. make a text filter looking for a concept classify and categorize the content … things, not strings
  • 20. Open Data community real issues Using OpenRefine + NER extension with Dandelion API extract meaninful informations from some CVs, like names, organizations, skills, … http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction normalize organizations names cited in some texts
  • 21. Data journalists Using OpenRefine + NER extension with Dandelion API extract relevant news to a precise topic ( a person, a brand or a company ) write a summary from a politician speech, starting from the main concepts extracted from the text mine specific informations in judicial decisions (judge's name, court, area of law and neutral citation number
  • 22. Using OpenRefine + NER extension with Dandelion API Text mining on tweets: extract brands, places and concepts easily from a twitter flow related to an event Text mining on website content: extract concepts and places easily from a webpage, to improve website SEO ranking Social media specialists
  • 23. Using OpenRefine + NER extension with Dandelion API Understand your own bank account statements: extract useful informations, like brands and places, to categorize and classify your own expenses “Quantify self” movement Analytics on Personal Data
  • 24. @dandelionapi #refine #ner you know other use cases? tell us on Twitter! @spaziodatidandelion.eu