2. What we are talking about
OpenRefine www.openrefine.org
NER extension integrated with
Dandelion API
http://freeyourmetadata.org/named-entity-extraction/
(dandelion.eu)
3. What industries are using OpenRefine?
https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
5. What does OpenRefine offer that other
data-parsing tools don't?
http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
6. reconciliation of text data against reference data
services containing strong identifiers (Freebase,
OpenCorporates, any SPARQL or RDF, etc)
!
simple linking of reconciled entities to other info
sources like Wikipedia, MusicBrainz, IMDB, etc
[…]
[…]
9. normalize, clean and extract data from different
sources
reconcile against internal reconciliation services
( administrative regions, names and telephone
numbers… )
apply rules and transformations to data, aligned
it with our internal ontologies
14. reconciliation works great for those fields
in your dataset that contain single terms
names of people
countries,
works of art
[…]
15. and what if we have a column with
unstructured texts, like this one?
16. we need a new step in the data curation workflow…
a new column data,
labelled “dataTXT”
extract named
entities using
NER extension
+ Dandelion API
data column with
some texts
17. in this column, there are named concepts,
linked to Wikipedia
label + URI
“Collective action” + http://en.wikipedia.org/wiki/Collective_action
18. make a text filter
looking for a concept
classify and categorize
the content
…
things, not strings
20. Open Data community real issues
Using OpenRefine + NER extension with
Dandelion API
extract meaninful informations from some
CVs, like names, organizations, skills, …
http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction
normalize organizations names cited in some
texts
21. Data journalists
Using OpenRefine + NER extension with
Dandelion API
extract relevant news to a precise topic
( a person, a brand or a company )
write a summary from a politician speech, starting
from the main concepts extracted from the text
mine specific informations in judicial decisions
(judge's name, court, area of law and neutral citation
number
22. Using OpenRefine + NER extension with
Dandelion API
Text mining on tweets: extract brands,
places and concepts easily from a twitter flow
related to an event
Text mining on website content: extract concepts and
places easily from a webpage, to improve website
SEO ranking
Social media specialists
23. Using OpenRefine + NER extension with
Dandelion API
Understand your own bank account statements:
extract useful informations, like brands and places,
to categorize and classify your own expenses
“Quantify self” movement
Analytics on Personal Data