Presentation for one of the keynotes at EKAW2014, where I talked about the need to lower the barrier for ontology development for those who have no experience with ontologies.
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already there?
1. Ontology Engineering
for and by the masses:
are we already there?
19th International Conference on Knowledge
Engineering and Knowledge Management
EKAW2014
27/11/2014
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
2. License
• This work is licensed under the license
CC BY-NC-SA 4.0 International
• http://purl.org/NET/rdflicense/cc-by-nc-sa4.0
• You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
• Under the following conditions
• Attribution — You must attribute the work by inserting
• “[source Oscar Corcho]” at the footer of each reused slide
• a credits slide stating: “These slides are partially based on
“Ontology Engineering for and by the masses: are we
already there?” by O. Corcho”
• Non-commercial
• Share-Alike
3. A walk through our Brave Little World of
Ontology Engineering
The world is living at year 21 After Gruber (A.G.)
All inhabitants of this world have this motto:
“Nothing is more beautiful than a formal,
explicit specification of a shared conceptualization”
repeated to them every night while the sleep (since year 5 A.G.)
Everybody loves Gruberliness, the property of creating models that are
SHARED, FORMAL and EXPLICIT
It is written in every single building and ontology repository
4. The world is divided in 10 regions, led by 10 world controllers who have
dominated it so far (according to Google Scholar)
Oxford (Ian Horrocks, 35K cit.) Milton Keynes (Enrico Motta, 11K cit.)
Buffalo (Barry Smith, 18K cit.) Trento (Nicola Guarino, 17K cit.)
Stanford (Mark Musen, 25K cit.) Karlsruhe (Rudi Studer, unknown cit.)
Madrid (Asunción Gómez-Pérez, 13K cit.) Amsterdam (Frank van Harmelen, 25K cit.)
Toronto (Mark S. Fox, 13K cit.) Osaka (Riichiro Mizoguchi, 7K cit.)
Disclaimer: This calculation is not exact. It only considers individuals, and favours geographical
distribution
5. Several wars in these 21 years of existence, which led to the current status:
• The “Language War”. It lasted 4 years. W3C Treaty Signed in Year 11 A.G.
Description logic won over frames, first order logic and semantic networks.
• The “Tool War”. It lasted 10 years. Tools like Protégé, OilEd, SWOOP,
WebODE, OntoEdit, or NeOn Toolkit fought among each other to get
installed on the computers of our world citizens. Protégé won. No treaty
signed
7. Average age: 50+
Number of individuals: 100+
Education: Formal logic and philosophy.
Sometimes Computer Science
Contribution to the world:
Write formal upper-level ontologies
DOLCE, BFO, GFO, SUMO
Languages spoken:
They are polyglots
First order and many other logics
OWL, OBO
Secret meetings: FOIS
Daily routine:
Wake up
Write a new term in a whiteboard
Think about it carefully
Incorporate it in an upper-level ontology
Some days they don’t include new terms
Alphas
8. Betas
Average age: 40+
Number of individuals: 1,000+
Education: Computer Science, Biology,
Geography. Some courses on logic
Contribution to the world:
Domain and application ontologies
Heavyweight, with many axioms, properties
and concepts
DOLCE, BFO, GFO, SUMO
Languages spoken:
Mostly OWL
Secret meetings: EKAW, KCAP
Daily routine:
Wake up
Open ontology design methodology book
Open ontology design pattern website
Open Protégé and activate reasoning
Work on 10 new terms
9. Gammas
Average age: 30+
Number of individuals: 10,000+
Education: Mostly Computer Science
Courses on ontologies as undergrads
Contribution to the world:
Write lightweight ontologies
Call them vocabularies
Create Linked (Open) Data
Languages spoken:
Native RDF Schema, and a bit of OWL
Secret meetings: ESWC, ISWC, WWW
Daily routine:
Wake up
Open LOV or prefix.cc
Look for vocabularies for their dataset
Select, extend and upload them in LOV
Update dataset in datahub.io
10. Deltas
Average age: unknown
Number of individuals: 10,000+
Education: Library Science
Some course on Computer Science
Contribution to the world:
Write codelists and thesauri
Languages spoken:
SKOS, RDF Schema
Secret meetings: Dublin Core Conference
Daily routine:
Wake up
Write a couple of new codelists/thesauri
Submit them to metadataregistry.org
Annotate some documents with them
11. Epsilons
Average age: 20+
Number of individuals: 100,000+
Education: Web development
Contribution to the world:
Write schema.org annotations
Some contributions to schema.org classes
and structured types
Languages spoken:
HTML, RDFa, JSON-LD
Secret meetings: Webinars, Hangouts, meetups
Daily routine:
Wake up
Check positioning of their site in Google
Annotate it with more schema.org tags
Wait for Google/Yahoo! to crawl them
Go back to next step
12. Open vote
• Are you an alpha, a beta, a gamma, a delta or an
epsilon?
• Or do you think that you can belong to several social
classes?
https://es.surveymonkey.com/s/3933H7C
• There should be a recent tweet from me (@ocorcho,
#ekaw2014), with a link to the survey
13. SHARED, FORMAL and EXPLICIT
• A happy world where all
sorts of ontologies,
vocabularies and
annotations are developed,
as efficiently as possible
• Everybody is happy in their
social class
• And where Gruberliness is
everywhere
Shared
Formal
Explicit
Alphas
Betas
Gammas
Deltas
Epsilons
14. And if somebody is not happy… a few grammes of recognition
• Every social class can take drugs after (even during)
work, to get even happier.
• One gramme of drug every time that…
• Alphas: a new term included in an upper-level ontology, or a
domain ontology is cleaned with their well-founded terms
• Betas: an inconsistency is found in some data thanks to the
logical axioms of their ontologies, or when their ontology is
included in BioPortal.
• Gammas: their ontology is listed in the LOV repository; and
ten grammes when used in some Linked Data dataset
• Deltas: the same for metadataregistry.org
• Epsilons: one gramme every 10.000 new Web pages
annotated according to schema.org
15. SHARED, FORMAL and EXPLICIT
But is our happy ontology engineering world big enough?
How many other people live outside of it?
There are savage reservations, where ugly non-ontologists live…
• They use relational databases
• Some of them are not even in normal form
• And “oh-my-God” CSVs
• And they communicate in natural language, HTML and
using UML class diagrams
However, sometimes they are visited by our inhabitants…
16.
17. The savage reservation in Madrid
• AENOR PNE 178301
• Norm on Open Data for
Smart Cities
• Organised by
• Spanish Ministry of
Industry
• AENOR
• AENOR CTN 178 group
• Subcommitee 3 on
Government and Mobility
• Workgroup on
Government
• Subgroup on Open Data
18. Some of the individuals in the savage reservation
• Coordinator
• Esther Minguela (Localidata)
• 35 members who belong to…
• Medium&Large Cities (10) – mostly City Information
Managers
• Private companies working for the public sector (6)
• Regions (3) – mostly Region Information Managers
• Ministries or alike (3)
• Geographic sector (3)
• I visited them for six months (January-June 2014),
trying to show the advantages of living in our Brave
Little World of Ontology Engineering
• Did I succeed? … No vote now (don’t spoil my presentation)
20. Main objectives of the work being done
• Make open data projects from cities more systematic
• Provide a reference guidelines for local administrations to
define, document and develop open data projects
• Evaluate the maturity of open data projects (through
indicators)
• Kickstarting them
• Continuous improvement
• Sustainability
• Quality and efficiency of the project
• Improve interoperability
• Decide on the 10 top-priority datasets to be opened
• Work on common data structures and vocabularies for these
datasets
21. 37 Metrics, grouped in domains (and dimensions)
Strategic Domain
1. Strategy
2. Leadership
3. Service-level agreement
4. Sustainability
Legal Domain
5. External and internal legal norms
6. Usage and licensing conditions
Organisational Domain
7. Responsible unit
8. Skilled team
9. Inventory of data
10. Priority
11. Measurement of the process
12. Measurements of usage and impact
Technical Domain
13. Catalogue
14. Available in the public sector catalogue
15. Documented datasets
16. Categories and search facilities
17. Availability
18. Persistent and friendly references
19. Accessibility
20. Access for free
21. Access systems in place
22. Primary data
23. Completeness
24. Documentation of data
25. Correctness
26. Geo-referencing
27. Linked Data
28. Update processes
29. Update frequency
30. New dataset inclusion
31. Data quantity
32. Data format
33. Vocabularies
Economic and social domain
34. Transparency, participation and
collaboration
35. Complaint/Conflict management
36. Fostering reuse
37. Developed reuse initiatives
22. • Each metric gets a number
• And each one has a weight,
agreed by group members
• A final indicator is then calculated
Total Value 0-200 201-400 401-600 601-800 801-1000
Open data indicator 1 2 3 4 5
Weight
Strategy
Strategy 25 %
Leadership 50 %
Service-level agreements 10 %
Sustainability 15 %
Level achieved Score
Level 0 (nothing) 0
Level 1 (you have
started doing it)
1
Level 2 (you are
good)
2
Level 3 (excellent) 3
An indicator on the maturity of open data projects
23. 10 Highest-Priority Datasets for 2015
• Listing based on the
current inventories from
all cities (and regions)
• Harmonisation
• Votes according to PSI-
reuse requests
Datasets
Cultural Agenda
Traffic
Population
Streets
Public Transport
Touristic Places and POIs
Budget
Shop Census
Air Quality
Contracts
Parkings
24. And now the meat…
• All that previous work may
have been done even by
our epsilons…
• Now it’s time to start
working on common data
structures and
vocabularies…
• Did I tell you that these
people were often visited
by some of the people
from our world?
• Before continuing, let’s
see some of the
conversations that we
managed to get acess
to…
27. Cool, we have a methodology…
Knowledge Resources
Non Ontological Resource
Reuse
Non Ontological Resource
Reengineering
2
2
2
Non Ontological Resources
Thesauri
DictionariesGlossaries Lexicons
Taxonomies
Classification
Schemas
O. Localization
9
Ontological Resource
Reengineering
4
4
4
O. Aligning
O. Merging
Alignments5
5
5
6
6
6
6
3
Ontological Resource
Reuse
3
Ontological Resources
O. Repositories and Registries
RDF(S)
OWL
Ontology Design
Pattern Reuse
7
O. Design Patterns
Ontology Restructuring
(Pruning, Extension,
Specialization, Modularization)
8
O. Specification O. Conceptualization O. ImplementationO. Formalization
1
RDF(S)
OWL
Scheduling
28. However…
• Our methodologies do not explain so much to domain
experts on what they have to do at each step
• So we just gave easy indications (as most of you do
normally)
• Start with competency questions, with a few answers
• This must come from data reusers’ requests
• And we call them “user stories”
• Extract terms, and classify them in nouns, adjectives, verbs
• Organise them a little bit
• Find common data structures out there (vocabularies,
ontologies) that use those terms or synonyms
• Decide which ones to use
• …
29. Savages working on their vocabularies…
• We used an agile-like method with a “competency
question backlog” (first some questions, and go down
the whole path, then some others, etc.)
• And used “common” tools
Google Docs Excel Card-sorting
• And now, let’s build the ontology
• Deadlock!!!
30. Deadlock 1. I have been told to reuse other ontologies
• We recommend reusing other ontological and non-
ontological resources (well, except for epsilons)
• That’s one of the bases of ontological engineering
• However, savages tend to do that at an early stage of
ontology development
• It causes confusion to them
• Should I use FOAF, or the Organization Ontology, or
vCard, or schema.org?
• And prevents people from being creative
• It causes endless discussions about terms (and lots of
problems with translations)
• Rec1: tell them to forget about reuse. Let them start
providing their own (wrong?) definitions, and agree
on those
31. Deadlock 2. I want my ontology to do inferences…
• A beta told me..
• OWL is funny to teach at University (especially for betas)
• It’s nice to see reasoning, consistency checking, OWA, etc.
• It is useful in many domains
• But developing such ontologies is not a task for our savages
• Rec2: Just work with text patterns, and guide them to write
good term definitions
• A district contains only neighbourhoods and census sections
• A shop can have at most three economic activities associated
to it
Note: Rabbit may be useful here (although I did not have time to
practice with it with this group)
32. Deadlock 3. I want my ontology to be ligthweight…
• A gamma told me…
• My ontology will be used for Linked Data publishing
(so that I am 5 stars!!)
• I have been said not to put domains or ranges
• I have been said to create only light taxonomies
• I have been said to use only RDF Schema
• Rec3: again, text patterns are a good option here
• Don’t make your experts worry about languages or formal
aspects
33. Deadlock 4. Which tool should I use?
• We thought that the war had ended?
• Alphas and betas told me to use Protégé
• Some of them said that I could use a Web-based version
• A gamma told me to use Neologism
• And an epsilon called me and said that it was enough if I used
tables with attributes, as in schema.org
• And then I saw an old tool, not available
anymore, that used schema.org-like
table-like descriptions and generated
ontologies in different languages
• WebODE
• Rec4: Use simple tools (e.g., Excel) that allow discussing
easily, without weird constraints
34. Deadlock 5. But these ontologies to reuse are in English
• These developers and data reusers prefer Spanish
terms
• We all know that identifiers are just symbols
• e.g., labels and comments in different languages should be
enough
• However…
• Should we mix term identifiers in different languages?
• Do we translate all terms to our language?
• Rec5: no idea yet about what to do…
35. The results so far…
Datasets Vocabulary
General vocabularies Postal Address: http://vocab.linkeddata.es/datosabiertos/def/urbanismo-
infraestructuras/direccionPostal
Administrative: http://vocab.linkeddata.es/datosabiertos/def/sector-publico/territorio
Streets http://vocab.linkeddata.es/datosabiertos/def/urbanismo-infraestructuras/callejero
SKOS: http://vocab.linkeddata.es/datosabiertos/kos/urbanismo-infraestructuras/tipo-via
Tourism http://vocab.linkeddata.es/datosabiertos/def/turismo/lugar
Cultural Agenda http://vocab.linkeddata.es/datosabiertos/def/cultura-ocio/agenda
Shop Census http://vocab.linkeddata.es/datosabiertos/def/comercio/tejidoComercial
SKOS (NACE): http://vocab.linkeddata.es/datosabiertos/kos/comercio/cnae
Population http://www.w3.org/TR/vocab-data-cube/
SKOS:
o Age: http://eurostat.linked-statistics.org/dic/age.rdf
o Gender: http://eurostat.linked-statistics.org/dic/sex.rdf
o Geo: http://eurostat.linked-statistics.org/dic/geo.rdf
Budget http://vocab.linkeddata.es/datosabiertos/def/hacienda/presupuesto
Contracts http://contsem.unizar.es/def/sector-publico/pproc
Air Quality http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
Traffic http://vocab.linkeddata.es/datosabiertos/def/transporte/trafico
Public Transport http://vocab.linkeddata.es/datosabiertos/def/transporte/transportePublico
Parkings http://vocab.linkeddata.es/datosabiertos/def/urbanismo-infraestructuras/aparcamiento
36.
37. A walk through the Brave Little World of
Ontology Engineering
Why are we still discussing about what ontologies should be used for?
(see recent thread in Google+’s LOV community, started by Bernard Vatant,
on the “intended and real usage of vocabularies in LOV”)
https://plus.google.com/u/0/+BernardVatant/posts/SDYTN3FGkEr
How will our world be at year 25 After Gruber (A.G.)? And at year 50 A.G.?
Will there be soon a revolution led by epsilons to rule the world?
Are we the ones that live in a savage reservation in a larger world?
Or will we conquer the rest of the world?
38. Which social class do EKAW2014 participants belong to?
https://es.surveymonkey.com/results/SM-F75GGL2V/
39. Acknowledgements
• First of all, my acknowledgements go to
Aldous Huxley for writing the always-
inspiring “Brave New World” book.
• I would also like to give thanks to some of those who have
helped with comments on this presentation
• Asunción Gómez-Pérez and the whole Ontology Engineering
Group team
• José Manuel Gómez-Pérez
• Those who provided some material (acknowledgements in the
corresponding slides)
• And to all those with whom I have enjoyed building ontologies
for so many years (far too many to enumerate here)
• Specially those from the savage reservation in Madrid
40. Disclaimers
• The contents of this slideset
represent my own view on this topic
• Not necessarily the views of all
members of the
Ontology Engineering Group (UPM)
• They are based on some of my own
experiences in ontology engineering
• These are not necessarily generalisable
• Specially not valid, probably, in ontology-savvy domains
• And more important…
• I was trying to be provocative here, to generate discussion
41. Ontology Engineering
for and by the masses:
are we already there?
19th International Conference on Knowledge
Engineering and Knowledge Management
EKAW2014
27/11/2014
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
Editor's Notes
Hello, I am an alpha, nice to meet you. Do you know what we do in our brave new ontology engineering world?
No, I am a savage. I only use relational databases, Excel, UML class diagrams
Oops, then you will not understand what “unity” means… See you…
Hello, I am a beta, nice to meet you. Have you ever heard the word ontology?
Yes, somebody told me that it is useful to publish my data, but I do not know how.
Well, yes, but that’s only for gammas. What it is really useful is to do some reasoning with it.
Ohh, cool, will I be able to detect duplicates in my data?
Yes, sure, we need to create a few axioms, activate Hermit, then consider the Open World Assumption, go back and reason again…
Ufff, I do not understand a word…
Obviously, you are a savage…
Hello, I am a gamma, nice to meet you. Do you use ontologies for reasoning?
Well, somebody came to tell me about it, but I could not understand a word of what they were telling me.
Ahh, yes, that’s because you really need vocabularies for publishing, nothing else.
And how do I select those vocabularies?
Just go to LOV, look for terms that are used, and use them. And the more you import from others the better
Hello, I am a delta, nice to meet you. Have you annotated your datasets and Web pages with Dublin Core?
Yes, I do all the time. It’s done by my content management system automatically.
And have you created codelists or SKOS thesauri already?
No, I was starting to do ontologies.
That’s going to hurt…
Hello, I am an epsilon, nice to meet you.
Hey, I was told that epsilons are not allowed to leave their world and visit us.
Of course, that’s the reason why I contact you through Google Hangout. Have you done your schema.org markup already?
Yes, I want to be ranked the first in Google searches. But I do not know the exact meaning of some properties
Ahh, that doesn’t really matter. We just use tables to describe them.