Description:
We start this training with an exploration into what schema looks like within Grakn, starting with clarifying the motivation for schema, the conceptual schema of Grakn, and its relationship to the Enhanced Entity-Relationship model.
Then we break things down a bit more philosophically. What does it mean to model a knowledge domain - specifically when modelling in Grakn which allows for a much closer representation to true domain.
Takeaways:
- Be able to articulate why schema is so beneficial when using Grakn, why we use one and how it enables a more expressive model.
- Write a Grakn schema in Graql.
- Practice modelling one of your own domains and begin to write the model in Graql
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Knowledge Modelling Principles for Grakn Academy
1. Grakn Academy | Knowledge Modelling Principles
November 11th 2020
Tomás Sabat and Daniel Crowe
2.
3. a. Logistics and intros
b. The modelling philosophy, principles and techniques around Grakn’s
knowledge schema
c. Best practices for designing schema
Agenda
5. Domain Modelling - best practices
Here we set out the best practice guidelines for creating a Graql schema.
It’s possible to create a successful model that does not conform to the
guidelines.
These guidelines aim to help maximise:
• true-to-domain modelling
• flexibility
• and extensibility
6. Importance of Modelling Choices
Example
We want to carefully choose our data model, that is, choosing what should be an entity, a
relation, or an attribute.
We want this model to closely reflect the domain. In this way we can know that if new data
becomes available in the domain then it will fit into the data model.
With this schema, when later we find
there is information regarding a
person’s employment, how can we
add it?
If the domain is modelled true-to-life,
then extra schema (and therefore
data) is added trivially.
7. Importance of Naming - naming concepts
• understand your model
• maintain it
• extend it
• adapt it.
Naming and choosing your entities, relations and attributes are two issues that
are tied together.
The naming should closely link to typical domain terminology, since natural
language is the means to describe the domain.
Using the expected domain terminology is important, this lets another domain
expert easily:
Focussing on terminology to
determine structure is a typical
approach for software architecture in
general, not just for Graql schema
8. Importance of Naming - naming conventions
• Naming is all in lower case
• Use hyphens
• Indent on newlines after declaring a type e.g.
9. Choosing Type Names
A Type Name should:
• Ideally be a noun (except for attributes)
• Be singular, the presence of more than one instance lends plurality
• Be as context-specific as possible, using the exact word that describes the
concept in- context
• If you can’t find a noun specific enough then try concatenating two or more
nouns by hyphenating them:
• these could become too verbose, so try to find the balance between
specificity and verbosity!
10. Choosing Type Names
A Type Name should:
• If a noun can’t be found, then use a present participle or past-participle of a verb as an
adjective to capture the context. Combine this with a context non-specific noun.
• e.g. below authored-content , since content would be too generic, and authored-
book would be too specific (as it would rule out other types of content):
• prepositions may also need to be interjected into the naming if multiple words are
required.
• End the name with a noun, not a preposition
• i.e. avoid -to, -from, -by, -for as they tend to add confusion
11. Take a Look at Your Domain
Exercise - 3 minutes - individually:
• Identify one area or piece of your domain - includes at least (1) entity and (1) relation
• Draw this as a diagram, making the connections between concepts
• What naming decisions have you made vs. the existing naming you have - be prepared
to talk about one fo these
• Screenshot your diagram and drop it into the #academy channel in discord.
12. Entities
Choosing things to model as entities
“An entity may be defined as a thing capable of an independent existence that can be
uniquely identified. An entity is an abstraction from the complexities of a domain. When
we speak of an entity, we normally speak of some aspect of the real world that can be
distinguished from other aspects of the real world.”
(https://en.wikipedia.org/wiki/Entity%C3%A2%C2%80%C2%93relationship_model)
Good choices for entities are:
• Any physical thing in the real world should be modelled as an entity (e.g. animal,
person, device, building)
• Anything that exists logically but doesn’t require involvement of other things in
order to exist, groups or collections of things.
13. Entities
Good choices for entities are:
• Use concrete/proper/common/abstract/collective.
proper – Homo Sapien (normally an attribute name of an instance), common
• concrete – person, tree, car.
• collective – family, government, team, orchestra, set
• Abstract – religion, pain, principle.
• To specialise a general noun, use a combination with another noun – social-
group.
14. Relations
Choosing things to model as relations
It’s easy to post at certain concepts as definitely entities, e.g. car .
Harder are the more conceptual, to decide whether they are entities or relations.
Relation categories
Binary relations should conform to the mathematical definitions. These definitions say that a
binary relation must either have the property or anti-property for each of the following cases:
Property Anti-property Property description
Symmetric Anti-symmetric Relation is the same in both directions
Transitive Anti-transitive Relation can be chained
Reflexive Anti-reflexive Roleplayer can be related to itself through that relation
This means that if a concept doesn’t have a property or anti-property, then it cannot be a binary
relation. Therefore it is likely an entity or a ternary or N-ary relation.
15. Relations
Choosing things to model as relations
Example 1
An employment relation, with roles employee and employer is antisymmetric, antitransitive and
antireflexive
We see that it is logically consistent to define it in these terms
Example 2
A religion is neither:
• symmetric or antisymmetric
• transitive or antitransitive
• reflexive or antireflexive
Therefore we can conclude that a religion is not a binary relation
16. Relations
Choosing things to model as relations
In general, relations shouldn’t make sense without their roles. For example, a marriage can’t
logically exist without at least one spouse/husband/wife
• In language, a relation cannot be referred to without the need to reference something else to
contextualise it. marriage is the go-to example here, since it loses context without referring to the
people that were married.
• Ideally, we are looking for the concept that connects two things, not a direct connection (often
those are role names, like employee)
• For instance here, we don’t use “owns” (the edge you might use in a triple) anywhere here:
**do you notice that ownership is the verbal noun of owns
17. Relations
Choosing things to model as relations
• Gather together domain terminology that sounds similar to the concept you want to model. Then
determine which are candidate relation, role, and entity names (determining and naming
attributes is often not too hard) .
• Remember that a role describes how a thing behaves in the scope of a relation. Examples with
roleplayer type, role, relation :
• a car behaves like property in an ownership
• a station behaves as a stop along a train-route
• a person behaves as an employee in an employment
18. Relations
Choosing things to model as relations
•Relation names could describe membership to a grouping/collection of things (component, group-
membership), an action/ongoing state (marriage, comparison, authorship, participation), or a
description of a direct interrelation between two or more things (friendship, parenthood,
association, drug-protein-interaction)
19. Relations
Choosing things to model as relations
• A relation is defined such that an instance should not be able to exist without relating at least
one instance for one of its roles. This is the idea that a relation is dependent upon the existence of
one or more roleplayers.
• A relation should still make sense even if any number of it’s roleplayers are missing
• The roles and roleplayers should make logical sense to be connected in any combination
20. Relations
Naming a relation
• You should find that you choose names from these categories:
• abstract nouns,
• transitive verbs that can accept 2 or more arguments – decide, agree, marry.
•their verbal nouns are preferable – decision, agreement, marriage.
•(https://www.grammar-monster.com/lessons/nouns_different_types.htm)
21. Relations
Naming a relation
Ending with Nouns Nouns, verbs and prepositions
How does it look when querying?
Using nouns only
Nouns, verbs and prepositions
Noun combinations can be more exact, but are more verbose. It’s your choice!
22. Relations
Naming a relation
• Wherever possible, relations should be named in such a way that the name doesn’t include a
‘reference’ to one of its roleplayers in particular.
• Parenthood is an example of a fairly unavoidable case, where the relation naming refers more
to the role of parent than the role of child .
• An example of the ideal case would be:
23. Discussion
Let’s give it a try on our own domains
• Go back to your diagram and assess the naming choices for your relation(s)
• Add or update role players
• Be prepared to share what you found and why you made the decisions you did
24. Attributes
Choosing things to model as attributes
• Usually the easiest to choose, since they are the direct description of a set of values that we want
to model.
Naming an attribute
The name of an attribute should refer to a literal value.
•Make attributes context-specific
• Where necessary by concatenating words, ending with a noun (as for entities)
• Abstract nouns e.g. colour
• Adjectives e.g. friendly
• Intransitive verbs (no direct objects, can’t be followed by “who” or “what”) e.g. is-raining,
graduated.
25. Composition vs. Inheritance
Composition replaces the temptation of multiple inheritance.
“Entity type Y is a subtype (subclass) of an entity type X if and only if every Y is necessarily an
X” (https://en.wikipedia.org/wiki/Enhanced_entity%C3%A2%C2%80%C2%93relationship_model)
Therefore define customer sub person; is a bad idea, since:
• An organisation could be a customer, therefore customer is a behaviour (a role)
• A person who plays the role of a customer could play the role of many other things, e.g. teacher
Mindset
Requires a shift in mindset, instead, see using roles as composition for behaviours of a concept
Try putting names in a context like this:
A [relation] has a [role] in the form of a [thing]
26. Benefits of Inheritance - Scoping Queries
Wide scope: Get all the posts
Narrow scope: Get all the comments
Intermediate scope: Get all media
27. Benefits of Inheritance - better constraints
If we use this for companies owning offices:
And we also use this for people owning social groups.
We see a problem. Now a person can own an
office, and a company can own a social
group.
With this schema Grakn cannot enforce this
for us.
28. Benefits of Inheritance - better constraints
We have the constraints we want, and we can still retrieve the subtypes using:
29. To help us see the use of ternary relations, consider someone buying a product
Start with only binary relations: Ternary since all 3 occur at the same time
Where do we add value for the sale? This gives us the perfect way to add the value
Ternary and N-ary Relations
30. Now we can refer to the transaction in
other relations.
Note that this can be favourable over
adding another role to the existing
relation.
This is better for:
• Consistency across schema
• Versatility, we can add more
information to either of the two
relations
Nested Relations
31. Schema design impacts query performance
Use context-specific relation and role names, this allows the query planner to find a
good path (otherwise all data is homogeneous, it all looks the same)
Optimisation
32. Writing our Domain Schema
Exercise - 10 minutes - individually:
• Using Workbase or a text editor, try building a schema for your domain!
33. An added incentive to keep improving your schema
We’re going to give away a swag pack (t-shirt, stickers, etc.) to the best
schema posted to twitter with the tag: @graknlabs + #GraknAcademy
You’ll have 7 days from now to post - Tomás and I will be picking the
winner(s) next Wednesday morning at 9am gmt+1