Presentation given at the SIB training: Using the Semantic Web for faster (Bio-)Research
http://edu.isb-sib.ch/course/view.php?id=212
(http://sgtp.net/AndreaSplendiani)
2. Semantic Web @Novartis
2
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
3. Semantic Web uptake in time
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use3
Context
Metastore/RDF
prep. production
“Semantic Web in pubmed”
preparation
prep
Query federation
Visualisation
Other semantic technologies
CTMF p. p.
4. Semantic Web usage within the organization
4
Context
Activities of TMS:
§ Text mining
§ Ontology development
§ Ontology provision
§ Data curation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
5. Semantic Web @Novartis
5
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
6. Metastore: a central repository for ontologies
6
Semantic Web in production: Metastore
§ Consists of a semantic data federation layer based on controlled terminologies
extracted from scientific data repositories
§ Organized around scientific concepts: Genes, Proteins, Indications, Anatomy etc…;
some hierarchically organized and classified
§ Complemented by referential knowledge (cross references to internal and external
knowledge repositories)
§ Supports different use cases, including text mining, data curation, data integration,
search
§ Accessible through SPARQL endpoint, dedicated service layer and reusable
widgets; full integrated application (MS Viewer) released to visualize all Metastore
content.
§ Based on an RDF data model
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
7. Metastore: content and usage
7
Semantic Web in production: Metastore
Approximately >2M accesses per month
March 2013
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
8. Metastore data model
8
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
9. Metastore technology I
9
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
10. Metastore technology II
10
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
Staging
Table
T_STABLE
RDF Triple
store
Materialized
Views
SPARQL end
Point Joseki
Relational
Tables
• Pointers
• History
• Versions
• Logs
• Reference
tables
Jena
Query SQL and
PL/SQL APIs
D
A
T
A
-
S
e
r
v
i
c
e
s
RDF/XML
files
11. Metastore Widgets (suggest example)
11
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
12. Metastore applications (Metastore viewer: summary)
12
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
13. Metastore applications (Metastore viewer: links)
13
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
14. Metastore applications (Metastore viewer: explorer)
14
Semantic Web in production: Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
15. Semantic Web @Novartis
15
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
- Query federation
- Visualization/interaction
- Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
16. Query federation: why and how
16
Semantic Web in Research: query federation
• Internal and external
data already in RDF
• Large datasets in
relational systems
• Proprietary datasets
with license restrictions
(e.g.: one server only)
• Relational 2 RDF
mapping (materialised
and virtualised)
• Bridge ontologies (work
in progress)
• Distributed queries
(service)
Why ? How ?
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
17. Data and systems architecture: example
17
Semantic Web in Research: query federation
Different arrangements possible (with caveats)
Export!
triplest !
SERVICE!
Dynamic translation!
Persist
triples!
Ontop!
SPARQL
End Point!
NIBR!
Data
Warehouse!
!
Ontop!
API!
Assay
Repository!
RDBMS!
Allegrograph!
!
Triplestore &
End point!
UNIPROT/EBI
SPARQL End
Point!
METASTORE!
Oracle Spatial &
graphs!
R2RML!
+ reasoning!
Metastore!
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
18. Federated query example
18
Semantic Web in Research: query federation
Assays
UNIPROT
Metastore
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
19. Federated queries: logical model
19
Semantic Web in Research: query federation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
20. RDF virtualization via OnTop
20
Semantic Web in Research: query federation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
21. Semantic Web @Novartis
21
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
- Query federation
- Visualization/interaction
- Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
22. Visualization: why and how
22
Semantic Web in research: visulization and interaction
• Accessibility of RDF
data by end users
• Complexity (or
unfamiliarity) with
SPARQL
• General lack of
knowledge on the
structure of data, at
query time
• Visual, interactive
environment
• Pre-configuration to
optimize interaction
styles
• Combination of tools
and exploration
paradigms
• Data access through
SPARQL endpoints
Why ? How ?
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
23. RDF data explorer configuration
23
Semantic Web in research: visulization and interaction
§ Visualisation features are tuned to
the datasets via a semi-automatic
configuration.
§ Structure discovery:
• ontology
• queries
• sampling
• manual specification/overriding
§ Manual tuning of the ontology and
other interaction parameters
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
24. Data overview
24
Semantic Web in research: visulization and interaction
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
25. Interaction: query builder + suggest
25
Semantic Web in research: visulization and interaction
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
26. Interaction: path suggestions
26
Semantic Web in research: visulization and interaction
Assisted query formulation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
27. Visulization and graph navigation
27
Semantic Web in research: visulization and interaction
Detail, Augmentation, Filtering, query re-formulation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
28. Exploration, layouts, graphic clues
28
Semantic Web in research: visulization and interaction
Detail, Augmentation, Filtering, query re-formulation
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
29. Multiple exports, sharing
29
Semantic Web in research: visulization and interaction
§ “queries” can be saved and shared
as files or links
§ Query history
§ Download of partial or total datasets
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
30. Semantic Web @Novartis
30
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
- Query federation
- Visualization/interaction
- Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
31. 31
Example: provision of “phenotype ontologies”
Semantic Web in Research: other projects
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
<owl:Class rdf:about="http://purl.obolibrary.org/obo/HP_0001636">
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Tetralogy of Fallot</rdfs:label>
<owl:equivalentClass>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/PATO_0000001"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001629"/>
</owl:Restriction>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001642"/>
</owl:Restriction>
…
What systems can understand:
HP_0001636 hasPart HP_0001629
32. 32
Example: provision of “phenotype ontologies”
Semantic Web in Research: other projects
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
<owl:Class rdf:about="http://purl.obolibrary.org/obo/HP_0001636">
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Tetralogy of Fallot</
rdfs:label>
<owl:equivalentClass>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/PATO_0000001"/>
<owl:Restriction>
<owl:onProperty rdfresource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001629"/>
</owl:Restriction>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/HP_0001642"/>
</owl:Restriction>
What systems can understand:
HP_0001636 hasPart HP_0001629
Imports closure
Classification
Extraction
33. Semantic Web @Novartis
33
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
- Query federation
- Visualization/interaction
- Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
34. CTMF: Collaborative Terminology Management
34
Semantic web under the hood: CTMF
§ The CTMF is a system designed to allow a distributed
“editing of ontologies”.
§ Users can request new “terms” via a web interface or
within an application.
§ “Content owners” can “assess” whether the requested
terms are new concepts or synonyms (or errors!) and
update the ontologies.
§ Resolution is asynchronous and the term request is non-
blocking for applications
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
35. CTMF web application (new request form)
35
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
36. CTMF: integration in applications
36
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
37. CTMF: term status page and discussion
37
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
38. CTMF: process (use of temporary ID)
38
Semantic web under the hood: CTMF
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
39. Under the hood
39
Semantic web under the hood: CTMF
§ Basic principle of the Semantic Web: identity comes first.
• What “people can talk about” is give an URI, and information is built around it.
§ The CTMF adopts the same approach:
• a “term” request is in itself identifying a concept: what the requestor had in mind at the time of the
request. We give this idea a URI (the term status page)
• Information is built around this request (clarification).
• A “content owner” can assess whether the concept is identical to something already in metastore
(most likely what was requested for was a synonym), or whether a new concept should be
introduced.
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
40. Semantic Web @Novartis
40
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
41. Semantic Web @Novartis
41
Topics
§ Semantic Web @Novartis
• Context (Where in Novartis)
• Semantic Web in production
• Semantic Web in research
- Query federation
- Visualization/interaction
- Other projects
• Semantic Web under the hood
§ Semantic Web in “Real Life”: open questions
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
42. Semantic Web in Real Life: Open questions
42
Data trumps everything
§ If there is a choice between better technology to access
data, and better data, the latter prevails.
• Corollary: interest is often where there is little data, especially in the
public domain.
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
43. Semantic Web in Real Life: Open questions
43
Industry (or real life) is big
§ Areas that look nearby on paper may be very distant
organization-wise.
• Bench-to-bedside data integration
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
44. Semantic Web in Real Life: Open questions
44
You don’t know the semantics of your data
§ The semantic expressiveness of RDF may be too much
for what is represented in your data.
• You don’t always make your data
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
45. Semantic Web in Real Life: Open questions
45
Is data integration really a shared goal ?
§ Not all stakeholders have interest in “opening” their data.
• When does a data producer gain in making its data more
accessible ?
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
46. Semantic Web in Real Life: Open questions
46
Many people are doing SemWeb without knowing it
§ “My project is not based on RDF, it is based on a graph
with properties from controlled vocabularies.”
• Why not RDF?
- Too academic
- Need something that works
- URIs are too long
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use
47. § Therese Vachon
§ Pierre Parisot
§ Katia Vella
§ Frederic Sutter
§ Daniel Cronenberger
§ Fatma Oezdemir-Zaech
§ Anosha Siripala
§ Olivier Kreim
§ Gilles Hubert
§ Laurentiu Stanculescu
§ Marc Lieber
§ Martin Rezk (OnTop)
§ Andrea Splendiani
47
Semantic Web technologies
experiences in Novartis
| Semantic Web technologies: experiences in Novartis| Andrea Splendiani | 2nd December 2015| Technology | Public use