Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Stephen Buxton, Senior Director, Product Management, MarkLogic
When to Use Documents vs Triples

SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
NoSQL
KEY-
VALUE
COLUMN
DOCUMENT
GRAPH
PROPERTY
GRAPHS
TRIPLE
STORES
NoSQL

NoSQL
KEY-
VALUE
COLUMN
DOCUMENT
GRAPH
PROPERTY
GRAPHS
TRIPLE
STORES
NoSQL
A Database That
Integrates Data Better,
Faster, with Less Cost

Leading Organizations Using MarkLogic Semantics
 Intelligent Search
 Semantic Metadata Hub
 Dynamic Semantic Publishing
 Recommendation Engines
 Compliance
Entertainment
Company
Pharmaceutical
Company

Relational Databases
Table
PROs
 Natural way to model strictly-tabular data
 Mature technology with rich eco-system
Ph_ID Cus_ID Type Number
4001 2001 Home 555-6789
4002 2001 Cell 555-7238
4003 2002 Home 137-2859
4004 2003 Home 189-2212
4005 2003 Cell 199-2312
4006 2003 Office 444-1898
4007 2003 Main 199-2312
CONs
 Real-world entities require complex modeling up-front
 Brittle: changes require adding columns and tables
 No inherent semantics

Document Databases
Document
PROs
 Natural way to model entities
 Schema is flexible within/across documents
 Self-describing
 Query and Search immediately
 Handles hierarchical data
 Handles repeating elements
 Handles sparse data
 Joins can be denormalized away
{ “ID” : 1001 ,
“Fname” : “Paul” ,
“Lname” : “Jackson” ,
“Phone” : “415-555-1212” ,
“SSN” : “123-45-6789” ,
“Addr” : “123 Avenue Road” ,
“City” : “San Francisco” ,
“State” : “CA” ,
“Zip” : 94111
}

Graph Databases – Triple Stores
Graph
PROs
 A triple defines a relationship
 Entity->Entity
 Entity->Concept
 Concept->Value
 Triples come together to form Graphs
 Graphs can be easily shared, combined
 Graphs can be traversed
 Can infer new triples using definitions (rules)

Hybrid Documents + Triple Store
Hybrid
PROs
 PROs of a Document Store
 PROs of a Triple Store
 Combination: Documents with Semantic context
 Define the semantics of your data
 Richer search through context and facts
 Combination: Triples with Document context
 Arbitrary annotation of Triples
 Metadata, provenance, temporal, etc.
 Rich queries over rich data
 Fast, iterative development
 Query through a SQL lens where appropriate
{ “ID” : 1001 ,
“Fname” : “Paul” ,
“Lname” : “Jackson” ,
“Phone” : “415-555-1212” ,
“SSN” : “123-45-6789” ,
“Addr” : “123 Avenue Road” ,
“City” : “San Francisco” ,
“State” : “CA” ,
“Zip” : 94111
}

Sidetrack – Documents and Data
Title
Date
Body
Section
Section
Section
Article
Abstract
Paragraph
Paragraph
Paragraph
Type
Date
Parties
Seller
Buyer
Channel
Trade
Amount
PaidBy
Affiliation
Name

Triples Alongside Documents
User1
rank
Senior
Manager
Geneva
basedIn
Compliance
Officer
role
High risk personApp1
runsOn
Cluster1
TopSecret
requires
Database1
accesses
runs

Show me documents that mention App1 (or
its dependencies)
 … and "trades" or "markets"
 … that were valid yesterday afternoon
 … that were produced near HQ
 see Intelligent Search, Infobox
 Show me instructions to access App1
 App1 user guide
 How to get TopSecret access
 Scope of Database1
 see Dynamic Semantic Publishing
Triples Alongside Documents

Documents as Part of the Graph
User1
rank
Senior
Manager
Geneva
basedIn
Compliance
Officer
role
Hig
h
risk
pers
on
App1
runsOn
Cluster1
TopSecret
requires
Database1
accesses
runs
deep dive
license
user guide
tutorialMovie
order

 Document as opaque object
 Show me all the instructional
documents related to App1
 Search inside the document
 Show me all the applications that
managers use that expire in the
next 6 months
Documents as Part of the Graph

Triples About Documents – Extended Metadata
User1
rank
Senior
Manager
Geneva
basedIn
Compliance
Officer
role
Hig
h
risk
pers
on
App1
runsOn
Cluster1
TopSecret
requires
Database1
accesses
runs
order
format
JSON
English
Delaware
2016-12-31
jurisdiction
expires
Ts and Cs
language

 Triples are a natural way to represent
metadata about documents
 Extended because that metadata is
part of the graph
 Example: show me all orders for a
TopSecret app that will expire soon
Triples About Documents – Extended Metadata

 Data Integration: Dirty data
 Show me license documents from
vendor Acme
 Data Integration: Overlapping data
 Show me all assets from vendor
Acme
Triples About Documents

Triples as part of a document
 Embed triples in a document
 Triples and document have the same security, transactions, backup,
temporality, …
 Annotate triples in an entirely generic way (XML or JSON)
 Provenance
 Confidence
 Bitemporal
 Query across triples and documents in the same query
 SPARQL, restrict result to some source, confidence range, bitemporal range
 Search, restrict result to documents that contain some facts or metadata

Use Triples when you want to …
Data Documents Triples
 Store and query hundreds of billions of
facts and relationships
 Explore a graph
 Visualize a graph
 Leverage standards: data + query
 Infer new information
 better insights
 simpler data modeling
 Semantics of data
 integration

Use Documents when you want to …
 Easily store heterogeneous data
(transactional data, records, free-text)
 Schema-agnostic
 modeling freedom
 integrate without ETL*
 Search flexibility and specificity
 Fast app development

Document Store and Triple Store Combined
All the benefits of each, plus:
 Docs can contain triples, Triples can
annotate docs, Graphs can contain docs
– Faster data integration using semantics as
the glue
– Ideal model for reference data, metadata,
provenance
– Ability to run really powerful queries
 Massive speed and scale
 Simplicity of a single unified platform
 Enterprise features (security, HA/DR, ACID
transactions,…)

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

Similar to Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples (20)

More from semanticsconference

More from semanticsconference (20)

Recently uploaded

Recently uploaded (20)

Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Triples

Editor's Notes