More Related Content More from Cambridge Semantics (20) Enterprise Analytics at Scale Using Graph Database1. ©2015 Cambridge Semantics Inc. All rights reserved.
Enterprise Analytics at Scale Using
Graph Database
Cambridge Semantics Contacts:
Marty Loughlin
Vice President, Financial Services
marty@cambridgesemantics.com
(o) 617.855.9565
Barry Zane
Vice President, Engineering
barry@cambridgesemantics.com
2. ©2016 Cambridge Semantics Inc. All rights reserved. Page 2
Introduction to Cambridge Semantics Inc (CSI)
The Anzo Smart Data Platform is used to create data analytics and
management solutions with diverse data from varied sources
Company:
Founded in 2007 by senior team from IBM’s Advanced Internet Technology Group
Privately Funded
Select customers:
Software:
Market leading Anzo software suite is built on open Semantic Web standards
Currently 3rd generation of the product in production use
Business Intelligence /
Analytics Solutions
2013(Winner) 2014(Finalist)
2015(Finalist)
2014 Innovation Showcase
3. ©2016 Cambridge Semantics Inc. All rights reserved. Page 3
The State of the Data Lake
• Great way to rapidly and inexpensively assemble large volumes of unfiltered data
• However, challenging to identify and link data
• Getting value requires harmonization of meaning across diverse sources and making it
accessible to business users
• And, you also need good data governance, quality, lineage and security
Leading organizations are looking to Semantic Models and
Tools to address these challenges
Source: 2015 EDM Council Benchmarking Study
4. ©2016 Cambridge Semantics Inc. All rights reserved. Page 4
The Anzo Smart Data Platform
• An agile, end-to-end, platform for tackling
diverse information challenges
• Link and contextualize information for search,
analytics, visualization and collaboration
5. ©2015 Cambridge Semantics Inc. All rights reserved.
On Tuesday,
Drugs123 Inc.
announced phase
1 development of
their newest sleep
aid therapeutic,
Narcoleptol.
On Tuesday,
Drugs123 Inc.
announced phase
1 development of
their newest sleep
aid therapeutic,
Narcoleptol.
Linking and Contextualizing Information
Company Website Mkt Cap
Bio Corp biocorp.com $2.2B
Drugs123 drugs123.com $930M
… … …
Competitive Intelligence database
Company
Drugs123
930,000,000
name
market
cap
drugs123.com
website
Web news
Drug
Development
1
development
stage
activity
Drug
developing
Insomnia
indication
Narcoleptol
brand
name
CRM System
Note
about
3/7/2012
Initial safety
signals are …
when
note
7. ©2015 Cambridge Semantics Inc. All rights reserved.
• Business understandable models describe
data and transformations
• Searchable Catalog of Data Sources, Maps
& Metadata
• Query model for data lineage, impact
analysis, data quality
Anzo Smart Data Lake
Anzo Smart Data Integration Server
Anzo Enterprise Server
• Standardized reports and self-service data
discovery for diverse use cases
• Data curation, annotation and application
workflow
Anzo Graph Query Engine
• Load, transform and harmonize diverse
internal and external data sources
• Link to business meaning (e.g., FIBO)
Data Store
Third party
BI/Analytics
Data ProvidersStructured Sources Unstructured
Sources
8. ©2016 Cambridge Semantics Inc. All rights reserved. Page 8
Graph Query Engine
• Designed for “big crunch” analytic queries
• Big data
• Complex queries in interactive time
• Not the “data of record”
9. ©2016 Cambridge Semantics Inc. All rights reserved. Page 9
Cloud Deployment
• Runs on standard cloud configs
– Robust servers & network
• On-premise for sensitive data
– Ordinary rack-mount server
farm
– Ethernet or Infiniband
10. ©2016 Cambridge Semantics Inc. All rights reserved. Page 10
Single Database Instance Across Many Nodes
• Behaves just like a single-node database, but faster
• More speed and more data by clustering
• Parallel, not distributed computing
• Just as easy to install 20 nodes as 2 nodes
• Management via SPARQL
11. ©2016 Cambridge Semantics Inc. All rights reserved. Page 11
Data Lake subsets
• The lake is the “database”
• Multiple GCE instances
• Short term instances
12. ©2016 Cambridge Semantics Inc. All rights reserved. Page 12
Best Practices
• All nodes the same high quality servers
• Fast private interconnect
• High speed access to data sources