More Related Content Similar to Don’t Choose One Database Choose Them All!, Capgemini (20) Don’t Choose One Database Choose Them All!, Capgemini1. Don’t Choose One Database Choose Them All!
Dave da Silva
Data Scientist, Capgemini UK
May 2017, Version 1.0
2. 2Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Capgemini & Our Challenge
Big Data
Discovery
Service
Insights in a Box
Business Data
Lake
Assurance Scoring
Service
Insight Driven
Operations
Data Optimisation
Data Warp
13,000 I&D Practitioners
Globally
~ 1,000 in UK
Embed insight at heart of
business
Large client have multiple
use cases
No one size fits all
How do we enable business
transformation?
Bring all of their data
together
3. 3Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Types of Database Technology – A Data Scientist View!
In-memory – fast ad-hoc querying and investigation
Graph – finding relationships between entities
Hadoop – accessing massive datasets
SQL – large complex queries
Lucene Based – complicated free text data discovery and retrieval
Database technologies
through the eyes of a
Data Scientist
4. 4Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Challenges of Using Wrong Database for Wrong Use Case
Graph queries can
take many lines of
SQL and slow to run
Running free text queries
on SQL databases is often
complicated and again can
be slow
NoSQL databases often
good at finding data
based on key but
cannot provide multi-
field querying of an SQL
database
In-memory DB fast
so long as data can
be stored in
memory! What
about large
datasets?
5. 5Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Why Not Just Use Multiple Databases?
Data Science Technologies
Database APIs
Data Layer
Decouple Data and
Analytics layers for
analytics tool flexibility.
Most DS languages
have good API support
Push as much data
processing down into
database.
Have several slave
databases.
NoSQL
versionGraph
version
Lucene
version
Master
6. 6Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
SQL
Complex
Joins
Credit Score
Bad Apps @
Address
Ave Annual
Income
90 0 30,000
60 0 45,000
67 0 20,000
84 2 60,000
34 5 10,000
• Fast joins of multiple large tables
• Complex WHERE conditions on those joins
7. 7Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
Graph Applicant
Links To Bad
Applicants
Jon 0
Jim 0
Joan 2
Janet 0
Jim Bob 1
• Graph queries less code & faster than same in SQL
• Out of the box graph queries
8. 8Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
NoSQL
• Process large unstructured web logs
• Extract data and apply behaviour model
192.168.192.01 - - [22/Dec/2015:21:10:20 -0400] "GET
/ HTTP/1.1" 200 6394 www.mysite.com/app_page1
"-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.192.01 - - [22/Dec/2015:21:11:40 -0400] "GET
/app1/section2 HTTP/1.1" 200 807 www.mysite.com/app_page2
"http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.192.01 - - [22/Dec/2002:21:12:10 -0400] "GET
/app1/section2 HTTP/1.1" 200 3500 www.mysite.com/app_page2
"http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
Applicant
Behaviour
Normalcy
Jon 0.99
Jim 0.95
Joan 0.87
Janet 0.56
Jim Bob 0.82
9. 9Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
Text
I am the best applicant ever, I
promise, so no need to waste
your time looking at my previous
five convictions.
• Advanced text search
• Convert text into structured data
Applicant Conviction Fraud Promise
Jon 0 0 0
Jim 0 0 1
Joan 0 1 0
Janet 1 1 1
Jim Bob 0 0 0
10. 10Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example – Bringing It Together
Applicant Credit Score
Bad Apps @
Address
Ave Annual
Income
Behaviour
Normalcy
Conviction Fraud Promise
Jon 90 0 30,000 0.99 0 0 0
Jim 60 0 45,000 0.95 0 0 1
Joan 67 0 20,000 0.87 0 1 0
Janet 84 2 60,000 0.56 1 1 1
Jim Bob 34 5 10,000 0.82 0 0 0
Much richer base for Insight Generation
Value of data substantially increased by using
different data base technologies.
11. 11Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Benefits & Costs
Data Science potential
Insights potential
Increases IT spend
Governance
Productivity increases
Integration Complexity
Diverse Skillsets
12. 12Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Summary
Multiple databases improve your ability to unlock business benefits from
data, also making happier
• Data Scientists
• End Users
But there is a set-up and ongoing cost
• This may prohibit a multiple-DB approach for smaller projects
Mitigate by working iteratively
1. Start by selecting one database that partially meets all your analysis needs
2. This should demonstrate value, enabling greater investment, but also highlight
bottlenecks
3. Now layer in additional DB to alleviate these bottlenecks as required
13. The information contained in this presentation is proprietary.
Copyright © 2016 Capgemini. All rights reserved.
Rightshore® is a trademark belonging to Capgemini.
www.capgemini.com/insights-data
To find out more visit us online at
About Capgemini
With more than 180,000 people in over 40 countries, Capgemini is a global
leader in consulting, technology and outsourcing services. The Group reported
2015 global revenues of EUR 11.9 billion. Together with its clients, Capgemini
creates and delivers business, technology and digital solutions that fit their
needs, enabling them to achieve innovation and competitiveness. A deeply
multicultural organization, Capgemini has developed its own way of working,
the Collaborative Business Experience™, and draws on Rightshore®, its
worldwide delivery model.
Learn more about us at www.capgemini.com.
About Capgemini Insights & Data
In a world of connected people and connected things, organizations
need a better view of what’s happening on the outside and a faster
view of what’s happening on the inside. Data must be the
foundation of every decision, but more data simply creates more
questions. With over 11,000 professionals across 40 countries,
Capgemini’s Insights & Data global practice can help you find the
answers, by combining technology excellence, data science and
business expertise. Together we leverage the new data landscape
to create deep insights where it matters most – at the point of
action.
Editor's Notes By using different db technologies the value of the data has substantially increased. Happier and faster when you avoid:
“I can’t do that analysis as it’ll take too long to run query”
“Those libraries / functions aren’t available in that language / API”
“It took me ages to build that RegEx in Cypher”