SlideShare a Scribd company logo
1 of 13
Don’t Choose One Database Choose Them All!
Dave da Silva
Data Scientist, Capgemini UK
May 2017, Version 1.0
2Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Capgemini & Our Challenge
Big Data
Discovery
Service
Insights in a Box
Business Data
Lake
Assurance Scoring
Service
Insight Driven
Operations
Data Optimisation
Data Warp
 13,000 I&D Practitioners
Globally
 ~ 1,000 in UK
 Embed insight at heart of
business
 Large client have multiple
use cases
 No one size fits all
 How do we enable business
transformation?
 Bring all of their data
together
3Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Types of Database Technology – A Data Scientist View!
In-memory – fast ad-hoc querying and investigation
Graph – finding relationships between entities
Hadoop – accessing massive datasets
SQL – large complex queries
Lucene Based – complicated free text data discovery and retrieval
Database technologies
through the eyes of a
Data Scientist
4Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Challenges of Using Wrong Database for Wrong Use Case
Graph queries can
take many lines of
SQL and slow to run
Running free text queries
on SQL databases is often
complicated and again can
be slow
NoSQL databases often
good at finding data
based on key but
cannot provide multi-
field querying of an SQL
database
In-memory DB fast
so long as data can
be stored in
memory! What
about large
datasets?
5Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Why Not Just Use Multiple Databases?
Data Science Technologies
Database APIs
Data Layer
Decouple Data and
Analytics layers for
analytics tool flexibility.
Most DS languages
have good API support
Push as much data
processing down into
database.
Have several slave
databases.
NoSQL
versionGraph
version
Lucene
version
Master
6Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
SQL
Complex
Joins
Credit Score
Bad Apps @
Address
Ave Annual
Income
90 0 30,000
60 0 45,000
67 0 20,000
84 2 60,000
34 5 10,000
• Fast joins of multiple large tables
• Complex WHERE conditions on those joins
7Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
Graph Applicant
Links To Bad
Applicants
Jon 0
Jim 0
Joan 2
Janet 0
Jim Bob 1
• Graph queries less code & faster than same in SQL
• Out of the box graph queries
8Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
NoSQL
• Process large unstructured web logs
• Extract data and apply behaviour model
192.168.192.01 - - [22/Dec/2015:21:10:20 -0400] "GET
/ HTTP/1.1" 200 6394 www.mysite.com/app_page1
"-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.192.01 - - [22/Dec/2015:21:11:40 -0400] "GET
/app1/section2 HTTP/1.1" 200 807 www.mysite.com/app_page2
"http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.192.01 - - [22/Dec/2002:21:12:10 -0400] "GET
/app1/section2 HTTP/1.1" 200 3500 www.mysite.com/app_page2
"http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
Applicant
Behaviour
Normalcy
Jon 0.99
Jim 0.95
Joan 0.87
Janet 0.56
Jim Bob 0.82
9Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example
Text
I am the best applicant ever, I
promise, so no need to waste
your time looking at my previous
five convictions.
• Advanced text search
• Convert text into structured data
Applicant Conviction Fraud Promise
Jon 0 0 0
Jim 0 0 1
Joan 0 1 0
Janet 1 1 1
Jim Bob 0 0 0
10Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
A Car Insurance Fraud Example – Bringing It Together
Applicant Credit Score
Bad Apps @
Address
Ave Annual
Income
Behaviour
Normalcy
Conviction Fraud Promise
Jon 90 0 30,000 0.99 0 0 0
Jim 60 0 45,000 0.95 0 0 1
Joan 67 0 20,000 0.87 0 1 0
Janet 84 2 60,000 0.56 1 1 1
Jim Bob 34 5 10,000 0.82 0 0 0
Much richer base for Insight Generation
Value of data substantially increased by using
different data base technologies.
11Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Benefits & Costs
Data Science potential
Insights potential
Increases IT spend
Governance
Productivity increases
Integration Complexity
Diverse Skillsets
12Copyright © Capgemini 2017. All Rights Reserved
Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017
Summary
 Multiple databases improve your ability to unlock business benefits from
data, also making happier
• Data Scientists
• End Users
 But there is a set-up and ongoing cost
• This may prohibit a multiple-DB approach for smaller projects
 Mitigate by working iteratively
1. Start by selecting one database that partially meets all your analysis needs
2. This should demonstrate value, enabling greater investment, but also highlight
bottlenecks
3. Now layer in additional DB to alleviate these bottlenecks as required
The information contained in this presentation is proprietary.
Copyright © 2016 Capgemini. All rights reserved.
Rightshore® is a trademark belonging to Capgemini.
www.capgemini.com/insights-data
To find out more visit us online at
About Capgemini
With more than 180,000 people in over 40 countries, Capgemini is a global
leader in consulting, technology and outsourcing services. The Group reported
2015 global revenues of EUR 11.9 billion. Together with its clients, Capgemini
creates and delivers business, technology and digital solutions that fit their
needs, enabling them to achieve innovation and competitiveness. A deeply
multicultural organization, Capgemini has developed its own way of working,
the Collaborative Business Experience™, and draws on Rightshore®, its
worldwide delivery model.
Learn more about us at www.capgemini.com.
About Capgemini Insights & Data
In a world of connected people and connected things, organizations
need a better view of what’s happening on the outside and a faster
view of what’s happening on the inside. Data must be the
foundation of every decision, but more data simply creates more
questions. With over 11,000 professionals across 40 countries,
Capgemini’s Insights & Data global practice can help you find the
answers, by combining technology excellence, data science and
business expertise. Together we leverage the new data landscape
to create deep insights where it matters most – at the point of
action.

More Related Content

What's hot

Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j
 

What's hot (20)

CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. Hunt
 
GraphTour 2020 - Opening Keynote
GraphTour 2020 - Opening KeynoteGraphTour 2020 - Opening Keynote
GraphTour 2020 - Opening Keynote
 
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
 
GraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyGraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right Technology
 
Operationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseOperationalized Analytics in the Enterprise
Operationalized Analytics in the Enterprise
 
GraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4jGraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4j
 
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
 
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
 
Climbing the AI Ladder
Climbing the AI LadderClimbing the AI Ladder
Climbing the AI Ladder
 
Neo4j Popular use case
Neo4j Popular use case Neo4j Popular use case
Neo4j Popular use case
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform Overview
 
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
 
GraphTour - Popular Use Cases
GraphTour - Popular Use CasesGraphTour - Popular Use Cases
GraphTour - Popular Use Cases
 
Opportunities derived by AI
Opportunities derived by AIOpportunities derived by AI
Opportunities derived by AI
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial Times
 

Similar to Don’t Choose One Database Choose Them All!, Capgemini

Similar to Don’t Choose One Database Choose Them All!, Capgemini (20)

BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 
Personalization Strategies Leveraging a Data Management Platform - with Bank ...
Personalization Strategies Leveraging a Data Management Platform - with Bank ...Personalization Strategies Leveraging a Data Management Platform - with Bank ...
Personalization Strategies Leveraging a Data Management Platform - with Bank ...
 
The Data & Analytics Journey – Why it’s more attainable for your company than...
The Data & Analytics Journey – Why it’s more attainable for your company than...The Data & Analytics Journey – Why it’s more attainable for your company than...
The Data & Analytics Journey – Why it’s more attainable for your company than...
 
Capgemini’s Data WARP: Accelerate your Journey to Insights
Capgemini’s Data WARP: Accelerate your Journey to InsightsCapgemini’s Data WARP: Accelerate your Journey to Insights
Capgemini’s Data WARP: Accelerate your Journey to Insights
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
MongoDB World 2019: From Transformation to Innovation: Lean-teams, Continuous...
MongoDB World 2019: From Transformation to Innovation: Lean-teams, Continuous...MongoDB World 2019: From Transformation to Innovation: Lean-teams, Continuous...
MongoDB World 2019: From Transformation to Innovation: Lean-teams, Continuous...
 
GPS: Starting Out with the AWS Partner Network - GPSBUS223 - re:Invent 2017
GPS: Starting Out with the AWS Partner Network - GPSBUS223 - re:Invent 2017GPS: Starting Out with the AWS Partner Network - GPSBUS223 - re:Invent 2017
GPS: Starting Out with the AWS Partner Network - GPSBUS223 - re:Invent 2017
 
GPSBUS223-Starting Out with the AWS Partner Network
GPSBUS223-Starting Out with the AWS Partner NetworkGPSBUS223-Starting Out with the AWS Partner Network
GPSBUS223-Starting Out with the AWS Partner Network
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
MWLUG2017 - The Data & Analytics Journey 2.0
MWLUG2017 - The Data & Analytics Journey 2.0MWLUG2017 - The Data & Analytics Journey 2.0
MWLUG2017 - The Data & Analytics Journey 2.0
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
How Deloitte Uses AI to Simplify Reporting and Increase Value
How Deloitte Uses AI to Simplify Reporting and Increase ValueHow Deloitte Uses AI to Simplify Reporting and Increase Value
How Deloitte Uses AI to Simplify Reporting and Increase Value
 
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
An Agile Approach to Cloud Adoption
An Agile Approach to Cloud AdoptionAn Agile Approach to Cloud Adoption
An Agile Approach to Cloud Adoption
 
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersGPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
 
Oracle Customer Engagement in a Digital World
Oracle Customer Engagement in a Digital WorldOracle Customer Engagement in a Digital World
Oracle Customer Engagement in a Digital World
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 

More from Neo4j

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Don’t Choose One Database Choose Them All!, Capgemini

  • 1. Don’t Choose One Database Choose Them All! Dave da Silva Data Scientist, Capgemini UK May 2017, Version 1.0
  • 2. 2Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Capgemini & Our Challenge Big Data Discovery Service Insights in a Box Business Data Lake Assurance Scoring Service Insight Driven Operations Data Optimisation Data Warp  13,000 I&D Practitioners Globally  ~ 1,000 in UK  Embed insight at heart of business  Large client have multiple use cases  No one size fits all  How do we enable business transformation?  Bring all of their data together
  • 3. 3Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Types of Database Technology – A Data Scientist View! In-memory – fast ad-hoc querying and investigation Graph – finding relationships between entities Hadoop – accessing massive datasets SQL – large complex queries Lucene Based – complicated free text data discovery and retrieval Database technologies through the eyes of a Data Scientist
  • 4. 4Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Challenges of Using Wrong Database for Wrong Use Case Graph queries can take many lines of SQL and slow to run Running free text queries on SQL databases is often complicated and again can be slow NoSQL databases often good at finding data based on key but cannot provide multi- field querying of an SQL database In-memory DB fast so long as data can be stored in memory! What about large datasets?
  • 5. 5Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Why Not Just Use Multiple Databases? Data Science Technologies Database APIs Data Layer Decouple Data and Analytics layers for analytics tool flexibility. Most DS languages have good API support Push as much data processing down into database. Have several slave databases. NoSQL versionGraph version Lucene version Master
  • 6. 6Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example SQL Complex Joins Credit Score Bad Apps @ Address Ave Annual Income 90 0 30,000 60 0 45,000 67 0 20,000 84 2 60,000 34 5 10,000 • Fast joins of multiple large tables • Complex WHERE conditions on those joins
  • 7. 7Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example Graph Applicant Links To Bad Applicants Jon 0 Jim 0 Joan 2 Janet 0 Jim Bob 1 • Graph queries less code & faster than same in SQL • Out of the box graph queries
  • 8. 8Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example NoSQL • Process large unstructured web logs • Extract data and apply behaviour model 192.168.192.01 - - [22/Dec/2015:21:10:20 -0400] "GET / HTTP/1.1" 200 6394 www.mysite.com/app_page1 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-" 192.168.192.01 - - [22/Dec/2015:21:11:40 -0400] "GET /app1/section2 HTTP/1.1" 200 807 www.mysite.com/app_page2 "http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-" 192.168.192.01 - - [22/Dec/2002:21:12:10 -0400] "GET /app1/section2 HTTP/1.1" 200 3500 www.mysite.com/app_page2 "http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-" Applicant Behaviour Normalcy Jon 0.99 Jim 0.95 Joan 0.87 Janet 0.56 Jim Bob 0.82
  • 9. 9Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example Text I am the best applicant ever, I promise, so no need to waste your time looking at my previous five convictions. • Advanced text search • Convert text into structured data Applicant Conviction Fraud Promise Jon 0 0 0 Jim 0 0 1 Joan 0 1 0 Janet 1 1 1 Jim Bob 0 0 0
  • 10. 10Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example – Bringing It Together Applicant Credit Score Bad Apps @ Address Ave Annual Income Behaviour Normalcy Conviction Fraud Promise Jon 90 0 30,000 0.99 0 0 0 Jim 60 0 45,000 0.95 0 0 1 Joan 67 0 20,000 0.87 0 1 0 Janet 84 2 60,000 0.56 1 1 1 Jim Bob 34 5 10,000 0.82 0 0 0 Much richer base for Insight Generation Value of data substantially increased by using different data base technologies.
  • 11. 11Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Benefits & Costs Data Science potential Insights potential Increases IT spend Governance Productivity increases Integration Complexity Diverse Skillsets
  • 12. 12Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Summary  Multiple databases improve your ability to unlock business benefits from data, also making happier • Data Scientists • End Users  But there is a set-up and ongoing cost • This may prohibit a multiple-DB approach for smaller projects  Mitigate by working iteratively 1. Start by selecting one database that partially meets all your analysis needs 2. This should demonstrate value, enabling greater investment, but also highlight bottlenecks 3. Now layer in additional DB to alleviate these bottlenecks as required
  • 13. The information contained in this presentation is proprietary. Copyright © 2016 Capgemini. All rights reserved. Rightshore® is a trademark belonging to Capgemini. www.capgemini.com/insights-data To find out more visit us online at About Capgemini With more than 180,000 people in over 40 countries, Capgemini is a global leader in consulting, technology and outsourcing services. The Group reported 2015 global revenues of EUR 11.9 billion. Together with its clients, Capgemini creates and delivers business, technology and digital solutions that fit their needs, enabling them to achieve innovation and competitiveness. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience™, and draws on Rightshore®, its worldwide delivery model. Learn more about us at www.capgemini.com. About Capgemini Insights & Data In a world of connected people and connected things, organizations need a better view of what’s happening on the outside and a faster view of what’s happening on the inside. Data must be the foundation of every decision, but more data simply creates more questions. With over 11,000 professionals across 40 countries, Capgemini’s Insights & Data global practice can help you find the answers, by combining technology excellence, data science and business expertise. Together we leverage the new data landscape to create deep insights where it matters most – at the point of action.

Editor's Notes

  1. By using different db technologies the value of the data has substantially increased.
  2. Happier and faster when you avoid: “I can’t do that analysis as it’ll take too long to run query” “Those libraries / functions aren’t available in that language / API” “It took me ages to build that RegEx in Cypher”