Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Don’t Choose One Database Choose Them All!, Capgemini

GraphConnect Europe 2017
Dave da Silva, Capgemini UK

  • Login to see the comments

  • Be the first to like this

Don’t Choose One Database Choose Them All!, Capgemini

  1. 1. Don’t Choose One Database Choose Them All! Dave da Silva Data Scientist, Capgemini UK May 2017, Version 1.0
  2. 2. 2Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Capgemini & Our Challenge Big Data Discovery Service Insights in a Box Business Data Lake Assurance Scoring Service Insight Driven Operations Data Optimisation Data Warp  13,000 I&D Practitioners Globally  ~ 1,000 in UK  Embed insight at heart of business  Large client have multiple use cases  No one size fits all  How do we enable business transformation?  Bring all of their data together
  3. 3. 3Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Types of Database Technology – A Data Scientist View! In-memory – fast ad-hoc querying and investigation Graph – finding relationships between entities Hadoop – accessing massive datasets SQL – large complex queries Lucene Based – complicated free text data discovery and retrieval Database technologies through the eyes of a Data Scientist
  4. 4. 4Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Challenges of Using Wrong Database for Wrong Use Case Graph queries can take many lines of SQL and slow to run Running free text queries on SQL databases is often complicated and again can be slow NoSQL databases often good at finding data based on key but cannot provide multi- field querying of an SQL database In-memory DB fast so long as data can be stored in memory! What about large datasets?
  5. 5. 5Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Why Not Just Use Multiple Databases? Data Science Technologies Database APIs Data Layer Decouple Data and Analytics layers for analytics tool flexibility. Most DS languages have good API support Push as much data processing down into database. Have several slave databases. NoSQL versionGraph version Lucene version Master
  6. 6. 6Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example SQL Complex Joins Credit Score Bad Apps @ Address Ave Annual Income 90 0 30,000 60 0 45,000 67 0 20,000 84 2 60,000 34 5 10,000 • Fast joins of multiple large tables • Complex WHERE conditions on those joins
  7. 7. 7Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example Graph Applicant Links To Bad Applicants Jon 0 Jim 0 Joan 2 Janet 0 Jim Bob 1 • Graph queries less code & faster than same in SQL • Out of the box graph queries
  8. 8. 8Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example NoSQL • Process large unstructured web logs • Extract data and apply behaviour model 192.168.192.01 - - [22/Dec/2015:21:10:20 -0400] "GET / HTTP/1.1" 200 6394 www.mysite.com/app_page1 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-" 192.168.192.01 - - [22/Dec/2015:21:11:40 -0400] "GET /app1/section2 HTTP/1.1" 200 807 www.mysite.com/app_page2 "http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-" 192.168.192.01 - - [22/Dec/2002:21:12:10 -0400] "GET /app1/section2 HTTP/1.1" 200 3500 www.mysite.com/app_page2 "http://www.mysite.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-" Applicant Behaviour Normalcy Jon 0.99 Jim 0.95 Joan 0.87 Janet 0.56 Jim Bob 0.82
  9. 9. 9Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example Text I am the best applicant ever, I promise, so no need to waste your time looking at my previous five convictions. • Advanced text search • Convert text into structured data Applicant Conviction Fraud Promise Jon 0 0 0 Jim 0 0 1 Joan 0 1 0 Janet 1 1 1 Jim Bob 0 0 0
  10. 10. 10Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 A Car Insurance Fraud Example – Bringing It Together Applicant Credit Score Bad Apps @ Address Ave Annual Income Behaviour Normalcy Conviction Fraud Promise Jon 90 0 30,000 0.99 0 0 0 Jim 60 0 45,000 0.95 0 0 1 Joan 67 0 20,000 0.87 0 1 0 Janet 84 2 60,000 0.56 1 1 1 Jim Bob 34 5 10,000 0.82 0 0 0 Much richer base for Insight Generation Value of data substantially increased by using different data base technologies.
  11. 11. 11Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Benefits & Costs Data Science potential Insights potential Increases IT spend Governance Productivity increases Integration Complexity Diverse Skillsets
  12. 12. 12Copyright © Capgemini 2017. All Rights Reserved Don’t Choose One Database Choose Them All! – Version 1.0 | May 2017 Summary  Multiple databases improve your ability to unlock business benefits from data, also making happier • Data Scientists • End Users  But there is a set-up and ongoing cost • This may prohibit a multiple-DB approach for smaller projects  Mitigate by working iteratively 1. Start by selecting one database that partially meets all your analysis needs 2. This should demonstrate value, enabling greater investment, but also highlight bottlenecks 3. Now layer in additional DB to alleviate these bottlenecks as required
  13. 13. The information contained in this presentation is proprietary. Copyright © 2016 Capgemini. All rights reserved. Rightshore® is a trademark belonging to Capgemini. www.capgemini.com/insights-data To find out more visit us online at About Capgemini With more than 180,000 people in over 40 countries, Capgemini is a global leader in consulting, technology and outsourcing services. The Group reported 2015 global revenues of EUR 11.9 billion. Together with its clients, Capgemini creates and delivers business, technology and digital solutions that fit their needs, enabling them to achieve innovation and competitiveness. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience™, and draws on Rightshore®, its worldwide delivery model. Learn more about us at www.capgemini.com. About Capgemini Insights & Data In a world of connected people and connected things, organizations need a better view of what’s happening on the outside and a faster view of what’s happening on the inside. Data must be the foundation of every decision, but more data simply creates more questions. With over 11,000 professionals across 40 countries, Capgemini’s Insights & Data global practice can help you find the answers, by combining technology excellence, data science and business expertise. Together we leverage the new data landscape to create deep insights where it matters most – at the point of action.

×