SlideShare a Scribd company logo
1 of 105
Data Vault:What’s Next? © Dan Linstedt, 2011-2012 all rights reserved 1
Agenda Introduction – why are you here? Short Data Vault Review What’s Next?  Advanced Architecture… Defining Operational Data Warehousing Why is Data Vault a Good Fit? <BREAK> Fundamental Paradigm Shift Business Keys & Business Processes Technical Review Query Performance (PIT & Bridge) What wasn’t covered in this presentation… 2
A bit about me… 3 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://YouTube.com/LearnDataVault http://LearnDataVault.com Slides available: http://SlideShare.net Search: “Advanced Architecture Data Vault” Full profile on http://www.LinkedIn.com/dlinstedt
Why Are You Here? 4 Your Expectations? Your Questions? Your Background? Areas of Interest? Biggest question: What are the top 3 pains your current EDW / BI solution is experiencing?
Short Data Vault Review What is it and where did it come from? 5
Data Warehousing Timeline E.F. Codd invented relational modeling 1976 Dr Peter Chen Created E-R Diagramming 2010- DV Alive and Well Around the World 1990 – Dan Linstedt Begins R&D on Data Vault Modeling Chris Date and Hugh Darwen  Maintained and Refined Modeling Mid 70’s AC Nielsen  Popularized Dimension & Fact Terms 1970 2010 2000 1960 1980 1990 Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse” Early 70’s Bill Inmon Began Discussing Data Warehousing Mid 80’s Bill Inmon Popularizes Data Warehousing Mid 60’s Dimension & Fact Modeling  presented by General Mills and Dartmouth University 2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling Mid – Late 80’s Dr Kimball  Popularizes Star Schema
Data Vault Modeling… Took 10 years of Research and Design, including TESTING  to become  flexible, consistent, and scalable 7
What IS a Data Vault? (Business Definition) Data Vault Model Detail oriented Historical traceability Uniquely linked set of normalized tables Supports one or more functional areas of business 8 ,[object Object]
CMMI, Project Plan
Risk, Governance, Versioning
Peer Reviews, Release Cycles
Repeatable, Consistent, Optimized
Complete with Best Practices for BI/DWBusiness Keys Span  / Cross Lines of Business Sales Contracts Planning Delivery Finance Operations Procurement Functional Area
Supply Chain Analogy 9 Source  Systems Data Vault (EDW) Data Marts
What Does One Look Like? Records a history of the interaction Customer Product Sat Sat Sat Sat Sat Link Customer Product F(x) F(x) F(x) Sat Sat Sat Sat Order F(x) Sat Order Elements: ,[object Object]
Link
Satellite10 Hub = List of Unique Business Keys Link = List of Relationships, Associations Satellites = Descriptive Data
Colorized Perspective… Data Vault 3rd NF & Star Schema (separation) Business Keys Associations Details HUB Satellite The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links)  and both of these from the Detailsthat describe them and provide context (Satellites).   LINK Satellite (Colors Concept Originated By: Hans Hultgren) 11
A Quick Look at Methodology Issues Business Rule Processing, Lack of Agility, and  Future proofing your new solution 12
EDW Architecture: Generation 1 13 Enterprise BI Solution Sales (batch) Staging (EDW) Star Schemas Complex  Business  Rules #2 Finance Conformed Dimensions Junk Tables Helper Tables Factless Facts Staging + History Complex Business  Rules +Dependencies Contracts ,[object Object]
Cross-system dependencies
Source data filtering
In-process data manipulation
High risk of incorrect data aggregation
Larger system = increased impact
Often re-engineered at the SOURCE
History can be destroyed (completely re-computed),[object Object]
Re-Engineering Business Rules Data Flow (Mapping) Current Sources Sales Customer Source Join Finance Customer Transactions Customer Purchases IMPACT!! ** NEW SYSTEM** 15
Federated Star Schema Inhibiting Agility Data Mart 3 High Effort & Cost Data Mart 2 Data Mart 1 Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time RESULT: Business builds their own Data Marts! Low Maintenance Cycle Begins Time Start 16 The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort.  This increases delivery time, difficulty, and maintenance costs.
EDW Architecture: Generation 2 SOA Enterprise BI Solution Star Schemas (real-time) Sales (batch) EDW (Data Vault) (batch) Staging Error Marts Finance Contracts Complex Business  Rules Report Collections Unstructured Data FUNDAMENTAL GOALS ,[object Object]
Consistent
Fault-tolerant
Supports phased release
Scalable
AuditableThe business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW) 17
NO Re-Engineering Current Sources Data Vault Sales Stage Copy Hub Customer Customer Finance Stage Copy Link Transaction Customer Transactions Hub Acct Hub Product Customer Purchases Stage Copy NO IMPACT!!! NO RE-ENGINEERING! ** NEW SYSTEM** IMPACT!! 18
Progressive Agility and  Responsiveness of IT High Effort & Cost Low Maintenance Cycle Begins Time Start 19 Foundational Base Built New Functional Areas Added Initial DV Build Out Re-Engineering does NOT occur with a Data Vault Model.  This keeps costs down, and maintenance easy.  It also reduces complexity of the existing architecture.
Why is Data Vault a Good Fit? 20
What are the top businessobstacles in your data warehousetoday? 21
Poor Agility Inconsistent Answer Sets Needs Accountability Demands Auditability Desires IT Transparency Are you feeling Pinned Down? 22
What are the top technologyobstacles in yourdata warehousetoday? 23
Complex Systems Real-Time Data Arrival Unimaginable Data Growth Master Data Alignment Bad Data Quality Late Delivery/Over Budget Are your systems CRUMBLING? 24
Yugo Existing Solutions Worlds Worst Car Have lead you down a painful path… 25
Projects Cancelled & Restarted Re-engineering required to absorb new systems Complexity drives maintenance cost Sky high Disparate Silo Solutions provide inaccurate answers! Severe lack of Accountability 26
How can youovercomethese obstacles? There must be a better way… There IS a better way! 27
It’s Called the Data Vault Model andMethodology 28
What is it? It’s a simple Easy-to-use Plan To build your  valuable Data Warehouse! 29
What’s the Value? Painless Auditability  Understandable Standards Rapid Adaptability Simple Build-out Uncomplicated Design Effortless Scalability Pursue Your Goals! 30
Why Bother With Something New? Old Chinese proverb:  'Unless you change direction, you're apt to end up where you're headed.' 31
What Are the Issues? This is NOT what you want happening to your project! Business… Changes Frequently IT…. Needs Accountability Takes Too Long Demands Auditability Is Over-budget Has No Visibility Too Complex Wants More Control Can’t Sustain Growth THE GAP!! 32
What Are the Foundational Keys? Flexibility Scalability Productivity 33
Key: Flexibility Enabling rapid change on a massive scale without downstream impacts! 34
Key: Scalability Providing no foreseeable barrier to increased size and scope People, Process, & Architecture! 35
Key: Productivity Enabling low complexity systems with high value output at a rapid pace 36
How does it work? Bringing the Data Vault to Your Project 37
Key: Flexibility No Re-Engineering! Addingnew components to the EDW has NEAR ZERO impact to: ,[object Object]
Existing Data Model
Existing Reporting & BI Functions
Existing Source Systems
Existing Star Schemas and Data Marts38
Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! 39
Key: Scalability in Architecture Scalingis easy, its based on the following principles ,[object Object]
MPP Shared-Nothing Architecture
Scale Free Networks40
Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! 41
Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can: Scale your team when desired, at different points in the project! 42
Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault 43
Key: Productivity Increasing Productivity requires a reduction in complexity. The Data Vault Model simplifies all of the following: ,[object Object]
Real-Time Ingestion of Data
Data Modeling for the EDW
Enhancing and Adapting for Change to the Model
Ease of Monitoring, managing and optimizing processes44
Case in Point: Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.   These individuals generated: ,[object Object]
100% of the Staging Data Model
75% of the finished EDW data Model
75% of the star schema data model45
The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Our total cost?  $30k and 2 weeks! 46
Results? Changing the direction of the river takes less effort than stopping the flow of water 47
< BREAK TIME > 48
What’s Next? A look at what’s around the corner for Data Warehousing and Business Intelligence, believe me, it’s going to get interesting fast. 49
Operational Data Vault 50 Data Co-Location: Transactions & Transaction History Master Data & Master Data History Metadata & Metadata History External Data & External Data History Business Rules & Business Rule History Security / Access data & History Unstructured Data Ties & History Real-time Data Feeds DIRECTLY in to the data store Operational Applications  ON TOP of the warehouse!
Extreme Automation! 51 Automated Creation of Data Models: ,[object Object]
Data Vault Models
Star Schema Models
Cube Models
Excel Models (spreadsheets)
Data Mining Models (table structures)Automated Creation of ETL Processes: ,[object Object]
Data Vault (Data Warehouse Loads)
Star Schema Loads (80% solutions)
Cube Loads (80% solutions)
Excel Loads / Queries (80% solutions)
Data Mining Queries (80% solutions)Other Automated Components: ,[object Object]
Initial Master Data Population
Generated Testing Scriptshttp://www.jmorganmarketing.com/should-social-crm-be-automated/
Results of all of this? 52 EDW Will: become BACK OFFICE!! become SELF-RELIANT / SELF-HEALING adapt to new structures, new hardware, and new data automatically backup and remove old data Self-Reliance http://images.businessweek.com/ss/06/10/bestunder25/source/1.htm
How Long Will it Take? 53 My milestone predictions: 1 yr: Operational Data Vault 2 yrs: Beginning automation of business rules 3 yrs: Beginning dynamic restructuring in the DV 4 yrs: Oper Apps contain BI & metadata & Master data GUI’s in a single place 5 yrs: the “all-in-one” appliance, containing 75% of what we need at the firmware levels to do all these things http://thypolarlife.wordpress.com/2011/08/02/this-moment-in-time/
Why Should I Care? 54 ,[object Object]
Because this technology is the heartof Data Warehousing!
Because the future is now
Because it will happen with or without you…  You do want a job right?,[object Object]
Who’s Tooling Today? 56 WhereScape Quipu AnalytixDS RapidACE Nexus BI-Ready Centennium
What Does It Add Up To? 57 Pervasive BI
What’s the Key Ingredient? 58 Ubiquitous A.I.
Defining Operational Data Warehousing What is an ODW and How did we get here? 59
What IS An Operational DW? A raw, time-variant, integrated, non-volatile data warehouse, on top of which sits an operational application – “editing and changing data”. However, instead of updates and deletes in place, the data is “marked” deleted, and updates are turned in to Inserts, creating a delta audit trail along the way. Yes, it’s an operational application on top of the integrated data warehouse (or in this case, Data Vault model). 60

More Related Content

What's hot

How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault ModelingKent Graziano
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Michael Olschimke
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schemaSayed Ahmed
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 

What's hot (20)

How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
ETL
ETLETL
ETL
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 

Viewers also liked

Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyEmpowered Holdings, LLC
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Empowered Holdings, LLC
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementEmpowered Holdings, LLC
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieAndreas Buckenhofer
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Capgemini
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Daniel Upton
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultDaniel Upton
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overviewnetpeachteam
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesCGI
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 

Viewers also liked (20)

Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And Methodology
 
Data vault: What's Next
Data vault: What's NextData vault: What's Next
Data vault: What's Next
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data Management
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
 
Data vault
Data vaultData vault
Data vault
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.
 
Etl testing
Etl testingEtl testing
Etl testing
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master Deck
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
Lambdaarchitektur für BigData
Lambdaarchitektur für BigDataLambdaarchitektur für BigData
Lambdaarchitektur für BigData
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 

Similar to Data Vault: What's Next for Operational Data Warehousing

Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologytovetrivel
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star SchemaDATAVERSITY
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data WarehousingDavide Mauri
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - finalAndrew White
 
OpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case StudiesOpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case StudiesDatavail
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
Creating Your Data Governance Dashboard
Creating Your Data Governance DashboardCreating Your Data Governance Dashboard
Creating Your Data Governance DashboardTrillium Software
 
Webinar: BI Team Backlogged with Information Demands?
Webinar: BI Team Backlogged with Information Demands?Webinar: BI Team Backlogged with Information Demands?
Webinar: BI Team Backlogged with Information Demands?Balanced Insight, Inc.
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Denodo
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL DatabaseNuoDB
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseDaniel Upton
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Data Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourData Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourWhereScape
 
Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012sahatwilliams
 
Tdwi march 2015 presentation
Tdwi march 2015 presentationTdwi march 2015 presentation
Tdwi march 2015 presentationAlison Macfie
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 

Similar to Data Vault: What's Next for Operational Data Warehousing (20)

Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Agile Data Warehousing
Agile Data WarehousingAgile Data Warehousing
Agile Data Warehousing
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - final
 
OpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case StudiesOpenWorld: 4 Real-world Cloud Migration Case Studies
OpenWorld: 4 Real-world Cloud Migration Case Studies
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Creating Your Data Governance Dashboard
Creating Your Data Governance DashboardCreating Your Data Governance Dashboard
Creating Your Data Governance Dashboard
 
Webinar: BI Team Backlogged with Information Demands?
Webinar: BI Team Backlogged with Information Demands?Webinar: BI Team Backlogged with Information Demands?
Webinar: BI Team Backlogged with Information Demands?
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile Enterprise
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourData Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast Tour
 
Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012
 
Tdwi march 2015 presentation
Tdwi march 2015 presentationTdwi march 2015 presentation
Tdwi march 2015 presentation
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Data Vault: What's Next for Operational Data Warehousing

  • 1. Data Vault:What’s Next? © Dan Linstedt, 2011-2012 all rights reserved 1
  • 2. Agenda Introduction – why are you here? Short Data Vault Review What’s Next? Advanced Architecture… Defining Operational Data Warehousing Why is Data Vault a Good Fit? <BREAK> Fundamental Paradigm Shift Business Keys & Business Processes Technical Review Query Performance (PIT & Bridge) What wasn’t covered in this presentation… 2
  • 3. A bit about me… 3 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://YouTube.com/LearnDataVault http://LearnDataVault.com Slides available: http://SlideShare.net Search: “Advanced Architecture Data Vault” Full profile on http://www.LinkedIn.com/dlinstedt
  • 4. Why Are You Here? 4 Your Expectations? Your Questions? Your Background? Areas of Interest? Biggest question: What are the top 3 pains your current EDW / BI solution is experiencing?
  • 5. Short Data Vault Review What is it and where did it come from? 5
  • 6. Data Warehousing Timeline E.F. Codd invented relational modeling 1976 Dr Peter Chen Created E-R Diagramming 2010- DV Alive and Well Around the World 1990 – Dan Linstedt Begins R&D on Data Vault Modeling Chris Date and Hugh Darwen Maintained and Refined Modeling Mid 70’s AC Nielsen Popularized Dimension & Fact Terms 1970 2010 2000 1960 1980 1990 Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse” Early 70’s Bill Inmon Began Discussing Data Warehousing Mid 80’s Bill Inmon Popularizes Data Warehousing Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University 2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling Mid – Late 80’s Dr Kimball Popularizes Star Schema
  • 7. Data Vault Modeling… Took 10 years of Research and Design, including TESTING to become flexible, consistent, and scalable 7
  • 8.
  • 13. Complete with Best Practices for BI/DWBusiness Keys Span / Cross Lines of Business Sales Contracts Planning Delivery Finance Operations Procurement Functional Area
  • 14. Supply Chain Analogy 9 Source Systems Data Vault (EDW) Data Marts
  • 15.
  • 16. Link
  • 17. Satellite10 Hub = List of Unique Business Keys Link = List of Relationships, Associations Satellites = Descriptive Data
  • 18. Colorized Perspective… Data Vault 3rd NF & Star Schema (separation) Business Keys Associations Details HUB Satellite The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Detailsthat describe them and provide context (Satellites). LINK Satellite (Colors Concept Originated By: Hans Hultgren) 11
  • 19. A Quick Look at Methodology Issues Business Rule Processing, Lack of Agility, and Future proofing your new solution 12
  • 20.
  • 24. High risk of incorrect data aggregation
  • 25. Larger system = increased impact
  • 27.
  • 28. Re-Engineering Business Rules Data Flow (Mapping) Current Sources Sales Customer Source Join Finance Customer Transactions Customer Purchases IMPACT!! ** NEW SYSTEM** 15
  • 29. Federated Star Schema Inhibiting Agility Data Mart 3 High Effort & Cost Data Mart 2 Data Mart 1 Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time RESULT: Business builds their own Data Marts! Low Maintenance Cycle Begins Time Start 16 The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.
  • 30.
  • 35. AuditableThe business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW) 17
  • 36. NO Re-Engineering Current Sources Data Vault Sales Stage Copy Hub Customer Customer Finance Stage Copy Link Transaction Customer Transactions Hub Acct Hub Product Customer Purchases Stage Copy NO IMPACT!!! NO RE-ENGINEERING! ** NEW SYSTEM** IMPACT!! 18
  • 37. Progressive Agility and Responsiveness of IT High Effort & Cost Low Maintenance Cycle Begins Time Start 19 Foundational Base Built New Functional Areas Added Initial DV Build Out Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.
  • 38. Why is Data Vault a Good Fit? 20
  • 39. What are the top businessobstacles in your data warehousetoday? 21
  • 40. Poor Agility Inconsistent Answer Sets Needs Accountability Demands Auditability Desires IT Transparency Are you feeling Pinned Down? 22
  • 41. What are the top technologyobstacles in yourdata warehousetoday? 23
  • 42. Complex Systems Real-Time Data Arrival Unimaginable Data Growth Master Data Alignment Bad Data Quality Late Delivery/Over Budget Are your systems CRUMBLING? 24
  • 43. Yugo Existing Solutions Worlds Worst Car Have lead you down a painful path… 25
  • 44. Projects Cancelled & Restarted Re-engineering required to absorb new systems Complexity drives maintenance cost Sky high Disparate Silo Solutions provide inaccurate answers! Severe lack of Accountability 26
  • 45. How can youovercomethese obstacles? There must be a better way… There IS a better way! 27
  • 46. It’s Called the Data Vault Model andMethodology 28
  • 47. What is it? It’s a simple Easy-to-use Plan To build your valuable Data Warehouse! 29
  • 48. What’s the Value? Painless Auditability Understandable Standards Rapid Adaptability Simple Build-out Uncomplicated Design Effortless Scalability Pursue Your Goals! 30
  • 49. Why Bother With Something New? Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.' 31
  • 50. What Are the Issues? This is NOT what you want happening to your project! Business… Changes Frequently IT…. Needs Accountability Takes Too Long Demands Auditability Is Over-budget Has No Visibility Too Complex Wants More Control Can’t Sustain Growth THE GAP!! 32
  • 51. What Are the Foundational Keys? Flexibility Scalability Productivity 33
  • 52. Key: Flexibility Enabling rapid change on a massive scale without downstream impacts! 34
  • 53. Key: Scalability Providing no foreseeable barrier to increased size and scope People, Process, & Architecture! 35
  • 54. Key: Productivity Enabling low complexity systems with high value output at a rapid pace 36
  • 55. How does it work? Bringing the Data Vault to Your Project 37
  • 56.
  • 58. Existing Reporting & BI Functions
  • 60. Existing Star Schemas and Data Marts38
  • 61. Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! 39
  • 62.
  • 65. Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! 41
  • 66. Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can: Scale your team when desired, at different points in the project! 42
  • 67. Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault 43
  • 68.
  • 71. Enhancing and Adapting for Change to the Model
  • 72. Ease of Monitoring, managing and optimizing processes44
  • 73.
  • 74. 100% of the Staging Data Model
  • 75. 75% of the finished EDW data Model
  • 76. 75% of the star schema data model45
  • 77. The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Our total cost? $30k and 2 weeks! 46
  • 78. Results? Changing the direction of the river takes less effort than stopping the flow of water 47
  • 79. < BREAK TIME > 48
  • 80. What’s Next? A look at what’s around the corner for Data Warehousing and Business Intelligence, believe me, it’s going to get interesting fast. 49
  • 81. Operational Data Vault 50 Data Co-Location: Transactions & Transaction History Master Data & Master Data History Metadata & Metadata History External Data & External Data History Business Rules & Business Rule History Security / Access data & History Unstructured Data Ties & History Real-time Data Feeds DIRECTLY in to the data store Operational Applications ON TOP of the warehouse!
  • 82.
  • 87.
  • 88. Data Vault (Data Warehouse Loads)
  • 89. Star Schema Loads (80% solutions)
  • 90. Cube Loads (80% solutions)
  • 91. Excel Loads / Queries (80% solutions)
  • 92.
  • 93. Initial Master Data Population
  • 95. Results of all of this? 52 EDW Will: become BACK OFFICE!! become SELF-RELIANT / SELF-HEALING adapt to new structures, new hardware, and new data automatically backup and remove old data Self-Reliance http://images.businessweek.com/ss/06/10/bestunder25/source/1.htm
  • 96. How Long Will it Take? 53 My milestone predictions: 1 yr: Operational Data Vault 2 yrs: Beginning automation of business rules 3 yrs: Beginning dynamic restructuring in the DV 4 yrs: Oper Apps contain BI & metadata & Master data GUI’s in a single place 5 yrs: the “all-in-one” appliance, containing 75% of what we need at the firmware levels to do all these things http://thypolarlife.wordpress.com/2011/08/02/this-moment-in-time/
  • 97.
  • 98. Because this technology is the heartof Data Warehousing!
  • 100.
  • 101. Who’s Tooling Today? 56 WhereScape Quipu AnalytixDS RapidACE Nexus BI-Ready Centennium
  • 102. What Does It Add Up To? 57 Pervasive BI
  • 103. What’s the Key Ingredient? 58 Ubiquitous A.I.
  • 104. Defining Operational Data Warehousing What is an ODW and How did we get here? 59
  • 105. What IS An Operational DW? A raw, time-variant, integrated, non-volatile data warehouse, on top of which sits an operational application – “editing and changing data”. However, instead of updates and deletes in place, the data is “marked” deleted, and updates are turned in to Inserts, creating a delta audit trail along the way. Yes, it’s an operational application on top of the integrated data warehouse (or in this case, Data Vault model). 60
  • 106. Oper/Active DW Timeline 61 Real-Time & Oper BI Make the Scene (Users Want Direct Control & Up to the Minute Data) Teradata makes Real advances in Active DW “Appliances” begin appearing On-scene Data Warehouses Split From Operational Systems 2010 2000 1980 1990 2002 - Cendant-TRG Creates Worlds First Operational Data Vault Mid 90’s “Active” DW Becomes Important But has to wait for Technology To Catch Up!
  • 107. How Did We Get Here? 62 DDW ODW How do you dynamically adapt to business? Can you change what is happening? 7 6 Dynamic Alterations To Structure System Of Record Application Direct Edits to Data in the EDW Parts are © Teradata – Stephen Brobst, CTO
  • 108. ODV Overview 63 Applications (Direct edits) ODV Direct Inserts NO STAGING AREA Web-Services (Direct Feeds) Virtual Marts (Direct Access) Unstructured Feeds (Indirect Feeds) Metadata Rules (Direct Edits) Batch Loads (Direct Feeds)
  • 109.
  • 110.
  • 118.
  • 119. Why should I care? 66 TWO REASONS: CONVERGENCE SELF-SERVICE BI
  • 120. Under the Covers… 67 Presents Data to User in Conformed Screens Application 3. Present in GUI 4. Accept Ins, Upd, Del Data Access Control Layer 5. Perform Insert / Status change 2. Lock Business Key Rows 1. Read Data for Edit 6. Release Lock On Business Key Rows Sat 1 Operational Data Vault (ODW) Layer Sat 2 Hub Parts Link Hub Seller Hub Product Link Sat 3 Sat 4 Satellite Satellite
  • 121. Dropping by the Way-Side No… ETL BATCH DRIVEN PROCESSING “Synchronization” with the Source System missing source data No scalability problems No ODS needed! No “Master Data” system needed No Staging area needed 68
  • 122. Positives Data in the ODW can be governed Audit trail built in Delta’s only are stored NEW applications can be created to “automatically” generate Cubes/Star Schemas – these apps can be run by the users… Self-Service BI is enabled! Master data can be “marked, scored, stored” in the same place as the EDW 69
  • 123. Old Components Still There? Staging areas will exist as long as there is external data to load and integrate ODS areas may still exist as long as there are other legacy applications existing as source systems Master Data areas may still exist as long as the logic is not built directly in to the “operational DW application” 70
  • 124. Secure ODV Technical Layers 71 Visible Objects Inbound API Outbound API Services Authentication API Master Data API Component Groups Packaging API Pedigree API Security Key Mgr API Transaction API Aggregation API File Management Interface Kit API Busn. Intelligence API Notification Interface Vault Accessibility Subject Area API Scheduling Interface Local DB Interface Global DB Interface Common Data Object Area Security Interface (Encryption Too) Format Interface Persistence Cache DB Interface Logging Interface Database Interface Web Server Locally Based Persistent DB Cache for Joining Global DB Local DB1 Local DB2
  • 125. What are the benefits? Simplified Architecture Single Copy of the data! No “intermediate” IT work to do Users become empowered, with direct access to data sets Of course, using the Data Vault model, you gain ALL the benefits of the Data Vault (Scalability, flexibility, etc…) NOTE: Two or more “users” can actually EDIT different parts of the same record at the same time! Integrating external data basically makes it all available to the application immediately! NO NEED TO BUILD A SEPARATE EDW!! 72
  • 126. What are the drawbacks? No current “application” is using the Data Vault for operational data In other words, off-the-shelf apps in this area do not yet exist – you have to “build it” yourself Self-Service BI application technology is nascent or non-existent today Master Data & Metadata Applications are not currently available on top of Data Vault 73
  • 127. Technical Review Hub, Link, Satellite - Definitions 74
  • 128. HUB Data Examples HUB_PART_NUMBER HUB_CUST_ACCT SQN PART_NUM LOAD_DTS RECORD_SRC 1 MFG-25862 10-14-2000 MANUFACT 2 MFG*25266 10-14-2000 MANUFACT 3 *P25862 10-14-2000 PLANNING 4 MFG_25862 10-15-2000 DELIVERY 5 CN*25266 10-16-2000 DELIVERY SQN CUST_ACCT LOAD_DTS RECORD_SRC 1 ABC123 10-14-2000 SALES 2 ABC-123 10-14-2000 SALES 3 *ABC-123 10-14-2000 FINANCE 4 123,ABCD 10-15-2000 CONTRACTS 5 PEF-2956 10-16-2000 CONTRACTS Hub Structure SEQUENCE <BUSINESS KEY> {LAST SEEN DATE} <LOAD DATE> <RECORD SOURCE> } Unique Index } Optional 75
  • 129. Link Structures Link_Product_Supplier Link_Customer_Account_Employee LPS_SQN PRODUCT_SQN SUPPLIER_SQN LPS_LOAD_DTS LPS_REC_SOURCE LPS_ENCR_KEY LCAE_SQN CUSTOMER_SQN ACCOUNT_SQN EMPLOYEE_SQN LCAE_LOAD_DTS LCAE_REC_SOURCE Unique Index Link Structure SEQUENCE <HUB KEY SQN 1> <HUB KEY SQN 2> <HUB KEY SQN N> {LAST SEEN DATE} {CONFIDENCE} {STRENGTH} <LOAD DATE> <RECORD SOURCE> Unique Index } Optional Dynamic Link 76
  • 130. Satellites Split By Source System SAT_FINANCE_CUST SAT_CONTRACTS_CUST SAT_SALES_CUST PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> Contact Name Contact Email Contact Phone Number PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> First Name Last Name Guardian Full Name Co-Signer Full Name Phone Number Address City State/Province Zip Code PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> Name Phone Number Best time of day to reach Do Not Call Flag Satellite Structure PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> {user defined descriptive data} {or temporal based timelines} Primary Key 77
  • 131. Why do we build Links this way? 78
  • 132. History Teaches Us… If we model for ONE relationship in the EDW, we BREAK the others! 79 Portfolio The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model! 1 Today: M Customer Hub Portfolio X 1 Portfolio 5 years From now M M M Customer Hub Customer X Portfolio M 10 Years ago 1 This situation forces re-engineering of the model, load routines, and queries! Customer
  • 133. History Teaches Us… If we model with a LINK table, we can handle ALL the requirements! 80 Portfolio 1 Today: Hub Portfolio M Customer 1 M Portfolio LNK Cust-Port 5 years from now M M M Customer 1 Hub Customer Portfolio M 10 Years ago This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING! 1 Customer
  • 134. Applying the Data Vault to Global DW2.0 Manufacturing EDW in China Planning in Brazil Hub Hub Link Sat Sat Link Sat Sat Link Hub Link Hub Hub Sat Sat Sat Sat Sat Sat Sat Sat Base EDW Created in Corporate Financials in USA 81
  • 135. 82 Extreme Data Vault Partitioning
  • 136. Query Performance Point-in-time and Bridge Tables, overcoming query issues 83
  • 137. Purpose Of PIT & Bridge To reduce the number of joins, and to reduce the amount of data being queried for a given range of time. These two together, allow “direct table match”, as well as table elimination in the queries to occur. These tables are not necessary for the entire model; only when: Massive amounts of data are found Large numbers of Satellites surround a Hub or Link Large query across multiple Hubs & Links is necessary Real-time-data is flowing in, uninterrupted What are they? Snapshot tables – Specifically built for query speed 84
  • 138. PIT Table Architecture Satellite: Point In Time Primary Key PARENT SEQUENCE LOAD DATE {Satellite 1 Load Date} {Satellite 2 Load Date} {Satellite 3 Load Date} {…} {Satellite N Load Date} PIT Sat Sat 1 Sat 2 Hub Order PIT Sat Sat 3 Sat 1 Sat 4 Sat 2 Sat 1 Hub Customer Hub Product Sat 2 Sat 3 Link Line Item Sat 4 Satellite Line Item 85
  • 139. PIT Table Example SAT_CUST_CONTACT_CELL SAT_CUST_CONTACT_ADDR SAT_CUST_CONTACT_NAME SQN LOAD_DTSCELL 1 10-14-2000999-555-1212 1 10-15-2000 999-111-1234 1 10-16-2000 999-252-2834 1 10-17-2000 999.257-2837 1 10-18-2000 999-273-5555 SQN LOAD_DTSADDR 1 08-01-200026 Prospect 109-29-200026 Prosp St. 112-17-200028 November 1 01-01-200126 Prospect St SQN LOAD_DTSNAME 1 10-14-2000 Dan L 1 11-01-2000Dan Linedt 112-31-2000Dan Linstedt SQN LOAD_DTSSAT_NAME_LDTS SAT_CELL_LDTS SAT_ADDR_LDTS 1 08-01-2000NULL NULL 08-01-2000 1 09-01-2000 NULL NULL 08-01-2000 1 10-01-2000 NULL NULL 09-29-2000 1 11-01-200011-01-200010-18-200009-29-2000 1 12-01-200011-01-200010-18-200009-29-2000 1 01-01-200112-31-200010-18-200001-01-2001 Snapshot Date 86
  • 140. BridgeTable Architecture Satellite: Bridge Primary Key UNIQUE SEQUENCE LOAD DATE {Hub 1 Sequence #} {Hub 2 Sequence #} {Hub 3 Sequence #} {Link 1 Sequence #} {Link 2 Sequence #} {…} {Link N Sequence #} {Hub 1 Business Key} {Hub 2 Business Key} {…} {Hub N Business Key} Bridge Sat 1 Sat 2 Hub Parts Hub Seller Hub Product Link Link Sat 3 Sat 4 Satellite Satellite 87
  • 141. Bridge Table Data Example Bridge Table: Seller by Product by Part SQN LOAD_DTSSELL_SQN SELL_ID PROD_SQN PROD_NUM PART_SQN PART_NUM 1 08-01-200015 NY*1 2756 ABC-123-9K 525 JK*2*4 209-01-200016CO*242654DEF-847-0L 324 MN*5-2 310-01-200016CO*2482374PPA-252-2A 9938 DD*2*3 411-01-200024AZ*2525222UIF-525-88 7 UF*9*0 512-01-200099NM*581DAN-347-7F 16 KI*9-2 601-01-200199NM*581DAN-347-7F 24 DL*0-5 Snapshot Date 88
  • 142. What WASN’T Covered ETL Automation ETL Implementation SQL Query Logic Balanced MPP design Data Vault Modeling on Appliances Deep Dive on Structures (Hubs, Links, Satellites) What happens when you break the rules? Project management, Risk management & mitigation, methodology & approach Automation: Automated DV modeling, Automated ETL production Change Management Temporal Data Modeling Concerns… And so on… 89
  • 145. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
  • 146. More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..” Scott Ambler
  • 147. Where To Learn More The Technical Modeling Book: http://LearnDataVault.com The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email World wide User Group (Free)http://dvusergroup.com Certification Training: Contact me, or learn more at: http://GeneseeAcademy.com 94
  • 148. ODV – Case Study Operational Data Vault – IN THE REAL WORLD! 95
  • 149. E-Pedigree, Drug Track & Trace 96 Product Returns And Recalls Product Packaging CorpSite Server Secure Integration Services Corporate Serialization Vault Serialization Analytics Engine Packaging Orders Product Authenticator 3rd Party Logistics Distribution Warehouse Secure Integration Services E-Pedigree Management Manufacturer Product Packager Supply Chain
  • 150.
  • 153.
  • 155.
  • 156. Physical Data Separation in Logical “Database” units
  • 157. No single login has 100% data access.
  • 158.
  • 159. Changes to the data model ripple (larger impacts) as more customers are signed up.
  • 160. Each “support call” requires separate login to see the data set.Data Exchange/Sharing Through Code Only Web-Services and Flat File Delivery Customer Login Corp Login Customer Login Corp Login Employee Validation Admin Login Encrypt Key Encrypt Key Encrypt Key Mart 1 Mart 2 Mart 3 Mart 1 Mart 2 Mart 3 Tracking # Machine Info SQL View Layer SQL View Layer Global Data Vault Data Vault Manufacturer Shipper 9/27/2011
  • 161.
  • 163.
  • 164.
  • 165. Customer Owned Key (Dictated by Customer)
  • 166. Corporate Owned Key (Encrypts data internally)Corp Managed / Owned Copy Web Services Customer Copy Customer Login Corp Login +HTTPS Corp Encrypt Key Web Services Encrypted Flat Files Decryption Key + SFTP Customer Local Copy
  • 167. Security: ODV Web Services 102 Corp Managed / Owned Copy Web Browser Web Site / Server Java Script Or PHP Web Services Customer Login Corp Login Corporate Encrypt Key Corporate Owned Encryption Key Global DB
  • 168. Inflow/Outflow Applications 103 Customer Corporation Corporation Customer Source Machine Encrypts Data Using Customer Key Corp Decrypts Data According to Customer Key Corp Re-Encrypts Data According to Internal Key For Specific Customer Corp Decrypts Data According to Internal Key For Specific Customer Corp Encrypts Data According to Customer Key Customer Decrypts Data According to Customer Key DB DB Transmit Encrypted Data over HTTPS Transmit Encrypted Data over HTTPS Web Service Sender Web Service Collector
  • 169. ODV: Secure File Request 104 Corporation Customer ** Note: Each Customer DB is encrypted via an internally owned Corp key which is unique to EACH customer. Customer Decrypts File According to Customer Key Transmit Encrypted Data over FTPS Encrypted File
  • 170. ODV: Front-End Ping Request 105 Corporation Customer Corp One-Way Hash of key Number To Execute Ping Web-Based PING Validation DBMS Unencrypted Data Transfer Login / Auth

Editor's Notes

  1. Before we begin exploring how the Data Vault can help you, or even defining what a Data Vault is, we need to first understand some of the business problems that may be causing you heartburn on a daily basis.
  2. Everything from poor agility to a lack of IT Transparency plague todays’ data warehouses. I can’t begin to tell you how much pain these businesses are suffering as a result of these problems. Inconsistent Answer Sets, Lack of accountability, inadequate auditablitiy all play a part in data warehouses that are currently on the brink of falling apart.But it’s not just business issues, there are technical ones to cope with as well.
  3. There are always technology obstacles that we face in any data warehousing project. So the question is: what kinds of problems have you seen in your journey? Do they haunt you today?
  4. Complexity drives high cost, resulting in unnecessary late delivery schedules and unsustainable business logic in the integration channels.Real-time data is flooding our data warehouses, has your architecture fallen down on the job?Unstructured data and legal requirements for auditability are bringing huge data volumes.Master Data Alignment is missing from our data warehouses, as they are split in disparate systems all over the world.Bad data quality is covered up through the transformation layers on the way IN to your EDW.Data warehouses grow so large and become so difficult to maintain that IT teams are often delivering late, and beyond original costs.The foundations of your data warehouse are probably crumbling under sheer weight and pressure.
  5. Disparate data marts, unmatched answer sets, geographical problems, and worse…Projects are under fire from a number of areas. Let’s take a look at what happenswhen a data warehouse project reaches the brick wall head-on, at 90 miles an hour.
  6. I think this says it all…. Projects cancelled and restarted, Re-Engineering required to absorb changes, high complexity making it difficult to upgrade, change, and keep up at the speed of business. Disparate silo solutions screaming for consolidation, and of course – a lack of accountability on BOTH sides of the fence…All signs of an ailing BI solution on the brink of being shut-down.
  7. We have got to keep focus on the prize. Business still wants a BI systemBacked by an enterprise EDW.IT still wants a manageable system that will grow and change without major re-engineering.There is a better way, and I can help you with it.
  8. The Data Vault model is really just another name for “Common foundational architecture and design”.It’s based on 10 years of Research and design work, followed by10 years of implementation best practices.It is architected to help you solve the problems!
  9. Put quite simply: It’s an easy-to-use architecture and plan, a guide-bookFor building a repeatable, consistent, and scalable data warehouse system.So just what is the value of the Data Vault?
  10. The Data Vault model and methodology provide:Painless AuditabilityUnderstandable standardsRapid AdaptabilitySimple Build-outUncomplicated DesignAnd Effortless ScalabilityGo after your goals, build a wildly successful data warehouse just like I have.
  11. Beginning: 5 advanced ETLBy the 1st month, they 5 advanced, and 15 basic/introBy the 6th month, they 5 advanced, but 50 basicBy the end of the 8th month they went to production with 10 MF sourcesAnd their team size was: 12 people (5 advanced, 7 basic – for support).
  12. You’re not the first, nor will you be the last one to use it.Some of the worlds biggest companies are implementing Data Vaults.From Diamler Motors to Lockheed Martin, to the Department of Defense.JPMorgan and Chase used the Data Vault model to merge 3 companies in 90 days!