SlideShare a Scribd company logo
1 of 51
Download to read offline
Jose Hernandez
Director of Business Intelligence
Dunn Solutions Group
Agenda
Introduction
What is a Data Warehouse?
Dimensional Modeling
Full-service IT consulting firm
Founded in 1988
Offices
 Chicago
 Minneapolis
 Raleigh
 Bangalore, India
Overview
Chicago Minneapolis Raleigh Bangalore
Practice Areas
Business Intelligence
DI + EIM/Quality
Budgeting & Planning
End-to-End BI
Data Warehouse
Dashboards
Map Intelligence
Managed Services
Predictive Analytics
Training
Open-Enrollment
On-Site + Custom
Jumpstart/Mentoring
Packaged Solutions
Legal Dashboard
Visible Visitors
Application Development
Web Design
E-Commerce
Custom App Dev
Mobile App Dev
Portals
Selected Clients
City of Chicago
Partnerships
Introduction: The New Series
Focus on Data Warehousing
Tool Agnostic
Kimball Focus
Introduction: This Presentation
 We start with 50,000 foot view
 Assuming you are new to data
warehousing
 Keep it fundamental
 Kimball point of view
 What, Why and How
Data Warehouse Back to Basics
Why Build a Data Warehouse
 We have mountains of data in this
company but we can’t access it!
 We need to slice and dice the data
in a variety of ways.
 You have to make it easy for
business people to get at the data.
 Two people present the same
business metrics and the numbers
are different!
 We want people to make decisions
based on facts.
Why Build a Data Warehouse
 Operational systems are not
integrated
• IDs and Codes not conformed
• Inconsistent format
• Data quality issues
 Operational systems generally
not ideal for reporting
• Lack history
• Complex data structure
• Moving target
• Poor query performance
Goals of a Data Warehouse
 Make an organization’s data easy
to access
 Present the organization’s data
consistently
 Be adaptive and resilient to
change
 Trusted and secure
 Serve as the foundation for
informed decisions
 Business community must accept
the warehouse if it is to be
successful
Agenda
Introduction
What is a Data Warehouse?
Dimensional Modeling
What is a Data Warehouse?
• A simple question
- does not seem
to have simple
answer!
• Many definitions
• Two that you
should consider
• Ralph Kimball
• Bill Inmon
What is a Data Warehouse
“A data warehouse is a system that extracts,
cleans, conforms and delivers source data into a
dimensional data store and then supports and
implements querying and analysis for the purpose
of decision making...”
…“It’s the place where users go to get their data”
Ralph Kimball
What a Data Warehouse is NOT
It is NOT…
 A product
 A language
 A project
 A data model
 A copy of your transactional systems
*Note: There are bundled products that come close to covering many aspects of
a data warehouse!
Jose
The BI StackSourceSystems
Legacy
mainframe
systems
Production
databases
Transactional
systems
Subscription data
…
ETLSystem Extract
Clean
Conform
Deliver
ETL
Management
Services
ETL Data Stores
PresentationServer
Data Marts
Stars &
Snowflakes
Conformed
Dimensions
Conformed Facts
BIApplications
Reporting
systems
Ad hoc systems
Dashboards
Analytics
systems
Back Room Front Room
Metadata
Infrastructure and Security
SourceSystems
Legacy
mainframe
systems
Production
databases
Transactional
systems
Subscription data
…
ETLSystem Extract
Clean
Conform
Deliver
ETL
Management
Services
ETL Data Stores
PresentationServer
Data Marts
Stars &
Snowflakes
Conformed
Dimensions
Conformed Facts
BIApplications
Reporting
systems
Ad hoc systems
Dashboards
Analytics
systems
Our Focus Today
Back Room Front Room
Metadata
Infrastructure and Security
Agenda
Introduction
What is a Data Warehouse?
Dimensional Modeling
Dimensional Modeling
Dimensional modeling
is a technique which
allows you to design a
database that meets
the goals of a data
warehouse.
Steps
 Identify Business Process
 Identify Grain (level of
detail)
 Identify Dimensions
 Identify Facts
 Build Star
Identify the Business Process
Requirements + Data Availability
Determine discrete business
processes (e.g.)
 Sales
 Inventory
 Student Registration
Identify the Grain
 Grain is the level of detail
stored in the data
warehouse.
• Do we store all products, or
just product categories?
• Each month, week, day,
hour?
• Has a big impact on size of
database.
 Can be a different grain
for each fact
 Typically implement the
lowest possible
dimension grain:
• not because users need
individual records
• because they want to
aggregate in many different
ways
Identify Dimensions
 Selection Criteria (where Gender=“Female”)
 Row Headers (“College Name”, “Region”, …)
 How do you want to slice the data?
 What are the artifacts of your business?
 Time Dimension - Always present
 Conforming Dimensions – very important aspect
of a successful data warehouse!*
*More on this later
Identify the Facts
Facts are the storage place for the measurements
we take...
Flavors of Facts
 Counts, Sums
 Additive
 Non-Additive
 Semi-Additive
 Fact-less Facts
 Transaction Grain
 Periodic Snapshot Grain
 Accumulating Snapshot
Grain
Dimensional Modeling - Stars
Why is it called a star?
Dimensional Modeling - Stars
Because it looks like a
star! (kinda)
 Fact Table in the center
 Dimension Tables
surrounding it
Dimensional Modeling - Constellation
Dimensional Modeling – Fact Tables
Fact Tables
 The center of the star
schema
 Based on a business
process
 Contains the business
process measures
 All measures in the fact are
of the same grain
 Fact tables are narrow but
deep
Dimensional Modeling – Dim Tables
Dimension Tables
 Business entities used to
slice up (determine the
grain) of the Facts
 Verbose and textual
 Should be conformed
across the organization
 Wide but shallow
 Always use surrogate
keys*
*exception for the Date Dimension
Star Schema – Physical Model
Date Dimension (my favorite dimension)
The Basic Date Dimension
Date Dimension
Special Date Dimension Attributes
 In another language
 Semester (First Semester, Second
Semester, …)
 High Season (Y/N), Low Season (Y/N)
 Season (Winter, Spring, Summer, Fall)
 Reporting Day (CurrDay, CurrDay-1D,
CurrDay-2d)
 Reporting Month (CurrMonth,
CurrMont-1M, …)
 Last Day of Quarter (Y/N)
 Last Day of Week (Y/N)
 American Holiday (Independence Day,
Christmas, …)
 Canadian Holiday
 And so many more!
Slowly Changing Dimensions
Known as SCDs
Dimensions change, how
do you handle this?
Three Basic Types
•Type 1
•Type 2
•Type 3
Hmmm.... these
are very
descriptive names.
Slowly Changing Dimensions (SCDs)
 Type 1:
• Do not preserve history
• Overwrite the record
 Type 2:
• Preserve all history
• Add a new record, indicate
current version
 Type 3:
• Preserve a point-in-time
history
• Add additional column(s)
Type 2
Slowly Changing Dimensions: Type 2
 SCD workhorse approach
 When a dimension
attribute changes, add a
new row and update
effective dates
 Old fact rows point to the
previous dimension row
 New fact rows point to the
current dimension row
 You can use a flag too
Other types of Dimensions
 Rapidly Changing
Dimensions
 Mini-dimensions
 Degenerate Dimension
 Junk Dimension
 Outrigger
Rapidly Changing Dimensions
AKA: Rapidly Changing
Monster Dimensions
 A dimension with
attributes that change
frequently is considered a
rapidly changing
dimension
 Produces very large
dimension tables
 Cannot be handled with
Type 2 approach (gets
too big)
Mini-dimensions
Technique for Rapidly
Changing Monster Dimension
 Use mini-dimensions
• Split up the rapidly changing
attributes to a mini-dimension
• Join the mini-dimension to the fact
table
 Use banded ranges
• Minimizes rows (no discrete values)
• A significant compromise
Customer Dimension
PK Customer Key
Customer ID
Name
Address
DoB
Date of First Order
-------
Age
Gender
Annual Income
Number of Children
Marital Status
Fact Table
FK1 Customer Key
More Foreign Keys
Facts...
New Customer Dimension
PK Customer Key
Customer ID
Name
Address
DoB
Date of First Order
Customer Demographics Dim
PK Customer Demo Key
Age Band
Gender
Annual Income Band
Num of Children Band
Marital Status
Fact Table 2
FK2 Customer Key
FK3 Customer Demo Key
More Foreign Keys
Facts...
Other Dimensions
 Rapidly Changing
Dimensions
 Mini-dimensions
 Degenerate Dimension
 Junk Dimension
 Outrigger
Other Dimensions
 Rapidly Changing
Dimensions
 Mini-dimensions
 Degenerate Dimension
 Junk Dimension
 Outrigger
A dimension key that
has no attributes.
A dimensional attribute
stored in the fact table
Examples:
 Transaction Number
 Invoice Number
 Line Item Number
 Ticket Number
Other Dimensions
 Rapidly Changing
Dimensions
 Mini-dimensions
 Degenerate Dimension
 Junk Dimension
 Outrigger
Do you have a drawer in
your kitchen that is a catch
all for stuff that you might
need...the junk drawer?
A collection of low
cardinality flags and
indicators that you might
need.
Examples: Payment Type,
Inbound/Outbound, Order
Type
Other Dimensions
 Rapidly Changing
Dimensions
 Mini-dimensions
 Degenerate Dimension
 Junk Dimension
 Outrigger
Exception, not the rule!
The start of snow-flaking
A secondary dimension table is
connected to a dimension table
(not via a fact).
Human Resource Fact
FK1 Employee Key
More FK
HR Fact 1
HR Fact 2
Employee Dimension
PK Employee Key
Employee Attributes
......
FK1 Emp Skill Key
Emplyee Skill Group (Outrigger)
PK Emp Skill Key
Emp Skill Description
Emp Skill Category
Just the Facts Tables
Home for the numerical measures
Typically Additive
Three types of Fact Tables
 Transactional Grain
 Periodic Snapshot Grain
 Accumulating Snapshot Grain
Comparison of Fact Table Types
Characteristic Transaction Grain Periodic
Snapshot Grain
Accumulating
Snapshot Grain
Time period
represented
Point in time Regular,
predictable
intervals
Indeterminate time
span, typically
short-lived
Grain One row per
transaction event
One row per period One row per life
Fact table loads Insert Insert Insert and update
Fact row updates Not revisited Not revisited Revisited
whenever activity
Date dimensions Transaction date End of period date Multiple dates for
standard
milestones
Facts Transaction activity Performance for
predefined time
interval
Performance over
finite lifetime
What makes it Enterprise?
Conformed Dimensions & Facts
 Common fields across the enterprise domains
 Common definition across the enterprise domains
The Bus Architecture
 Allows traversing across business processes
 Promotes conformity
Conformed Dimensions / Bus Architecture
Dimensional Modeling Embellishments
Snowflaking
 Normalizing a dimension
table
 OLTP modeler tendency
 Not optimal for query
performance
Outriggers
 A dimension table is
referenced in another
dimension (i.e. hire date
example)
Bridges
 Many to many
relationships not resolved
in fact tables
 Sits between a dimension
and a fact
 Ragged and variable
depth hierarchies
Snowflaking
What is Snowflaking?
 Normalizing in a star
schema
 Should be avoided
• Adds complexity to
presentation layer
• SQL is more complex
*good for low cardinality fields
• Adds burden to database optimizers
• Very little space savings
• Impacts Bitmap indexes*
 Sometimes OK (Outriggers for low cardinality attributes)
Snowflaking
What is Snowflaking?
 Normalizing in a star
schema
 Should be avoided
• Adds complexity to
presentation layer
• SQL is more complex
*good for low cardinality fields
• Adds burden to database optimizers
• Very little space savings
• Impacts Bitmap indexes*
 Sometimes OK (Outriggers for low cardinality attributes)
DW Tips: Dimensional Modeling Myths
 Dimensional data warehouses
are appropriate for summary
level data only
 Dimensional models
presuppose the business
questions and therefore are
inflexible
 Dimensional models are
departmental
 Brining a new data source into
a dimensional data warehouse
breaks existing schemas and
requires new fact tables
 A good way to narrow the
scope and manage risk is to
focus on delivering the report
most often requested
 Dimensional models are fully
de-normalized
 Ralph Kimball invented the fact
and dimension terminology
Kimball University White Paper
DW Tips: 10 Essential Dim Mod Rules
 Load detailed atomic data into
dimensional structures
 Structure dimensional models
around business processes
 Ensure every fact table has a
date dimension table
 Ensure all facts in a Fact table
are the same grain
 Resolve many-to-many
relationships in fact tables
 Resolve many to one
relationships in dimension
tables
 Store report lables and filter
domain values in dimension
tables
 Dimension tables should use
surrogate keys
 Create conformed dimensions
to integrate data across the
enterprise
 Continuously balance
requirements and realities to
deliver a DW/BI solution that’s
accepted by business users
and that supports their
decision making
Kimball University Article, Margy Ross, InformationWeek
Thank You
Future Webinars
 The ETL Process
 Stars in Motion
 Columnar and In-memory
databases
 Modeling Business Process
• Retail Sales
• Inventory
• CRM
• HR

More Related Content

What's hot

Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??Abdul Aslam
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingDATAVERSITY
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overviewhonglee71
 
How to build a data dictionary
How to build a data dictionaryHow to build a data dictionary
How to build a data dictionaryPiotr Kononow
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schemaSayed Ahmed
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingnurmeen1
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Difference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationDifference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationAsim Saif
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 

What's hot (20)

Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
My tableau
My tableauMy tableau
My tableau
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data Modeling
 
Ppt
PptPpt
Ppt
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overview
 
How to build a data dictionary
How to build a data dictionaryHow to build a data dictionary
How to build a data dictionary
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
Data models
Data modelsData models
Data models
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
ETL_DWH_ Resume
ETL_DWH_ ResumeETL_DWH_ Resume
ETL_DWH_ Resume
 
Difference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellationDifference between snowflake schema and fact constellation
Difference between snowflake schema and fact constellation
 
Data integration
Data integrationData integration
Data integration
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 

Similar to Data Warehouse Back to Basics: Dimensional Modeling

Data Modelling PPT.ppt
Data Modelling PPT.pptData Modelling PPT.ppt
Data Modelling PPT.pptssuser66b82d
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligenceAhsan Kabir
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasirguest7c8e5f
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
 
Intro to datawarehouse dev 1.0
Intro to datawarehouse   dev 1.0Intro to datawarehouse   dev 1.0
Intro to datawarehouse dev 1.0Jannet Peetz
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Terry Bunio
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousingtheextraaedge
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2akitda
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouseganblues
 

Similar to Data Warehouse Back to Basics: Dimensional Modeling (20)

Data Modelling PPT.ppt
Data Modelling PPT.pptData Modelling PPT.ppt
Data Modelling PPT.ppt
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Business Intelligence: A Review
Business Intelligence: A ReviewBusiness Intelligence: A Review
Business Intelligence: A Review
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
Complete unit ii notes
Complete unit ii notesComplete unit ii notes
Complete unit ii notes
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Bi overview
Bi overviewBi overview
Bi overview
 
Business analysis
Business analysisBusiness analysis
Business analysis
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Intro to datawarehouse dev 1.0
Intro to datawarehouse   dev 1.0Intro to datawarehouse   dev 1.0
Intro to datawarehouse dev 1.0
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
 
3dw
3dw3dw
3dw
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
 
3dw
3dw3dw
3dw
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Data Warehouse Back to Basics: Dimensional Modeling

  • 1. Jose Hernandez Director of Business Intelligence Dunn Solutions Group
  • 2. Agenda Introduction What is a Data Warehouse? Dimensional Modeling
  • 3. Full-service IT consulting firm Founded in 1988 Offices  Chicago  Minneapolis  Raleigh  Bangalore, India Overview Chicago Minneapolis Raleigh Bangalore
  • 4. Practice Areas Business Intelligence DI + EIM/Quality Budgeting & Planning End-to-End BI Data Warehouse Dashboards Map Intelligence Managed Services Predictive Analytics Training Open-Enrollment On-Site + Custom Jumpstart/Mentoring Packaged Solutions Legal Dashboard Visible Visitors Application Development Web Design E-Commerce Custom App Dev Mobile App Dev Portals
  • 7. Introduction: The New Series Focus on Data Warehousing Tool Agnostic Kimball Focus
  • 8. Introduction: This Presentation  We start with 50,000 foot view  Assuming you are new to data warehousing  Keep it fundamental  Kimball point of view  What, Why and How Data Warehouse Back to Basics
  • 9. Why Build a Data Warehouse  We have mountains of data in this company but we can’t access it!  We need to slice and dice the data in a variety of ways.  You have to make it easy for business people to get at the data.  Two people present the same business metrics and the numbers are different!  We want people to make decisions based on facts.
  • 10. Why Build a Data Warehouse  Operational systems are not integrated • IDs and Codes not conformed • Inconsistent format • Data quality issues  Operational systems generally not ideal for reporting • Lack history • Complex data structure • Moving target • Poor query performance
  • 11. Goals of a Data Warehouse  Make an organization’s data easy to access  Present the organization’s data consistently  Be adaptive and resilient to change  Trusted and secure  Serve as the foundation for informed decisions  Business community must accept the warehouse if it is to be successful
  • 12. Agenda Introduction What is a Data Warehouse? Dimensional Modeling
  • 13. What is a Data Warehouse? • A simple question - does not seem to have simple answer! • Many definitions • Two that you should consider • Ralph Kimball • Bill Inmon
  • 14. What is a Data Warehouse “A data warehouse is a system that extracts, cleans, conforms and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making...” …“It’s the place where users go to get their data” Ralph Kimball
  • 15. What a Data Warehouse is NOT It is NOT…  A product  A language  A project  A data model  A copy of your transactional systems *Note: There are bundled products that come close to covering many aspects of a data warehouse! Jose
  • 16. The BI StackSourceSystems Legacy mainframe systems Production databases Transactional systems Subscription data … ETLSystem Extract Clean Conform Deliver ETL Management Services ETL Data Stores PresentationServer Data Marts Stars & Snowflakes Conformed Dimensions Conformed Facts BIApplications Reporting systems Ad hoc systems Dashboards Analytics systems Back Room Front Room Metadata Infrastructure and Security
  • 17. SourceSystems Legacy mainframe systems Production databases Transactional systems Subscription data … ETLSystem Extract Clean Conform Deliver ETL Management Services ETL Data Stores PresentationServer Data Marts Stars & Snowflakes Conformed Dimensions Conformed Facts BIApplications Reporting systems Ad hoc systems Dashboards Analytics systems Our Focus Today Back Room Front Room Metadata Infrastructure and Security
  • 18. Agenda Introduction What is a Data Warehouse? Dimensional Modeling
  • 19. Dimensional Modeling Dimensional modeling is a technique which allows you to design a database that meets the goals of a data warehouse. Steps  Identify Business Process  Identify Grain (level of detail)  Identify Dimensions  Identify Facts  Build Star
  • 20. Identify the Business Process Requirements + Data Availability Determine discrete business processes (e.g.)  Sales  Inventory  Student Registration
  • 21. Identify the Grain  Grain is the level of detail stored in the data warehouse. • Do we store all products, or just product categories? • Each month, week, day, hour? • Has a big impact on size of database.  Can be a different grain for each fact  Typically implement the lowest possible dimension grain: • not because users need individual records • because they want to aggregate in many different ways
  • 22. Identify Dimensions  Selection Criteria (where Gender=“Female”)  Row Headers (“College Name”, “Region”, …)  How do you want to slice the data?  What are the artifacts of your business?  Time Dimension - Always present  Conforming Dimensions – very important aspect of a successful data warehouse!* *More on this later
  • 23. Identify the Facts Facts are the storage place for the measurements we take... Flavors of Facts  Counts, Sums  Additive  Non-Additive  Semi-Additive  Fact-less Facts  Transaction Grain  Periodic Snapshot Grain  Accumulating Snapshot Grain
  • 24. Dimensional Modeling - Stars Why is it called a star?
  • 25. Dimensional Modeling - Stars Because it looks like a star! (kinda)  Fact Table in the center  Dimension Tables surrounding it
  • 26. Dimensional Modeling - Constellation
  • 27. Dimensional Modeling – Fact Tables Fact Tables  The center of the star schema  Based on a business process  Contains the business process measures  All measures in the fact are of the same grain  Fact tables are narrow but deep
  • 28. Dimensional Modeling – Dim Tables Dimension Tables  Business entities used to slice up (determine the grain) of the Facts  Verbose and textual  Should be conformed across the organization  Wide but shallow  Always use surrogate keys* *exception for the Date Dimension
  • 29. Star Schema – Physical Model
  • 30. Date Dimension (my favorite dimension) The Basic Date Dimension
  • 31. Date Dimension Special Date Dimension Attributes  In another language  Semester (First Semester, Second Semester, …)  High Season (Y/N), Low Season (Y/N)  Season (Winter, Spring, Summer, Fall)  Reporting Day (CurrDay, CurrDay-1D, CurrDay-2d)  Reporting Month (CurrMonth, CurrMont-1M, …)  Last Day of Quarter (Y/N)  Last Day of Week (Y/N)  American Holiday (Independence Day, Christmas, …)  Canadian Holiday  And so many more!
  • 32. Slowly Changing Dimensions Known as SCDs Dimensions change, how do you handle this? Three Basic Types •Type 1 •Type 2 •Type 3 Hmmm.... these are very descriptive names.
  • 33. Slowly Changing Dimensions (SCDs)  Type 1: • Do not preserve history • Overwrite the record  Type 2: • Preserve all history • Add a new record, indicate current version  Type 3: • Preserve a point-in-time history • Add additional column(s) Type 2
  • 34. Slowly Changing Dimensions: Type 2  SCD workhorse approach  When a dimension attribute changes, add a new row and update effective dates  Old fact rows point to the previous dimension row  New fact rows point to the current dimension row  You can use a flag too
  • 35. Other types of Dimensions  Rapidly Changing Dimensions  Mini-dimensions  Degenerate Dimension  Junk Dimension  Outrigger
  • 36. Rapidly Changing Dimensions AKA: Rapidly Changing Monster Dimensions  A dimension with attributes that change frequently is considered a rapidly changing dimension  Produces very large dimension tables  Cannot be handled with Type 2 approach (gets too big)
  • 37. Mini-dimensions Technique for Rapidly Changing Monster Dimension  Use mini-dimensions • Split up the rapidly changing attributes to a mini-dimension • Join the mini-dimension to the fact table  Use banded ranges • Minimizes rows (no discrete values) • A significant compromise Customer Dimension PK Customer Key Customer ID Name Address DoB Date of First Order ------- Age Gender Annual Income Number of Children Marital Status Fact Table FK1 Customer Key More Foreign Keys Facts... New Customer Dimension PK Customer Key Customer ID Name Address DoB Date of First Order Customer Demographics Dim PK Customer Demo Key Age Band Gender Annual Income Band Num of Children Band Marital Status Fact Table 2 FK2 Customer Key FK3 Customer Demo Key More Foreign Keys Facts...
  • 38. Other Dimensions  Rapidly Changing Dimensions  Mini-dimensions  Degenerate Dimension  Junk Dimension  Outrigger
  • 39. Other Dimensions  Rapidly Changing Dimensions  Mini-dimensions  Degenerate Dimension  Junk Dimension  Outrigger A dimension key that has no attributes. A dimensional attribute stored in the fact table Examples:  Transaction Number  Invoice Number  Line Item Number  Ticket Number
  • 40. Other Dimensions  Rapidly Changing Dimensions  Mini-dimensions  Degenerate Dimension  Junk Dimension  Outrigger Do you have a drawer in your kitchen that is a catch all for stuff that you might need...the junk drawer? A collection of low cardinality flags and indicators that you might need. Examples: Payment Type, Inbound/Outbound, Order Type
  • 41. Other Dimensions  Rapidly Changing Dimensions  Mini-dimensions  Degenerate Dimension  Junk Dimension  Outrigger Exception, not the rule! The start of snow-flaking A secondary dimension table is connected to a dimension table (not via a fact). Human Resource Fact FK1 Employee Key More FK HR Fact 1 HR Fact 2 Employee Dimension PK Employee Key Employee Attributes ...... FK1 Emp Skill Key Emplyee Skill Group (Outrigger) PK Emp Skill Key Emp Skill Description Emp Skill Category
  • 42. Just the Facts Tables Home for the numerical measures Typically Additive Three types of Fact Tables  Transactional Grain  Periodic Snapshot Grain  Accumulating Snapshot Grain
  • 43. Comparison of Fact Table Types Characteristic Transaction Grain Periodic Snapshot Grain Accumulating Snapshot Grain Time period represented Point in time Regular, predictable intervals Indeterminate time span, typically short-lived Grain One row per transaction event One row per period One row per life Fact table loads Insert Insert Insert and update Fact row updates Not revisited Not revisited Revisited whenever activity Date dimensions Transaction date End of period date Multiple dates for standard milestones Facts Transaction activity Performance for predefined time interval Performance over finite lifetime
  • 44. What makes it Enterprise? Conformed Dimensions & Facts  Common fields across the enterprise domains  Common definition across the enterprise domains The Bus Architecture  Allows traversing across business processes  Promotes conformity
  • 45. Conformed Dimensions / Bus Architecture
  • 46. Dimensional Modeling Embellishments Snowflaking  Normalizing a dimension table  OLTP modeler tendency  Not optimal for query performance Outriggers  A dimension table is referenced in another dimension (i.e. hire date example) Bridges  Many to many relationships not resolved in fact tables  Sits between a dimension and a fact  Ragged and variable depth hierarchies
  • 47. Snowflaking What is Snowflaking?  Normalizing in a star schema  Should be avoided • Adds complexity to presentation layer • SQL is more complex *good for low cardinality fields • Adds burden to database optimizers • Very little space savings • Impacts Bitmap indexes*  Sometimes OK (Outriggers for low cardinality attributes)
  • 48. Snowflaking What is Snowflaking?  Normalizing in a star schema  Should be avoided • Adds complexity to presentation layer • SQL is more complex *good for low cardinality fields • Adds burden to database optimizers • Very little space savings • Impacts Bitmap indexes*  Sometimes OK (Outriggers for low cardinality attributes)
  • 49. DW Tips: Dimensional Modeling Myths  Dimensional data warehouses are appropriate for summary level data only  Dimensional models presuppose the business questions and therefore are inflexible  Dimensional models are departmental  Brining a new data source into a dimensional data warehouse breaks existing schemas and requires new fact tables  A good way to narrow the scope and manage risk is to focus on delivering the report most often requested  Dimensional models are fully de-normalized  Ralph Kimball invented the fact and dimension terminology Kimball University White Paper
  • 50. DW Tips: 10 Essential Dim Mod Rules  Load detailed atomic data into dimensional structures  Structure dimensional models around business processes  Ensure every fact table has a date dimension table  Ensure all facts in a Fact table are the same grain  Resolve many-to-many relationships in fact tables  Resolve many to one relationships in dimension tables  Store report lables and filter domain values in dimension tables  Dimension tables should use surrogate keys  Create conformed dimensions to integrate data across the enterprise  Continuously balance requirements and realities to deliver a DW/BI solution that’s accepted by business users and that supports their decision making Kimball University Article, Margy Ross, InformationWeek
  • 51. Thank You Future Webinars  The ETL Process  Stars in Motion  Columnar and In-memory databases  Modeling Business Process • Retail Sales • Inventory • CRM • HR