3. Full-service IT consulting firm
Founded in 1988
Offices
Chicago
Minneapolis
Raleigh
Bangalore, India
Overview
Chicago Minneapolis Raleigh Bangalore
4. Practice Areas
Business Intelligence
DI + EIM/Quality
Budgeting & Planning
End-to-End BI
Data Warehouse
Dashboards
Map Intelligence
Managed Services
Predictive Analytics
Training
Open-Enrollment
On-Site + Custom
Jumpstart/Mentoring
Packaged Solutions
Legal Dashboard
Visible Visitors
Application Development
Web Design
E-Commerce
Custom App Dev
Mobile App Dev
Portals
8. Introduction: This Presentation
We start with 50,000 foot view
Assuming you are new to data
warehousing
Keep it fundamental
Kimball point of view
What, Why and How
Data Warehouse Back to Basics
9. Why Build a Data Warehouse
We have mountains of data in this
company but we can’t access it!
We need to slice and dice the data
in a variety of ways.
You have to make it easy for
business people to get at the data.
Two people present the same
business metrics and the numbers
are different!
We want people to make decisions
based on facts.
10. Why Build a Data Warehouse
Operational systems are not
integrated
• IDs and Codes not conformed
• Inconsistent format
• Data quality issues
Operational systems generally
not ideal for reporting
• Lack history
• Complex data structure
• Moving target
• Poor query performance
11. Goals of a Data Warehouse
Make an organization’s data easy
to access
Present the organization’s data
consistently
Be adaptive and resilient to
change
Trusted and secure
Serve as the foundation for
informed decisions
Business community must accept
the warehouse if it is to be
successful
13. What is a Data Warehouse?
• A simple question
- does not seem
to have simple
answer!
• Many definitions
• Two that you
should consider
• Ralph Kimball
• Bill Inmon
14. What is a Data Warehouse
“A data warehouse is a system that extracts,
cleans, conforms and delivers source data into a
dimensional data store and then supports and
implements querying and analysis for the purpose
of decision making...”
…“It’s the place where users go to get their data”
Ralph Kimball
15. What a Data Warehouse is NOT
It is NOT…
A product
A language
A project
A data model
A copy of your transactional systems
*Note: There are bundled products that come close to covering many aspects of
a data warehouse!
Jose
19. Dimensional Modeling
Dimensional modeling
is a technique which
allows you to design a
database that meets
the goals of a data
warehouse.
Steps
Identify Business Process
Identify Grain (level of
detail)
Identify Dimensions
Identify Facts
Build Star
20. Identify the Business Process
Requirements + Data Availability
Determine discrete business
processes (e.g.)
Sales
Inventory
Student Registration
21. Identify the Grain
Grain is the level of detail
stored in the data
warehouse.
• Do we store all products, or
just product categories?
• Each month, week, day,
hour?
• Has a big impact on size of
database.
Can be a different grain
for each fact
Typically implement the
lowest possible
dimension grain:
• not because users need
individual records
• because they want to
aggregate in many different
ways
22. Identify Dimensions
Selection Criteria (where Gender=“Female”)
Row Headers (“College Name”, “Region”, …)
How do you want to slice the data?
What are the artifacts of your business?
Time Dimension - Always present
Conforming Dimensions – very important aspect
of a successful data warehouse!*
*More on this later
23. Identify the Facts
Facts are the storage place for the measurements
we take...
Flavors of Facts
Counts, Sums
Additive
Non-Additive
Semi-Additive
Fact-less Facts
Transaction Grain
Periodic Snapshot Grain
Accumulating Snapshot
Grain
27. Dimensional Modeling – Fact Tables
Fact Tables
The center of the star
schema
Based on a business
process
Contains the business
process measures
All measures in the fact are
of the same grain
Fact tables are narrow but
deep
28. Dimensional Modeling – Dim Tables
Dimension Tables
Business entities used to
slice up (determine the
grain) of the Facts
Verbose and textual
Should be conformed
across the organization
Wide but shallow
Always use surrogate
keys*
*exception for the Date Dimension
31. Date Dimension
Special Date Dimension Attributes
In another language
Semester (First Semester, Second
Semester, …)
High Season (Y/N), Low Season (Y/N)
Season (Winter, Spring, Summer, Fall)
Reporting Day (CurrDay, CurrDay-1D,
CurrDay-2d)
Reporting Month (CurrMonth,
CurrMont-1M, …)
Last Day of Quarter (Y/N)
Last Day of Week (Y/N)
American Holiday (Independence Day,
Christmas, …)
Canadian Holiday
And so many more!
32. Slowly Changing Dimensions
Known as SCDs
Dimensions change, how
do you handle this?
Three Basic Types
•Type 1
•Type 2
•Type 3
Hmmm.... these
are very
descriptive names.
33. Slowly Changing Dimensions (SCDs)
Type 1:
• Do not preserve history
• Overwrite the record
Type 2:
• Preserve all history
• Add a new record, indicate
current version
Type 3:
• Preserve a point-in-time
history
• Add additional column(s)
Type 2
34. Slowly Changing Dimensions: Type 2
SCD workhorse approach
When a dimension
attribute changes, add a
new row and update
effective dates
Old fact rows point to the
previous dimension row
New fact rows point to the
current dimension row
You can use a flag too
35. Other types of Dimensions
Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension
Junk Dimension
Outrigger
36. Rapidly Changing Dimensions
AKA: Rapidly Changing
Monster Dimensions
A dimension with
attributes that change
frequently is considered a
rapidly changing
dimension
Produces very large
dimension tables
Cannot be handled with
Type 2 approach (gets
too big)
37. Mini-dimensions
Technique for Rapidly
Changing Monster Dimension
Use mini-dimensions
• Split up the rapidly changing
attributes to a mini-dimension
• Join the mini-dimension to the fact
table
Use banded ranges
• Minimizes rows (no discrete values)
• A significant compromise
Customer Dimension
PK Customer Key
Customer ID
Name
Address
DoB
Date of First Order
-------
Age
Gender
Annual Income
Number of Children
Marital Status
Fact Table
FK1 Customer Key
More Foreign Keys
Facts...
New Customer Dimension
PK Customer Key
Customer ID
Name
Address
DoB
Date of First Order
Customer Demographics Dim
PK Customer Demo Key
Age Band
Gender
Annual Income Band
Num of Children Band
Marital Status
Fact Table 2
FK2 Customer Key
FK3 Customer Demo Key
More Foreign Keys
Facts...
39. Other Dimensions
Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension
Junk Dimension
Outrigger
A dimension key that
has no attributes.
A dimensional attribute
stored in the fact table
Examples:
Transaction Number
Invoice Number
Line Item Number
Ticket Number
40. Other Dimensions
Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension
Junk Dimension
Outrigger
Do you have a drawer in
your kitchen that is a catch
all for stuff that you might
need...the junk drawer?
A collection of low
cardinality flags and
indicators that you might
need.
Examples: Payment Type,
Inbound/Outbound, Order
Type
41. Other Dimensions
Rapidly Changing
Dimensions
Mini-dimensions
Degenerate Dimension
Junk Dimension
Outrigger
Exception, not the rule!
The start of snow-flaking
A secondary dimension table is
connected to a dimension table
(not via a fact).
Human Resource Fact
FK1 Employee Key
More FK
HR Fact 1
HR Fact 2
Employee Dimension
PK Employee Key
Employee Attributes
......
FK1 Emp Skill Key
Emplyee Skill Group (Outrigger)
PK Emp Skill Key
Emp Skill Description
Emp Skill Category
42. Just the Facts Tables
Home for the numerical measures
Typically Additive
Three types of Fact Tables
Transactional Grain
Periodic Snapshot Grain
Accumulating Snapshot Grain
43. Comparison of Fact Table Types
Characteristic Transaction Grain Periodic
Snapshot Grain
Accumulating
Snapshot Grain
Time period
represented
Point in time Regular,
predictable
intervals
Indeterminate time
span, typically
short-lived
Grain One row per
transaction event
One row per period One row per life
Fact table loads Insert Insert Insert and update
Fact row updates Not revisited Not revisited Revisited
whenever activity
Date dimensions Transaction date End of period date Multiple dates for
standard
milestones
Facts Transaction activity Performance for
predefined time
interval
Performance over
finite lifetime
44. What makes it Enterprise?
Conformed Dimensions & Facts
Common fields across the enterprise domains
Common definition across the enterprise domains
The Bus Architecture
Allows traversing across business processes
Promotes conformity
46. Dimensional Modeling Embellishments
Snowflaking
Normalizing a dimension
table
OLTP modeler tendency
Not optimal for query
performance
Outriggers
A dimension table is
referenced in another
dimension (i.e. hire date
example)
Bridges
Many to many
relationships not resolved
in fact tables
Sits between a dimension
and a fact
Ragged and variable
depth hierarchies
47. Snowflaking
What is Snowflaking?
Normalizing in a star
schema
Should be avoided
• Adds complexity to
presentation layer
• SQL is more complex
*good for low cardinality fields
• Adds burden to database optimizers
• Very little space savings
• Impacts Bitmap indexes*
Sometimes OK (Outriggers for low cardinality attributes)
48. Snowflaking
What is Snowflaking?
Normalizing in a star
schema
Should be avoided
• Adds complexity to
presentation layer
• SQL is more complex
*good for low cardinality fields
• Adds burden to database optimizers
• Very little space savings
• Impacts Bitmap indexes*
Sometimes OK (Outriggers for low cardinality attributes)
49. DW Tips: Dimensional Modeling Myths
Dimensional data warehouses
are appropriate for summary
level data only
Dimensional models
presuppose the business
questions and therefore are
inflexible
Dimensional models are
departmental
Brining a new data source into
a dimensional data warehouse
breaks existing schemas and
requires new fact tables
A good way to narrow the
scope and manage risk is to
focus on delivering the report
most often requested
Dimensional models are fully
de-normalized
Ralph Kimball invented the fact
and dimension terminology
Kimball University White Paper
50. DW Tips: 10 Essential Dim Mod Rules
Load detailed atomic data into
dimensional structures
Structure dimensional models
around business processes
Ensure every fact table has a
date dimension table
Ensure all facts in a Fact table
are the same grain
Resolve many-to-many
relationships in fact tables
Resolve many to one
relationships in dimension
tables
Store report lables and filter
domain values in dimension
tables
Dimension tables should use
surrogate keys
Create conformed dimensions
to integrate data across the
enterprise
Continuously balance
requirements and realities to
deliver a DW/BI solution that’s
accepted by business users
and that supports their
decision making
Kimball University Article, Margy Ross, InformationWeek
51. Thank You
Future Webinars
The ETL Process
Stars in Motion
Columnar and In-memory
databases
Modeling Business Process
• Retail Sales
• Inventory
• CRM
• HR