Synopsis:
[Video link: http://www.youtube.com/watch?v=ZNrTxSU5IQ0 ]
Jim Stagnitto and John DiPietro of consulting firm a2c) will discuss Agile Data Warehouse Design - a step-by-step method for data warehousing / business intelligence (DW/BI) professionals to better collect and translate business intelligence requirements into successful dimensional data warehouse designs.
The method utilizes BEAM✲ (Business Event Analysis and Modeling) - an agile approach to dimensional data modeling that can be used throughout analysis and design to improve productivity and communication between DW designers and BI stakeholders. BEAM✲ builds upon the body of mature "best practice" dimensional DW design techniques, and collects "just enough" non-technical business process information from BI stakeholders to allow the modeler to slot their business needs directly and simply into proven DW design patterns.
BEAM✲ encourages DW/BI designers to move away from the keyboard and their entity relationship modeling tools and begin "white board" modeling interactively with BI stakeholders. With the right guidance, BI stakeholders can and should model their own BI data requirements, so that they can fully understand and govern what they will be able to report on and analyze.
The BEAM✲ method is fully described in
Agile Data Warehouse Design - a text co-written by Lawrence Corr and Jim Stagnitto.
About the speaker:
Jim Stagnitto Director of a2c Data Services Practice
Data Warehouse Architect: specializing in powerful designs that extract the maximum business benefit from Intelligence and Insight investments.
Master Data Management (MDM) and Customer Data Integration (CDI) strategist and architect.
Data Warehousing, Data Quality, and Data Integration thought-leader: co-author with Lawrence Corr of "Agile Data Warehouse Design", guest author of Ralph Kimball’s “Data Warehouse Designer” column, and contributing author to Ralph and Joe Caserta's latest book: “The DW ETL Toolkit”.
John DiPietro Chief Technology Officer at A2C IT Consulting
John DiPietro is the Chief Technology Officer for a2c. Mr. DiPietro is responsible
for setting the vision, strategy, delivery, and methodologies for a2c’s Solution
Practice Offerings for all national accounts. The a2c CTO brings with him an
expansive depth and breadth of specialized skills in his field.
Sponsor Note:
Thanks to:
Microsoft NERD for providing awesome venue for the event.
http://A2C.com IT Consulting for providing the food/drinks.
http://Cognizeus.com for providing book to give away as raffle.
2. Agenda
•
Introduction / a2c Overview
•
Modeling for End Users
•
Role of Dimensional Models in Big Data
•
Example: eCommerce
•
Structured Data: Sales
•
Semi-structured Data: Clickstream
•
Agile Dimensional Modeling Overview
•
Case Study Review
•
Q&A
!2
3. Introduction
•
a2c
•
•
Data Warehousing
•
Master Data Management
•
Closed Look Analytics and Visualization
•
•
Boutique EDM (Enterprise Data Management)
consultancy firm:
Data & Application Architecture
John DiPietro
•
•
Principal, Chief Technology Officer
Jim Stagnitto
•
Data Warehouse & MDM Architect
!3
5. Company Overview
•
Technology Solution Consultancy headquartered in Philadelphia with
regional offices in New York and Boston
•
Servicing Healthcare, Life Science, Tel-Com and Financial Services
industries with recent obtainment of our GSA schedule to pursue Federal
Government opportunities
•
Consultant base of over 2500 proven IT professionals throughout the North
East Region with a recruiting network which provides national coverage
•
Flexible approach to helping our clients with their initiatives
•
Project-based Solutions
•
Staff Augmentation
•
Managed Service Offerings – “On-Shore QA , Development & Application Support”
•
Executive & Professional Search
!5
6. Competitive Advantage
•
Founders of a2c were part of the fastest growing privately held IT consulting and staff
augmentation firm in the US from 1994-2002. Our Executive Management Team has over a
100 years collective experience and been responsible for delivering over a half-billion dollars
of IT Consulting and staff augmentation revenue from 1994 through to the present day.
•
a2c’s Recruiting Engine and Methodology is one of the best in the industry, capable of
producing quality results, on-demand for our clients
•
Resource Managers continually “Silo” disciplines with available candidates whom have
proven their abilities with us over the last 10 years
•
Our solutions organization is instrumentally involved during the screening and selection
process to ensure that candidates submitted to our clients are an ideal match
•
a2c’s Culture provides an ability to attract and retain the best talent in the industry and fosters
creativity, integrity, growth and teamwork
•
a2c provides our clients with an alternative solution to a “Big 4” consultancy at substantial
savings for projects that are between $500K and $5M due to our flexibility, agility and focus
!6
9. a2c Solutions Capabilities
•
Enterprise Data Management Practice helps clients manage their complete Information
Lifecycle from their On-line Transactional systems to their Data Warehousing, Enterprise
Reporting, Data Migration, Back-Up and Recovery Strategies (See Slide 7)
•
Business Architecture & Optimization Practice utilizes “Six Sigma Lean” methodologies to
analyze, re-engineer and automate our client’s business processes to leverage human
workflow and business rules engine technologies to create efficiencies and provide
business unit owners with the necessary metrics to continually improve performance
•
Program Management Office oversees all aspects of solutions planning and delivery
across client engagement teams and provides the methodology and frameworks which
are based on PMI® industry standards
•
Application Development & Managed Services Practice helps clients architect, implement
and deploy the latest Microsoft and Enterprise Java based applications which are built on
proven frameworks and architectures for the enterprise
•
a2c's SDLC Delivery Model is comprised of over 20 years collective best practices and
industry proven methodologies that allow our delivery teams to rapidly design, develop
and implement solutions. Our SDLC model has been designed to complement our project
management methodology, utilizing iterative development cycles that enable project
teams to provide consistently high quality, on-time deliverables, regardless of technology
platform
!9
11. Modeling for End Users
•
How to Design to Answer
Business Questions?
•
Think about how questions are articulated
•
And how the answers should be
deliveredIdentify a common question
framework
•
Design an architecture that
embraces and leverages this
common question framework
•
Utilize the best designs and
technologies to:
•
(a) derive the answers
•
(b) present them in compelling ways that
lead to the next interesting question!
!11
12. How Do We Ask Questions?
Who
What
When
“How do this quarter’s sales by sales rep of
electronic products that we promoted to retail
customers in the east compare with last year’s?
What
Who
Where
Why
!12
When
13. How Do We Ask Questions?
•
Events / Transactions
•
•
•
e.g. Sale
a immutable "fact" that occurs in a time and (typically a)
place
Interrogatives:
•
Who, What, When, Where, Why
•
Descriptive context that fully describes the event
•
a set of “dimensions" that describe events
!13
14. Dimensional Value Proposition
•
It makes sense to present answers to people using the same
taxonomy of events and interrogatives (aka: facts and dimensions
- dimensional structure) that they use when forming questions
•
Events are instances of processes :
•
It’s best to present information to people who will ask the system
questions in dimensional form
•
This is true regardless of the type of information being
interrogated, it’s source, or IT stuff (like database technologies
utilized)
•
It’s best to model this presentation layer based on the events (aka:
business processes) that underlie the questions
!14
16. Scenarios
•
A brief discussion of how and where
dimensional modeling and/or
databases fit within common and
emerging “big data” data
warehousing architectures
!16
17. Kimball Dimensional DW
Dimensional BI Semantic Layer
Dimensional Data Warehouse
Data Movement / Integration
Source Data
(Structured)
!17
18. Kimball with Big Data
Dimensional BI Semantic Layer
Dimensional Data Warehouse
Big Data
Capture
Big Data
Discovery
(e.g. HDFS)
(e.g. MR)
Data Movement / Integration Tier
Data Movement / Integration Tier
Source Data Tier
Source Data Tier
(Un/Semi-Structured)
(Structured)
!18
19. Corporate Information Factory (CIF)
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Corporate Information Factory 3NF DW
Data Movement / Integration
Source Data
(Structured)
!19
20. CIF with Big Data
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Big Data
Capture
Big Data
Discovery
(e.g. HDFS)
(e.g. MR)
Corporate Information
Factory 3NF DW
Data Movement / Integration Tier
Data Movement / Integration Tier
Source Data Tier
Source Data Tier
(Un/Semi-Structured)
(Structured)
!20
21. Data Vault
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Data Vault
Data Movement / Integration
Source Data
(Structured)
!21
22. Data Vault with Big Data
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Big Data
Capture
Big Data
Discovery
(e.g. HDFS)
(e.g. MR)
Data Vault
Data Movement / Integration Tier
Data Movement / Integration Tier
Source Data Tier
Source Data Tier
(Un/Semi-Structured)
(Structured)
!22
24. Common Framework
Dimensional BI Semantic Layer
Dimensional Tier
[Physical (Kimball) or Virtual (CIF or Data Vault)
Persistant Un/
Semi-Structured
Staging Area
Unstructured ->
Structured
Data Discovery
Processing
Persistent Structured Data
Repository
(not needed for Kimball)
Un/Semi-Structured Data
Movement
Structured Data Movement
Un/Semi-Structured Source Data
Structured Source Data
(Structured)
!24
Insight
Generation /
Data Mining
25. Common Framework
Dining Room
Readily Accessible to End Users
(and BI Developers)
Safe, Hospital Environment
Data Assets “Ready for Primetime”
Dimensionally Structured
Dimensional BI Semantic Layer
Dimensional Tier
[Physical (Kimball) or Virtual (CIF or Data Vault)
Persistant Un/
Semi-Structured
Staging Area
Unstructured ->
Structured Data
Discovery
Processing
Persistent Structured Data
Repository
Kitchen
(not needed for Kimball)
Un/Semi-Structured Data Movement
Structured Data Movement
Un/Semi-Structured Source Data
Structured Source Data
(Structured)
Clickstream Data
Off Limits to End Users
Data Professionals Only Please
Dangerous / Inhospitable Environment
Data Assets “Not Ready for Primetime”
Structured Variably For Data Processing
eCommerce Sale
eCommerce Example
!25
32. I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who…
–Rudyard Kipling
!32
!32
43. DW Architectures: A Brief History
Corporate Information
Factory
!
Data-Driven Analysis
Undisciplined Dimensional
!
Report-Driven Analysis
Dimensional Bus
Architecture
!
Process-Driven Analysis
44. 7Ws Dimensional Model
When
Who
Time
Customer
Day
How – Facts:
Employee
Month
Much
Third Party
Fiscal Period
Many
Organization
Often
£$€
Where
What
Location
Product
??
Why
Service
Store
Causal
Transactions
Ship To
Promotion
Hospital
Reason
Geographic
Weather
Competition
47. Tech Design Artifacts?
CALENDAR
PRODUCT
Date Key
Product Key
Date
Day
Day in Week
Day in Month
Day in Qtr
Day in Year
Month
Qtr
Year
Weekday Flag
Holiday Flag
Product Code
Product Description
Product Type
Brand
Subcategory
Category
SALES FACT
Date Key
Product Key
Store Key
Promotion Key
Quantity Sold
Revenue
Cost
Basket Count
STORE
PROMOTION
Store Key
Promotion Key
Store Code
Store Name
URL
Store Manager
Region
Country
Promotion Code
Promotion Name
Promotion Type
Discount Type
Ad Type
50. Waterfall BI/DW
Limited Stakeholder interaction
Analysis
Design
Development
This Year
BDUF
Stakeholder
Requirements
Input
Data
Model
Next Year
Test
Release
ETL
BI
DATA
VALUE?
51. Agile DW/BI Development
Stakeholder interaction
?
JEDUF
BI
Prototyping
ETL
Review
Release
This Year
Next Year
Iteration 1
VALUE?
Iteration 2
ETL
BI
Iteration 3Rev
ADM
VALUE
Iteration …
VALUE!
DATA
Iteration n
VALUE!
VALUE!
52. State of The
DW Field
Solid:
Dimensional Data Warehouse Design is Mature
Proven Design Patterns Exist for Common
Requirements
Hit or Miss:
Collecting Unambiguous and Thorough
Requirements
Slotting Requirements into Proven Design
Patterns
End-User Ownership and Validation
Too Often: Snatching Defeat from the Jaws of
Victory
!52
54. BEAM✲ Methodology
Structured, non-technical, collaborative working
conversation directly with BI Users
BEAM✲
BI User’s Business
Process, Organizational,
Hierarchical, and Data
Knowledge
• Focused Data Profiling
•
Data
Modeler
BI Stakeholders
• Logical and Physical
(Kimball-esque)
Dimensional Data Models
• Example data
• Detailed and Testable ETL
Specification
• Instantiated DW
Prototype
57. Agile Data Modeling Requirements
•
Techniques for encouraging interaction
•
Must use simple, inclusive notation and tools
•
Must be quick: hours rather than days – modelstorming
•
Balance ‘just in time’ (JIT) and ‘just enough design up
front’ (JEDUF) to reduce design rework
•
DW designers must embrace data model change, allow models
to evolve, avoid generic data models; need design patterns they
can trust to represent tomorrow’s BI requirements tomorrow
•
ETL and BI developers must embrace database change; need
tool support
!57
60. CALENDAR
PRODUCT
Date Key
Product Key
Date
Day
Day in Week
Day in Month
Day in Qtr
Day in Year
Month
Qtr
Year
Weekday Flag
Holiday Flag
Product Code
Product Description
Product Type
Brand
Subcategory
Category
SALES FACT
Date Key
Product Key
Store Key
Promotion Key
Quantity Sold
Revenue
Cost
Basket Count
STORE
PROMOTION
Store Key
Promotion Key
Store Code
Store Name
URL
Store Manager
Region
Country
Promotion Code
Promotion Name
Promotion Type
Discount Type
Ad Type
65. Collaborative / Conversational Design
Who does what?
“Customers buy products”
BEAM✲
Modeler
Subjects Verb Objects
BI Users
66. Design Using Natural Language
•
Verbs – Events – Relationships – Fact Tables
•
Nouns – Details – Entities – Dimensions
•
Main Clause – Subject-Verb-Object
•
Prepositions – connect additional details to the
main clause
•
Interrogatives – The 7Ws – Dimension Types
•
Business Vocabulary - no IT-Speak
!66
67. “Spreadsheet”-like Models
Event Table Name (filled in later)
Subject Column Name
Verb
Object Column Name
Interrogative
Details
Example Data (4-6
rows)
69. Capture Example Data
verb
on/at/every
SUBJECT
OBJECT
EVENT
DATE
[who]
[what]
[when]
[where]
[how many]
[why]
[how]
Typical
Typical/Popular
Typical
Typical
Typical/Average
Typical/Normal
Typical/Normal
Different
Different
Different
Different
Different
Different
Different
Repeat
Repeat
Repeat
Repeat
Repeat
Repeat
Repeat
Missing
Missing
Missing
Missing
Missing
Missing
Missing
Group
Multiple/Bundle
Old, Low
Old, Low Value
Oldest needed
Near
Min, Negative, 0
New, High
New, High
Most Recent, Future
Far
Max, Precision
Multi-Level
Engage business users
Clarify definitions / Conform Dimensions
Illustrate exceptions
Drive out uniqueness
“Show and tell”
Multiple Values
Exceptional
Exceptional
77. Model How Many Measures
•
Additive – can be summed up over any combination
of dimensions. No special rules
•
Non-additive – can not be summed over any
dimension e.g. unit price or temperature
•
•
•
Must be aggregated in other ways e.g. average, min, max
Degenerate Dimensions – transaction #, timestamps, flags
Semi-additive – can not be summed across at least
one dimension e.g. balances can not be summed
over time
!77
87. Recap
•
Collaborative and Agile
•
•
Data Sourcing
•
•
Data Modeling
Data Conformance
Requirements = Design
•
•
Slots directly into proven and mature dimensional data warehousing
design patterns
Validation through Prototyping
•
Semi-automated build of dimensional data warehouse
•
Perfect compliment to Agile BI Tools and Methods (e.g. Pentaho)
!87
88. If you have been affected by
any of the issues raised
in this presentation
89. !
Agile Data Warehouse Design
Lawrence Corr, Jim Stagnitto, Decision Press, November 2011
!