2024: Domino Containers - The Next Step. News from the Domino Container commu...
Why create a Data Mart with Dimensional Fact Model
1. Why create a Data Mart w/
Dimensional Fact Model
Almeria, 3/3/2017
Stefano Cazzella @StefanoCazzella
http://caccio.blogdns.net
http://bimodeler.com
stefano.cazzella{at}gmail.com
2. BI Trends
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 2
Data
Integration
Descriptive
Predictive
Prescriptive
Deep
learning
Business
Value
Business
Intelligence
Data
Warehouse
Simulation &
forecasting
Optimization &
automation
Semantic &
AI
Time
Digital transformation of every market
Data explosion: exponential growth of digital data
3. Driven by
business user’s needs
The Data Mart role in the info pipeline
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 3
IoT
Social Media
Corporate
Information
Systems
Data Lake
Enterprise
Data
Warehouse
Data Marts
Data Labs
Corporate BI
Machine
Learning
Self-service
BI & Data
Discovery
Analytical
tables
Star schemas
Structured &
unstructured
data
Structured &
summarized data
4. New processes? Roles?
Waterfall process
Business
Desing
Build
Iterative process
Business
Design
Build
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 4
Business
Analyst
Engineer
Technician
Data
Scientist
Business
Analyst
Engineer
Technician
5. Project Layers for Data Mart
Business
Design
Build
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 5
Civil eng. Software eng.
6. Why Dimensional Fact Model ?
Formal language well-specified syntax and sound algebraic definition
Simple and effective graphical notation (representation)
Does not imply any technical/implementation choice
Specifically designed to represent multi-dimensional models
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 6
1
2
3
4
7. Multi-dimensional model
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 7
The SALES event:
On Nov. 25th, 2014
the Store 2 sold 10
pieces of Product X
for a total revenue
of € 220
Product
Store
Day
Product X
Store 2
Store 1
Store 3
Product Y
Units sold: 10 pieces
Revenue: € 220
Product Z
3-dimensional SALES hyper-space
Dimension
Measure
City DEF
City ABC
State XYZ
Hierarchy
[State – City – Store]
9. Data Mart building process
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 9
Business user’s needs
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Data Mart
Deployment
Implementation
strategy
Technical knowledge
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
10. Data Mart building process
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 10
Business user’s needs
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Data Mart
Deployment
Implementation
strategy
Technical knowledge
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
11. Business - From requisite to DFM
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 11
• Context: weblog analytics - the
analysis of the visits of several web
sites belonging to different domains
(eg. Google Analytics)
• Requisite: monitoring and analyzing
the number of visits and their
monthly and daily average duration
for each page of the websites, or
each domain, distributed by the
geographic region of the IP of the
visitors.
Domain definition
Aggregation rules
Optional dependencies
+
12. Design choice
• Star-schema (denormalized dimension table)
• Snow-flake (hierarchies implemented by tables in 3NF)
Reference ROLAP model:
• Use natural key (the dimension attribute PK column)
• Use surrogate key (add a new column with no business meaning)
• Use slow-changing dimension (SCD) of type 2
• Use implicit dimension (no dimension table, only a column in the fact table)
Hierarchy implementation strategy (for every dimension)
• Text VARCHAR(250) ; Currency NUMBER(9,2) ; etc.
Domain Data type association
• Table name prefix (D for Dimensions, F for Facts) ; Number NBR ; etc.
Standard naming conventions and abbreviations
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 12
13. Transform DFM in a Relational Model
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 13
Technical design choices:
• Reference ROLAP model star-schema
• Hierarchy Viewer use surrogate key
• Hierarchy Page SCD – Type 2
• Hierarchy Time denormalized with natural key
Model
transformation
Fact grain
Surrogate key
SCD-2
Start date
End date
14. Build choice
• Microsoft SqlServer – Oracle DBMS – SAP Hana– Apache Hive / Hadoop
Choice the DBMS
• Generate unique keys / primary keys / integrity constraints (foreign keys)
Generate constraints?
• Add clustered indexes / column-store indexes / bitmap indexes / etc.
Add specific indexes
• Organize fact tables in partitions (by hash, value, range, etc.)
Define table partitions
• Define file groups / tablespaces for tables, partitions, indexes
Distribute data over multiple volumes
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 14
15. In-Memory Computing Engine
Session management
Request Processing / Execution Control
Transaction
Manager
Metadata
Manager
SQL Parser
SQL ScriptCalc. Engine
MDX
Relational Engines
Row Store Column Store
Persistence LayerPage Management Logger
Disk Storage
Authorization
Manager
Data Volumes Log Volumes
SAP HANA Architecture
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 15
Row tables
versus
Column tables
Partitioning by
HASH, RANGE,
ROUNDROBIN
16. DDL script
Phisical model and DDL
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 16
Implementation choices & best practice:
• DBMS SAP HANA
• All tables are Column-tables
• Fact F_VISITS partitioned by HASH on DAY
• Fact F_VISITS indexed by PAGE
Partition by HASH
BTREE index
Unload priority for memory optimization
Create a column table
Preload columns for
performance optimization
17. BI Modeler
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 17
• In order to apply a model-driven approach, BI Project teams need a software tool
to:
Manage (draw) all the models - DFM, relational, etc.
Support (and drive) the model transformation process
• There was (are) no many tools able to do that so, in 2006 I started working on the
development of …
http://bimodeler.com
18. DEMO
Create a DFM
from scratch
Define the fact schema and its measures
Add some dimensions / hierarchies
Define and associate domains to attributes and measures
Transform a
DFM in a
relational data
model
Define an implementation strategy for Hierarchies
Associate Data type to domains
Apply a naming convention
Add physical
properties to
the relational
model
Choose a DBMS
Create partitions
Create indexes
Generate DDL script
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 18