SlideShare a Scribd company logo
1 of 18
Why create a Data Mart w/
Dimensional Fact Model
Almeria, 3/3/2017
Stefano Cazzella @StefanoCazzella
http://caccio.blogdns.net
http://bimodeler.com
stefano.cazzella{at}gmail.com
BI Trends
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 2
Data
Integration
Descriptive
Predictive
Prescriptive
Deep
learning
Business
Value
Business
Intelligence
Data
Warehouse
Simulation &
forecasting
Optimization &
automation
Semantic &
AI
Time
Digital transformation of every market
Data explosion: exponential growth of digital data
Driven by
business user’s needs
The Data Mart role in the info pipeline
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 3
IoT
Social Media
Corporate
Information
Systems
Data Lake
Enterprise
Data
Warehouse
Data Marts
Data Labs
Corporate BI
Machine
Learning
Self-service
BI & Data
Discovery
Analytical
tables
Star schemas
Structured &
unstructured
data
Structured &
summarized data
New processes? Roles?
Waterfall process
Business
Desing
Build
Iterative process
Business
Design
Build
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 4
Business
Analyst
Engineer
Technician
Data
Scientist
Business
Analyst
Engineer
Technician
Project Layers for Data Mart
Business
Design
Build
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 5
Civil eng. Software eng.
Why Dimensional Fact Model ?
Formal language  well-specified syntax and sound algebraic definition
Simple and effective graphical notation (representation)
Does not imply any technical/implementation choice
Specifically designed to represent multi-dimensional models
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 6
1
2
3
4
Multi-dimensional model
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 7
The SALES event:
On Nov. 25th, 2014
the Store 2 sold 10
pieces of Product X
for a total revenue
of € 220
Product
Store
Day
Product X
Store 2
Store 1
Store 3
Product Y
Units sold: 10 pieces
Revenue: € 220
Product Z
3-dimensional SALES hyper-space
Dimension
Measure
City DEF
City ABC
State XYZ
Hierarchy
[State – City – Store]
DFM Notation Compendium
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 8
Hierarchy
Dimension
Dimensonal attribute
Non-dimensonal
attribute
Measure
Fact schema SALES
Dependency
Data Mart building process
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 9
Business user’s needs
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Data Mart
Deployment
Implementation
strategy
Technical knowledge
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
Data Mart building process
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 10
Business user’s needs
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Data Mart
Deployment
Implementation
strategy
Technical knowledge
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
Business - From requisite to DFM
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 11
• Context: weblog analytics - the
analysis of the visits of several web
sites belonging to different domains
(eg. Google Analytics)
• Requisite: monitoring and analyzing
the number of visits and their
monthly and daily average duration
for each page of the websites, or
each domain, distributed by the
geographic region of the IP of the
visitors.
 Domain definition
 Aggregation rules
 Optional dependencies
+
Design choice
• Star-schema (denormalized dimension table)
• Snow-flake (hierarchies implemented by tables in 3NF)
Reference ROLAP model:
• Use natural key (the dimension attribute  PK column)
• Use surrogate key (add a new column with no business meaning)
• Use slow-changing dimension (SCD) of type 2
• Use implicit dimension (no dimension table, only a column in the fact table)
Hierarchy implementation strategy (for every dimension)
• Text  VARCHAR(250) ; Currency  NUMBER(9,2) ; etc.
Domain  Data type association
• Table name prefix (D for Dimensions, F for Facts) ; Number  NBR ; etc.
Standard naming conventions and abbreviations
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 12
Transform DFM in a Relational Model
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 13
Technical design choices:
• Reference ROLAP model  star-schema
• Hierarchy Viewer use surrogate key
• Hierarchy Page  SCD – Type 2
• Hierarchy Time  denormalized with natural key
Model
transformation
Fact grain
Surrogate key
SCD-2
Start date
End date
Build choice
• Microsoft SqlServer – Oracle DBMS – SAP Hana– Apache Hive / Hadoop
Choice the DBMS
• Generate unique keys / primary keys / integrity constraints (foreign keys)
Generate constraints?
• Add clustered indexes / column-store indexes / bitmap indexes / etc.
Add specific indexes
• Organize fact tables in partitions (by hash, value, range, etc.)
Define table partitions
• Define file groups / tablespaces for tables, partitions, indexes
Distribute data over multiple volumes
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 14
In-Memory Computing Engine
Session management
Request Processing / Execution Control
Transaction
Manager
Metadata
Manager
SQL Parser
SQL ScriptCalc. Engine
MDX
Relational Engines
Row Store Column Store
Persistence LayerPage Management Logger
Disk Storage
Authorization
Manager
Data Volumes Log Volumes
SAP HANA Architecture
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 15
Row tables
versus
Column tables
Partitioning by
HASH, RANGE,
ROUNDROBIN
DDL script
Phisical model and DDL
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 16
Implementation choices & best practice:
• DBMS  SAP HANA
• All tables are Column-tables
• Fact F_VISITS partitioned by HASH on DAY
• Fact F_VISITS indexed by PAGE
Partition by HASH
BTREE index
Unload priority for memory optimization
Create a column table
Preload columns for
performance optimization
BI Modeler
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 17
• In order to apply a model-driven approach, BI Project teams need a software tool
to:
Manage (draw) all the models - DFM, relational, etc.
Support (and drive) the model transformation process
• There was (are) no many tools able to do that so, in 2006 I started working on the
development of …
http://bimodeler.com
DEMO
Create a DFM
from scratch
Define the fact schema and its measures
Add some dimensions / hierarchies
Define and associate domains to attributes and measures
Transform a
DFM in a
relational data
model
Define an implementation strategy for Hierarchies
Associate Data type to domains
Apply a naming convention
Add physical
properties to
the relational
model
Choose a DBMS
Create partitions
Create indexes
Generate DDL script
BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 18

More Related Content

What's hot

Steps To Build A Datawarehouse
Steps To Build A DatawarehouseSteps To Build A Datawarehouse
Steps To Build A Datawarehouse
Hendra Saputra
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Varun Jain
 
Dataware housing
Dataware housingDataware housing
Dataware housing
work
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 
Bw training 1 intro dw
Bw training   1 intro dwBw training   1 intro dw
Bw training 1 intro dw
Joseph Tham
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
Uday Kothari
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Eyad Manna
 

What's hot (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Steps To Build A Datawarehouse
Steps To Build A DatawarehouseSteps To Build A Datawarehouse
Steps To Build A Datawarehouse
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
DATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTUREDATA MART APPROCHES TO ARCHITECTURE
DATA MART APPROCHES TO ARCHITECTURE
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
Dw Concepts
Dw ConceptsDw Concepts
Dw Concepts
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
Bw training 1 intro dw
Bw training   1 intro dwBw training   1 intro dw
Bw training 1 intro dw
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
Business Intelligence Fundamentals
Business Intelligence FundamentalsBusiness Intelligence Fundamentals
Business Intelligence Fundamentals
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Star schema
Star schemaStar schema
Star schema
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 

Viewers also liked

Zackman frame work
Zackman frame workZackman frame work
Zackman frame work
ganblues
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
Aswathy S Nair
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 

Viewers also liked (17)

Zackman frame work
Zackman frame workZackman frame work
Zackman frame work
 
Informatica Designer Module
Informatica Designer ModuleInformatica Designer Module
Informatica Designer Module
 
Informatica Server Manager
Informatica Server ManagerInformatica Server Manager
Informatica Server Manager
 
Informatica Power Center 7.1
Informatica Power Center 7.1Informatica Power Center 7.1
Informatica Power Center 7.1
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Informatica student meterial
Informatica student meterialInformatica student meterial
Informatica student meterial
 
Fact and-opinion
Fact and-opinionFact and-opinion
Fact and-opinion
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 
Fact or opinion
Fact or opinionFact or opinion
Fact or opinion
 
Marketing fact vs Marketing fantasy
Marketing fact vs Marketing fantasyMarketing fact vs Marketing fantasy
Marketing fact vs Marketing fantasy
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
BUSINESS QUIZ -Round 1
 BUSINESS QUIZ -Round 1 BUSINESS QUIZ -Round 1
BUSINESS QUIZ -Round 1
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similar to Why create a Data Mart with Dimensional Fact Model

Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
ganblues
 
Model Driven Business Intelligence
Model Driven Business IntelligenceModel Driven Business Intelligence
Model Driven Business Intelligence
caccio
 
Samuel Bayeta
Samuel BayetaSamuel Bayeta
Samuel Bayeta
Sam B
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011
Itay Braun
 

Similar to Why create a Data Mart with Dimensional Fact Model (20)

Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008
 
sap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hanasap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hana
 
CS636-olap.ppt
CS636-olap.pptCS636-olap.ppt
CS636-olap.ppt
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
BI 2008 Simple
BI 2008 SimpleBI 2008 Simple
BI 2008 Simple
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Model Driven Business Intelligence
Model Driven Business IntelligenceModel Driven Business Intelligence
Model Driven Business Intelligence
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
 
Corey Sykes' Resume
Corey Sykes' ResumeCorey Sykes' Resume
Corey Sykes' Resume
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Coud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AICoud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AI
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Samuel Bayeta
Samuel BayetaSamuel Bayeta
Samuel Bayeta
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data Management
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011
 
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Why create a Data Mart with Dimensional Fact Model

  • 1. Why create a Data Mart w/ Dimensional Fact Model Almeria, 3/3/2017 Stefano Cazzella @StefanoCazzella http://caccio.blogdns.net http://bimodeler.com stefano.cazzella{at}gmail.com
  • 2. BI Trends BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 2 Data Integration Descriptive Predictive Prescriptive Deep learning Business Value Business Intelligence Data Warehouse Simulation & forecasting Optimization & automation Semantic & AI Time Digital transformation of every market Data explosion: exponential growth of digital data
  • 3. Driven by business user’s needs The Data Mart role in the info pipeline BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 3 IoT Social Media Corporate Information Systems Data Lake Enterprise Data Warehouse Data Marts Data Labs Corporate BI Machine Learning Self-service BI & Data Discovery Analytical tables Star schemas Structured & unstructured data Structured & summarized data
  • 4. New processes? Roles? Waterfall process Business Desing Build Iterative process Business Design Build BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 4 Business Analyst Engineer Technician Data Scientist Business Analyst Engineer Technician
  • 5. Project Layers for Data Mart Business Design Build BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 5 Civil eng. Software eng.
  • 6. Why Dimensional Fact Model ? Formal language  well-specified syntax and sound algebraic definition Simple and effective graphical notation (representation) Does not imply any technical/implementation choice Specifically designed to represent multi-dimensional models BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 6 1 2 3 4
  • 7. Multi-dimensional model BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 7 The SALES event: On Nov. 25th, 2014 the Store 2 sold 10 pieces of Product X for a total revenue of € 220 Product Store Day Product X Store 2 Store 1 Store 3 Product Y Units sold: 10 pieces Revenue: € 220 Product Z 3-dimensional SALES hyper-space Dimension Measure City DEF City ABC State XYZ Hierarchy [State – City – Store]
  • 8. DFM Notation Compendium BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 8 Hierarchy Dimension Dimensonal attribute Non-dimensonal attribute Measure Fact schema SALES Dependency
  • 9. Data Mart building process BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 9 Business user’s needs Phisical data model (DDL with indexes, partions, etc.) Model transformation Data Mart Deployment Implementation strategy Technical knowledge Multidimensional data model (Dimensional Fact Model) Requirements definition Model transformation Logical data model (Relational model: tables, columns, etc.)
  • 10. Data Mart building process BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 10 Business user’s needs Phisical data model (DDL with indexes, partions, etc.) Model transformation Data Mart Deployment Implementation strategy Technical knowledge Multidimensional data model (Dimensional Fact Model) Requirements definition Model transformation Logical data model (Relational model: tables, columns, etc.)
  • 11. Business - From requisite to DFM BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 11 • Context: weblog analytics - the analysis of the visits of several web sites belonging to different domains (eg. Google Analytics) • Requisite: monitoring and analyzing the number of visits and their monthly and daily average duration for each page of the websites, or each domain, distributed by the geographic region of the IP of the visitors.  Domain definition  Aggregation rules  Optional dependencies +
  • 12. Design choice • Star-schema (denormalized dimension table) • Snow-flake (hierarchies implemented by tables in 3NF) Reference ROLAP model: • Use natural key (the dimension attribute  PK column) • Use surrogate key (add a new column with no business meaning) • Use slow-changing dimension (SCD) of type 2 • Use implicit dimension (no dimension table, only a column in the fact table) Hierarchy implementation strategy (for every dimension) • Text  VARCHAR(250) ; Currency  NUMBER(9,2) ; etc. Domain  Data type association • Table name prefix (D for Dimensions, F for Facts) ; Number  NBR ; etc. Standard naming conventions and abbreviations BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 12
  • 13. Transform DFM in a Relational Model BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 13 Technical design choices: • Reference ROLAP model  star-schema • Hierarchy Viewer use surrogate key • Hierarchy Page  SCD – Type 2 • Hierarchy Time  denormalized with natural key Model transformation Fact grain Surrogate key SCD-2 Start date End date
  • 14. Build choice • Microsoft SqlServer – Oracle DBMS – SAP Hana– Apache Hive / Hadoop Choice the DBMS • Generate unique keys / primary keys / integrity constraints (foreign keys) Generate constraints? • Add clustered indexes / column-store indexes / bitmap indexes / etc. Add specific indexes • Organize fact tables in partitions (by hash, value, range, etc.) Define table partitions • Define file groups / tablespaces for tables, partitions, indexes Distribute data over multiple volumes BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 14
  • 15. In-Memory Computing Engine Session management Request Processing / Execution Control Transaction Manager Metadata Manager SQL Parser SQL ScriptCalc. Engine MDX Relational Engines Row Store Column Store Persistence LayerPage Management Logger Disk Storage Authorization Manager Data Volumes Log Volumes SAP HANA Architecture BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 15 Row tables versus Column tables Partitioning by HASH, RANGE, ROUNDROBIN
  • 16. DDL script Phisical model and DDL BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 16 Implementation choices & best practice: • DBMS  SAP HANA • All tables are Column-tables • Fact F_VISITS partitioned by HASH on DAY • Fact F_VISITS indexed by PAGE Partition by HASH BTREE index Unload priority for memory optimization Create a column table Preload columns for performance optimization
  • 17. BI Modeler BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 17 • In order to apply a model-driven approach, BI Project teams need a software tool to: Manage (draw) all the models - DFM, relational, etc. Support (and drive) the model transformation process • There was (are) no many tools able to do that so, in 2006 I started working on the development of … http://bimodeler.com
  • 18. DEMO Create a DFM from scratch Define the fact schema and its measures Add some dimensions / hierarchies Define and associate domains to attributes and measures Transform a DFM in a relational data model Define an implementation strategy for Hierarchies Associate Data type to domains Apply a naming convention Add physical properties to the relational model Choose a DBMS Create partitions Create indexes Generate DDL script BI ACADEMY Conference - Almeria, 3/3/2017 - Stefano Cazzella 18