SlideShare a Scribd company logo
1 of 43
Download to read offline
Data Warehousing
Concepts and Design
What is a Data Warehouse?
A Data Warehouse is a collection of data gathered and
organized so that it can easily by analyzed, extracted,
synthesized, and otherwise be used for the purposes of
further understanding the data.
A Data warehouse is a subject-oriented, integrated, time-
variant and non-volatile collection of data in support of
management's decision making process
THE ARCHITECTURE
A typical Data warehouse consists of
•Source
•Staging Area
•Data warehouse/ Data Mart
•End User
Data Warehouse Architecture (with a
Staging Area)
General Stages of Data Warehouse
 Off line Operational Database
o Data warehouses are developed by copying the data off
an operational system to another server where the
processing load of reporting against the copied data
does not impact the operational system's performance.
 Off line Data Warehouse
o Data warehouses are updated from data in the
operational systems on a regular basis and the data
warehouse data is stored in a data structure designed to
facilitate reporting.
5
 Real Time Data Warehouse
o Data warehouses at this stage are updated every time
an operational system performs a transaction
 Integrated Data Warehouse
o Data warehouses at this stage are updated every time
an operational system performs a transaction. The data
warehouses then generate transactions that are passed
back into the operational systems.
6
General Stages of Data Warehouse
Types of Data Warehouse
 Enterprise Data Warehouse
o provide a control Data Base for decision support through
out the enterprise.
 Operational data store
o has a broad enterprise under scope but unlike a real
enterprise DW. Data is refreshed in rare real time and
used for routine business activity.
 Data Mart
o is a sub part of Data Warehouse. It support a particular
reason or it is design for particular lines of business.
7
8
Data Mart
 A data mart is a smaller version of a data
warehouse
• Usually containing data related to a single unit of an
organization
 Usually data mart focuses on the requirements
of only one department or business function of a
company
 Data mart can be a useful first step to a full-
scale data warehouse
9
Why creating Data Mart?
 To provide data for users in a single department
 To improve end-user response time due to the reduction
in the volume of data
 Building a data mart is simpler compared with
establishing a corporate data warehouse
 The cost of implementing data marts is normally less
than implementing a data warehouse
 Users of a data mart are more clearly defined and can
be more easily targeted
10
Data Warehouse
Data Mart
Benefits of data warehousing
 A data warehouse provides a common data model for all
data of interest regardless of the data's source
 Prior to loading data into the data warehouse,
inconsistencies are identified and resolved.
 Information in the data warehouse is under the control of
data warehouse users so that, even if the source system
data is purged over time, the information in the
warehouse can be stored safely for extended periods of
time.
11
 Data warehouses can work in conjunction with and,
hence, enhance the value of operational business
applications, notably customer relationship management
(CRM) systems.
 Data warehouses facilitate decision support system
applications such as trend reports, exception reports,
and reports that show actual performance versus goals.
 Data warehouses provide retrieval of data without
slowing down operational systems.
12
Benefits of data warehousing
Disadvantages of data
warehouses
 The data warehouse is usually not static. Maintenance
costs are high.
 Data warehouses can get outdated relatively quickly.
 There is often a fine line between data warehouses and
operational systems. Duplicate, expensive functionality
may be developed. Or, functionality may be developed in
the data warehouse that, in retrospect, should have been
developed in the operational systems and vice versa..
13
Business intelligence
 Business intelligence (BI) refers to skills, technologies,
applications and practices used to help a business
acquire a better understanding of its commercial context.
Business intelligence may also refer to the collected
information itself.
 BI applications provide historical, current, and predictive
views of business operations. Common functions of
business intelligence applications are reporting, OLAP,
analytics, data mining, business performance
management, benchmarks, text mining, and predictive
analytics.
14
Types of business intelligence
tools
 Spreadsheets
 Reporting and querying software
 OLAP
 Digital Dashboards
 Data mining
 Process mining
 Business performance management
15
Data mining
 Data mining is the process of extracting hidden patterns
from large amounts of data. It is commonly used in a
wide range of profiling practices, such as marketing,
surveillance, fraud detection and scientific discovery.
 As more data is gathered, with the amount of data
doubling every three years, data mining is becoming an
increasingly important tool to transform this data into
information. It is commonly used in a wide range of
profiling practices, such as marketing, surveillance, fraud
detection and scientific discovery.
 Data mining can be applied to data sets of any size.
16
Data integration
 Data integration is the process of combining data
residing at different sources and providing the user with
a unified view of these data .
 Data integration appears with increasing frequency as
the volume and the need to share existing data
explodes.
 It has been the focus of extensive theoretical work and
numerous open problems remain to be solved. In
management practice, data integration is frequently
called Enterprise Information Integration.
17
OLAP & OLTP
OLAP OLTP database
Designed for analysis of business
measures by category and
attributes.
Designed for real time business
operations.
Optimized for bulk loads and large,
complex, unpredictable queries
that access many rows per table.
Optimized for a common set of
transactions, usually adding or
retrieving a single row at a time per
table.
Loaded with consistent, valid data;
requires no real time validation.
Optimized for validation of
incoming data during transactions;
uses validation data tables.
Supports few concurrent users
relative to OLTP.
Supports thousands of concurrent
users.
Analysis Services
A middle-tier server for OLAP and data
mining; manages multi-dimensional cubes of
data for analysis and provides rapid client
access; allows you to create data mining
models from both OLAP and relational data
sources
Data Warehousing Schemas
 Star Schemas
 Snowflake Schemas
Star Schema
 The star schema is perhaps the simplest
data warehouse schema. It is called a star
schema because the entity-relationship
diagram of this schema resembles a star,
with points radiating from a central table.
The center of the star consists of a large
fact table and the points of the star are the
dimension tables
22
Example: Star Schema
PropertySale
TimeID (FK)
PropertyID (FK)
BranchID (FK)
BuyerID (FK)
PromotionID (FK)
StaffID (FK)
OwnerID (FK)
OfferPrice
SellingPrice
SaleCommission
SaleRevenue
Property
PropertyID (PK)
Type
Street
City
Province
Country
Postcode
Buyer
BuyerID (PK)
Name
Type
City
Province
Country
Staff
StaffID (PK)
Name
Position
Gender
City
Province
Country
Owner
OwnerID (PK)
Name
Type
City
Province
Country
BranchID (PK)
Type
City
Province
Country
Branch
PromotionID (PK)
Name
Type
Promotion
TimeID (PK)
Day
Week
Month
Year
Time
Fact Table
Dimension Tables
Dimension Tables
23
Snowflake Schema
 Snowflake schema is a variant of the star
schema where dimension tables contain
normalized data
o e.g. ‘city’ and ‘province’ can be splitted as separated
tables to normalize dimension tables
o ‘Starflake’ schema is a hybrid structure that contains
a mixture of star (denormalized) and snowflake
(normalized) schemas
 This allows dimension tables to be present in both forms for
different query requirements
24
Example: Snowflake Schema
PropertySale
TimeID (FK)
PropertyID (FK)
BranchID (FK)
BuyerID (FK)
PromotionID (FK)
StaffID (FK)
OwnerID (FK)
OfferPrice
SellingPrice
SaleCommission
SaleRevenue
BranchID (PK)
Type
CityID (FK)
Branch
Fact Table
Province (PK)
Country
Province
CityID (PK)
CityName
Province (FK)
City
Dimension Tables
 In a star schema every dimension will have a primary key.
 In a star schema, a dimension table will not have any parent
table.
 Whereas in a snow flake schema, a dimension table will
have one or more parent tables.
 Hierarchies for the dimensions are stored in the
dimensional table itself in star schema.
 Whereas hierarchies are broken into separate tables in snow
flake schema. These hierarchies helps to drill down the data
from topmost hierarchies to the lowermost hierarchies.
Important aspects of Star
Schema & Snow Flake Schema
Dimension Modeling
 Dimensional modeling is a technique for conceptualizing
and visualizing data models as a set of measures that
are described by common aspects of the business.
 It is especially useful for summarizing and rearranging
the data and presenting views of the data to support data
analysis.
 Dimensional modeling focuses on numeric data, such as
values, counts, weights, balances, and occurrences.
27
Multidimensional Data Model
(MDM)
 Focused on a collection of numeric measures.
 Each measure depends on a set of dimensions
 Data are presented as multidimensional array
Unified Dimensional Model
(UDM)
 The role of a Unified Dimensional Model (UDM) is to
provide a bridge between the user and the data sources.
 A UDM is constructed over one or more physical data
sources, and then the end user issues queries against
the UDM using one of a variety of client tools.
 Advantages of creating UDM
 More readily understood model of the data
 Isolation from heterogeneous backend data sources
 Improved performance for summary type queries.
28
Basic Concepts
 Dimensional modeling has several basic
concepts:
· Facts
· Dimensions
· Measures (variables)
Fact Table
 Contain numeric measures of the business
 Contains facts and connected to dimensions
 two types of columns
facts or measures
foreign keys to dimension tables
 May contain date-stamped data
 A fact table might contain either detail level facts or facts
that have been aggregated
Steps in designing Fact Table
 Identify a business process for analysis(like sales).
 Identify measures or facts (sales dollar).
 Identify dimensions for facts(product dimension, location
dimension, time dimension, organization dimension).
 List the columns that describe each dimension.(region
name, branch name, region name).
 Determine the lowest level of summary in a fact
table(sales dollar).
Types of Facts (Measures)
 Additive - Measures that can be added across all
dimensions.
 Semi Additive - Measures that can be added across few
dimensions and not with others.
 Non Additive - Measures that cannot be added across
all dimensions.
Dimension Tables
 A dimension is a structure, often composed of one or
more hierarchies, that categorizes data. Dimensional
attributes help to describe the dimensional value. They
are normally descriptive, textual values. Several distinct
dimensions, combined with facts, enable you to answer
business questions. Commonly used dimensions are
customers, products, and time.
Dimension Tables
 Contain textual information that represents attributes
of the business
 Contain relatively static data
 Are joined to fact table through a foreign key
reference
 Are usually smaller than fact tables
35
Examples for
Dimensions and Facts
MEASURES
 A measure is a numeric attribute of a fact, representing
the performance or behavior of the business relative to
the dimensions. The actual numbers are called as
variables.
 A measure is determined by combinations of the
members of the dimensions and is located on facts.
Examples of Measures:
o Quantity Sold
o Unit Price
o Amount Sold
o Profit
CUBE
Cubes contain a set of data that is usually constructed
from a subset of a data warehouse and is organized and
summarized into a multidimensional structure defined by a
set of dimensions and measures.
Advantages of SSAS Cubes
 SSAS is fast even on a large volume of data
 SSAS calculated measures are fast execution-wise and easy
reusable
 They are defined centrally in the SSAS database, and the
reports pick and choose the calculated measures they want.
39
ROLAP: Relational OLAP
 Special schema design: star, snowflake
 Special indexes: bitmap, multi-table join
 Special tuning: maximize query throughput
 Proven technology (relational model, DBMS),
tend to outperform specialized MDDB
especially on large data sets
 Products
o IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
Advantages:
o It can handle large amounts of data, ROLAP itself places
no limitation on data amount
 Disadvantages:
o Performance can be slow. Because each ROLAP report is
essentially a SQL query (or multiple SQL queries) in the
relational database, the query time can be long if the
underlying data size is large .
o It is difficult to perform complex calculations .
ROLAP: Relational OLAP
CS 33641
MOLAP: Multi Dimensional OLAP
 MDDB: a special-purpose data model
 Facts stored in multi-dimensional arrays
 Dimensions used to index array
 Sometimes on top of relational DB
 Products
o Pilot, Arbor Essbase, Gentia
Advantages
o Excellent performance
o The storage is not in the relational database, but in
proprietary formats.
o MOLAP cubes are built for fast data retrieval, and
are optimal for slicing and dicing operations.
Disadvantages:
o It is limited in the amount of data it can handle.
Because all calculations are performed when the
cube is built, it is not possible to include a large
amount of data in the cube itself.
o It requires an additional investment in human and
capital resources are needed.
MOLAP: Multi Dimensional OLAP
 Hybrid OLAP (HOLAP) refers to technologies that
combine MOLAP and ROLAP.
Advantages
o For summary-type information, HOLAP
leverages cube technology for faster
performance.
o When detail information is needed, HOLAP
can "drill through" from the cube into the
underlying relational data.
`
HOLAP: Hybrid OLAP

More Related Content

What's hot

What's hot (20)

Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Kdd process
Kdd processKdd process
Kdd process
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Data cubes
Data cubesData cubes
Data cubes
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databases
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Data mart
Data martData mart
Data mart
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 

Viewers also liked

Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??Abdul Aslam
 
Linux VMWare image with Informatica , Oracle and Rundeck scheduler
Linux VMWare image with Informatica , Oracle and Rundeck schedulerLinux VMWare image with Informatica , Oracle and Rundeck scheduler
Linux VMWare image with Informatica , Oracle and Rundeck schedulerpcherukumalla
 
Информационни системи
Информационни системиИнформационни системи
Информационни системиМони Мон
 
Use of Star Schema in Health Care
Use of Star Schema in Health CareUse of Star Schema in Health Care
Use of Star Schema in Health CareVINEETH M
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataDATAVERSITY
 
Income From Other Sources
Income From Other SourcesIncome From Other Sources
Income From Other SourcesAmeet Patel
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Introduction to metadata management
Introduction to metadata managementIntroduction to metadata management
Introduction to metadata managementOpen Data Support
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouseSiddique Ibrahim
 

Viewers also liked (16)

Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
Linux VMWare image with Informatica , Oracle and Rundeck scheduler
Linux VMWare image with Informatica , Oracle and Rundeck schedulerLinux VMWare image with Informatica , Oracle and Rundeck scheduler
Linux VMWare image with Informatica , Oracle and Rundeck scheduler
 
Star schema
Star schemaStar schema
Star schema
 
Star schema
Star schemaStar schema
Star schema
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R ModellingData Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
 
Информационни системи
Информационни системиИнформационни системи
Информационни системи
 
Use of Star Schema in Health Care
Use of Star Schema in Health CareUse of Star Schema in Health Care
Use of Star Schema in Health Care
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
 
Income From Other Sources
Income From Other SourcesIncome From Other Sources
Income From Other Sources
 
Metadata in Business Intelligence
Metadata in Business IntelligenceMetadata in Business Intelligence
Metadata in Business Intelligence
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to metadata management
Introduction to metadata managementIntroduction to metadata management
Introduction to metadata management
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
 

Similar to Data warehousing

DataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptDataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptPurnenduMaity2
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docxKen T
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxDURGADEVIL
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
 
TOPIC 9 data warehousing and data mining.pdf
TOPIC 9 data warehousing and data mining.pdfTOPIC 9 data warehousing and data mining.pdf
TOPIC 9 data warehousing and data mining.pdfSCITprojects2022
 
data warehousing and data mining (1).pdf
data warehousing and data mining (1).pdfdata warehousing and data mining (1).pdf
data warehousing and data mining (1).pdfSCITprojects2022
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
 

Similar to Data warehousing (20)

DataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptDataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.ppt
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docx
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
TOPIC 9 data warehousing and data mining.pdf
TOPIC 9 data warehousing and data mining.pdfTOPIC 9 data warehousing and data mining.pdf
TOPIC 9 data warehousing and data mining.pdf
 
data warehousing and data mining (1).pdf
data warehousing and data mining (1).pdfdata warehousing and data mining (1).pdf
data warehousing and data mining (1).pdf
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 

Data warehousing

  • 2. What is a Data Warehouse? A Data Warehouse is a collection of data gathered and organized so that it can easily by analyzed, extracted, synthesized, and otherwise be used for the purposes of further understanding the data. A Data warehouse is a subject-oriented, integrated, time- variant and non-volatile collection of data in support of management's decision making process
  • 3. THE ARCHITECTURE A typical Data warehouse consists of •Source •Staging Area •Data warehouse/ Data Mart •End User
  • 4. Data Warehouse Architecture (with a Staging Area)
  • 5. General Stages of Data Warehouse  Off line Operational Database o Data warehouses are developed by copying the data off an operational system to another server where the processing load of reporting against the copied data does not impact the operational system's performance.  Off line Data Warehouse o Data warehouses are updated from data in the operational systems on a regular basis and the data warehouse data is stored in a data structure designed to facilitate reporting. 5
  • 6.  Real Time Data Warehouse o Data warehouses at this stage are updated every time an operational system performs a transaction  Integrated Data Warehouse o Data warehouses at this stage are updated every time an operational system performs a transaction. The data warehouses then generate transactions that are passed back into the operational systems. 6 General Stages of Data Warehouse
  • 7. Types of Data Warehouse  Enterprise Data Warehouse o provide a control Data Base for decision support through out the enterprise.  Operational data store o has a broad enterprise under scope but unlike a real enterprise DW. Data is refreshed in rare real time and used for routine business activity.  Data Mart o is a sub part of Data Warehouse. It support a particular reason or it is design for particular lines of business. 7
  • 8. 8 Data Mart  A data mart is a smaller version of a data warehouse • Usually containing data related to a single unit of an organization  Usually data mart focuses on the requirements of only one department or business function of a company  Data mart can be a useful first step to a full- scale data warehouse
  • 9. 9 Why creating Data Mart?  To provide data for users in a single department  To improve end-user response time due to the reduction in the volume of data  Building a data mart is simpler compared with establishing a corporate data warehouse  The cost of implementing data marts is normally less than implementing a data warehouse  Users of a data mart are more clearly defined and can be more easily targeted
  • 11. Benefits of data warehousing  A data warehouse provides a common data model for all data of interest regardless of the data's source  Prior to loading data into the data warehouse, inconsistencies are identified and resolved.  Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time. 11
  • 12.  Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.  Data warehouses facilitate decision support system applications such as trend reports, exception reports, and reports that show actual performance versus goals.  Data warehouses provide retrieval of data without slowing down operational systems. 12 Benefits of data warehousing
  • 13. Disadvantages of data warehouses  The data warehouse is usually not static. Maintenance costs are high.  Data warehouses can get outdated relatively quickly.  There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa.. 13
  • 14. Business intelligence  Business intelligence (BI) refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself.  BI applications provide historical, current, and predictive views of business operations. Common functions of business intelligence applications are reporting, OLAP, analytics, data mining, business performance management, benchmarks, text mining, and predictive analytics. 14
  • 15. Types of business intelligence tools  Spreadsheets  Reporting and querying software  OLAP  Digital Dashboards  Data mining  Process mining  Business performance management 15
  • 16. Data mining  Data mining is the process of extracting hidden patterns from large amounts of data. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.  As more data is gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.  Data mining can be applied to data sets of any size. 16
  • 17. Data integration  Data integration is the process of combining data residing at different sources and providing the user with a unified view of these data .  Data integration appears with increasing frequency as the volume and the need to share existing data explodes.  It has been the focus of extensive theoretical work and numerous open problems remain to be solved. In management practice, data integration is frequently called Enterprise Information Integration. 17
  • 18. OLAP & OLTP OLAP OLTP database Designed for analysis of business measures by category and attributes. Designed for real time business operations. Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. Loaded with consistent, valid data; requires no real time validation. Optimized for validation of incoming data during transactions; uses validation data tables. Supports few concurrent users relative to OLTP. Supports thousands of concurrent users.
  • 19. Analysis Services A middle-tier server for OLAP and data mining; manages multi-dimensional cubes of data for analysis and provides rapid client access; allows you to create data mining models from both OLAP and relational data sources
  • 20. Data Warehousing Schemas  Star Schemas  Snowflake Schemas
  • 21. Star Schema  The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables
  • 22. 22 Example: Star Schema PropertySale TimeID (FK) PropertyID (FK) BranchID (FK) BuyerID (FK) PromotionID (FK) StaffID (FK) OwnerID (FK) OfferPrice SellingPrice SaleCommission SaleRevenue Property PropertyID (PK) Type Street City Province Country Postcode Buyer BuyerID (PK) Name Type City Province Country Staff StaffID (PK) Name Position Gender City Province Country Owner OwnerID (PK) Name Type City Province Country BranchID (PK) Type City Province Country Branch PromotionID (PK) Name Type Promotion TimeID (PK) Day Week Month Year Time Fact Table Dimension Tables Dimension Tables
  • 23. 23 Snowflake Schema  Snowflake schema is a variant of the star schema where dimension tables contain normalized data o e.g. ‘city’ and ‘province’ can be splitted as separated tables to normalize dimension tables o ‘Starflake’ schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) schemas  This allows dimension tables to be present in both forms for different query requirements
  • 24. 24 Example: Snowflake Schema PropertySale TimeID (FK) PropertyID (FK) BranchID (FK) BuyerID (FK) PromotionID (FK) StaffID (FK) OwnerID (FK) OfferPrice SellingPrice SaleCommission SaleRevenue BranchID (PK) Type CityID (FK) Branch Fact Table Province (PK) Country Province CityID (PK) CityName Province (FK) City Dimension Tables
  • 25.  In a star schema every dimension will have a primary key.  In a star schema, a dimension table will not have any parent table.  Whereas in a snow flake schema, a dimension table will have one or more parent tables.  Hierarchies for the dimensions are stored in the dimensional table itself in star schema.  Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies. Important aspects of Star Schema & Snow Flake Schema
  • 26. Dimension Modeling  Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business.  It is especially useful for summarizing and rearranging the data and presenting views of the data to support data analysis.  Dimensional modeling focuses on numeric data, such as values, counts, weights, balances, and occurrences.
  • 27. 27 Multidimensional Data Model (MDM)  Focused on a collection of numeric measures.  Each measure depends on a set of dimensions  Data are presented as multidimensional array
  • 28. Unified Dimensional Model (UDM)  The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources.  A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools.  Advantages of creating UDM  More readily understood model of the data  Isolation from heterogeneous backend data sources  Improved performance for summary type queries. 28
  • 29. Basic Concepts  Dimensional modeling has several basic concepts: · Facts · Dimensions · Measures (variables)
  • 30. Fact Table  Contain numeric measures of the business  Contains facts and connected to dimensions  two types of columns facts or measures foreign keys to dimension tables  May contain date-stamped data  A fact table might contain either detail level facts or facts that have been aggregated
  • 31. Steps in designing Fact Table  Identify a business process for analysis(like sales).  Identify measures or facts (sales dollar).  Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).  List the columns that describe each dimension.(region name, branch name, region name).  Determine the lowest level of summary in a fact table(sales dollar).
  • 32. Types of Facts (Measures)  Additive - Measures that can be added across all dimensions.  Semi Additive - Measures that can be added across few dimensions and not with others.  Non Additive - Measures that cannot be added across all dimensions.
  • 33. Dimension Tables  A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questions. Commonly used dimensions are customers, products, and time.
  • 34. Dimension Tables  Contain textual information that represents attributes of the business  Contain relatively static data  Are joined to fact table through a foreign key reference  Are usually smaller than fact tables
  • 36. MEASURES  A measure is a numeric attribute of a fact, representing the performance or behavior of the business relative to the dimensions. The actual numbers are called as variables.  A measure is determined by combinations of the members of the dimensions and is located on facts. Examples of Measures: o Quantity Sold o Unit Price o Amount Sold o Profit
  • 37. CUBE Cubes contain a set of data that is usually constructed from a subset of a data warehouse and is organized and summarized into a multidimensional structure defined by a set of dimensions and measures.
  • 38. Advantages of SSAS Cubes  SSAS is fast even on a large volume of data  SSAS calculated measures are fast execution-wise and easy reusable  They are defined centrally in the SSAS database, and the reports pick and choose the calculated measures they want.
  • 39. 39 ROLAP: Relational OLAP  Special schema design: star, snowflake  Special indexes: bitmap, multi-table join  Special tuning: maximize query throughput  Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets  Products o IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
  • 40. Advantages: o It can handle large amounts of data, ROLAP itself places no limitation on data amount  Disadvantages: o Performance can be slow. Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large . o It is difficult to perform complex calculations . ROLAP: Relational OLAP
  • 41. CS 33641 MOLAP: Multi Dimensional OLAP  MDDB: a special-purpose data model  Facts stored in multi-dimensional arrays  Dimensions used to index array  Sometimes on top of relational DB  Products o Pilot, Arbor Essbase, Gentia
  • 42. Advantages o Excellent performance o The storage is not in the relational database, but in proprietary formats. o MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations. Disadvantages: o It is limited in the amount of data it can handle. Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. o It requires an additional investment in human and capital resources are needed. MOLAP: Multi Dimensional OLAP
  • 43.  Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP. Advantages o For summary-type information, HOLAP leverages cube technology for faster performance. o When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data. ` HOLAP: Hybrid OLAP