This document provides an overview of data warehousing and related concepts. It defines a data warehouse as a centralized database for analysis and reporting that stores current and historical data from multiple sources. The document describes key elements of data warehousing including Extract-Transform-Load (ETL) processes, multidimensional data models, online analytical processing (OLAP), and data marts. It also outlines advantages such as enhanced access and consistency, and disadvantages like time required for data extraction and loading.
2. Presented By,
Neenu C. Paul(12120051)
CS B, S7
SOE, CUSAT
Guided By,
Dr. Sudheep Elayidom
Division of Computer Science
SOE, CUSAT
3. CONTENTS
• What is a data warehouse?
• What is data warehousing?
• Database vs Data warehouse
• OLTP & OLAP
• Data warehouse architecture
• Multidimensional data model
• Data Mart
• ETL
• Advantages of data warehouse
• Disadvantages of data warehouse
• S/W Solutions of data warehouse
• Conclusion
• References
4. A producer wants to know….
Which are our
lowest/highest margin
customers ?
Who are my customers
and what products
are they buying?
What is the most
effective distribution
channel?
What product prom-
-otions have the biggest
impact on revenue? What impact will
new products/services
have on revenue
and margins?
Which customers
are most likely to go
to the competition ?
5. What is a Data Warehouse??
• A data warehouse is an appliance for storing and analyzing data, and
reporting.
• Central database that includes information from several different
sources.
• Keeps current as well as historical data.
• Used to produce reports to assist in decision-making and management.
6. “Data Warehouse is a subject
oriented, integrated, time-variant
and non-volatile
collection of data in support of
management’s decision making
process.” –W. H. Inmon
Subject
Oriented
Data
Warehouse
Integrated
Time
Variant
Non-volatile
7. What is Data Warehousing?
A process of transforming data
into information and making it
available to users in a timely
enough manner to make a
difference
Data
Information
8. Database vs Data Warehouse
Database
• Transaction Oriented
• For saving online bargain data
• E-R modeling techniques are
used for designing
• Capture data
• Constitute real time information
Data Warehouse
• Subject oriented
• For saving historical data
• Data modeling techniques are
used for designing.
• Analyze data
• Constitute entire information
base for all time.
9. Data Processing Technologies
• OLTP (on-line transaction processing)
- The major task is to perform on-line
transaction and query processing. Covers
most of the day-to-day operations of an
organization.
• OLAP(On-Line Analytical Processing)
- Serve knowledge workers(users) in the
role of data analysis and decision making.
- Organize and present data in various
formats to accommodate the diverse needs
of the different users.
Data Processing
Technologies
OLTP OLAP
10. OLTP vs OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write dozens of records Millions of record read
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
11. 11 October 31, 2014
To summarize ...
OLTP Systems are
used to “run” a business
The Data Warehouse helps
to “optimize” the business
12. Typical DW Architecture
Data Sources ETL Data Store Data Access Presentation
System A
System B
System C
System D
Extract
Transform
Load
The Data
Warehouse
Business Model
Dashboards
Prompted Views
Scorecards
Ad-Hoc Reporting
Self Serve
12
13. Multidimensional data model
• Developed for implementing data warehouse and data marts.
• Provides both a mechanism to store data and a way for business
analysis.
• An alternative to entity-relationship (E/R) model
TYPES OF MULTIDIMENSIONAL DATA MODEL
Data cube model.
Star schema model.
Snow flake schema model.
Fact Constellations.
14. Data cubes
• A data warehouse is based on a multidimensional data model which views data in
the form of a data cube.
• Three important concepts are associated with data cubes
- Slicing
- Dicing
- Rotating
•In the cube given below we have the results of the 1991 Canadian Census with
ethnic origin, age group and geography representing the dimensions of the cube,
while 174 represents the measure. The dimension is a category of data. Each
dimension includes different levels of categories. The measures are actual data
values that occupy the cells as defined by the dimensions selected.
16. Slicing the Data Cube
• Figure 2 illustrates slicing the
Ethnic origin Chinese. When the
cube is sliced like in this example,
we are able to generate data for
Chinese origin for the geography
and age groups as a result.
• The data that is contained within
the cube has effectively been
filtered in order to display the
measures associated only with the
Chinese ethnic origin.
• From an end user perspective, the
term slice most often refers to a
two- dimensional page selected
from the cube.
16
17. Dicing and Rotating
• Dicing is a related operation to slicing
in which a sub-cube of the original
space is defined
• Dicing provides the user with the
smallest available slice of data,
enabling you to examine each sub-cube
in greater detail.
• Rotating, which is sometimes called
pivoting changes the dimensional
orientation of the report or page
display from the cube data. Rotating
may consist of swapping the rows an
columns, or moving one of the row
dimensions into the column
dimension.
17
18. Data Mart
• Contains a subset of the data stored in the data warehouse that is of
interest to a specific business community, department, or set of users.
• E.g.: Marketing promotions, finance ,or account collections.
• Data marts are small slices of the data warehouse.
• Data marts improve end-user response time by allowing users to have
access to the specific type of data they need to view.
• A data mart is basically a condensed and more focused version of a
data warehouse.
19. Data warehouse vs Data mart
DATA WAREHOUSE
• Holds multiple subject areas
• Holds very detailed information
• Works to integrate all data
sources
• Does not necessarily use a
dimensional model but feeds
dimensional models
DATA MART
• Often holds only one subject area-for
example, Finance, or Sales
• May hold more summarized data
(although many hold full detail)
• Concentrates on integrating
information from a given subject
area or set of source systems
• Is built focused on a dimensional
model using a star schema
20. Reasons for creating a data mart
• Easy access to frequently needed data
• Creates collective view by a group of users
• Improves end-user response time
• Ease of creation
• Lower cost than implementing a full data warehouse
• Potential users are more clearly defined than in a full data warehouse
• Contains only business essential data and is less cluttered.
21. Advantages & Disadvantages of data warehousing
Advantages
Enhances end-user access to a wide variety of data.
Increases data consistency.
Increases productivity and decreases computing costs.
Is able to combine data from different sources, in one place.
It provides an infrastructure that could support changes to data and replication of the changed data
back into the operational systems.
Disadvantages
Extracting, cleaning and loading data could be time consuming.
Problems with compatibility with systems already in place e.g. transaction processing system.
Providing training to end-users, who end up not using the data warehouse.
Security could develop into a serious issue, especially if the data warehouse is web accessible.
22. Applications of data warehousing
Industry Application
Finance Credit card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record Analysis
Transport Logistics management
Consumer goods Promotion Analysis
23. etl
• Extract-Transform-Load
• Responsible for the operations taking place in the backstage of data
warehouse architecture.
• Extract : Get the data from source system as efficiently as possible
• Transform : Perform calculations on data
• Load : Load the data in the target storage
ADVANTAGES OF ETL TOOL
Simple, faster and cheaper
Deliver good performance even for very large data set
Allows reuse of existing complex programs
24. Popular etl tools
Tools Company
Infomix IBM
Oracle Warehouse Builder ORACLE
Microsoft SQL Server Integration Microsoft
25. IBM Infomix
• Informix is one of the world’s most widely used database servers
• High levels of performance and availability, distinctive capabilities in data replication and
scalability, and minimal administrative overhead.
HIGHLIGHTS
Real-time Analytics: Informix is a single platform that can power OLTP and OLAP workloads
and successfully meet service-level agreements (SLAs) for each
Fast, Always-on Transactions: Provides one of the industry’s widest sets of options for keeping
data available at all times, including zero downtime for maintenance
Sensor data management: Solves the big data challenge of sensor data with unmatched
performance and scalability for managing time series data
Easy to Use: Informix runs virtually unattended with self-configuring, self-managing and self-healing
capabilities
Best-of-breed embeddability: Provides a proven embedded data management platform for ISVs
and OEMs to deliver integrated, world-class solutions, enabling platform independence
NoSQL capability:
IBM Informix unleashes new capabilities, giving you a way to combine unstructured and
structured data in a smart way, bringing NoSQL to your SQL database.
26. conclusion
Data Warehousing is not a new phenomenon. All large
organizations already have data warehouses, but they are just not
managing them. Over the next few years, the growth of data
warehousing is going to be enormous with new products and
technologies coming out frequently. In order to get the most out of this
period, it is going to be important that data warehouse planners and
developers have a clear idea of what they are looking for and then
choose strategies and methods that will provide them with
performance today and flexibility for tomorrow.
27. Reference
1) Data Mining , Gupta
2) Data Warehousing , C.S.R. Prabhu
3) Jeff Lawyer and Shamsul Chowdhury “Best Practices in Data
Warehousing to Support Business Initiatiatives and Needs”, IEEE 2004
4) Ruilian Hou “Research and Analysis of Data Warehouse Technologies”,
IEEE 2011
5) S. Sai Sathyanarayana Reddy, Dr. L.S.S.Reddy, Dr.V.Khanna, A.Lavanya
“Advanced Techniques for Scientific Data Warehousing”, IEEE 2009
6) Murat Obali, Abdul Kadir Gorur, “A Real Time Data Warehouse
Approach for Data Processing”, IEEE 2013
7) Ruilian Hou “Analysis and research on the difference between data
warehouse and database”, IEEE 2011