1. Dr. Abdul Basit Siddiqui
Assistant Professor
FURC
(Lecture Slides Week # 2)
2. Why a Data Warehouse (DWH)?
Data recording and storage is growing:
Almost every industry has huge amount of operational data.
Careful use/analysis of historic information may result in
excellent prediction for the future:
Knowledge worker wants to turn available data into useful
information.
This information is used by them to support strategic decision
making.
Gives total view of the organization:
It is a platform for consolidated historical data for analysis.
It stores data of good quality so that knowledge worker can make
correct decisions.
Intelligent decision-support is required for decision-
making.
Data Warehouse & Mining- Spring 201404/19/15 2
3. Why a Data Warehouse? (Contd.)
From business perspective:
It is latest marketing weapon.
Helps to keep customers by learning more about
their needs.
Valuable tool in today’s competitive fast evolving
world.
Data Warehouse & Mining- Spring 201404/19/15 3
4. Reason-I: Why a Data Warehouse (DWH)?
Data sets are growing:
How Much Data is that?
1 MB 220
or 106
bytes Small novel 3½ Disk.
1 GB 230
or 109
bytes
Paper reams that could fill the back of a
pickup van.
1 TB 240
or 1012
bytes
50,000 trees chopped and converted into
paper and printed.
2 PB 1 PB = 250
or 1015
bytes Academic research libraries across USA.
5 EB 1 EB = 260
or 1018
bytes
All words ever spoken by the Human
Beings.
Data Warehouse & Mining- Spring 201404/19/15 4
5. Reason-I: Why a Data Warehouse (DWH)?
Size of Data Sets are going up.
Cost of Data Storage is coming down.
The amount of data average business collects and stores is
doubling every year.
Total hardware and software cost to store and manage 1 MB of
data:
1990: $ 15
2002: ¢ 15 (down 100 times)
2010: < ¢ 1 (down 150 times)
A few examples:
Wall Mart: 24+ TB
Finance Telecom: 100+ TB
CERN: Upto 20 PB by 2006
Stanford Linear Accelerator Center (SLAC): 500 TB
Telenor, Ufone, Mobilink, Warid, Zong ???
Data Warehouse & Mining- Spring 201404/19/15 5
6. Caution!
A Warehouse of Data
is NOT a
Data Warehouse.
Data Warehouse & Mining- Spring 201404/19/15 6
8. Reason-2: Why a Data Warehouse (DWH)?
DBMS Approach
List of all items that were sold last
month?
List of all makeup items
purchased by Sassi?
The total sales of the last month
grouped by branch?
How many sales transactions
occurred during the month of
January?
Intelligent Enterprise
Which items sell together? Which
items to stock?
Where and how to place the
items? What discounts to offer?
How best to target customers to
increase sales at a branch?
Which customers are most likely
to respond to my next
promotional campaign, and why?
Data Warehouse & Mining- Spring 2014
Businesses demand Intelligence (BI).
Complex questions from integrated data.
“Intelligent Enterprise”
04/19/15 8
9. Reason-3: Why a Data Warehouse (DWH)?
Businesses want much more …
What happened?
Why it happened?
What will happen?
What is happening?
What do you want to happen?
Data Warehouse & Mining- Spring 201404/19/15 9
10. What is a Data Warehouse?
A complete repository of historical
corporate data extracted from
transaction systems that is
available for ad-hoc access by
knowledge workers.
Data Warehouse & Mining- Spring 201404/19/15 10
11. What is a Data Warehouse?
Transaction System:
Management Information System (MIS)
Could be typed sheets (NOT transaction system)
Ad-Hoc Access:
Does not have a certain access pattern
Queries not known in advance
Difficult to write SQL in advance
Knowledge Workers:
Typically NOT IT literate (Executives, Analysts, Managers)
NOT clerical workers
Decision makers
Data Warehouse & Mining- Spring 201404/19/15 11
12. What is a Data Warehouse?
Inmons’s Definition:
A Data Warehouse is:
Subject-oriented
Integrated
Time-variant
Nonvolatile
Collection of data in support of management’s
decision making process.
Data Warehouse & Mining- Spring 201404/19/15 12
13. Another View of a DWH
Data Warehouse & Mining- Spring 2014
Subject
Oriented
Integrated
Time Variant
Non Volatile
04/19/15 13
14. Subject-oriented
Data Warehouse is organized around subjects such as sales,
product, customer.
It focuses on modeling and analysis of data for decision makers.
Excludes data not useful in decision support process.
Data Warehouse & Mining- Spring 201404/19/15 14
15. Integration
Data Warehouse is constructed by integrating multiple
heterogeneous sources.
Data Preprocessing are applied to ensure consistency.
Data Warehouse & Mining- Spring 2014
RDBMS
Legacy
System
Data
Warehouse
Flat File Data Processing
Data Transformation
04/19/15 15
16. Time-variant
Provides information from historical perspective e.g.
past 5-10 years.
Every key structure contains either implicitly or
explicitly an element of time.
Data Warehouse & Mining- Spring 201404/19/15 16
17. Nonvolatile
Data once recorded cannot be updated.
Data Warehouse requires two operations in data
accessing
Initial loading of data
Access of data
Data Warehouse & Mining- Spring 2014
load
access
04/19/15 17
18. Summary: What is a Data Warehouse?
It is a blend of many technologies, the basic
concept being:
Take all data from different operational systems
If necessary, add relevant data from industry
Transform all data and bring into a uniform format
Integrate all data as a single entity
Store data in a format supporting easy access for
decision support
Create performance enhancing indices
Implement performance enhancement joins
Run ad-hoc queries with slow selectivity
Data Warehouse & Mining- Spring 201404/19/15 18
19. Benefits of Data Warehouse
High returns on investment.
Substantial competitive advantage.
Increased productivity of corporate decision-makers.
Fast reporting for decision making process.
Reduced reporting load on transactional systems.
Making institutional data more user-friendly and
accessible for knowledge workers.
Integrated data from different source systems.
Enabled ‘point-in-time’ analysis and trending over time.
Helps in identifying and resolving data integrity issues,
either in the warehouse itself or in the source systems
that collect the data.
Data Warehouse & Mining- Spring 201404/19/15 19
20. Data Warehouse: How is it Different?
1. Decision making is Ad-Hoc
Data Warehouse & Mining- Spring 201404/19/15 20
21. Data Warehouse: How is it Different?
2. Different patterns of hardware utilization
Data Warehouse & Mining- Spring 2014
Bus Service vs. Train
04/19/15 21
22. Data Warehouse: How is it Different?
3. Combines operational and historic data
Don’t do data entry into a DWH. OLTP or ERP are the
source systems.
OLTP systems don’t keep history, cannot get balance
statement more than a year old.
DWH keep historical data, even of bygone customers.
Why?
In the context of bank, want to know why the customer
left?
What are the events that led to his/her leaving? Why?
Customer retention
Data Warehouse & Mining- Spring 201404/19/15 22
23. Data Warehouse: How is it Different?
How much history?
Depends on:
Industry
Cost of storing historical data
Economic value of historical data
Industry and history
Telecom calls are much much more as compared to bank
transactions
18 months
Retailers interested in analyzing yearly seasonal patterns
65 weeks, why?
Insurance companies want to do actuary analysis, use the
historical data in order to predict risk
7 years
Hence NOT a complete repository of data.
Data Warehouse & Mining- Spring 201404/19/15 23
24. Data Warehouse: How is it Different?
How much history?
Economic value of data vs. storage cost
Data Warehouse a complete repository of data?
Data Warehouse & Mining- Spring 201404/19/15 24