This document describes a data warehouse and business intelligence project for analyzing Starbucks store data. It discusses extracting data from various structured, semi-structured, and unstructured sources, transforming the data using SQL and R, and loading it into a star schema data warehouse with fact and dimension tables. The data warehouse is then used for business queries and analysis in Tableau, with case studies examining city revenue, visitor and beverage sales by city, and city ratings based on food and beverage counts. The analysis finds that New York City generally has the highest revenue, visitor counts, and ratings.
1. Data warehouse and business intelligent
project for the analysis of
Starbucks
Student Name: Sonali Gupta
Student ID: x01527245
Course: Msc. Data Analytics
2. Table of Contents
INTRODUCTION ...................................................................................................................... 3
DATA SOURCES................................................................................................................... 3
TECHNOLOGY USED ..........................................................................................................4
DATA WAREHOUSE DESIGN AND ARCHIETECTURE ....................................................................4
Design of Data Warehouse ......................................................................................................6
Business Query........................................................................................................................ 9
Case Study 1............................................................................................................................. 9
Case Study2:....................................................................................................................... 10
Case Study 3:...................................................................................................................... 10
Conclusion:............................................................................................................................ 11
3. INTRODUCTION
Grabbing a cup of coffee in the morning is always delightful as it provides a punch to energize
our day, and when coffee comes with sense of ownership and lot of offer only names comes in
my mind is Starbucks. what makes me feel more delightful is having a cup of coffee at
Starbucks and trying every new variety of coffee with different beverages. I am big lover of
coffee and when it comes to buy one, I am always looking for Starbucks and It used to give me
feeling of joy, their way of presenting different variety coffee which is chosen around the globe
and the service they provide is applaudable.
Afterward I used to wonder how Starbucks manages its inventory and how handles their
business. This curiosity made me chose the Starbucks as the topic of my data warehouse
project. This project is working model of data warehousing for Starbucks and shows its
business intelligence capabilities.
Information related to Starbucks:
It is an American coffee company and was started Seattle, Washington in 1971. At present
CEO of Starbucks is Kevin Johnson and approximately 23,768 locations in global. This is
knowledgeable Starbucks is the third largest fast food restaurant chain.
DATA SOURCES
1. Structured data
I found this data set by Kaggle website. This data contains all the details of Starbucks
worldwide location. The columns of metadata are Brand, Store Number, Store Name,
Ownership Type, Street address, City, State/Province, Country, Postcode, Phone Number,
Timezone, Longitude and Latitude. From this data I took the dataset of united states which I
used in project.
Link of the source – https://www.kaggle.com/starbucks/store-locations/data.
2. Semi - Structured data
I generate this data set using Mockaroo API. This data set all about the Starbucks sales
report and column of this data sets are year, month, revenue details, number of visitors,
food sales quality and Beverages sales quality.
Link of the source - http://my.api.mockaroo.com/
3. Unstructured data
This data means that does not have relational table. Data that have high text related data
That can be date, points, rating and comments. I generate this data set using the API of
Yelp.com. Yelp is basically used for to publish review rating of any local business (Restaurants,
Hotel). This data set all about the review rating of Starbucks store.
Link of the source - https://www.yelp.com/developers/documentation/v3/business_search
4. TECHNOLOGY USED
Different types of technology used in this project which shown below:
Database Management
• SQL Server Management Studio (SSMS)
• SQL Server Integration Services (SSIS)
Programming Language
• R is used for Twitter sentimental analysis and cleaning.
• SQL for dimensions table and fact table for every component.
Additional Software
• Tableau for creating graphs.
DATA WAREHOUSE DESIGN AND ARCHIETECTURE
Use of Data warehouse:
1. Data integrate from various sources in real time which is good for the business decision
so that in future user can access data and also time saving.
2. We have historic data, can integrate at one place with common keys, common formats
and common data model.
3. Improve the quality of the data and reports generate faster.
4. Business intelligence create. For ex: SSAS cubes
When we talk about designing and storage part for data warehouse as business intelligence
purpose. At that time, two methodology use that is Inmon and Kimball both approach have
their own advantage.
Kimball’s methodology uses as dimensional design approach and also known as the bottom-
up design. In this first create data marts reports then integrate and create data warehouse. So,
using this star schema and snow flake easy to create. This methodology gives business value
in short span of time. This is the reason I was decided to choose this approach.
Inmon’s methodology use in enterprise data warehouse. This approach also known as the top
down design. First create the normalised data model, then build the data marts and data required
for specifically business process.so this approach take lots of time and more ETL work
required.
For the analysis of Starbucks store in different area of USA like how much is the revenue
generate, number of visitors, maximum and minimum sales of food and Beverages in which
month and year. So here, Kimball’s approach is used to build this Data Warehouse.
These are the four steps for design of dimensional data model.
1. Select the business process.
2. Declare the Grain
3. Identify the dimensions
4. Identify the fact.
5. I have considered that how my data warehouse Starbucks look like and what be its performance
matrix on a high level before deciding my dimensions and facts. In this project are Starbucks
on atomic level after that I have selected 3 dimensions as per the need of filtering and grouping
the fact.
Fig.1
Star Schema:
Star Schema is the simplest form of data warehouse schema because diagram resembles as a
star. Star schema consist of facts table and dimensions table where as fact table is in centre
and dimension tables are joined with fact table. In this data is systematized in to facts and
dimensions.
Fact Table:
Fact table is the combination of Foreign key column and Measures column whereas foreign
key column behaves as primary key in dimension table and measures columns contain data
that is being analysed.
In Starbucks of data warehouse, fact table contains Store details, date, location, sale report
and yelp rating data. these all details helping to analysing Business query.
Loading in to
staging area Starbucks
DW
DW
Cube
Reports in
BI
1.Data source
Kaggle
(Structured
2.Mocakro API
4.YelpAPI
6. Dimension Table:
In data warehouse, dimension table used for define dimension, keys, attributes and values.
Every dimension table have own primary key which is unique table. It contains details of
each object data. Star schema of dimension and fact table is shown in below figure.
Fig.2
Benefits of Star Schema
If Star Schema is fine designed then it is easy to understand and analyse large data sets. Main
benefits are described below:
• ETL process is easy to create
• Complexity is very low because table has direct relationship
• Every dimension directly connected to fact table.
• Query Performance
• Load Performance and Administration
• Built in Referential Integrity
• Efficient Navigation through Data
Designof Data Warehouse
In this Starbucks Data Warehouse three dimensions and one fact table have created.
DimStoreDetails: Store details dimensions consist of Store_id, Store_name, Store_number,
Ownership and Yelp rating. Store_id is the primary key in this dimension.
Dimlocation: location details consist of Location_id, Latitude, Longitude, City, Country,
Postcode and Address. Location_id is the primary key in this dimension.
7. DimDate: Date dimensions consist of Date_id, Year and Month. Date_id is the primary key
in this dimension.
Facttable: For created the Starbucks data warehouse create one fact table which connected
with all dimension table with foreign key relationship. In these four columns for this
measurement.
1. Visitor_count – It contains the number of visitors
2. Revenue – It contain store details revenue.
3. Beverage_count –
4. Food_count-
Fig 3
Extract Transform Load (ETL) Process The main task of any data warehouse is to
rearrange, integrate and consolidate data over many systems. Basically, ETL means extract
data from different sources and then transformed in to staging stage and then load in to
destination stage. This is called ETL process. For ETL process SSIS tool is used. The first
step is extract data in to staging database then next transformation stage and last stage is
Loading stage where data is loaded in fact table. In end ETL process data is populated in fact
table along with dimensions table.
8. Extraction: I have extract the data from three different sources. First data set directly load in
to flat file and other two data files extract using API and storing in to csv format. Extraction
load the data in to staging stage connect with the OLEDB dimensions.
In this stage, yelp and mocaroo are the unstructured data set so using scrapping and R
language with help of API data generated.
In truncate means no multiple data set generated.
Transformation: After extracting, data is extracted and then transformed. I am used lookup,
join and SQL query for loading the data in dimension table.
These are the three dimensions:
1. dbo.DimDate
2. dbo.Dimlocation
3. dbo.DimStoreDetails
Loading: After populate the dimensions, then another step is populate fact table where fact
table includes all the primary key of the dimensions and lookup is used for populate the
dimensions table and measure in fact table.
Deploying the cube: With the help of SSAS which is basically used for analyse the data on
the basis of measure. Which is used in fact table and the textual form in dimension table.
When the cube deployed that means. We can apply the Business query in database
External Source
StagingDatabase
Dimensiontable
Fact table
9. Business Query
Case Study 1:
Whichcity has maximumrevenue?
ThisQuerycontainsthe store_sales_reportandStore_details.Sobelow graph represents the how
much revenue isgeneratedwithcity.
Analysis:
From thisbar chart representation, we caneasilyanalysethe maximumrevenue isgeneratedinNew
York city,thenChicago.
10. Case Study2:
Whichcity has maximumnumberof visitorsandBeveragessales?
This query contains Store_details and store_name. so below bubble graph represent the city
with grouping with visitors and Beverages.
Analysis:
From this bubble chart representation, we can easily analyse the city
Case Study 3:
Whichcity has highratinginthe basisof foodcount andBeverage count?
Thisdata set contain yelp_Rating,store_sales_report.Sobelow bargraphrepresentsthe scenarioof
thissituation:
11. Analysis:
Afteranalyze,clearlyseenthatNewYorkhashighrating on the basisof foodcount and beverages
count.
Conclusion:
Data warehouse easytohandle,analyze large amountof data.Usingthe data warehouse,we can
easilyfindthe inwhichmonthStarbuckssale highorlow,whichcityhas maximumrevenue,rating
and manymore.At final decide thatNew Yorkalwaysgetgoodratingand alwaysmaintainhigh
revenue.