SlideShare a Scribd company logo
1 of 24
1
(Praxis Business School)
Data Mining Assignment
A report on
Sales forecasting for Walmart
Submitted to
Prof. Suman K Mazumdar
In partial fulfillment of the requirements of the subject
(iSAS)
On (26th
September, 2015)
By
Anurag Mukherjee
2
Sales forecasting for Walmart
3
Table of Content
Sl
No Topic Page
1 Cover Page 1
2 Title Page 2
3 Executive Summary 3
4 Background 3
5 Business Problem 3
6 Data Overview 4
7 Exploratory Analysis 5
8 Examining the final features dataset : 19
9
Merging of train and features for the final data set
creation 20
10 Model Building 23
4
Executive Summary :
Walmart is the world'slargestcompanybyrevenue, according to the Fortune Global 500 list in
2014, as well as the biggestprivate employerin the world with 2.2 million employees.
Walmart is a family-owned business, as the company is controlled by the Waltonfamily. Sam
Walton's heirs own over 50 percent of Walmart through their holding company, Walton
Enterprises, and through their individual holdings. It is also one of the world'smostvaluable
companiesbymarketvalue,[10]and is also the largestgrocery retailer in the U.S. In 2009, it
generated 51 percent of its US$258 billion (equivalent to $284 billion in 2015) sales in the
U.S. from its grocery business.
We are provided with datasets containing sales per store,per department on weekly
basis.We are are about to forecast sales for Walmart to help the company in taking much
better data driven decisions for inventory planning and channel optimization.
Background:
Wal-Mart Stores,Inc.isan Americanmultinational retailcorporation thatoperatesachain
ofdiscountdepartmentstores andwarehousestores.Headquarteredin Bentonville,
Arkansas,UnitedStates,the companywasfoundedby SamWaltonin1962 and incorporated on
October31, 1969. It hasover11,000 storesin28 countries,underatotal of 65 banners.The
companyoperatesunderthe Walmartname inthe UnitedStatesandCanada.It operatesasWalmart
de Méxicoy CentroaméricainMexico,as Asdainthe UnitedKingdom, as SeiyuinJapan,andas Best
Price inIndia.It has whollyownedoperationsinArgentina,Brazil,andCanada.Italsoownsand
operatesthe Sam'sClubretail warehouses.
Business Problem:
Withhistorical salesdatafor45 Walmartstoreslocatedindifferentregions.Eachstore contains
manydepartments,andthe aimisto projectthe salesfor eachdepartmentineachstore.To add to
the challenge,selectedholidaymarkdowneventsare includedinthe dataset.These markdownsare
knownto affectsales.
Data Overview :
5
train.csv
Thisis the historical trainingdata,whichcoversto2010-02-05 to 2012-11-01. Withinthisfile youwill
findthe followingfields:
 Store - the store number
 Dept- the departmentnumber
 Date - the week
 Weekly_Sales - salesforthe givendepartmentinthe givenstore
 IsHoliday - whetherthe weekisaspecial holidayweek
features.csv
Thisfile containsadditional datarelatedtothe store,department,andregional activityforthe given
dates.It containsthe followingfields:
 Store - the store number
 Date - the week
 Temperature - average temperature inthe region
 Fuel_Price - costof fuel inthe region
 MarkDown1-5 - anonymizeddatarelatedtopromotionalmarkdownsthatWalmartisrunning.
MarkDown data isonlyavailable afterNov2011, and isnot available forall storesall the time.Any
missingvalue ismarkedwithanNA.
 CPI - the consumerprice index
 Unemployment- the unemploymentrate
 IsHoliday - whetherthe weekisaspecial holidayweek
6
Exploratory Analysis :
1.train.csv
1.1 Importing the raw dataset :
proc importout=walmart_traindatafile='/folders/myshortcuts/myfolder/train_walmart.csv'
dbms=csvreplace;
getnames=yes;
run;
1.2 Checkingthe contentsof train.csv :
proc contents data=walmart_train;
run;
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
3 Date Num 8 DDMMYY10. DDMMYY10.
2 Dept Num 8 BEST12. BEST32.
6 IsHoliday Char 5 $5. $5.
4 Month_Year Num 8 DATETIME. ANYDTDTM40.
1 Store Num 8 BEST12. BEST32.
5 Weekly_Sale
s
Num 8 BEST12. BEST32.
1.3 Checkingthe basic statistical measures
7
proc meansdata=walmart_train;
var Weekly_Sales;
run;
Analysis Variable : Weekly_Sales
N Mean Std Dev Minimum Maximum
42157
0
15981.2
6
22711.1
8
-4988.94 693099.36
Negative SalesIndicate Returns.
1.4 Plot of Weekly_SalesVsDate :
8
1.5 PlottingSales YearWise :
proc sql;
create table walmart_train_dataas
selectDate,sum(Weekly_Sales)asSales
fromwalmart_train
groupby Date;
run;
2010 Sales Report :
data Sales_2010;
setwalmart_train_data(keep=SalesDate where=(Datebetween'05Feb2010'd and '31Dec2010'd));
run;
*plotting2010 SalesbyDate;
ods graphics/ resetimagemap;
proc sgplotdata=WORK.SALES_2010;
vbarDate / response=Salesstat=Meanname='Bar';
yaxisgrid;
run;
ods graphics/ reset;
proc printdata=Sales_2010;
run;
9
Obs Date Sales
1 05/02/2010 49750740.5
0
2 12/02/2010 48336677.6
3
3 19/02/2010 48276993.7
8
4 26/02/2010 43968571.1
3
5 05/03/2010 46871470.3
0
6 12/03/2010 45925396.5
1
7 19/03/2010 44988974.6
4
8 26/03/2010 44133961.0
5
(First 8 Sales figuresfor 2010 for convenience)
0
20000000
40000000
60000000
80000000Sales(Mean)
05/02/2010
12/02/2010
19/02/2010
26/02/2010
05/03/2010
12/03/2010
19/03/2010
26/03/2010
02/04/2010
09/04/2010
16/04/2010
23/04/2010
30/04/2010
07/05/2010
14/05/2010
21/05/2010
28/05/2010
04/06/2010
11/06/2010
18/06/2010
25/06/2010
02/07/2010
09/07/2010
16/07/2010
23/07/2010
30/07/2010
06/08/2010
13/08/2010
20/08/2010
27/08/2010
03/09/2010
10/09/2010
17/09/2010
24/09/2010
01/10/2010
08/10/2010
15/10/2010
22/10/2010
29/10/2010
05/11/2010
12/11/2010
19/11/2010
26/11/2010
03/12/2010
10/12/2010
17/12/2010
24/12/2010
31/12/2010
Date
10
2011 Sales Report :
data Sales_2011;
setwalmart_train_data(keep=SalesDate where=(Datebetween'07Jan2011'd and '30Dec2011'd));
run;
*plotting2011 SalesbyDate;
ods graphics/ resetimagemap;
proc sgplotdata=WORK.SALES_2011;
vbarDate / response=Salesstat=Meanname='Bar';
yaxisgrid;
run;
ods graphics/ reset;
proc printdata=Sales_2011;
run;
0
20000000
40000000
60000000
80000000
Sales(Mean)
07/01/2011
14/01/2011
21/01/2011
28/01/2011
04/02/2011
11/02/2011
18/02/2011
25/02/2011
04/03/2011
11/03/2011
18/03/2011
25/03/2011
01/04/2011
08/04/2011
15/04/2011
22/04/2011
29/04/2011
06/05/2011
13/05/2011
20/05/2011
27/05/2011
03/06/2011
10/06/2011
17/06/2011
24/06/2011
01/07/2011
08/07/2011
15/07/2011
22/07/2011
29/07/2011
05/08/2011
12/08/2011
19/08/2011
26/08/2011
02/09/2011
09/09/2011
16/09/2011
23/09/2011
30/09/2011
07/10/2011
14/10/2011
21/10/2011
28/10/2011
04/11/2011
11/11/2011
18/11/2011
25/11/2011
02/12/2011
09/12/2011
16/12/2011
23/12/2011
30/12/2011
Date
11
Sales in tabular Form :
Obs Date Sales
1 07/01/20
11
42775787.7
7
2 14/01/20
11
40673678.0
4
3 21/01/20
11
40654648.0
3
4 28/01/20
11
39599852.9
9
5 04/02/20
11
46153111.1
2
6 11/02/20
11
47336192.7
9
7 18/02/20
11
48716164.1
2
(First8 Salesfiguresfor 2011 for convenience)
12
2012 Sales Report :
data Sales_2012;
setwalmart_train_data(keep=SalesDate where=(Datebetween'06Jan2012'd and '26Oct2012'd));
run;
*plotting2012 SalesbyDate;
ods graphics/ resetimagemap;
proc sgplotdata=WORK.SALES_2012;
vbarDate / response=Salesstat=Meanname='Bar';
yaxisgrid;
run;
ods graphics/ reset;
proc printdata=Sales_2012; run;
;
0
10000000
20000000
30000000
40000000
50000000
Sales(Mean)
06/01/2012
13/01/2012
20/01/2012
27/01/2012
03/02/2012
10/02/2012
17/02/2012
24/02/2012
02/03/2012
09/03/2012
16/03/2012
23/03/2012
30/03/2012
06/04/2012
13/04/2012
20/04/2012
27/04/2012
04/05/2012
11/05/2012
18/05/2012
25/05/2012
01/06/2012
08/06/2012
15/06/2012
22/06/2012
29/06/2012
06/07/2012
13/07/2012
20/07/2012
27/07/2012
03/08/2012
10/08/2012
17/08/2012
24/08/2012
31/08/2012
07/09/2012
14/09/2012
21/09/2012
28/09/2012
05/10/2012
12/10/2012
19/10/2012
26/10/2012
Date
13
Sales intabular form - 2012
Obs Date Sales
1 06/01/20
12
44955421.9
5
2 13/01/20
12
42023078.4
8
3 20/01/20
12
42080996.5
6
4 27/01/20
12
39834974.6
7
5 03/02/20
12
46085608.0
9
6 10/02/20
12
50009407.9
2
7 17/02/20
12
50197056.9
6
8 24/02/20
12
45771506.5
7
14
1.6.Outlier Treatment for train.csv :
The data being a time series record have some seasonalities .During the month of December
there’s a sales spike.This can be explained further by Markdowns.
Markdown 1,2,4,5 doesnt seem to be that effective as compared to Markdown 3.
15
16
As the spike in the sales would affect the entire model,the difference of excess sales has
been distributed across all the records.
data wal;
set walmart_train_data;
where Sales > 50000000;
sales_diff=Sales-46243899.58;
run;
proc sql;
create table mapper as
select sum(Sales_diff) from
wal;
run;
*total excess sales from weeks having > 50000000 = 181638262.18;
data walmart_final;
set walmart_train;
if Weekly_Sales > 50000000 then Weekly_Sales=46243899.58;
Weekly_Sales_new=Weekly_Sales+(181638262.18/421570);
run;
proc univariate data=walmart_final;
var Weekly_Sales;
run;
17
2.features.csv
2.1 Importing raw data set :
proc import out=walmart_features datafile='/folders/myshortcuts/myfolder/features.csv'
dbms=csv replace;
getnames=yes;
guessingrows=200;
run;
2.2 Checking the contents of features.csv :
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
4 CPI Char 11 $11. $11.
2 Date Num 8 YYMMDD10. YYMMDD10.
6 Fuel_Price Num 8 BEST12. BEST32.
13 IsHoliday Char 5 $5. $5.
7 MarkDown1 Char 8 $8. $8.
8 MarkDown2 Char 8 $8. $8.
9 MarkDown3 Char 8 $8. $8.
10 MarkDown4 Char 8 $8. $8.
11 MarkDown5 Char 8 $8. $8.
1 Store Num 8 BEST12. BEST32.
5 Temperature Num 8 BEST12. BEST32.
12 Unemployme
nt
Char 5 $5. $5.
14 VAR14 Char 1 $1. $1.
3 Weekly_Sales Char 8 $8. $8.
18
2.3 Checking the basic statistical measures of features.csv :
proc means data=walmart_features;
run;
2.4 OutlierTreatment :
data walmart_f;
setwalmart_features;
formatDate DDMMYY10.;
if MarkDown1="NA"or MarkDown1="#N/A" thenMarkDown1=0;
if MarkDown2="NA"or MarkDown2="#N/A" thenMarkDown2=0;
if MarkDown3="NA"or MarkDown3="#N/A" thenMarkDown3=0;
if MarkDown4="NA"or MarkDown4="#N/A" thenMarkDown4=0;
if MarkDown5="NA"or MarkDown5="#N/A" thenMarkDown5=0;
if IsHoliday="TRUE"thenIsHoliday_Yes=1;
else IsHoliday_Yes=0;
if Weekly_Sales="#N/A"thenWeekly_Sales=0;
run;
19
data walmart_features_1(keep=StoreDate Weekly_Sales_nFuel_Price IsHoliday_YesMarkDown1_n
MarkDown1_n MarkDown2_n MarkDown3_n MarkDown4_n MarkDown5_n Temperature
UnemploymentCPI) ;
setwalmart_f;
MarkDown1_n=MarkDown1*1;
MarkDown2_n=MarkDown2*1;
MarkDown3_n=MarkDown3*1;
MarkDown4_n=MarkDown4*1;
MarkDown5_n=MarkDown5*1;
Weekly_Sales_n=Weekly_Sales*1;
run;
20
Examining the final features dataset :
proc contentsdata=walmart_features_1;
run;
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
3 CPI Char 11 $11. $11.
2 Date Num 8 DDMMYY10. YYMMDD10.
5 Fuel_Price Num 8 BEST12. BEST32.
7 IsHoliday_Yes Num 8
8 MarkDown1_n Num 8
9 MarkDown2_n Num 8
10 MarkDown3_n Num 8
11 MarkDown4_n Num 8
12 MarkDown5_n Num 8
1 Store Num 8 BEST12. BEST32.
4 Temperature Num 8 BEST12. BEST32.
6 Unemployment Char 5 $5. $5.
13 Weekly_Sales_n Num 8
21
Merging of trainand features for the final data set creation:
proc sql;
create table walmart_final_1 as
select
a.*,b.CPI,b.Temperature,b.Fuel_Price,b.MarkDown1_n,b.MarkDown2_n,b.MarkDown3_n,b.
MarkDown4_n,b.MarkDown5_n,b.Unemployment,b.IsHoliday_Yes
from walmart_final as a left join walmart_features_1 as b on
a.Date=b.Date and a.Store=b.Store;
run;
data walmart_final_2 (drop=IsHoliday Month_Year Unemployment Weekly_Sales);
set walmart_final_1;
run;
22
proc contents data=walmart_final_2;
run;
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
5 CPI Char 11 $11. $11.
3 Date Num 8 DDMMYY10. DDMMYY10.
2 Dept Num 8 BEST12. BEST32.
7 Fuel_Price Num 8 BEST12. BEST32.
13 IsHoliday_Yes Num 8
8 MarkDown1_n Num 8
9 MarkDown2_n Num 8
10 MarkDown3_n Num 8
11 MarkDown4_n Num 8
12 MarkDown5_n Num 8
1 Store Num 8 BEST12. BEST32.
6 Temperature Num 8 BEST12. BEST32.
4 Weekly_Sales_new Num 8
23
Printing the final dataset after merge :
proc print data=walmart_final_2(obs=10);
run;
O
b
s
St
or
e
D
e
pt
Date Weekly_
Sales_ne
w
CPI Temp
eratur
e
Fuel
_Pric
e
MarkD
own1_
n
MarkD
own2_
n
MarkD
own3_
n
MarkD
own4_
n
MarkD
own5_
n
IsHoli
day_Y
es
1 1 4
5
05/0
2/20
10
468.30 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
2 1 5 05/0
2/20
10
32660.24 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
3 1 9 05/0
2/20
10
17361.85 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
4 1 2
9
05/0
2/20
10
7455.81 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
5 1 9
2
05/0
2/20
10
140315.8
0
211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
6 1 4
2
05/0
2/20
10
8797.57 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
7 1 8
0
05/0
2/20
10
16125.03 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
8 1 1
9
05/0
2/20
10
2377.91 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
9 1 3
2
05/0
2/20
10
12306.70 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
1
0
1 4
0
05/0
2/20
10
67211.49 211.0
96358
2
42.31 2.57
2
0 0 0 0 0 0
24
Model Building :
proc reg data=walmart_final_2;
model Weekly_Sales_new= Fuel_Price MarkDown3_n Temperature ;
run;

More Related Content

What's hot

Market Segmentation Analysis Example PowerPoint Presentation Slides
Market Segmentation Analysis Example PowerPoint Presentation SlidesMarket Segmentation Analysis Example PowerPoint Presentation Slides
Market Segmentation Analysis Example PowerPoint Presentation Slides
SlideTeam
 
Product recommendation for Santander Bank customers
Product recommendation for Santander Bank customersProduct recommendation for Santander Bank customers
Product recommendation for Santander Bank customers
Sumit Saini
 
Walmart Presentation
Walmart Presentation Walmart Presentation
Walmart Presentation
fscjstdnt
 

What's hot (20)

Myntra strategy
Myntra strategyMyntra strategy
Myntra strategy
 
Target Corporation Market Analysis
Target Corporation Market AnalysisTarget Corporation Market Analysis
Target Corporation Market Analysis
 
Amazon Case Study
Amazon Case Study Amazon Case Study
Amazon Case Study
 
7 eleven
7 eleven7 eleven
7 eleven
 
Market Segmentation Analysis Example PowerPoint Presentation Slides
Market Segmentation Analysis Example PowerPoint Presentation SlidesMarket Segmentation Analysis Example PowerPoint Presentation Slides
Market Segmentation Analysis Example PowerPoint Presentation Slides
 
Case analysis walmart case group i
Case analysis walmart case group iCase analysis walmart case group i
Case analysis walmart case group i
 
Case study on amazon.com
Case study on amazon.comCase study on amazon.com
Case study on amazon.com
 
KPMG Data Analysis Project Keynote
KPMG Data Analysis Project KeynoteKPMG Data Analysis Project Keynote
KPMG Data Analysis Project Keynote
 
Amazon vs Wal-Mart
Amazon vs Wal-MartAmazon vs Wal-Mart
Amazon vs Wal-Mart
 
Amazon SWOT Analysis 2018
Amazon SWOT Analysis 2018Amazon SWOT Analysis 2018
Amazon SWOT Analysis 2018
 
Product recommendation for Santander Bank customers
Product recommendation for Santander Bank customersProduct recommendation for Santander Bank customers
Product recommendation for Santander Bank customers
 
E-Business analysis of Chaldal.com
E-Business analysis of Chaldal.com E-Business analysis of Chaldal.com
E-Business analysis of Chaldal.com
 
Telenor Case Study
Telenor Case StudyTelenor Case Study
Telenor Case Study
 
Aldi's Expansion Strategy
Aldi's Expansion StrategyAldi's Expansion Strategy
Aldi's Expansion Strategy
 
IT infrastructure at big bazaar
IT infrastructure at big bazaarIT infrastructure at big bazaar
IT infrastructure at big bazaar
 
Presentation on Walmart
Presentation on WalmartPresentation on Walmart
Presentation on Walmart
 
Ecommerce amazon.com
Ecommerce amazon.comEcommerce amazon.com
Ecommerce amazon.com
 
Walmart Presentation
Walmart Presentation Walmart Presentation
Walmart Presentation
 
Seven Eleven Store - Case study - Answers
Seven Eleven Store - Case study - AnswersSeven Eleven Store - Case study - Answers
Seven Eleven Store - Case study - Answers
 
Case no.6 (group)
Case no.6 (group)Case no.6 (group)
Case no.6 (group)
 

Similar to Walmart sales forecast

Week 1Instructions1. The purpose of this template is to gather dat.docx
Week 1Instructions1. The purpose of this template is to gather dat.docxWeek 1Instructions1. The purpose of this template is to gather dat.docx
Week 1Instructions1. The purpose of this template is to gather dat.docx
lillie234567
 
Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1
uoni
 
Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1
uoni
 
ChapterTool KitChapter 1212912Corporate Valuation and Financial .docx
ChapterTool KitChapter 1212912Corporate Valuation and Financial .docxChapterTool KitChapter 1212912Corporate Valuation and Financial .docx
ChapterTool KitChapter 1212912Corporate Valuation and Financial .docx
mccormicknadine86
 
Make-Up in Germany
Make-Up in GermanyMake-Up in Germany
Make-Up in Germany
ReportLinker.com
 

Similar to Walmart sales forecast (20)

IRJET- Retail Chain Sales Analysis and Forecasting
IRJET-  	  Retail Chain Sales Analysis and ForecastingIRJET-  	  Retail Chain Sales Analysis and Forecasting
IRJET- Retail Chain Sales Analysis and Forecasting
 
ggg
 ggg ggg
ggg
 
Week 1Instructions1. The purpose of this template is to gather dat.docx
Week 1Instructions1. The purpose of this template is to gather dat.docxWeek 1Instructions1. The purpose of this template is to gather dat.docx
Week 1Instructions1. The purpose of this template is to gather dat.docx
 
Wal mart strategic audit-- final edit
Wal mart strategic audit-- final editWal mart strategic audit-- final edit
Wal mart strategic audit-- final edit
 
Apple & Dell - Financial Analysis 2008 - 2011
Apple & Dell - Financial Analysis 2008 - 2011Apple & Dell - Financial Analysis 2008 - 2011
Apple & Dell - Financial Analysis 2008 - 2011
 
Strategic Analysis of Wal-Mart
Strategic  Analysis of Wal-MartStrategic  Analysis of Wal-Mart
Strategic Analysis of Wal-Mart
 
Supply Chain Metrics That Matter-A Focus on the Automotive Industry-8 OCT 2013
Supply Chain Metrics That Matter-A Focus on the Automotive Industry-8 OCT 2013Supply Chain Metrics That Matter-A Focus on the Automotive Industry-8 OCT 2013
Supply Chain Metrics That Matter-A Focus on the Automotive Industry-8 OCT 2013
 
Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1
 
Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1Wal Mart & Stockholder Analysis 1
Wal Mart & Stockholder Analysis 1
 
Amazon’s Recession Strategy
Amazon’s Recession StrategyAmazon’s Recession Strategy
Amazon’s Recession Strategy
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
 
Sample Report: Digital River Company Profile 2015: Online Payment Services
Sample Report: Digital River Company Profile 2015: Online Payment ServicesSample Report: Digital River Company Profile 2015: Online Payment Services
Sample Report: Digital River Company Profile 2015: Online Payment Services
 
ChapterTool KitChapter 1212912Corporate Valuation and Financial .docx
ChapterTool KitChapter 1212912Corporate Valuation and Financial .docxChapterTool KitChapter 1212912Corporate Valuation and Financial .docx
ChapterTool KitChapter 1212912Corporate Valuation and Financial .docx
 
Walmart SWOT analysis 2017
Walmart SWOT analysis 2017Walmart SWOT analysis 2017
Walmart SWOT analysis 2017
 
Examples of Key Sales Metrics: Examples of Sales Metrics, List of Sales Effec...
Examples of Key Sales Metrics: Examples of Sales Metrics, List of Sales Effec...Examples of Key Sales Metrics: Examples of Sales Metrics, List of Sales Effec...
Examples of Key Sales Metrics: Examples of Sales Metrics, List of Sales Effec...
 
U.S. Saw Market. Analysis And Forecast to 2020
U.S. Saw Market. Analysis And Forecast to 2020U.S. Saw Market. Analysis And Forecast to 2020
U.S. Saw Market. Analysis And Forecast to 2020
 
Make-Up in Germany
Make-Up in GermanyMake-Up in Germany
Make-Up in Germany
 
U23000754 data mining final project
U23000754 data mining final projectU23000754 data mining final project
U23000754 data mining final project
 
Microsoft display team 09
Microsoft display team 09Microsoft display team 09
Microsoft display team 09
 
Supply Chain - Automation - Solutions
Supply Chain - Automation - SolutionsSupply Chain - Automation - Solutions
Supply Chain - Automation - Solutions
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Walmart sales forecast

  • 1. 1 (Praxis Business School) Data Mining Assignment A report on Sales forecasting for Walmart Submitted to Prof. Suman K Mazumdar In partial fulfillment of the requirements of the subject (iSAS) On (26th September, 2015) By Anurag Mukherjee
  • 3. 3 Table of Content Sl No Topic Page 1 Cover Page 1 2 Title Page 2 3 Executive Summary 3 4 Background 3 5 Business Problem 3 6 Data Overview 4 7 Exploratory Analysis 5 8 Examining the final features dataset : 19 9 Merging of train and features for the final data set creation 20 10 Model Building 23
  • 4. 4 Executive Summary : Walmart is the world'slargestcompanybyrevenue, according to the Fortune Global 500 list in 2014, as well as the biggestprivate employerin the world with 2.2 million employees. Walmart is a family-owned business, as the company is controlled by the Waltonfamily. Sam Walton's heirs own over 50 percent of Walmart through their holding company, Walton Enterprises, and through their individual holdings. It is also one of the world'smostvaluable companiesbymarketvalue,[10]and is also the largestgrocery retailer in the U.S. In 2009, it generated 51 percent of its US$258 billion (equivalent to $284 billion in 2015) sales in the U.S. from its grocery business. We are provided with datasets containing sales per store,per department on weekly basis.We are are about to forecast sales for Walmart to help the company in taking much better data driven decisions for inventory planning and channel optimization. Background: Wal-Mart Stores,Inc.isan Americanmultinational retailcorporation thatoperatesachain ofdiscountdepartmentstores andwarehousestores.Headquarteredin Bentonville, Arkansas,UnitedStates,the companywasfoundedby SamWaltonin1962 and incorporated on October31, 1969. It hasover11,000 storesin28 countries,underatotal of 65 banners.The companyoperatesunderthe Walmartname inthe UnitedStatesandCanada.It operatesasWalmart de Méxicoy CentroaméricainMexico,as Asdainthe UnitedKingdom, as SeiyuinJapan,andas Best Price inIndia.It has whollyownedoperationsinArgentina,Brazil,andCanada.Italsoownsand operatesthe Sam'sClubretail warehouses. Business Problem: Withhistorical salesdatafor45 Walmartstoreslocatedindifferentregions.Eachstore contains manydepartments,andthe aimisto projectthe salesfor eachdepartmentineachstore.To add to the challenge,selectedholidaymarkdowneventsare includedinthe dataset.These markdownsare knownto affectsales. Data Overview :
  • 5. 5 train.csv Thisis the historical trainingdata,whichcoversto2010-02-05 to 2012-11-01. Withinthisfile youwill findthe followingfields:  Store - the store number  Dept- the departmentnumber  Date - the week  Weekly_Sales - salesforthe givendepartmentinthe givenstore  IsHoliday - whetherthe weekisaspecial holidayweek features.csv Thisfile containsadditional datarelatedtothe store,department,andregional activityforthe given dates.It containsthe followingfields:  Store - the store number  Date - the week  Temperature - average temperature inthe region  Fuel_Price - costof fuel inthe region  MarkDown1-5 - anonymizeddatarelatedtopromotionalmarkdownsthatWalmartisrunning. MarkDown data isonlyavailable afterNov2011, and isnot available forall storesall the time.Any missingvalue ismarkedwithanNA.  CPI - the consumerprice index  Unemployment- the unemploymentrate  IsHoliday - whetherthe weekisaspecial holidayweek
  • 6. 6 Exploratory Analysis : 1.train.csv 1.1 Importing the raw dataset : proc importout=walmart_traindatafile='/folders/myshortcuts/myfolder/train_walmart.csv' dbms=csvreplace; getnames=yes; run; 1.2 Checkingthe contentsof train.csv : proc contents data=walmart_train; run; Alphabetic List of Variables and Attributes # Variable Type Len Format Informat 3 Date Num 8 DDMMYY10. DDMMYY10. 2 Dept Num 8 BEST12. BEST32. 6 IsHoliday Char 5 $5. $5. 4 Month_Year Num 8 DATETIME. ANYDTDTM40. 1 Store Num 8 BEST12. BEST32. 5 Weekly_Sale s Num 8 BEST12. BEST32. 1.3 Checkingthe basic statistical measures
  • 7. 7 proc meansdata=walmart_train; var Weekly_Sales; run; Analysis Variable : Weekly_Sales N Mean Std Dev Minimum Maximum 42157 0 15981.2 6 22711.1 8 -4988.94 693099.36 Negative SalesIndicate Returns. 1.4 Plot of Weekly_SalesVsDate :
  • 8. 8 1.5 PlottingSales YearWise : proc sql; create table walmart_train_dataas selectDate,sum(Weekly_Sales)asSales fromwalmart_train groupby Date; run; 2010 Sales Report : data Sales_2010; setwalmart_train_data(keep=SalesDate where=(Datebetween'05Feb2010'd and '31Dec2010'd)); run; *plotting2010 SalesbyDate; ods graphics/ resetimagemap; proc sgplotdata=WORK.SALES_2010; vbarDate / response=Salesstat=Meanname='Bar'; yaxisgrid; run; ods graphics/ reset; proc printdata=Sales_2010; run;
  • 9. 9 Obs Date Sales 1 05/02/2010 49750740.5 0 2 12/02/2010 48336677.6 3 3 19/02/2010 48276993.7 8 4 26/02/2010 43968571.1 3 5 05/03/2010 46871470.3 0 6 12/03/2010 45925396.5 1 7 19/03/2010 44988974.6 4 8 26/03/2010 44133961.0 5 (First 8 Sales figuresfor 2010 for convenience) 0 20000000 40000000 60000000 80000000Sales(Mean) 05/02/2010 12/02/2010 19/02/2010 26/02/2010 05/03/2010 12/03/2010 19/03/2010 26/03/2010 02/04/2010 09/04/2010 16/04/2010 23/04/2010 30/04/2010 07/05/2010 14/05/2010 21/05/2010 28/05/2010 04/06/2010 11/06/2010 18/06/2010 25/06/2010 02/07/2010 09/07/2010 16/07/2010 23/07/2010 30/07/2010 06/08/2010 13/08/2010 20/08/2010 27/08/2010 03/09/2010 10/09/2010 17/09/2010 24/09/2010 01/10/2010 08/10/2010 15/10/2010 22/10/2010 29/10/2010 05/11/2010 12/11/2010 19/11/2010 26/11/2010 03/12/2010 10/12/2010 17/12/2010 24/12/2010 31/12/2010 Date
  • 10. 10 2011 Sales Report : data Sales_2011; setwalmart_train_data(keep=SalesDate where=(Datebetween'07Jan2011'd and '30Dec2011'd)); run; *plotting2011 SalesbyDate; ods graphics/ resetimagemap; proc sgplotdata=WORK.SALES_2011; vbarDate / response=Salesstat=Meanname='Bar'; yaxisgrid; run; ods graphics/ reset; proc printdata=Sales_2011; run; 0 20000000 40000000 60000000 80000000 Sales(Mean) 07/01/2011 14/01/2011 21/01/2011 28/01/2011 04/02/2011 11/02/2011 18/02/2011 25/02/2011 04/03/2011 11/03/2011 18/03/2011 25/03/2011 01/04/2011 08/04/2011 15/04/2011 22/04/2011 29/04/2011 06/05/2011 13/05/2011 20/05/2011 27/05/2011 03/06/2011 10/06/2011 17/06/2011 24/06/2011 01/07/2011 08/07/2011 15/07/2011 22/07/2011 29/07/2011 05/08/2011 12/08/2011 19/08/2011 26/08/2011 02/09/2011 09/09/2011 16/09/2011 23/09/2011 30/09/2011 07/10/2011 14/10/2011 21/10/2011 28/10/2011 04/11/2011 11/11/2011 18/11/2011 25/11/2011 02/12/2011 09/12/2011 16/12/2011 23/12/2011 30/12/2011 Date
  • 11. 11 Sales in tabular Form : Obs Date Sales 1 07/01/20 11 42775787.7 7 2 14/01/20 11 40673678.0 4 3 21/01/20 11 40654648.0 3 4 28/01/20 11 39599852.9 9 5 04/02/20 11 46153111.1 2 6 11/02/20 11 47336192.7 9 7 18/02/20 11 48716164.1 2 (First8 Salesfiguresfor 2011 for convenience)
  • 12. 12 2012 Sales Report : data Sales_2012; setwalmart_train_data(keep=SalesDate where=(Datebetween'06Jan2012'd and '26Oct2012'd)); run; *plotting2012 SalesbyDate; ods graphics/ resetimagemap; proc sgplotdata=WORK.SALES_2012; vbarDate / response=Salesstat=Meanname='Bar'; yaxisgrid; run; ods graphics/ reset; proc printdata=Sales_2012; run; ; 0 10000000 20000000 30000000 40000000 50000000 Sales(Mean) 06/01/2012 13/01/2012 20/01/2012 27/01/2012 03/02/2012 10/02/2012 17/02/2012 24/02/2012 02/03/2012 09/03/2012 16/03/2012 23/03/2012 30/03/2012 06/04/2012 13/04/2012 20/04/2012 27/04/2012 04/05/2012 11/05/2012 18/05/2012 25/05/2012 01/06/2012 08/06/2012 15/06/2012 22/06/2012 29/06/2012 06/07/2012 13/07/2012 20/07/2012 27/07/2012 03/08/2012 10/08/2012 17/08/2012 24/08/2012 31/08/2012 07/09/2012 14/09/2012 21/09/2012 28/09/2012 05/10/2012 12/10/2012 19/10/2012 26/10/2012 Date
  • 13. 13 Sales intabular form - 2012 Obs Date Sales 1 06/01/20 12 44955421.9 5 2 13/01/20 12 42023078.4 8 3 20/01/20 12 42080996.5 6 4 27/01/20 12 39834974.6 7 5 03/02/20 12 46085608.0 9 6 10/02/20 12 50009407.9 2 7 17/02/20 12 50197056.9 6 8 24/02/20 12 45771506.5 7
  • 14. 14 1.6.Outlier Treatment for train.csv : The data being a time series record have some seasonalities .During the month of December there’s a sales spike.This can be explained further by Markdowns. Markdown 1,2,4,5 doesnt seem to be that effective as compared to Markdown 3.
  • 15. 15
  • 16. 16 As the spike in the sales would affect the entire model,the difference of excess sales has been distributed across all the records. data wal; set walmart_train_data; where Sales > 50000000; sales_diff=Sales-46243899.58; run; proc sql; create table mapper as select sum(Sales_diff) from wal; run; *total excess sales from weeks having > 50000000 = 181638262.18; data walmart_final; set walmart_train; if Weekly_Sales > 50000000 then Weekly_Sales=46243899.58; Weekly_Sales_new=Weekly_Sales+(181638262.18/421570); run; proc univariate data=walmart_final; var Weekly_Sales; run;
  • 17. 17 2.features.csv 2.1 Importing raw data set : proc import out=walmart_features datafile='/folders/myshortcuts/myfolder/features.csv' dbms=csv replace; getnames=yes; guessingrows=200; run; 2.2 Checking the contents of features.csv : Alphabetic List of Variables and Attributes # Variable Type Len Format Informat 4 CPI Char 11 $11. $11. 2 Date Num 8 YYMMDD10. YYMMDD10. 6 Fuel_Price Num 8 BEST12. BEST32. 13 IsHoliday Char 5 $5. $5. 7 MarkDown1 Char 8 $8. $8. 8 MarkDown2 Char 8 $8. $8. 9 MarkDown3 Char 8 $8. $8. 10 MarkDown4 Char 8 $8. $8. 11 MarkDown5 Char 8 $8. $8. 1 Store Num 8 BEST12. BEST32. 5 Temperature Num 8 BEST12. BEST32. 12 Unemployme nt Char 5 $5. $5. 14 VAR14 Char 1 $1. $1. 3 Weekly_Sales Char 8 $8. $8.
  • 18. 18 2.3 Checking the basic statistical measures of features.csv : proc means data=walmart_features; run; 2.4 OutlierTreatment : data walmart_f; setwalmart_features; formatDate DDMMYY10.; if MarkDown1="NA"or MarkDown1="#N/A" thenMarkDown1=0; if MarkDown2="NA"or MarkDown2="#N/A" thenMarkDown2=0; if MarkDown3="NA"or MarkDown3="#N/A" thenMarkDown3=0; if MarkDown4="NA"or MarkDown4="#N/A" thenMarkDown4=0; if MarkDown5="NA"or MarkDown5="#N/A" thenMarkDown5=0; if IsHoliday="TRUE"thenIsHoliday_Yes=1; else IsHoliday_Yes=0; if Weekly_Sales="#N/A"thenWeekly_Sales=0; run;
  • 19. 19 data walmart_features_1(keep=StoreDate Weekly_Sales_nFuel_Price IsHoliday_YesMarkDown1_n MarkDown1_n MarkDown2_n MarkDown3_n MarkDown4_n MarkDown5_n Temperature UnemploymentCPI) ; setwalmart_f; MarkDown1_n=MarkDown1*1; MarkDown2_n=MarkDown2*1; MarkDown3_n=MarkDown3*1; MarkDown4_n=MarkDown4*1; MarkDown5_n=MarkDown5*1; Weekly_Sales_n=Weekly_Sales*1; run;
  • 20. 20 Examining the final features dataset : proc contentsdata=walmart_features_1; run; Alphabetic List of Variables and Attributes # Variable Type Len Format Informat 3 CPI Char 11 $11. $11. 2 Date Num 8 DDMMYY10. YYMMDD10. 5 Fuel_Price Num 8 BEST12. BEST32. 7 IsHoliday_Yes Num 8 8 MarkDown1_n Num 8 9 MarkDown2_n Num 8 10 MarkDown3_n Num 8 11 MarkDown4_n Num 8 12 MarkDown5_n Num 8 1 Store Num 8 BEST12. BEST32. 4 Temperature Num 8 BEST12. BEST32. 6 Unemployment Char 5 $5. $5. 13 Weekly_Sales_n Num 8
  • 21. 21 Merging of trainand features for the final data set creation: proc sql; create table walmart_final_1 as select a.*,b.CPI,b.Temperature,b.Fuel_Price,b.MarkDown1_n,b.MarkDown2_n,b.MarkDown3_n,b. MarkDown4_n,b.MarkDown5_n,b.Unemployment,b.IsHoliday_Yes from walmart_final as a left join walmart_features_1 as b on a.Date=b.Date and a.Store=b.Store; run; data walmart_final_2 (drop=IsHoliday Month_Year Unemployment Weekly_Sales); set walmart_final_1; run;
  • 22. 22 proc contents data=walmart_final_2; run; Alphabetic List of Variables and Attributes # Variable Type Len Format Informat 5 CPI Char 11 $11. $11. 3 Date Num 8 DDMMYY10. DDMMYY10. 2 Dept Num 8 BEST12. BEST32. 7 Fuel_Price Num 8 BEST12. BEST32. 13 IsHoliday_Yes Num 8 8 MarkDown1_n Num 8 9 MarkDown2_n Num 8 10 MarkDown3_n Num 8 11 MarkDown4_n Num 8 12 MarkDown5_n Num 8 1 Store Num 8 BEST12. BEST32. 6 Temperature Num 8 BEST12. BEST32. 4 Weekly_Sales_new Num 8
  • 23. 23 Printing the final dataset after merge : proc print data=walmart_final_2(obs=10); run; O b s St or e D e pt Date Weekly_ Sales_ne w CPI Temp eratur e Fuel _Pric e MarkD own1_ n MarkD own2_ n MarkD own3_ n MarkD own4_ n MarkD own5_ n IsHoli day_Y es 1 1 4 5 05/0 2/20 10 468.30 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 2 1 5 05/0 2/20 10 32660.24 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 3 1 9 05/0 2/20 10 17361.85 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 4 1 2 9 05/0 2/20 10 7455.81 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 5 1 9 2 05/0 2/20 10 140315.8 0 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 6 1 4 2 05/0 2/20 10 8797.57 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 7 1 8 0 05/0 2/20 10 16125.03 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 8 1 1 9 05/0 2/20 10 2377.91 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 9 1 3 2 05/0 2/20 10 12306.70 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0 1 0 1 4 0 05/0 2/20 10 67211.49 211.0 96358 2 42.31 2.57 2 0 0 0 0 0 0
  • 24. 24 Model Building : proc reg data=walmart_final_2; model Weekly_Sales_new= Fuel_Price MarkDown3_n Temperature ; run;