1. Dashboard 1
2
1
Big Data Science
Online Retail Store Analysis
Submitted to : Dr Jongwook Woo
24th Annual Student Symposium,CSULA
Submitted By : Rajeev Singh , Manvi Chandra
Richa Kankarej
California State University, Los Angeles
20. Introduction…
Industry : Online Retail (Canadian Company)
In this foundational white paper, we used
Microsoft Azure, Hadoop, Hive, Spark and Apriori
Algorithm to model and analyze bid data for
GroupX
Requirements : Analyzed 3 years of historical
data for peak sales and high net revenue
generating customers
Submissions
20
21. Go Big OR Go Home….
21
Retail Industry – It (Online & Offline) is huge of approx.
8 Trillion USD
Extrapolation – From weather patterns, search/browsing
trends, social networks, industry forecasts, existing
customer records
Predict – Instore sales, predict product trends, forecast
demand, pinpoint customer, optimize pricing and
promotions
Leverage to Retailers – Fewer sotckouts, higher visit to
buy ratio, better anticipation and response to market
shifts.
Retail Oligopoly – A market structure where only few
firms dominate.
Go Big …
28. 28
Query In Hive In Spark
Sales in Province 56.01 sec 16.2 sec
Holiday Trend 4.21 sec 2.4 sec
Product SubCatg and Sum(Sales) 62.8 sec 8.7 sec
Customer Segment and Sum(Sales) 74.14 sec 5.8 sec
Time comparison of Hive and Spark Query
29. Apriori Algorithm / Predictive Analytics
For frequent item set mining over transactional databases
To determine association rules which highlight general trends in
the database
This has applications in domains such as market basket analysis
SAP Predictive Analytics
29
33. Conclusion/Learnings
• No co-relation in Holiday season and an increase in sales for GroupX
• Spark is faster than Hive
• What steps can GroupX take to increase its YoY(Year on Year)
revenues
33
34. References
References
What is Hive?
http://www-01.ibm.com/software/data/infosphere/hadoop/hive/.
Introduction to Hadoop in HDInsight: Big-data analysis and processing in the
cloud. https://azure.microsoft.com/en-us/documentation/articles/hdinsight-
hadoop-introductin
34