2. Summary
● Introduction
● Clickstream Source Data
● Clickstream Data Challenges
● Clickstream Dimensional Models
● Clickstream Session Fact Table
● Clickstream Page Event Fact Table
● Google Analytics
● Integrating Clickstream into Web Retailer’s Bus Matrix
● Profitability Across Channels Including Web
6. What it is and how to identify the Clickstream?
The Clickstream is, by definition, every page event recorded
by each company’s web server. By page event it means user
clicks anywhere in the webpage. This clicks are kept in
Clickstream Data records.
The Clickstream contains a number of 4 new dimensions,
which are not found in other data sources: Page, Event,
Session and Referral.
Opodo - spanish site to book cheap flights, hotels and
package holidays.
7. Clickstream Source Data
● The Clickstream is an evolving collection of data sources,
● Clickstream is captured in different server log files formats and, also, by different physical servers,
simultaneously; these log files formats have optional data components, which can be helpful in
identifying visitors, sessions, and true meaning of behavior,
● Sources of clickstream data are coming from internal and external parties,
● Some examples of external parties: reffering partners, Internet Service Providers(ISPs), search
specification given to a search engine that then directs visitor to the website,
● Two main disadvantages of clickstream data: is stateless and has a clear anonymity of the session,
● By stateless it means that the log shows an isolated page retrieval event, but does not provide a
clear tie to other page events elsewhere in the log; without contextual help is difficult to identify a
complete visitor session,
● By anonymity of the session it means that unless visitors agree to reveal their identity in some way,
you cannot be sure who they are.
9. Identifying the Visitor Origin
● The case when your website is the default page for the visitor’s browser,
● A visitor may be directed to your site from a search at a portal such as Yahoo! or Google, external
referrals,
● Another common source of visitors is from a browser bookmark,
● You site may be reached as a result of a clickthrough - a deliberate click on a text or a graphical link
from another site.
10. Identifying the Session
Condition for valid analysis: Every visitor session(visit) on webpage must have its own unique identity tag (session Id),
similar to a supermarket receipt number. If missing, you could assume the entries are for the same session, by:
● Collating time-contiguos (for example, one hour) log entries from the same host (IP address),
● Let the web browser place a session-level cookie into the visitor’s web browser,
● HTTP Secure sockets layers (SSL) - may include a login action by the visitor and exchange of
encryption keys,
● By placing a session Id in a hidden field of each page returned to the visitor,
● The website may establish a persistent cookie in the visitor’s machine, that is not deleted by the
browser when the session ends.
11. Identifying the Visitor
Real problem for a site designer, webmaster or manager of the web analytics group, because:
● Web visitors want to be anonymous, not to provide personal identification or credit card
information, for example,
● If you demand visitor’s identity, they may not provide accurate information,
● You can’t be sure which family member is visiting your site - a particular computer can be used, but
not by the same person,
● You can’t assume an individual is always at the same computer - he can access the same website
from an office computer or home computer or mobile device, and different website cookie is put into
each machine.
14. Clickstream Dimensional Model
Only 4 unique Dimensions of the Clickstream:
Page Dimension
Event Dimension
Session Dimension
Referral Dimension
15. Page Dimension
The Page Dimension describes the page context for a web page (static or dynamic) event.
16. Event Dimension
The Event Dimension describes what happened on a particular page at a particular point in time.
17. Session Dimension
The Session Dimension provides one or more levels of diagnosis for the visitor’s session as a whole. For
example, one type of analysis is in this question: How many customers did not finish ordering? Where did
they stop?
25. Conclusion
● How many customers consulted your product information before ordering?
● How many customers looked at your product information and never ordered?
● How profitable is each channel (web sales, telesales and store sales)? Why?
● How profitable are your customer segments? Why?
● Which promotions work well on the web but do not work well in other channels? Why?
● When is your business most profitable? Why?
26. Resources
● Book: The Datawarehouse Toolkit, Third Edition - Ralph Kimball, Margy Ross, WILEY 2013,
● https://www.safaribooksonline.com/library/view/designing-web-navigation/9780596528102/ch
04.html,
● https://www.c-sharpcorner.com/UploadFile/225740/introduction-of-session-in-Asp-Net/Images/
Session%20in%20ASP.NET17.PNG,
● http://www.vileda.com/media/wysiwyg/Webshop_AUS/FAQ/wow_r.jpg,
● https://www.jasondavies.com/wordcloud/,
● http://www.worldometers.info/,
● https://www.slideshare.net/itsmenaguda4others/final-ppt-e-commerce-1