In this presentation, we will answer these questions:
The code is available on GitHub.
How many valid users and active users there are on Steam?
How much time do Steam’s users spend on Steam?
How much money do Steam’s users spend on Steam?
What is the Price–performance ratio (Avg. Cost Per Hour) of Steam's games?
4. Why Analyzing Steam’s Data?
● Steam is the dominant platform serving at the
de facto PC game distribution hub.
● Total games owned: 3 Billions!
● Real time statistics of Steam’s users and games
shed lights on a market of billions.
5. Guiding Questions
1)How many valid users and active users there are on Steam?
2)How much time do Steam’s users spend on Steam?
3)How much money do Steam’s users spend on Steam?
4)What is the Price–performance ratio (Avg. Cost Per Hour) of Steam's games?
6. Sampling Method
● Steam ID is assigned a 64-bit unique ID in a sequential manner
● Steam ID space length is 462,322,592
● Query Steam web API by a generated steam ID can get this user’s summary and game list
● Question: how to sample a certain amount of users for our analysis? Simple Random Sampling?
Stratified Sampling?
Stratified Sampling with
Random Sampled Strata?
7. Estimating Total Valid Users
● Let Yi = 1 denote the i th ID is a valid user (or active users, private users...) and Yi = 0 otherwise
● N is the total number of IDs in ID space, so the variable of interest is: Y1 ,...,YN
● Let n denote the number of samples, so the variable of interest is: y1 ,..., yn
● For Stratified Sampling, let L denote the number of strata and in our case and h is a specific stratum
● For Stratified Sampling with Random Sampled Strata, let B denote total number of strata we randomly choosed in L
Est. of Simple Random Sampling
Est. of Stratified Sampling
Est. of Stratified Sampling with
Random Sampled Strata
where
Actually in our case, the strata are equal length and number of
samples in each stratum or chosen bucket is also equal...
8. Variance!
The variance of estimator of population total is not same
Simple Random Sampling
Stratified Sampling
Stratified Sampling with Random Sampled Strata
9. Number of Invalid Users Samples in Each Stratum
Split 462,322,592 (N) into 1000 Strata (L) and 150 (nh) Samples in each stratum
10. Stratified Sampling and Estimation of Valid users
● The insight of Stratified Sampling
○ may produce a smaller error of estimation than a simple random sample with same size
○ estimates of population parameters may be desired for subgroups of the population
● For total valid users
○ ID samples: 150,000 from 150 samples (nh) * 1000 strata (L)
○ ID space length (Nh) in each stratum: 462322
○ Number of invalid users in samples: 17541
○ Total invalid users: 54,064,004
○ Total valid users: 408,258,588
● When the stratum sample sizes are at least 30, we can use z to approximate t in t distribution
○ the 90% confidence interval is: 408,258,588 ± 559,047
○ If samples are from uniformed sampling, it is: 408,258,588 ± 799,945
Same analysis can be performed on estimating active users, private users ...
11. Computational Experiments
● Distribution of Game Price
● Distribution of Number of Games Purchased by Users
● Gamer Behavior
● Price Performance Ratio (Average Cost/Hour)
12. Distribution of Game Price
48,213 Games in Steam
23.14% of them are free
85.23% of them are less than $10
14. Number of Games Purchased by User
most users purchase less than 25 games
45.61% of users have not purchased any games
15. Game Behavior - Account Value
25,260 sample users 46% of them never paid for games → $0
99.45% of them are less than $3,000
16. 11.09% of all games bought on
Steam are never played
90.07% of them are played less
than 10 hours
Game Behavior - Average Play Time
17. Cost/Hour for user: Is playing game expensive?
No, for most users(98%), they spent 2.72$/hour on game playing.
7.09$/h
Ticket only
2.72$/h 11.57$/h
Ticket only
Economical and NOT one time consumption.
18. Hour/User for Game: “White Elephant” VS “Worth Buying”
The average playing time per game owner:
<1: Unsatisfying Games (36%)
85% <10$, Average price is 7.47$
>=1: Satisfying Games (64%)
96%<10$, Average price is 11.48$
Higher Price cannot guarantee quality, but
satisfying games on average sells more
expensive.
19. Cost/Hour,User for Game
Case Study: overall gameplay experience
16.489$/h
49.99$
4 hours to finish
Definitely not worth
$50
20. 0.836$/h
59.99$
Urge to play again
Not a waste, even at
full price.
Cost/Hour,User for Game
Case Study: overall gameplay experience
21. Back To The Question:
Which game is more affordable?
Price could be misleading!
22. Cost VS Cost-Benefit - Cost per unit of time
59.99$ 9.99$ 4.99$
0.485$/h 1.463$/h 1.921$/h
23. Conclusion
● Apply the stratified sampling on Steam ID space, and compare it with simple
random sampling and stratified sampling with random sampled strata
● Get the unbiased estimation of total valid user number and the its confidence
interval
● 46% users only play free games or discard the account and play no games.
● Playing computer games is economical comparing to other leisure activities.
● Satisfying game --->Higher Price
● Higher price-performance ratio ---> Better overall gameplay experience
Let me start with this question,
which of these games is more affordable?
Any idea?
what is missing to answer?
The price
here is the price (click)
what do you think now?
for now, let's put this question on hold, we will back to it at the end of the presentation
In our project, we are going to research these question, which we think are important measure the size and impact of the video game industry
We found out that not many users are willing to try multiple games.
the total money spent on owned games of each account
46% 11,577