Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego

Presented during DevCon Summit 2016 #DevFutureForward on November 5-6, 2016 at SMX Convention Center Manila, Mall of Asia Complex, Pasay City.

  • Login to see the comments

Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego

  1. 1. Applying Machine Learning to Mobile Games Neil Patrick Del Gallego An example of supervised learning for predictive game analytics
  2. 2. Who Am I Software Engineer and Game Developer for 5 years. Developed Dragon Cubes along with other team members Developed Bubble Cubes Currently developing Casino Slots
  3. 3. Who Am I Also currently taking Masters of Science in Computer Science in DLSU.
  4. 4. What is machine learning?
  5. 5. Machine Learning Field of study that gives the computer the ability to learn and recognize patterns. Image R G B Name A 255 255 255 Animal B 255 0 0 Animal C 132 122 230 Plant D 89 134 200 Plant Training Dataset Machine Learning Algorithm Prediction Model/s Animal? Plant?
  6. 6. Supervised Learning The class to predict is explicitly given in the dataset TRAINING DATASET Image R G B Name A 255 255 255 Animal B 255 0 0 Animal C 132 122 230 Plant D 89 134 200 Plant
  7. 7. Supervised Learning Prediction model attempts to predict the missing class value TRAINING DATASET NEW/UNSEEN DATASET Image R G B Name A 255 255 255 Animal B 255 0 0 Animal C 132 122 230 Plant D 89 134 200 Plant Image R G B Name E 0 0 255 ?? F 255 0 0 ?? G 0 122 125 ?? H 98 7 2 ??
  8. 8. Data Mining • Extract patterns or provide insights from a complex dataset. • Borrows techniques from machine learning and statistics.
  9. 9. Data Mining • Patterns should be: • Non-trivial (not normally extracted in an SQL statement) • Unknown • Unexpected • Potentially useful • Actionable
  10. 10. Why Data Mining? Drowning in data but starved for knowledge Unstructured data, or knowledge is deeply buried.
  11. 11. Predicting Daily Active Users for Match-3 Mobile Games Application of supervised learning for Dragon Cubes and Jungle Cubes
  12. 12. It’s been published!
  13. 13. Daily Active Users A measure of application virality. A very important metric to gauge the success of an application.
  14. 14. Motivation Attempt to determine the amount of user activity X days ahead to assist on project planning. X = 7, on our study
  15. 15. Marketing expenses Daily active users Develop features to make users stay
  16. 16. Where data was extracted MARKETING DATA
  17. 17. Dataset Overview • Two match-3 games • JNC generating revenue • DNC does not • Both games only differ in game mechanics.
  18. 18. General Methodology of Data Mining* *How I performed data mining on our data *For supervised learning. Unsupervised learning use subjective evaluation to determine reliability of model. Dataset Feature Selection Refined Dataset Machine Learning Prediction Models Model A Model B Model C Model D Unseen Data Predicted Result Accuracy Measure
  19. 19. Features in Dataset Users are reached via advertising channels MKTExpenses This is the total amount of marketing expenses, in USD, spent to advertise the game. A high marketing expense means more advertising channels have been used to target more potential users to install the game.
  20. 20. Advertising amount per user per country* ● Users are reached via advertising channels. *Market insight taken from Chartboost: http://tinyurl.com/charboost
  21. 21. Features in Dataset How many users discovered our app on a given date? Install Date Calendar date of installation. Cohort Size Refers to the total amount of users who have installed the application on the given install date. Session Count Refers to the total amount of play sessions on a given install date.
  22. 22. Features in Dataset How long do players play the game? AvgSessionSeconds The arithmetic mean of the total amount of time users spend in the game MedianSessionSeconds Session length value where half of the sessions are longer, and half are shorter.
  23. 23. Features in Dataset Market impression of the game DailyAverageRating
  24. 24. Features in Dataset Market impression of the game CrashesANRDay1
  25. 25. Features in Dataset How many users are engaged? ActiveUsers This refers to the total amount of unique users who spent considerable time in the game given a certain date. ActiveUsersDay7 This is similar to the ActiveUsers variable but offset 7 days after the install date. This is the variable to be predicted.
  26. 26. Features in Dataset Screen that triggers the events LevelPlayedEvents LevelSuccessEvents LevelFailedEvents
  27. 27. Dataset
  28. 28. Applying Methodology Dataset Feature Selection Refined Dataset Machine Learning Prediction Models Model A Model B Model C Model D Unseen Data Predicted Result Accuracy Measure
  29. 29. Feature Selection Filter out unneeded attributes Determine what attributes matters
  30. 30. Correlation Analysis Measure relationship of two variables. As X grows, how fast Y grows/declines? Range (-1.0 to +1.0) 0.0 means no relationship at all. +1.0 strong positive relationship -1.0 strong negative relationship
  31. 31. Correlation Analysis 0.7 as our threshold for strong relationship. Basis for manual feature selection.
  32. 32. Automatic Feature Selection Automatic Feature Selection CSV file Filtered CSV Acceptab le? Manual feature selection NO YES Final CSV for training *Wrappers for feature subset selection by Ron Kohavi a,, George H. John b, (1995) Wrapper scheme* algorithm is used for automatic feature selection.
  33. 33. Automatic Feature Selection Automatic Feature Selection CSV file Filtered CSV Acceptab le? Manual feature selection NO YES Final CSV for training ● Despite using the wrapper scheme, some selected features are considered noise. ● Evaluate if the selected features are indeed valuable.
  34. 34. Applying Methodology Dataset Feature Selection Refined Dataset Machine Learning Prediction Models Model A Model B Model C Model D Unseen Data Predicted Result Accuracy Measure
  35. 35. Selected Attributes
  36. 36. Applying Methodology Dataset Feature Selection Refined Dataset Machine Learning Prediction Models Model A Model B Model C Model D Unseen Data Predicted Result Accuracy Measure
  37. 37. Applying Machine Learning Using M5Base or decision tree with regression
  38. 38. Machine Learning Technique Using M5Base (decision tree with regression) WEKA demo (if possible)
  39. 39. M5Base sample
  40. 40. Problem! We do not have enough unseen data yet! What do we do to test our model?
  41. 41. K-fold cross-validation Divide dataset into K partitions (recommended 10) Test set Training set
  42. 42. K-fold cross-validation Divide dataset into K partitions (recommended 10) Test set Training set Traini ng set
  43. 43. K-fold cross-validation Divide dataset into K partitions (recommended 10) Test set Training setTraining set
  44. 44. K-fold cross-validation Divide dataset into K partitions (recommended 10) Test set Training setTraining set
  45. 45. Applying Methodology Dataset Feature Selection Refined Dataset Machine Learning Prediction Models Model A Model B Model C Model D Unseen Data Predicted Result Accuracy Measure
  46. 46. Unseen Data Sample
  47. 47. Applying Methodology Dataset Feature Selection Refined Dataset Machine Learning Prediction Models Model A Model B Model C Model D Unseen Data Predicted Result Accuracy Measure
  48. 48. Accuracy Measure A value close to 0.0 is not really significant.
  49. 49. Accuracy Measure Magnitude of error of predicted value vs actual value. Lower is better.
  50. 50. Accuracy Measure Percentage of error of predicted value vs actual value. Lower is better.
  51. 51. Interpretations Using results from M5Base. *More details available at: http://www.dlsu.edu.ph/conferences/dlsu-research-congress-proceedings/2016/GRC/GRC-HCT-I-001.pdf
  52. 52. Interpretations M5Base performed exceptionally well on JNC-Test (unseen data), which makes it recommendable for real-world use.
  53. 53. Interpretations Jungle Cubes is potentially, a predictable scaleable game.
  54. 54. Interpretations JNC and DNC have almost the same total advertising expense. But DNC did not gain enough daily active users. Positive correlation summary JNC DAU-Day7 DNC DAU-Day7 MKTExpenses High Low SessionCount High Low SessionLength High Low
  55. 55. Interpretations Based from our study, we propose a finding that MKTExpenses gets high correlation with DAU-Day7 once the game has enough enjoyable content to keep users engaged. Ensure high session length Promote replayability SessionLength affects SessionCount affects Satisfies business requirements? Increase advertising campaigns MKTExpenses affects DAU DAU-Day7 influences
  56. 56. Conclusion Did our study correctly predicted the fate of JNC and DNC? Yes. Jungle Cubes has gained over 1M downloads as of October 2016 and is still profitable. Dragon Cubes was pulled out of the market last September 2016.
  57. 57. We thank you Jakob Lykkegaard Pedersen and Thomas Andreasen for allowing us to use the dataset for Jungle Cubes and Dragon Cubes. We would also like to give thanks to Suhana Chooli, the marketing manager of Playlab Inc., which provided the details about the marketing expenses. Thank you for listening!

×