DeNA West was dealing with large amounts of raw log data from over 100 mobile game titles that was delaying analysis and causing other issues. They implemented BigQuery to instantly ingest streaming data, scale to support many analysts, and provide fast queries. This solved their problems by simplifying their technology stack, allowing data to be accessed in seconds instead of hours, and supporting everyone's analysis needs without performance issues. BigQuery provided a hands-off solution and the ability to easily explore and visualize their data.
3. Who is DeNA West?
• 1st Party: developed in house
• 2nd-3rd Party: Developed externally
with/without our help and published
by DeNA
• JP 1st Party: Import hits made by
DeNA in Japan
4. What data do we care about?
Standard Game KPIs Marketing Data Custom Game Insight
5. What’s our data like?
of raw logs per minute
~60MB
of raw logs per day
~50GB
6. How were we dealing with it?
Log Table
Logs from over
100 titles and
multiple studios
42TB raw log
data from May
2011 to Jan 2015
Accessed
via HiveQL
7. What were our data woes?
Data sources
ETL
Table Storage
Data access for ad
hoc analysis Visualization solutions
Visualization workflows
21. Let’s see some data!
Dropped entry price,
increased conversion Reduced overall revenue at
start of funnel but increase
overall
22. Let’s see some data!
Ruby Medal Balance over Time
Tutorial Completion RateTCP Latency Distribution
Editor's Notes
Don’t plan on saying anything here. This is just the slide to bring up while you’re getting on stage. But you can certainly duplicate this if you want.
Hi everyone! I work at DeNA, Japanese company growing our western presence in gaming. Last year, we had just under 2 Billion dollars worth of virtual currency spent in Japan and around 270 million dollars across the rest of the world.
Internally like Blood Brothers 2, or with licensed IP like Transformers, Star Wars, and Marvel. And we’re bringing over Final Fantasy: Record Keeper, one of our biggest hits in Japan that’s making us over $10 million dollars a month there.
common logs across all our games to compare KPIs. We get marketing data from ad vendors on install sources and spend. We implement custom logs in each game for design tuning.
<have time> On busy days, we get around 60 megabytes of raw player logs per minute and 50 gigabites of raw logs per day. <struggled to process this volume>
Old solution: Hadoop cluster accessed primarily via Hive. All of our player logs since May 2011 from over 100 titles across multiple studios is stored in one big 42TB table
Old infrastructure <pause> 15 seconds is definitely not enough time to describe it all. We had a pretty complicated set up before, and we would run into various bottlenecks and failure points that I’ll dive into next.
Our first big issue was a hefty delay between when a player would trigger logs to when analysts could query the data. The log collection and ETL process took a while, and analysts would need to wait 3 or more hours for new data - not fun when your games run live events.
Next, as DeNA West grew our portfolio, we also grew our data users - our increasing team of analysts would clog our systems and we had issues controlling permissions, especially with external developers.
And - this one drove me crazy -Queries took so long to run you’d forget what you were looking for, and exploratory analysis was clunky - just not as fun and intuitive as it should be.
<have time> So how did we decide to solve these issues? We simplified our games’ common tech in the West by using Google AppEngine as a platform server and Google BigQuery for analysis.
This addressed a lot of the problems we used to deal with. For starters, we experimented with Google’s Streaming API, Cloud Logging, and other set ups to get our logs almost instantly after they’re sent - not 3 hours later.
Gateway module receives ~35,000 records/min
Pull Tasks queue handles 8K~9K/min including retries
Hit URLFetch Quota on GAE
3 queries per minute per game
x 7 game projects
x 1440 minutes per day
= 30,240 queries per day
Not to mention pager duty for our Hadoop cluster will soon be a thing of the past. Now if we have too many users clogging the system, it’s Google’s problem, not ours. BigQuery is built to scale, so if we launch 20 more titles, we don’t need to worry about stress on our cluster like we used to.
And we can much more flexibly control permissions - internally as well as externally. For example, we can easily share 3rd party game data with the partner developer, an issue we struggled with before.
<if time, our old solution for this was a nightmare>
And queries are SO much faster. In this video I’m getting tutorial completion rate by country, platform, and device. As the rules go, I only have 15 seconds to crunch over 150 gigs data - and it’s done! In Hadoop, the same query over the same volume took 2 minutes.
Not to say it’s been completely smooth sailing in our transition, mostly because we’ve had to look at things differently. We’ve learned to keep an eye on hitting quotas, as well as query and storage costs. And given it’s a newer product, we’ve had some problems where things don’t work as expected. It’s been crucial to be able to iterate quickly and work with Google when we’ve noticed issues.
<time, sum bq has helped us deal with some headaches> We’ve felt that it’s been a good solution given that we like doing our analysis in house, but don’t want to maintain a cluster ourselves.
Before I go, show you data with Tableau. In Blood Brothers 2 for example, we saw an issue where players lost easy missions early on, so we improved educating them and drilling into the data by day we could see our change helped.
dropped the entry price for a live event’s step-up gacha, which is one that has unlocking steps increasing in price. Though this meant lower revenue early in the funnel, overall revenue (the green line) was due to increased conversion to the end
We always strive to drive actions with data - prioritizing engineering effort by monitoring game performance, optimizing tutorial and purchase funnels, tuning our games, and iterating on live events which are our primary drive for monetization.