Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Video Games Presentation for Policymaking in the big data era conference
1. Using big data to map the UK
video games industry
Juan Mateos Garcia and Hasan Bakhshi, 16 June 2015
2. Talks about Nesta + Ukie’s research mapping the UK
games industry with web (biggish) data.
Focuses on data collection and compares results with
what one would obtain using standard (SIC-based)
approaches.
Less focused on reviewing all our findings. For that,
you can download the full report here:
https://www.nesta.org.uk/sites/default/files/map_uk_ga
mes_industry_wv.pdf
This presentation
2
3. Exam question
To measure and map a fast moving, innovative, entrepreneurial sector.
Opportunity
The ‘big data’ revolution:
Unstructured web inputs
Combining varied datasets
Open, interactive outputs (datasets + platforms)
Audiences:
• Policymakers
• Industry
• Other innovation agents
• Researchers
1. Context
3
4. 2. How do we FIND UK games companies?
Using official data
Business
Analyst
SIC Code
Data
Govt
Do these SIC codes capture games companies?
Some issues:
1. Inadequate SIC codes: Games SIC codes only appeared in
2007.
2. Misclassification:
• Companies have no incentives to select the right SIC code.
• Companies straddle sectors (educational games, games app
developers etc.)
Is this data relevant?
Some issues:
1. It misses smaller companies
2. Lags in the publication of the data (~1/2 years)
3. Data only available in an aggregate way. Not possible to
identify companies (due to disclosure issues)
4. Data doesn’t include industry-relevant questions
4
5. Industry
expert
Analyst
Domain
knowledge
Survey
Sample
Excellent source of data, tried and tested
methodology
• Used in many policy-relevant reports.
• Allows targeting existing companies, and obtaining
very relevant information.
Limitations:
• Very expensive
• Very low response rates
• Snapshot
2. How do we FIND UK games companies?
Using surveys
5
6. Busines
s
Analyst
Activity
Data
Web
Advantages
• Definition not based on
SIC codes but on
economic/creative activity
• ‘Real-time’ data
• Relevant data
Not a silver bullet… as we
will see.
2. How do we FIND UK games companies?
Using web data (our approach)
6
7. An illustration of the pitfalls of web data
Several academic papers have used a similar approach, to ours, but based on a
single data source (MobyGames). But MobyGames is very skewed towards
older, niche gaming platforms vs. new, mainstream ones. This reflects biases
in the user-base of the platform.
7
8. 2. How do we FIND UK games companies?
Process
8
Data scraping carried out by external agency
with IT + domain expertise. Analysis in-house
9. 2. How do we FIND UK games companies?
Some observations
Not all observations are born equal:
• Matching companies from web sources with CH data is a probabilistic
process.
• False positives/negatives costly not just in terms of accuracy, but also of
perceptions.
• Strategies to address this:
– Manual (expensive, stringent) verification of companies using web
information: only 23% companies verified (80% of those validated
were correct):
– Decision tree (CHAID) to identify groups of companies similar to
those verified positively: 546 companies added.
– Quality assurance with domain experts (Ukie):
• Remove 17 companies (BBC, gambling companies)
• Incorporate 184 companies with no web presence.
9
10. 4. Results
Coverage
We identify 1902
companies active in 2015
(cf. 1320 according to
IDBR in 2013, ~500 in
most domain-expert
generated company lists).
Just over a third of
companies covered by
official SIC codes.
20% of the companies
have no official SIC code
yet, but are identified by
our approach.
10
12. 4. Results
Geography [1]
breslq ark.lq idbrcount.lq
breslq 1.00 0.38 0.46
ark.lq 0.38 1.00 0.53
idbrcount.lq 0.46 0.53 1.00
Gini
BRES
Gini
IDBR Gini WD
0.929 0.898 0.801
Our approach shows a geography of the UK games industry
echoing official data sources, but with less concentration 12
13. 4. Results
Geography [2]
Differences in
“hotspots” when we
compare our data
and IDBR.
Conversations with
Ukie suggest the
extra hubs identified
by our analysis are
more credible than
those using IDBR
(Liverpool + Cardiff
vs. Hull + Reading)
13
14. 4. Results
Hub composition
One explanation
“New” games hubs
with more diversified
creative economies
tend to include less
companies covered
by official SIC codes,
compared with
“longstanding” hubs.
14
15. 4. Results
Micro-geography
Our data allows us to map the games industry at the micro (company
address) level -> this is policy relevant information.
15
16. 4. Results
Some issues
Poor availability of
financial data (only 6%
report it to CH) -> We
can’t produce estimates
of employment or value
added.
We rely on inaccurate
trading addresses for our
mapping. We know
there are issues here.
How many of our
companies specialise in
games vs. make some
games? What goes in
and what goes out?
16
17. Lessons learned
Structured domain-specific resources help: not available for all
sectors.
It’s not web vs SIC, but web + SIC
Combining automated data collection and matching with domain
knowledge is preferable.
Do not underestimate the risks of errors, or the costs of
minimising them.
Next steps
Our strategy to improve the quality of the data is to open it up for
the games industry by developing an interactive, dynamic
platform: Watch this space
17
5. Conclusions
The company dataset also includes SIC codes (3805 companies).
We collected data on 115 universities offering games education.
Matching with Open Corporates reconciliation API based on several parameters including period of activity and SIC code. Choose matches with a score > 50.
NB the final dataset contains 2320 companies. 1902 active. 245 from CH, 1548 Internet, 107 Ukie.
NB the final dataset contains 2320 companies. 1902 active. 245 from CH, 1548 Internet, 107 Ukie.