These slides use concepts from my (Jeff Funk) course entitled Biz Models for Hi-Tech Products to analyze the business model for Kaggle’s Crowd Sourcing Service for Data Analytics. Kaggle connects data scientists with organizations who have problems related to data analysis. Kaggle helps organizations define their data analytic problems, present them to data scientists, and organize and evaluate competitions between data analytic solutions. Its data ensemble technique also evaluates the effectiveness of the various solutions. These slides describe the specific value proposition for organizations and data scientists and other aspects of the business model such as the method of value capture, scope of activities, and method of strategic control.
4. What is Data Science?
The newly emerging field that is dedicated to analysing and manipulating unstructured/structured raw data to derive insights and build process, products and alter or develop new business model.
Necessary skill-sets ranges from computer science, to mathematics, to knowledge in relevant field.
INTRODUCTION
Data science
5. How Kaggle addresses Data Science?
It is almost never the case that any single organization has access to the advanced machine learning and statistical techniques that would allow them to extract maximum value from their data.
Meanwhile, data scientists crave real-world data to develop and refine their techniques.
Kaggle corrects this mismatch by offering companies a cost effective way to harness the ‘cognitive surplus’ of the world's best data scientists.
What does Kaggle use to correct the mismatch?
Crowdsourcing –It shares the real time data to specific group of users (data scientists) to come up with the predictive models to solve the problems.
INTRODUCTION
6. WHY DATA SCIENCE AND ANALYTICS?
Organization's are spending an average of 21% of their marketing budget on analytics
http://blogs.osc-ib.com/2014/02/ib-student-blogs/data-is-the-new-oil/
7. DATA IS THE NEW OIL
http://blogs.osc-ib.com/2014/02/ib-student-blogs/data-is-the-new-oil/
8. HOW KAGGLE WORKS?
Thecompetitionhostpreparesthedataandadescriptionoftheproblem.Heannouncedtheprizepoolforapropersolutiontogetherwithadeadlineforthechallenge.
Participantsexperimentwithdifferenttechniquesandcompeteagainsteachothertofindthebestmodels.Afterthedeadlinepasses, thecompetitionhostpaystheprizemoneytothewinner.
KaggleConnectistheconsultingpartoftheplatform,whichconnectscompaniestotheeliteoftheKagglecommunity,whomservesolutionsfordifferentdatascienceproblems.
9. HOW THE COMPETITIONS WORK?
4. Understand
(Data Scientist & Kaggle)
5. Collect
(Data Scientist & Kaggle)
6. Data exploration
(Data Scientist & Kaggle)
7. Plausibility check
(Data Scientist & Kaggle)
8. Model
(Data Scientist)
9. Validate
(Kaggle –Ensemble
approach)
1.Company
(customer
with problems)
2. Kaggle
3. Organize data
(Kaggle)
Data scientist
Registration
10. Communicating
Results
Deploy
Best solution
10. WHICH MODEL TO USE?
Countless possible approaches to any data prediction problem. Which to choose?
11. HOW KAGGLE SELECTS THE BEST?
Competitionsarejudgedbasedonpredictiveaccuracyandobjectivecriteriasetbythecompetitionhost/company.
Kagglecomparetechniquesonauniformdatasetwithauniformevaluationalgorithmthatassignspointstoeachsolutionandtheresultsarecategorized.
KaggleusesanEnsembleapproachwhichisproventobebettertoassesspredictivemodellingsolutions.
Ensemble approach
15. SAMPLE COMPETITION
Intel gathered the data of previous NCAA tournament results and fixtures match-up, players data and home and away wins over a period of two decades.
First stage is to generate a predictive model to and compare it with the previous tournaments.
Target is to use the model to predict the winners of the 2014 NCAA tournaments.
Prize money : $15,000
id
pred1
pred2
name.x
name.y
S_507_509
0.245309234288291
0.708999299530187
ALBANY NY
AMERICAN UNIV
S_507_511
0.015245408147597
0.083965574256572
ALBANY NY
ARIZONA
S_509_511
0.044761732923018
0.041779131840498
AMERICAN UNIV
ARIZONA
S_509_512
0.282281213282214
0.185690215492044
AMERICAN UNIV
ARIZONA ST
S_507_512
0.114997411223728
0.324048786686369
ALBANY NY
ARIZONA ST
S_511_521
0.846952788682282
0.835060080083856
ARIZONA
BAYLOR
S_507_521
0.077615865041407
0.28593300082739
ALBANY NY
BAYLOR
S_509_536
0.304576324006342
0.187324294026667
AMERICAN UNIV
BYU
S_507_536
0.126407140118714
0.326412371166609
ALBANY NY
BYU
Predictions :
16. TOP COMPANIES INVOLVED
In kagglethousand of competition are hosted
Competition varieties range from Biology to Finance.
Various companies such as Nasa, Microsoft etcand medium sized enterprise host competition.
Universities such as Stanford and Harvard even host the competition.
17. KAGGLE COMMUNITY
Kagglecommunityistheplacewherevariousdatascientistsandexpertsstandonasingleplatformtosharethoughts.
Kagglerunsablog“nofreehunch”whereeveryactivityhappeninginkaggle,bestpractices,conferencesandupdatesonrecentdevelopmentsareconstantlyposted.
Thecommunityalsohasthetopdatascientistsintheworld,withwhomthecompaniescoulddiscussonthecurrentmodelandtheeffectsofthepredictivemodelsdeveloped.
TheJobsBoardisthenewfeaturewherecompany/customerinneedofDataScientistcouldpostanadwiththeirrequirements
18. CONTENTS
Introduction
Scope of Activities
Value Proposition
Customer Selection & Market
Value Capture
Competitor Analysis
Strategic Control
1
2
3
4
5
6
7
19. SCOPE OF ACTIVITIES
Kaggle
Open source
Investors & support
Companies
Data Scientist
Competitionhosts
x
Data providers
x
Content development
x
Software
x
x
Algorithm
x
x
x
Evaluation
x
DataStorage
x
Marketing
x
x
Licensing
x
x
Readingmaterial
x
x
x
Search
x
Terms
x
x
x
20. CONTENTS
Introduction
Scope of Activities
Value Proposition
Customer Selection & Market
Value Capture
Competitor Analysis
Strategic Control
1
2
3
4
5
6
7
21. VALUE PROPOSITION –KAGGLE
KAGGLE has two types of Customer:
1.Data Scientist (who works for the problem)
2.Company/Organizations.(who gives the problem)
22. Participation by worlds leading data scientist
Many data scientist participate
Different minds gives different solutions
Kaggleplatform<<< data scientist
Ensemble approach
Signing of NDA, Background check, Exclusive sets of data scientists
VALUE PROPOSITION-COMPANIES
23. VALUE PROPOSITION FOR DATASCIENTIST
To Big companies such as NASA, Facebook, Microsoft
Highly paid jobs in big organizations.
Signature track : Data Scientist in Kaggleleader board which gives them recognition in the field of predictive modelling.
24. CONTENTS
Introduction
Scope of Activities
Value Proposition
Customer Selection & Market
Value Capture
Competitor Analysis
Strategic Control
1
2
3
4
5
6
7
26. END USERS
Corporations and Research Organizations
People
Kaggle
Trend Analytics on Stock Prices
Users Subscribe to services based on Kaggle Solutions
Direct
Indirect
32. KAGGLE’S MARKET
Sales Forecasting
Stock Forecasting
Risk Modelling & Pricing
Logistic optimisation
Best Process Prediction
Inventory Management
Traffic Forecasting
Energy demand
Crime Prediction
Tax Social fraud detection
Hospital Casualty Demand
Private Sectors
Public Sectors
33. MARKET DRIVERS
IT offers a definitive source of competitive advantage across all industries and will offer significant future value.
Data is being considered to be the future commodity.
Individuals create 70% of data, Enterprises store 80% of the data
35. CONTENTS
Introduction
Scope of Activities
Value Proposition
Customer Selection & Market
Value Capture
Competitor Analysis
Strategic Control
1
2
3
4
5
6
7
36. KaggleCompetition
Community Access
% from prize money
Company-Open Data
Data Scientists
Solution
Prize Money
CURRENT REVENUE STREAM -BUSINESS
37. KaggleConnect
Top Data Scientist Access
Connect Fee
Company-Sensitive Data
Top 0.5% Data Scientists
Money
Solution
CURRENT REVENUE STREAM -BUSINESS
38. CURRENT REVENUE STREAM -EDUCATION
Kaggle corp
Assignments
% Revenue
Results in order of marks obtained
Student enrolled in the university
Question & Data
Data model
Top universities
39. PROPOSED REVENUE STREAM –EDUCATION
Contract with online courseware websites like Coursera, edxcould be signed and provide data for students enrolled in specific courses.
Singapore government has proposed to introduce data science in high schools as a part of co-curriculum. Kaggle could enter the market to provide a tool for schools.
40. PROPOSED REVENUE STREAM –GOVERNMENT ALIASES
Kaggle corp
Kaggle competition
Kaggle connect
Government/
Customer
Local Data scientist
Data available online
Job offer
Brand value gained as a government recognised platform/organisation for Analytics
Prize money
% of Prize money
Job
Data model
Has knowledge about the local market
Data model
+ Trust/Privacy
Human
Resource
41. PROPOSED REVENUE STREAM –KAGGLE CONSULTANCY
Kaggle corp
Kaggle connect
Oil & Gas industries/
Customer
Raw Data + Challenge
Fee for consultancy
Top 0.5% of Data Scientist in relevant field
Work
Kaggle consultancy
Job offer
Structured data
Ownership
Data model
With good Brand value, trust and adequate human resource availability, Kaggle could enter the field of analytics as a consulting firm.
The major field of interest could be Oil & Gas as the data is large, unstructured and sensitive.
42. VALUE CAPTURE -KAGGLE PRODUCTS
Kaggle Public Competitions
Competitions allow organizations to post their data and a specific prediction problem to be answered competitively by the world's best
Kaggle Masters Competitions
Kaggle provides the same platform as with its public competitions, except that access is limited only to an elite group of Kaggle players
Kaggle-in-Class
Kaggle-in-Class allows instructors to host data prediction competitions for their students.
44. CONTENTS
Introduction
Scope of Activities
Value Proposition
Customer Selection & Market
Value Capture
Competitor Analysis
Strategic Control
1
2
3
4
5
6
7
45. Kaggle
Innocentive
For users
Career Choice with enough competitions
Rewarding hobby
platform
Crowdsourcing, Open innovation,Predictive modelling
Open innovation,Research and Development
Scope
Problems involving Data analytics
R&D in various industries
Registered Members
100,000 in 3 years
300,000 in 12 years
Max Prize money
3million
1 million
Numberof Competitions
311(107/year)
1650( 138/year)
https://www.kaggle.com/competitions
KAGGLEVSINNOCENTIVE
Kaggle focuses on problems that are related to data analytics. Kaggle’s data scientist use machine leaning as a methodology to solve these problems.
Problems posted in Innocentive are related to R&D, product development generic issues. Ususallycoding stands as the major part of the development.
These 2 are different organizations with a different value proposition.
46. CONTENTS
Introduction
Scope of Activities
Value Proposition
Competitor Analysis
Customer Selection
Value Capture
Strategic Control
1
2
3
4
5
6
7
47. More Data scientists attracts more Clients
NETWORK EFFECT
First mover advantages of internet platforms
Clients
Data
ScientistMore Clients attracts more data scientists
48. STRATEGIC PARTNERSHIP & COLLABORATION
Strong collaboration with big data companies And Institutions –GE, Google, Facebook, Amazon, WalmartSecure PlatformSecure Platform
49. BARRIER FOR ENTRY
StrengthenandestablishexclusiverelationshipswithBigdatacompaniesandWorldclassInstitutionswillcreateabarrierforothercompetitorstoenterinthebusiness
Patent/tradesecretofbusinessmodelshallbemade
50. IP MANAGEMENT
Kaggle has a strong IP management
IP protected ranking software which is used to choose the best model
Ranking software is the key for Appropriability
Between the parties, Kaggle is the owner of all Intellectual Property Rights in and to the Website
Winner entry will be governed by a separate contract between the winner and the Competition Host
All text, graphics, user interfaces, photographs, trademarks, logos and artwork, including the design, structure… licensed by or to Kaggle and is protected by applicable copyright, patent and trademark laws and various other intellectual property rights and unfair competition laws.
51. COMPLEMENTARY ASSETS
Job Opportunities
Data analysis courses and online support
Certificate/Credit System: Kaggle can establish a credit system as like the leader board that can leverage a Student to join in a school
Complementary Products like T-Shirts for Non-Profit competitions
53. 1. INTRODUCTION
“I keep saying the sexy job in the next ten years will be statisticians.”
Hal Varian
Google Chief Economist
2009
“Aim to make Data Science a Sport.”
Anthony Goldbloom
Kaggle Founder
2012