Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Notes on Machine Learning and Data-centric Startups
1. Jianqiang (Jay) Wang
Stitch fix/twitter/HP Labs
July 26, 2015
Notes on Machine Learning and
data centric startups
2. About me
B.S. degree in Management Science; Ph.D. in Statistics;
Data scientist in Stitch Fix (retail recommendation);
Data scientist in twitter (computational ads algo);
HP Labs : Business optimization (pricing & portfolio
management, marketing)
Consulting:
SpotTrender (video-pretesting)
Brilent (data science training, recruiter products)
Data-centric businesses (advertising, retail,...).
5. Sources of data
sold flag, survey ratings
Unstructured : feedback, request note,
style image
6. How should interact with algorithms to
Recommend clothes
perform analytics
Medical diagnostics
Human-computer interaction
7. Data-centric startups
Jet.com Amazon killer: subscription-based retail, Marc
Lore (Diapers.com), $50/yr, 5-10% lower price
Thumbtacks Service provider referral (how to monetize?)
SpotTrender Pre-test video commercials
Sano Realtime news discovery from social
networks (twitter, instagram, weibo, VK, ..)
Common
crawl
(non-profit) Open repo of web crawl data,
billions of pages each month
8. ML applications
Search engines
Computational advertising
Recommender systems
Adaptive websites : (learn user preference, personalized webpage)
Medical diagnosis
Human-computer interaction;
Computational finance/stock market analysis;
Computer vision, object recognition,
Speech and handwriting recognition
Machine Translation
Fraud detection (internet, credit card)
Game playing
Information retrieval
Natural language processing
14. Advertiser campaigns
Supply (platform users) vs demand (advertisers)
Creating your own campaign
Tweet engagement
Followers
App install
Website visits
Lead generation
15. Targeting
Targeting criteria
Keywords (tweet or tweet engagement)
Interests
Followers : (similar) followers of a handle
Tailored audiences
How to match users to targeting criteria
Interest/age prediction: we don’t ask the users to explicitly indicate their
interests/age but infer them from who they follow and what they tweet about.
Algorithm & analytics
Interest (NLP), age (classification)
16. Filtering ad candidates
Campaigns currently active with budget left
Same advertiser/tweet fatigue rules
How many times per week for the same user?
How to make such decisions?
Dismiss/block/spam filters
17. Click through rate (CTR) prediction
How likely is the user to ...
Click on the url
Expand the image
Download the app
Online machine learning with 10k+ features
User request and candidate features
Request : user geo, user type, login frequency, interest,..
Ad : advertiser vertical, popularity, tweet content
Model fitting & diagnostics
18. Ranking
Second price auction on Expected Cost per Impression
(ECPI)
Advertisers bid for engagement (Bid)
Predicated engagement rate (pCTR)
Naïve ranking function : ECPI=Bid * pCTR
Pricing
Minimum bid required to win auction
Winner has (bidCPE1, pCTR1), runner-up has (bidCPE2, pCTR2)
Winner pays paidCPE = bidCPE2 * pCTR2 / pCTR1