2. Disclaimer
• The opinions expressed here are mine and in no way
represent the official position of LinkedIn
3. Example of user interaction
ts, user-id, <items shown at various slots>, <what was clicked?>, < what after click>
user-id: covariates; item-id: covariates; user-id: social connections
4. Statistical Challenges
• Exploratory Analysis (EDA), Visualization
– Retrospective (on Terabytes)
– More Real Time (every few minutes/hours)
• Statistical Modeling
– Scale (computational challenge)
– Dimensionality (few categorical variables with
massive number of levels interacting)
– Temporal Effects
5. Statistical Challenges continued
• Experiments
– To test new methods, test hypothesis from
randomized experiments
– Adaptive experiments
• Forecasting
– Planning, advertising
6. My 2 cents
• BD problems are complex, messy, it is inherently multi-disciplinary
• Having a clear idea of the underlying scientific problem important
• Systems, Algorithms, Statistics, Machine Learning, Optimization,…
• Statisticians could consume wonderful tools created by our friends,
develop the statistical aspects
– Learn Hadoop and Pig, it has become easy (like R)
• Emphasis on areas like sampling, DOE, scalable model fitting
• More collaborative programs between academia/industry,
academia/government
– E.g. Training programs for students working with problem ownners