Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal


Published on

Next generation applications address more sophisticated questions that go beyond 'What happened?' by using Machine Learning/Statistical modelling to answer 'Why?' and 'What will happen next? Data insights can be easily deployed and rapidly delivered to the decision makers via cloud based applications. This framework focuses on technologies available for the entire data workflow from ingestion and modeling to cloud deployment; Hadoop, MADlib, Python, R, CloudFoundry, etc. This presentation will also include examples of how this framework and innovative Data Science techniques have been applied across diverse business units within Media, including pricing analyses for ad optimization and predicting viewership.

Published in: Technology
  • Login to see the comments

Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal

  1. 1. Open Source Framework for Deploying Data Science Models and Cloud Based Applications Pivotal Data Science Team
  2. 2. What happened? What should I do about it? This is where Data Science comes in What will happen next?
  3. 3. What Thought Leaders Have In Common  Large amounts of structured and unstructured data  Deep personal knowledge of their audience  Quantified understanding of their products  Data-driven culture  User experience optimized by data science
  4. 4. Viewership Advertisements Merchandise Sales & Finance $ Market Research & Competitive Information Audience Demographics Internal Data Sources Typical External Sources Semi/Unstructured Data Clickstream Social Media Content
  5. 5. Data Science Impact Business Motivation Increase Demand Build Brand Equity Increase Production Efficiency Optimize Ad Spend Efficiency Increase Customer Engagement • Campaign Optimization • Marketing Mix Models Data Science Opportunities • Customer segmentation • Affinity analysis • Social media analytics • Supply/Demand forecasting Increase Revenue Reduce Cost
  6. 6. Example Use Case: Ratings Prediction Use Case: Increase ratings across viewer demographics How: • Data: Viewership, transcripts and show data combined in big data platform • Model: Machine learning used to identify the impact of production decisions on viewership Insights
  7. 7. Models  Insights  Actions Models are built to answer business questions e.g. what makes viewers tune- in and tune-out? Data Scientists interpret models for answers e.g. On screen arguments make viewers tune out Report Dashboard BI Tool Email Presentation Cloud App End User A good insight drives action that will generate value for stakeholders
  8. 8. Revisiting Rating Prediction Use Case Model exposed to end users via cloud application allowing what-if scenario building
  9. 9. Characteristics Of Actionable Insights Real-time ScalableSocial Relevant Accessible Open
  10. 10. Benefits Of Cloud Based Applications Service failure or data loss at scale Long innovation cycles Poor experience at scale Resilient, scale-out messaging and processing Agile development with cloud based data services Low-latency, in- memory computing
  11. 11. Open Source Analytics Ecosystem Media companies benefit from algorithmic breadth and scalability for building and socializing data science models MLlib PL/X Algorithms Visualization Best of breed in-memory and in-database tools for an MPP platform
  12. 12. Example Scalable Open Source Platform Hadoop++: Complementing the Hadoop platform are Data Science modeling tools. SQL on Hadoop (e.g. HAWQ), Python/R interfaces to SQL, Apache Spark etc. Apps Data Analytics Leading Media companies are moving towards a platform with Hadoop at the core.
  13. 13. Data Science Pipeline On Hadoop++ MLlib PL/X Data Lake Hadoop++ Structured + Unstructured Data
  14. 14. Open Source Framework For Ratings Prediction Data Lake Insights and Model Results Ratings Predictions Business Levers Hosted on What-if Scenario ApplicationContains structured + unstructured data MLlib PL/X
  15. 15. Gather video ads impression stats Data Lake Ingest Message Broker Simulate Ad Server Behavior Impression Forecasts Business Levers Hosted on Business Metrics Dashboard Expanding The Framework To Include Impression Forecasting Modeling MLlib PL/X
  16. 16. Measuring Audience Engagement : Workflow Parallel Parsing of JSON (PL/Python) Twitter Decahose (~55 million tweets/day) Source: http Sink: hdfs HDFS External Tables PXF Nightly Cron Jobs Topic Analysis through MADlib pLDA Unsupervised Sentiment Analysis (PL/Python) Hosted on
  17. 17. Key Takeaways • Blended data sets lead to richer models and more valuable insights • Turn Data Science models and insights into value generating actions through data driven applications. • Open source = power and flexibility • Platform extensibility is key to supporting Data Science • Turnkey PaaS is available through CloudFoundry, including infrastructure monitoring, server configuration and scalability.
  18. 18. THANK YOU!