Next generation applications address more sophisticated questions that go beyond 'What happened?' by using Machine Learning/Statistical modelling to answer 'Why?' and 'What will happen next? Data insights can be easily deployed and rapidly delivered to the decision makers via cloud based applications. This framework focuses on technologies available for the entire data workflow from ingestion and modeling to cloud deployment; Hadoop, MADlib, Python, R, CloudFoundry, etc. This presentation will also include examples of how this framework and innovative Data Science techniques have been applied across diverse business units within Media, including pricing analyses for ad optimization and predicting viewership.
Open Source Framework for Deploying Data Science Models and Cloud Based Applications by Noelle Sio of Pivotal
Open Source Framework for
Deploying Data Science Models and
Cloud Based Applications
Pivotal Data Science Team
What should I do about it?
This is where Data Science comes in
What will happen next?
What Thought Leaders Have In Common
Large amounts of structured and
Deep personal knowledge of their
Quantified understanding of their
User experience optimized by data
Sales & Finance
Market Research &
Internal Data Sources
Typical External Sources Semi/Unstructured Data
Data Science Impact
Build Brand Equity
• Marketing Mix
Data Science Opportunities
• Affinity analysis
• Social media analytics
Example Use Case: Ratings Prediction
Use Case: Increase ratings across viewer
• Data: Viewership, transcripts and show
data combined in big data platform
• Model: Machine learning used to
identify the impact of production
decisions on viewership
Models Insights Actions
Models are built to
e.g. what makes viewers tune-
in and tune-out?
interpret models for
e.g. On screen arguments
make viewers tune out
A good insight drives action that will generate value for stakeholders
Revisiting Rating Prediction Use Case
Model exposed to end users via cloud
application allowing what-if scenario building
Characteristics Of Actionable Insights
Benefits Of Cloud Based Applications
Service failure or
data loss at scale
Poor experience at
with cloud based
Open Source Analytics Ecosystem
Media companies benefit from algorithmic breadth and scalability for
building and socializing data science models
Best of breed in-memory and in-database tools for an MPP platform
Example Scalable Open Source Platform
Hadoop++: Complementing the Hadoop platform are Data Science modeling tools.
SQL on Hadoop (e.g. HAWQ), Python/R interfaces to SQL, Apache Spark etc.
Leading Media companies are moving towards a platform with Hadoop at the core.
Data Science Pipeline On Hadoop++
Open Source Framework For Ratings Prediction
+ unstructured data
Gather video ads
Message Broker Simulate Ad
Expanding The Framework To Include Impression
Measuring Audience Engagement : Workflow
(~55 million tweets/day)
Nightly Cron Jobs
• Blended data sets lead to richer models and more
• Turn Data Science models and insights into value
generating actions through data driven applications.
• Open source = power and flexibility
• Platform extensibility is key to supporting Data Science
• Turnkey PaaS is available through CloudFoundry,
including infrastructure monitoring, server
configuration and scalability.