(HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

•

4 likes•1,370 views

Producing vaccines is a significant and complex effort that spans manufacturing, biological materials, streaming data, and complex computational challenges. In this session, speakers from Merck and Booz Allen Hamilton discuss how they partnered to leverage AWS and data science techniques, enabling them to pioneer new approaches for analyzing vaccine production yields. The solution they created combines a shared data lake service built on AWS services-such as Amazon EC2 and Amazon VPC-as well as Hadoop MapReduce, HDFS, Hive, and R to implement the data science infrastructure and analysis that created models of complex biological processes. As a result of this project, Merck has analyzed 12 years of vaccine manufacturing data from 16 data sources, conducted over 15 billion calculations, and was recognized with the InformationWeek Elite Business Innovation Award for the innovative application of data science towards enhancing vaccine yield rates and saving lives.

Technology

11.018.14
Brian Keller, Data Science Lead, Booz Allen Hamilton
Jerry Megaro, Director, Advanced Analytics and Innovation, Merck Manufacturing
Nic Perez, Cloud Architecture Lead, Booz Allen Hamilton
Making a difference with data

1 Broor S, Ghosh D, Mathur P. Molecular epidemiology of rotaviruses in India. Indian J Med Res 2003; 118:59-67.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*= sales for RotaTeq®
*
*
*
*
*
*
*
*
*
*
*
*
*
*
5.6 Billion people in the world do not have access to our products
90% of RotaTeq
sales are in USA
The Rotavirus Vaccine Disconnect
= 1,000 deaths
•

Parametric models
Let the data tell the story
Input/Output modeling
Data experimentsto enable discovery
Avoid failure
Failureis powerful… learn fast and adjust
Narrowscope of analysis
Ask biggerquestions using atypical data

Human Insight + Actions
Data Management
Infrastructure
Machine Learning Free-Computation Alerting
Geographic
Language
Translation
Entity
Relationship
Event Grab
Dense/
Sparse
Structured Unstructured Streaming
Provisioning Deployment Monitoring Workflow
Streaming Analytics
Streaming
indexes
Services (SOA)
Analytics and
Discovery
Views and Indexes
HDFS/Data Lake
Metadata Tagging
Data Sources
Infrastructure/
Management
Visualization,
Reporting, Dashboards,
and Query Interface

Resulted in…
Winner of Information Week Business Innovation Award

Clustering in this region indicates parameter similarity is associated with high yield
Clustering in this region indicates parameter similarity is associated with low yield
Similarity
Score
(low)
(high)
Batch 2
Batch 1
Batch 3
Batch 5
Batch 4
Batch 1
Batch 3
Batch 2
Batch 5
Batch 4
Increasing yield
Increasing yield SimilarityMatrix

(moderatesimilarity)
(high similarity)
Lots of Data Experiments (And Failures) That Lead to Final Predictive Model…

BusinessDecisionMakersResearchersExternal Partners

Redshift-Based
Data Marts
Amazon EC2
Elastic Map/
Reduce
Hadoop, Solr Search Solution
Legacy
Enterpise RDS
AES Encypted S3 Data Lake
VPC
Enterprise
Active Directory
JAXRS/Tomcat-Based Rest
Services on Elastic Bean Stalk
Insights Angular, D3.js Web UI
Accelerated Reasoning
Security
Cell-Level Visibilty,
Life Science Informatics via
Custom Solr Plug-ins
Flexible Data Processing
Pipelines
Business Users
Data Scientists

Reference Architecture –Privileged Identity Management

Reference Architecture –Identity Analytics

–Monitor, identify, and alert on abnormal user activity
–Govern administrative rights; policy based enforcement
–Hardened virtual appliance; do not allow direct RDP/SSH access to management/security appliances
–IA has purview into every log (firewall/router logs, crypto logs, application logs, systemdlogs, OS logs, SCCM, etc.)

exploredatascience.comgithub.com/booz-allen-hamilton

Viewers also liked

Missing vin (1)Osopher

Maar wat is Sosiale Media?KuberKat SwartBelt

Moving beyond Blackboard: The VLE journey at DundeeNatalie Lafferty

A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...Emiliano De Cristofaro

Dossier Pedagogique Festival Electrochoc 12 Les Abattoirs SMAC - Scène de Musiques Actuelles

announcements- Friday March 31, 2017Ken Stayner

Viewers also liked (6)

Missing vin (1)

Maar wat is Sosiale Media?

Moving beyond Blackboard: The VLE journey at Dundee

A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...

Dossier Pedagogique Festival Electrochoc 12

announcements- Friday March 31, 2017

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Story boards and shot lists for my a level piececharlottematthew16

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

"ML in Production",Oleksandr BaganFwdays

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

CloudStudio User manual (basic edition):comworks

Search Engine Optimization SEO PDF for 2024.pdfRankYa

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

The Future of Software Development - Devin AI Innovative Approach.pdf

Scanning the Internet for External Cloud Exposures via SSL Certs

DevEX - reference for building teams, processes, and platforms

Story boards and shot lists for my a level piece

What's New in Teams Calling, Meetings and Devices March 2024

Powerpoint exploring the locations used in television show Time Clash

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

SAP Build Work Zone - Overview L2-L3.pptx

Artificial intelligence in cctv survelliance.pptx

Unraveling Multimodality with Large Language Models.pdf

"ML in Production",Oleksandr Bagan

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

WordPress Websites for Engineers: Elevate Your Brand

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

CloudStudio User manual (basic edition):

Search Engine Optimization SEO PDF for 2024.pdf

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Nell’iperspazio con Rocket: il Framework Web di Rust!

(HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

1. 11.018.14 Brian Keller, Data Science Lead, Booz Allen Hamilton Jerry Megaro, Director, Advanced Analytics and Innovation, Merck Manufacturing Nic Perez, Cloud Architecture Lead, Booz Allen Hamilton Making a difference with data

2. -George W. Merck (1950)

4. 1 Broor S, Ghosh D, Mathur P. Molecular epidemiology of rotaviruses in India. Indian J Med Res 2003; 118:59-67. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *= sales for RotaTeq® * * * * * * * * * * * * * * 5.6 Billion people in the world do not have access to our products 90% of RotaTeq sales are in USA The Rotavirus Vaccine Disconnect = 1,000 deaths •

5. BUSINESS KNOWLEDGE

6. Parametric models Let the data tell the story Input/Output modeling Data experimentsto enable discovery Avoid failure Failureis powerful… learn fast and adjust Narrowscope of analysis Ask biggerquestions using atypical data

7. Human Insight + Actions Data Management Infrastructure Machine Learning Free-Computation Alerting Geographic Language Translation Entity Relationship Event Grab Dense/ Sparse Structured Unstructured Streaming Provisioning Deployment Monitoring Workflow Streaming Analytics Streaming indexes Services (SOA) Analytics and Discovery Views and Indexes HDFS/Data Lake Metadata Tagging Data Sources Infrastructure/ Management Visualization, Reporting, Dashboards, and Query Interface

9. Resulted in…

10. Resulted in…

11. Resulted in… Winner of Information Week Business Innovation Award

12.

13.

14. Clustering in this region indicates parameter similarity is associated with high yield Clustering in this region indicates parameter similarity is associated with low yield Similarity Score (low) (high) Batch 2 Batch 1 Batch 3 Batch 5 Batch 4 Batch 1 Batch 3 Batch 2 Batch 5 Batch 4 Increasing yield Increasing yield SimilarityMatrix

15. (moderatesimilarity) (high similarity) Lots of Data Experiments (And Failures) That Lead to Final Predictive Model…

16. BusinessDecisionMakersResearchersExternal Partners

17. Redshift-Based Data Marts Amazon EC2 Elastic Map/ Reduce Hadoop, Solr Search Solution Legacy Enterpise RDS AES Encypted S3 Data Lake VPC Enterprise Active Directory JAXRS/Tomcat-Based Rest Services on Elastic Bean Stalk Insights Angular, D3.js Web UI Accelerated Reasoning Security Cell-Level Visibilty, Life Science Informatics via Custom Solr Plug-ins Flexible Data Processing Pipelines Business Users Data Scientists

18.

19. Reference Architecture –Privileged Identity Management

20.

21. Reference Architecture –Identity Analytics

22. –Monitor, identify, and alert on abnormal user activity –Govern administrative rights; policy based enforcement –Hardened virtual appliance; do not allow direct RDP/SSH access to management/security appliances –IA has purview into every log (firewall/router logs, crypto logs, application logs, systemdlogs, OS logs, SCCM, etc.)

23. Reference Architecture –Cryptography

24.

25. exploredatascience.comgithub.com/booz-allen-hamilton

26. http://bit.ly/awsevals

(HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

(HLS201) Using AWS and Data Science to Analyze Vaccine Yield | AWS re:Invent 2014