SlideShare a Scribd company logo
1 of 63
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Arnoud Otte, Assistant Director Cloud & Data Architecture, Cambia Health Solutions
Rich Uhl, CTO / Founder, 1Strategy
Ujjwal Ratan, Solutions Architect, AWS
November 28, 2016
HLC301
Data Science and Healthcare: Running Large
Scale Analytics and Machine Learning on AWS
What to Expect from the Session
• Benefits from large-scale analytics with PHI - Arnoud
• Securing Amazon EMR & Elasticsearch - Rich
• Additional solution components for HIPAA compliance [demo] - Rich
• Reducing cost and improve quality of care with Amazon Machine
Learning [demo] - Ujjwal
NOTE: This is a deep dive session on HOW rather than WHAT. We will show
implementation details.
• This session expects familiarity with:
• AWS services - EMR and S3
 BDM401 - Deep Dive: Amazon EMR Best Practices & Design Patterns
 BDA206 - Building Big Data Applications with the AWS Big Data Platform
• Encryption and distributed systems like Hadoop and Elasticsearch
Arnoud Otte
Assistant Director Cloud & Data Architecture
Arnoud.Otte@CambiaHealth.com
Cambia Health Solutions
Our Roots
Born from an inspired idea
Our Cause
Becoming catalysts
for transformation
Our Vision
Delivering a reimagined
health care experience
Requirements
HIPAA eligible
Scalable
Managed Service
Secure
Pay-as-we-go
Performance
Master Data
Management
Data Science
& Analytics
Architecture
Amazon
CloudWatch
AWS
CloudTrail
AWS
IAM
Cambia
Data Center
Amazon
S3
Amazon
DynamoDB
AWS
Lambda
Amazon
EMR
Amazon
Elasticsearch Service
Data Lake
Metadata
Security
Amazon
Redshift
Amazon
EMR
Data Science
& Analytics
Amazon
EMR
Master Data
Management
Master Data Management
Source A Source B
First
Name
John John
Last
Name
Doe Doe
DOB 1970-01-01 2016-11-28
Street 105 Main St 105 Main St
City Portland Portland
State OR OR
Source A Source B
First
Name
Jillian Jill
Last
Name
Doe Doe-Doe
SSN 123-45-6789 123-45-6789
Street 605 Oak Dr 105 Main Street
City PDX Portland
State OR Oregon
No. Fatherandson. Yes.Married,changedname,andmoved.
This is artificial data fabricated for illustration purposes only.
Are these the same people?
Master Data Management – Approach
Demographics
Laboratory
Pharmaceutics
Geography
Claims
Composite
record of
best values
Cambia
Match and Merge
on Amazon EMR
Master Data Management – Quality
98.50%
99.90%
99.99%
97.5%
98.0%
98.5%
99.0%
99.5%
100.0%
Match Correctness
Vendor Cambia V1 Cambia V1.1
98.80%
84.30%
98.10%
75.0%
80.0%
85.0%
90.0%
95.0%
100.0%
Match Completeness
Vendor Cambia V1 Cambia V1.1
7,000+ records containing 1,600+ matches
Manually checked and confirmed in the real world
Master Data Management – Performance
90 minutes 40 minutes
0
500
1000
1500
2000
2500
minutes
Run time
Vendor Cambia V1 Cambia V1.1
2160 minutes
or 36 hours
17.7M records containing 1.8M matches
Next Steps
Scale
in and out or up and down
Amazon Machine
Learning
Amazon
EMR
Build out healthcare
data science models
HIPAA compliant
search on data
Amazon
EC2
SecurityBig Data
1Strategy.com | @1strategy_cloud | Booth #408
Rich Uhl
Founder & CTO
Rich@1Strategy.com
At Rest – when data is in a stored location
Definition of Terms
In Transit – when data is moved to and from storage
In Process – when data is in temporary space for processing state
Architecture
Amazon
CloudWatch
AWS
CloudTrail
AWS
IAM
Cambia
Data Center
Amazon
S3
Amazon
DynamoDB
AWS
Lambda
Amazon
EMR
Amazon
Elasticsearch Service
Data Lake
Metadata
Security
Amazon
Redshift
Amazon
EMR
Data Science
& Analytics
Amazon
EMR
Master Data
Management
AWS KMS
Encryption Keys Exchanging Keys Temporary KeysMaster Key
Key Management
Encryption at Rest
EMRFS
on S3
EMRFS on S3 – This is achieved via s3 client-side encryption with AWS KMS.
HDFS – via Hadoop File System (HDFS) transparent data encryption as
described in the Apache Docs.
HDFS on
EMR Cluster
Config File
Encrypted
Encryption at Rest
{
"Sid": "DenyUnEncryptedObjectUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::prd-datalake/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
EMRFS
on S3
Encryption at Rest
Data
Encryption
Key (DEK)
Envelope Data
Encryption Key
(EDEK)
Hadoop KMS
Bootstrap Script
Uses native Hadoop HDFS Transparent Data Encryption (DEK/EDEK)
HDFS on
EMR Cluster
Encryption at Rest
{
"Classification": "hdfs-site",
"Properties": {
"dfs.encryption.key.provider.uri": "kms://…”,
"dfs.namenode.name.dir": "file:///…",
"dfs.name.dir": "/mnt/encrypted/…",
"dfs.data.dir": "/mnt/encrypted/…",
"dfs.datanode.data.dir": "file:///…"
}
Bootstrap Script
HDFS on
EMR Cluster
Encryption at Rest
EMRFS
on S3
HDFS on
EMR Cluster
Summary of Encryption at Rest
Encryption in Transit
HDFS on
EMR Cluster
EMRFS
on S3
Encryption in Transit
EMRFS on
S3
HDFS on
EMR
Cluster
Encryption in Transit
<!-- Client certificate Store -->
<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.client.keystore.location</name>
<value>/etc/emr/security/ssl/keystore.jks</value>
</property>
<property>
<name>ssl.client.keystore.password</name>
<value>changeit</value>
</property>
<!-- Client Trust Store -->
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.client.truststore.location</name>
<value>/etc/emr/security/ssl/truststore.jks</value>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value>changeit</value>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
</property>
</configuration>
Three areas to address
1. Hadoop RPC - Hadoop RPC is used by API clients of MapReduce
2. HDFS DTP - HDFS Transparent encryption this traffic is automatically encrypted
3. Hadoop MapReduce Shuffle - MapReduce shuffles and sorts the output of each map task to reducers
on different nodes
HDFS
on EMR
Cluster
Encryption in Transit - Cluster
RPC
client
Hadoop RPC - Hadoop RPC is used by API clients of MapReduce
EMR
Cluster
EMRFS
on S3
Encryption in Transit - Cluster
RPC
client
<property>
<name>hadoop.security.service.user.name.key</name>
<value></value>
<description>
For those cases where the same RPC protocol is implemented by multiple
servers, this configuration is required for specifying the principal
name to use for the service when the client wishes to make an RPC call.
</description>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
<description>A comma-separated list of protection values for secured sasl
connections. Possible values are authentication, integrity and privacy.
authentication means authentication only and no integrity or privacy;
integrity implies authentication and integrity are enabled; and privacy
implies all of authentication, integrity and privacy are enabled.
hadoop.security.saslproperties.resolver.class can be used to override
the hadoop.rpc.protection for a connection at the server side.
</description>
</property>
Encryption in Transit - Cluster
Data
Encryption
Key (DEK)
Envelope Data
Encryption Key
(EDEK)
Hadoop KMS
HDFS Data Transfer Protocol (DTP) – Using HDFS
Transparent encryption enabled ensures automatic
encryption
Encryption in Transit - Cluster
EMRFS
on S3
EMR
Cluster
<property>
<name>dfs.encrypt.data.transfer</name>
<value>true</value>
<description>
Whether or not actual block data that is read/written from/to HDFS should
be encrypted on the wire. This only needs to be set on the NN and DNs,
clients will deduce this automatically. It is possible to override this setting
per connection by specifying custom logic via dfs.trustedchannel.resolver.class.
</description>
</property>
<property>
<name>dfs.encrypt.data.transfer.algorithm</name>
<value></value>
<description>
This value may be set to either "3des" or "rc4". If nothing is set, then
the configured JCE default on the system is used (usually 3DES.) It is
widely believed that 3DES is more cryptographically secure, but RC4 is
substantially faster.
</description>
</property>
Data
Encryption
Key (DEK)
Envelope Data
Encryption Key
(EDEK)
Hadoop KMS
Hadoop Data Transfer Protocol (DTP) configured on
startup with a bootstrap script
Encryption in Transit - Cluster
Hadoop
Encrypted
Shuffle and Sort
Hadoop MapReduce Shuffle - In the shuffle phase, Hadoop MapReduce (MRv2) shuffles the output of
each map task to reducers on different nodes using HTTP by default.
EMR
Cluster
Encryption in Transit - Cluster
EMRFS
on S3
{
"Classification": "mapred-site",
"Properties": {
"mapreduce.shuffle.ssl.enabled": "true",
"mapred.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred",
"mapreduce.cluster.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred",
"mapreduce.application.classpath": "$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,n
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,n /usr/lib/hadoop-lzo/lib/*,n
/usr/share/aws/emr/emrfs/conf,n /usr/share/aws/emr/emrfs/lib/*,n /usr/share/aws/emr/emrfs/auxlib/*,n
/usr/share/aws/emr/lib/*,n /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,n
/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,n /usr/share/aws/emr/kinesis/lib/emr-kinesis-
hadoop.jar,n /usr/share/aws/emr/cloudwatch-sink/lib/*,n /etc/emr/security/conf"
}
Hadoop
Encrypted
Shuffle and Sort
Encryption in Transit - Cluster
EMRFS
on S3
EMR
Cluster
Encryption in Transit - Cluster
Spark block transfer service – This is can be encrypted using SASL encryption in Spark 1.5.1 and later.
{
"Classification": "spark-env",
"Properties": {
"spark.authenticate.enableSaslEncryption": "true",
"spark.network.sasl.serverAlwaysEncrypt": "true"
}
Encryption in Transit
Encryption in Process
Temporary
Space on EBS
Volumes
Temporary
Keys
Bootstrap Script
Encryption in Process
Bootstrap Script
function encrypt_disk() {
local dev=$1
local dir=$2
local cryptname="crypt_${dir:1}"
# Unmount the drive
sudo umount "$dev"
# Encrypt the drive
sudo cryptsetup luksFormat -q --key-file "$PWD_FILE" "$dev"
sudo cryptsetup luksOpen -q --key-file "$PWD_FILE" "$dev" "$cryptname"
# Format the drive
sudo mkfs -t xfs "/dev/mapper/$cryptname"
sudo mount -o defaults,noatime,inode64 "/dev/mapper/$cryptname" "$dir"
sudo rm -rf "$dir/lost+found"
sudo mkdir -p "$dir/encrypted"
sudo chown -R hadoop:hadoop "$dir"
echo "/dev/mapper/$cryptname $dir xfs defaults,noatime,inode64 0 0" |
sudo tee -a /etc/fstab
echo "$cryptname $dev $PWD_FILE" | sudo tee -a /etc/crypttab
}
Temporary
Space on EBS
Volumes
Encryption in Process
HDFS on
EMR ClusterEMRFS on S3
Temporary Space
on EBS Volumes
RPC
Hadoop Encrypted
Shuffle and Sort
Native DTP
Summary of the EMR Encryption Process
EMR Updates
1Strategy blog links
amzn.to/2g0JJIN
September 21st, 2016
bit.ly/1strategy_emr
AWS EMR Encryption Documentation
EMR Updates and how they play into this
Temporary
Space on EBS
Volumes
ElasticSearch for HealthCare
Encryption and AuthenticationElasticSearch
on EC2
Instances
EMRFS on S3
Temporary Space
on EBS Volumes
ElasticSearch on EC2
Instances
ElasticSearch Encryption Process Summary
HIPAA is more than encryption
Auditing & custom tools:
• Audit script to show limited users have access to encrypted S3 data
• S3 Buckets are encrypted
• Show S3 Objects are encrypted
*Working with Cambia to open source these tools
bit.ly/1strategy_emr_code
Demo
Ujjwal Ratan
Solutions Architect, AWS
Ujjwalr@Amazon.com
Machine Learning inside Healthcare
Analyzing Medical Images
Prescription Compliance Prediction
Evidence Based & Precision Medicine
Text classification and mining
Medicare and Medicaid Fraud
Hospital Bed Utilization
Treatment Queries and Suggestions
Drug Discovery and Clinical Trials
Population Health
Vaccination and Immunization
Omics and Clinical Data Integration
Patient Outcomes
Patient Readmission
Prediction through risk
stratification
Real World Problem – Hospital Readmissions
• Hospital Readmission Reduction
Program (HRRP) part of the Affordable
Care Act.
• Centers for Medicare & Medicaid
Services (CMS) required to reduce
payments to hospitals with excess
readmissions.
• Not all readmissions can be prevented
• Facilities with high readmission rates
had their Medicare payment cut by 1%
in 2013 which rose to 2% in 2014.
Source - www.ncbi.nlm.nih.gov/pmc/articles/PMC3558794
Our Focus
Utilizing AWS For Machine Learning (ML)
Continuum of Machine Learning Solutions
• Limited ML Options
• Binary
• Multiclass
• Regression
• Simple to train
• Easy to evaluate
• Quick to deploy
• Comprehensive ML options
• Requires work to train
• No support for evaluation
• Additional work to deploy
• Scalable
• Customizable
Amazon EMR
+ Spark ML
Amazon Machine
Learning
Introducing Amazon Machine Learning (AML)
• Easy to use, managed machine learning
service built for developers
• Robust, powerful machine learning
technology based on Amazon’s internal
systems
• Use your data already stored in the
AWS cloud
• Models in production within seconds
Machine Learning
Proactive Prediction of Readmission
Patient
Demographics
Patient History
Admission
Attributes
Other features
Patient
High Risk Patient
Low Risk Patient
Moderate Risk
Patient
Amazon
S3
Amazon
Redshift
Amazon Machine
Learning
users
Internet
CSV
Files
1 2 3
5
Amazon
Cognito
S3 Static
Website
Internet
4
AML Application for Predicting Readmissions
Clinical Data Set
https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008
• 101,766 rows
• 10 years of clinical care
• 130 US hospitals
• 50+ attributes of diabetes patients and hospital outcomes
Ingesting Data into S3 - Staging
Table Name Table Type
admission_source.csv Master
admission_type.csv Master
discharge_disposition.csv Master
Diabetic_data.csv Transaction
aws s3 cp /tmp/foo/ s3://bucket/ --recursive
Schema in Redshift
Fact
create table admission_type (
admission_type_id INTEGER NOT NULL,
description varchar(100)
);
create table discharge_disposition (
discharge_disposition_id INTEGER NOT NULL,
description VARCHAR(500)
);
create table admission_source (
admission_source_id INTEGER NOT NULL,
description VARCHAR(500)
);
create table diabetes_data (
// ~50 attributes
);
Dim2
Dim3
Dim1
Data Load and Standardization
COPY<Redshift_Table_Name> FROM's3://<file_path.csv>' CREDENTIALS
'aws_access_key_id=<>;aws_secret_access_key=<>’ DELIMITER ',’ IGNOREHEADER 1;
Data Load
• Updated NULL values
• Change attributes values which do not comply with standard patterns.
• ex: Phone = (206) XXX-XXXX
• Complete geographical data where possible
• Include timeline values if possible
• Group granular attributes in sets.
• ex: Ages 0 to 20 as youth, 20 to 40 as adult and so on.
Data Standardization
Create AML Data Source with Redshift
CreateDataSourceFromRedshift API
Console
Real-time Predictions Using API
• Synchronous, low-latency, high-throughput prediction generation
• Request through service API or server or mobile SDKs
• Best for interaction applications that deal with individual data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ml.predict(
ml_model_id=’my_model',
predict_endpoint=’example_endpoint’,
record={’key1':’value1’, ’key2':’value2’})
{
'Prediction': {
'predictedValue': 13.284348,
'details': {
'Algorithm': 'SGD',
'PredictiveModelType': 'REGRESSION’
}
}
}
Application Website Hosted on S3
var machinelearning = new AWS.MachineLearning({apiVersion:
'2014-12-12'});
var params = {
MLModelId: ‘<AML Model ID>',
PredictEndpoint: ‘<AML Model Real Time End Point>',
Record: <Selected Attributes record set>
};
var request = machinelearning.predict(params);
Application calls the Predict() API using necessary parameters
Website hosting in S3 without web servers eliminates complexities of
scaling hardware based on traffic routed to your application.
bit.ly/aml_demo - Demo bit.ly/hcl301_blog - Blog
Expanded Architecture
Amazon
S3
Amazon
Redshift
Amazon Machine
Learning Amazon
EC2
Amazon
EMR
users
Internet
Corporate Data Center
Make data suitable to acting as
an ML data source
An ML model is
created with Redshift
as the data source
EC2 as a frontend
for AML end point
Process unstructured and
semi-structured data
Data Lake
Amazon
S3
Amazon
QuickSight
Amazon
RDS users
Batch prediction
generated and
stored in S3
DB Schemas
CSV Files
Unstructured files
QuickSight
generates BI reports
on prediction data.
An RDS schema
acts as a source
for QuickSight
Thank you!
Join us tonight at the Health Care happy hour
sponsored by Cambia Health Solutions,
8KMiles.com and AWS at:
Japonais restaurant in the Mirage
on Monday 11/28 from 6-8 PM
AWS and Cambia are co-presenting:
SEC305 – Scaling Security Resources for
Your First 10 Million Customers
Tuesday, Nov 29, 12:30 PM - 1:30 PM
Do you want to know
more about how to
secure health data?
Remember to complete
your evaluations!

More Related Content

What's hot

The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016Amazon Web Services
 
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...Amazon Web Services
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Amazon Web Services
 
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesAmazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data AnalyticsAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
 
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTHow EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTAmazon Web Services
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSAmazon Web Services
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Amazon Web Services
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 

What's hot (20)

The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016The New Normal - AWSome Day Zurich 112016
The New Normal - AWSome Day Zurich 112016
 
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
AWS Big Data and Analytics Services Speed Innovation | AWS Public Sector Summ...
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017
 
Securing Your Big Data on AWS
Securing Your Big Data on AWSSecuring Your Big Data on AWS
Securing Your Big Data on AWS
 
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTHow EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
 
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017 Full Stack Analytics on AWS - AWS Summit Cape Town 2017
Full Stack Analytics on AWS - AWS Summit Cape Town 2017
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 

Viewers also liked

Splunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior AnalyticsSplunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior AnalyticsSplunk
 
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...Amazon Web Services
 
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...Amazon Web Services
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisAnton Chuvakin
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...Amazon Web Services
 
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sCloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sHarish Ganesan
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)Lynn Cherny
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)Amazon Web Services
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...Amazon Web Services
 
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)Amazon Web Services
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)Amazon Web Services
 
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...Amazon Web Services
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...Amazon Web Services
 
Turning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced AnalyticsTurning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced AnalyticsCraig Rhinehart Rhinehart
 
Healthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of AnalyticsHealthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of AnalyticsHealth Catalyst
 

Viewers also liked (20)

Splunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior AnalyticsSplunk for Enterprise Security featuring User Behavior Analytics
Splunk for Enterprise Security featuring User Behavior Analytics
 
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
 
Log Data Mining
Log Data MiningLog Data Mining
Log Data Mining
 
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
AWS re:Invent 2016: Case Study: Data-Heavy Healthcare: UPMCe’s Transformative...
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
AWS re:Invent 2016: Security Automation: Spend Less Time Securing Your Applic...
 
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sCloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2's
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)AWS re:Invent 2016: Datapipe Open Source:  Image Development Pipeline (ARC319)
AWS re:Invent 2016: Datapipe Open Source: Image Development Pipeline (ARC319)
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
 
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
AWS re:Invent 2016: Extending Datacenters to the Cloud: Connectivity Options ...
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
 
Turning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced AnalyticsTurning Big Data Insights into Action through Advanced Analytics
Turning Big Data Insights into Action through Advanced Analytics
 
Healthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of AnalyticsHealthcare 2.0: The Age of Analytics
Healthcare 2.0: The Age of Analytics
 

Similar to AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS

Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceAmazon Web Services
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
Fraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningFraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningAmazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersAmazon Web Services
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersAmazon Web Services
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Tom Laszewski
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobAmazon Web Services LATAM
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best PracticesAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWSDanilo Poccia
 
Fraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSFraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSAmazon Web Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Amazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 

Similar to AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS (20)

Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Fraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningFraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine Learning
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security
 
Builders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right JobBuilders Day' - Databases on AWS: The Right Tool for The Right Job
Builders Day' - Databases on AWS: The Right Tool for The Right Job
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
Fraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSFraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWS
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014Masterclass Webinar: Amazon DynamoDB July 2014
Masterclass Webinar: Amazon DynamoDB July 2014
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Arnoud Otte, Assistant Director Cloud & Data Architecture, Cambia Health Solutions Rich Uhl, CTO / Founder, 1Strategy Ujjwal Ratan, Solutions Architect, AWS November 28, 2016 HLC301 Data Science and Healthcare: Running Large Scale Analytics and Machine Learning on AWS
  • 2. What to Expect from the Session • Benefits from large-scale analytics with PHI - Arnoud • Securing Amazon EMR & Elasticsearch - Rich • Additional solution components for HIPAA compliance [demo] - Rich • Reducing cost and improve quality of care with Amazon Machine Learning [demo] - Ujjwal NOTE: This is a deep dive session on HOW rather than WHAT. We will show implementation details. • This session expects familiarity with: • AWS services - EMR and S3  BDM401 - Deep Dive: Amazon EMR Best Practices & Design Patterns  BDA206 - Building Big Data Applications with the AWS Big Data Platform • Encryption and distributed systems like Hadoop and Elasticsearch
  • 3. Arnoud Otte Assistant Director Cloud & Data Architecture Arnoud.Otte@CambiaHealth.com
  • 4. Cambia Health Solutions Our Roots Born from an inspired idea Our Cause Becoming catalysts for transformation Our Vision Delivering a reimagined health care experience
  • 6. Architecture Amazon CloudWatch AWS CloudTrail AWS IAM Cambia Data Center Amazon S3 Amazon DynamoDB AWS Lambda Amazon EMR Amazon Elasticsearch Service Data Lake Metadata Security Amazon Redshift Amazon EMR Data Science & Analytics Amazon EMR Master Data Management
  • 7. Master Data Management Source A Source B First Name John John Last Name Doe Doe DOB 1970-01-01 2016-11-28 Street 105 Main St 105 Main St City Portland Portland State OR OR Source A Source B First Name Jillian Jill Last Name Doe Doe-Doe SSN 123-45-6789 123-45-6789 Street 605 Oak Dr 105 Main Street City PDX Portland State OR Oregon No. Fatherandson. Yes.Married,changedname,andmoved. This is artificial data fabricated for illustration purposes only. Are these the same people?
  • 8. Master Data Management – Approach Demographics Laboratory Pharmaceutics Geography Claims Composite record of best values Cambia Match and Merge on Amazon EMR
  • 9. Master Data Management – Quality 98.50% 99.90% 99.99% 97.5% 98.0% 98.5% 99.0% 99.5% 100.0% Match Correctness Vendor Cambia V1 Cambia V1.1 98.80% 84.30% 98.10% 75.0% 80.0% 85.0% 90.0% 95.0% 100.0% Match Completeness Vendor Cambia V1 Cambia V1.1 7,000+ records containing 1,600+ matches Manually checked and confirmed in the real world
  • 10. Master Data Management – Performance 90 minutes 40 minutes 0 500 1000 1500 2000 2500 minutes Run time Vendor Cambia V1 Cambia V1.1 2160 minutes or 36 hours 17.7M records containing 1.8M matches
  • 11. Next Steps Scale in and out or up and down Amazon Machine Learning Amazon EMR Build out healthcare data science models HIPAA compliant search on data Amazon EC2
  • 12. SecurityBig Data 1Strategy.com | @1strategy_cloud | Booth #408 Rich Uhl Founder & CTO Rich@1Strategy.com
  • 13. At Rest – when data is in a stored location Definition of Terms In Transit – when data is moved to and from storage In Process – when data is in temporary space for processing state
  • 14. Architecture Amazon CloudWatch AWS CloudTrail AWS IAM Cambia Data Center Amazon S3 Amazon DynamoDB AWS Lambda Amazon EMR Amazon Elasticsearch Service Data Lake Metadata Security Amazon Redshift Amazon EMR Data Science & Analytics Amazon EMR Master Data Management
  • 15. AWS KMS Encryption Keys Exchanging Keys Temporary KeysMaster Key Key Management
  • 17. EMRFS on S3 EMRFS on S3 – This is achieved via s3 client-side encryption with AWS KMS. HDFS – via Hadoop File System (HDFS) transparent data encryption as described in the Apache Docs. HDFS on EMR Cluster Config File Encrypted Encryption at Rest
  • 18. { "Sid": "DenyUnEncryptedObjectUploads", "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::prd-datalake/*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "AES256" } } } EMRFS on S3 Encryption at Rest
  • 19. Data Encryption Key (DEK) Envelope Data Encryption Key (EDEK) Hadoop KMS Bootstrap Script Uses native Hadoop HDFS Transparent Data Encryption (DEK/EDEK) HDFS on EMR Cluster Encryption at Rest
  • 20. { "Classification": "hdfs-site", "Properties": { "dfs.encryption.key.provider.uri": "kms://…”, "dfs.namenode.name.dir": "file:///…", "dfs.name.dir": "/mnt/encrypted/…", "dfs.data.dir": "/mnt/encrypted/…", "dfs.datanode.data.dir": "file:///…" } Bootstrap Script HDFS on EMR Cluster Encryption at Rest
  • 21. EMRFS on S3 HDFS on EMR Cluster Summary of Encryption at Rest
  • 23. HDFS on EMR Cluster EMRFS on S3 Encryption in Transit
  • 24. EMRFS on S3 HDFS on EMR Cluster Encryption in Transit <!-- Client certificate Store --> <property> <name>ssl.client.keystore.type</name> <value>jks</value> </property> <property> <name>ssl.client.keystore.location</name> <value>/etc/emr/security/ssl/keystore.jks</value> </property> <property> <name>ssl.client.keystore.password</name> <value>changeit</value> </property> <!-- Client Trust Store --> <property> <name>ssl.client.truststore.type</name> <value>jks</value> </property> <property> <name>ssl.client.truststore.location</name> <value>/etc/emr/security/ssl/truststore.jks</value> </property> <property> <name>ssl.client.truststore.password</name> <value>changeit</value> </property> <property> <name>ssl.client.truststore.reload.interval</name> <value>10000</value> </property> </configuration>
  • 25. Three areas to address 1. Hadoop RPC - Hadoop RPC is used by API clients of MapReduce 2. HDFS DTP - HDFS Transparent encryption this traffic is automatically encrypted 3. Hadoop MapReduce Shuffle - MapReduce shuffles and sorts the output of each map task to reducers on different nodes HDFS on EMR Cluster Encryption in Transit - Cluster
  • 26. RPC client Hadoop RPC - Hadoop RPC is used by API clients of MapReduce EMR Cluster EMRFS on S3 Encryption in Transit - Cluster
  • 27. RPC client <property> <name>hadoop.security.service.user.name.key</name> <value></value> <description> For those cases where the same RPC protocol is implemented by multiple servers, this configuration is required for specifying the principal name to use for the service when the client wishes to make an RPC call. </description> </property> <property> <name>hadoop.rpc.protection</name> <value>authentication</value> <description>A comma-separated list of protection values for secured sasl connections. Possible values are authentication, integrity and privacy. authentication means authentication only and no integrity or privacy; integrity implies authentication and integrity are enabled; and privacy implies all of authentication, integrity and privacy are enabled. hadoop.security.saslproperties.resolver.class can be used to override the hadoop.rpc.protection for a connection at the server side. </description> </property> Encryption in Transit - Cluster
  • 28. Data Encryption Key (DEK) Envelope Data Encryption Key (EDEK) Hadoop KMS HDFS Data Transfer Protocol (DTP) – Using HDFS Transparent encryption enabled ensures automatic encryption Encryption in Transit - Cluster EMRFS on S3 EMR Cluster
  • 29. <property> <name>dfs.encrypt.data.transfer</name> <value>true</value> <description> Whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. This only needs to be set on the NN and DNs, clients will deduce this automatically. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class. </description> </property> <property> <name>dfs.encrypt.data.transfer.algorithm</name> <value></value> <description> This value may be set to either "3des" or "rc4". If nothing is set, then the configured JCE default on the system is used (usually 3DES.) It is widely believed that 3DES is more cryptographically secure, but RC4 is substantially faster. </description> </property> Data Encryption Key (DEK) Envelope Data Encryption Key (EDEK) Hadoop KMS Hadoop Data Transfer Protocol (DTP) configured on startup with a bootstrap script Encryption in Transit - Cluster
  • 30. Hadoop Encrypted Shuffle and Sort Hadoop MapReduce Shuffle - In the shuffle phase, Hadoop MapReduce (MRv2) shuffles the output of each map task to reducers on different nodes using HTTP by default. EMR Cluster Encryption in Transit - Cluster EMRFS on S3
  • 31. { "Classification": "mapred-site", "Properties": { "mapreduce.shuffle.ssl.enabled": "true", "mapred.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred", "mapreduce.cluster.local.dir": "/mnt/encrypted/mapred,/mnt1/encrypted/mapred", "mapreduce.application.classpath": "$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,n $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,n /usr/lib/hadoop-lzo/lib/*,n /usr/share/aws/emr/emrfs/conf,n /usr/share/aws/emr/emrfs/lib/*,n /usr/share/aws/emr/emrfs/auxlib/*,n /usr/share/aws/emr/lib/*,n /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar,n /usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar,n /usr/share/aws/emr/kinesis/lib/emr-kinesis- hadoop.jar,n /usr/share/aws/emr/cloudwatch-sink/lib/*,n /etc/emr/security/conf" } Hadoop Encrypted Shuffle and Sort Encryption in Transit - Cluster
  • 32. EMRFS on S3 EMR Cluster Encryption in Transit - Cluster Spark block transfer service – This is can be encrypted using SASL encryption in Spark 1.5.1 and later.
  • 33. { "Classification": "spark-env", "Properties": { "spark.authenticate.enableSaslEncryption": "true", "spark.network.sasl.serverAlwaysEncrypt": "true" } Encryption in Transit
  • 36. Bootstrap Script function encrypt_disk() { local dev=$1 local dir=$2 local cryptname="crypt_${dir:1}" # Unmount the drive sudo umount "$dev" # Encrypt the drive sudo cryptsetup luksFormat -q --key-file "$PWD_FILE" "$dev" sudo cryptsetup luksOpen -q --key-file "$PWD_FILE" "$dev" "$cryptname" # Format the drive sudo mkfs -t xfs "/dev/mapper/$cryptname" sudo mount -o defaults,noatime,inode64 "/dev/mapper/$cryptname" "$dir" sudo rm -rf "$dir/lost+found" sudo mkdir -p "$dir/encrypted" sudo chown -R hadoop:hadoop "$dir" echo "/dev/mapper/$cryptname $dir xfs defaults,noatime,inode64 0 0" | sudo tee -a /etc/fstab echo "$cryptname $dev $PWD_FILE" | sudo tee -a /etc/crypttab } Temporary Space on EBS Volumes Encryption in Process
  • 37. HDFS on EMR ClusterEMRFS on S3 Temporary Space on EBS Volumes RPC Hadoop Encrypted Shuffle and Sort Native DTP Summary of the EMR Encryption Process
  • 38. EMR Updates 1Strategy blog links amzn.to/2g0JJIN September 21st, 2016 bit.ly/1strategy_emr AWS EMR Encryption Documentation
  • 39. EMR Updates and how they play into this
  • 40.
  • 41. Temporary Space on EBS Volumes ElasticSearch for HealthCare Encryption and AuthenticationElasticSearch on EC2 Instances
  • 42. EMRFS on S3 Temporary Space on EBS Volumes ElasticSearch on EC2 Instances ElasticSearch Encryption Process Summary
  • 43. HIPAA is more than encryption Auditing & custom tools: • Audit script to show limited users have access to encrypted S3 data • S3 Buckets are encrypted • Show S3 Objects are encrypted *Working with Cambia to open source these tools bit.ly/1strategy_emr_code
  • 44. Demo
  • 45. Ujjwal Ratan Solutions Architect, AWS Ujjwalr@Amazon.com
  • 46. Machine Learning inside Healthcare Analyzing Medical Images Prescription Compliance Prediction Evidence Based & Precision Medicine Text classification and mining Medicare and Medicaid Fraud Hospital Bed Utilization Treatment Queries and Suggestions Drug Discovery and Clinical Trials Population Health Vaccination and Immunization Omics and Clinical Data Integration Patient Outcomes Patient Readmission Prediction through risk stratification
  • 47. Real World Problem – Hospital Readmissions • Hospital Readmission Reduction Program (HRRP) part of the Affordable Care Act. • Centers for Medicare & Medicaid Services (CMS) required to reduce payments to hospitals with excess readmissions. • Not all readmissions can be prevented • Facilities with high readmission rates had their Medicare payment cut by 1% in 2013 which rose to 2% in 2014. Source - www.ncbi.nlm.nih.gov/pmc/articles/PMC3558794
  • 48. Our Focus Utilizing AWS For Machine Learning (ML) Continuum of Machine Learning Solutions • Limited ML Options • Binary • Multiclass • Regression • Simple to train • Easy to evaluate • Quick to deploy • Comprehensive ML options • Requires work to train • No support for evaluation • Additional work to deploy • Scalable • Customizable Amazon EMR + Spark ML Amazon Machine Learning
  • 49. Introducing Amazon Machine Learning (AML) • Easy to use, managed machine learning service built for developers • Robust, powerful machine learning technology based on Amazon’s internal systems • Use your data already stored in the AWS cloud • Models in production within seconds
  • 50. Machine Learning Proactive Prediction of Readmission Patient Demographics Patient History Admission Attributes Other features Patient High Risk Patient Low Risk Patient Moderate Risk Patient
  • 51. Amazon S3 Amazon Redshift Amazon Machine Learning users Internet CSV Files 1 2 3 5 Amazon Cognito S3 Static Website Internet 4 AML Application for Predicting Readmissions
  • 52. Clinical Data Set https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008 • 101,766 rows • 10 years of clinical care • 130 US hospitals • 50+ attributes of diabetes patients and hospital outcomes
  • 53. Ingesting Data into S3 - Staging Table Name Table Type admission_source.csv Master admission_type.csv Master discharge_disposition.csv Master Diabetic_data.csv Transaction aws s3 cp /tmp/foo/ s3://bucket/ --recursive
  • 54. Schema in Redshift Fact create table admission_type ( admission_type_id INTEGER NOT NULL, description varchar(100) ); create table discharge_disposition ( discharge_disposition_id INTEGER NOT NULL, description VARCHAR(500) ); create table admission_source ( admission_source_id INTEGER NOT NULL, description VARCHAR(500) ); create table diabetes_data ( // ~50 attributes ); Dim2 Dim3 Dim1
  • 55. Data Load and Standardization COPY<Redshift_Table_Name> FROM's3://<file_path.csv>' CREDENTIALS 'aws_access_key_id=<>;aws_secret_access_key=<>’ DELIMITER ',’ IGNOREHEADER 1; Data Load • Updated NULL values • Change attributes values which do not comply with standard patterns. • ex: Phone = (206) XXX-XXXX • Complete geographical data where possible • Include timeline values if possible • Group granular attributes in sets. • ex: Ages 0 to 20 as youth, 20 to 40 as adult and so on. Data Standardization
  • 56. Create AML Data Source with Redshift CreateDataSourceFromRedshift API Console
  • 57. Real-time Predictions Using API • Synchronous, low-latency, high-throughput prediction generation • Request through service API or server or mobile SDKs • Best for interaction applications that deal with individual data records >>> import boto >>> ml = boto.connect_machinelearning() >>> ml.predict( ml_model_id=’my_model', predict_endpoint=’example_endpoint’, record={’key1':’value1’, ’key2':’value2’}) { 'Prediction': { 'predictedValue': 13.284348, 'details': { 'Algorithm': 'SGD', 'PredictiveModelType': 'REGRESSION’ } } }
  • 58.
  • 59. Application Website Hosted on S3 var machinelearning = new AWS.MachineLearning({apiVersion: '2014-12-12'}); var params = { MLModelId: ‘<AML Model ID>', PredictEndpoint: ‘<AML Model Real Time End Point>', Record: <Selected Attributes record set> }; var request = machinelearning.predict(params); Application calls the Predict() API using necessary parameters Website hosting in S3 without web servers eliminates complexities of scaling hardware based on traffic routed to your application. bit.ly/aml_demo - Demo bit.ly/hcl301_blog - Blog
  • 60. Expanded Architecture Amazon S3 Amazon Redshift Amazon Machine Learning Amazon EC2 Amazon EMR users Internet Corporate Data Center Make data suitable to acting as an ML data source An ML model is created with Redshift as the data source EC2 as a frontend for AML end point Process unstructured and semi-structured data Data Lake Amazon S3 Amazon QuickSight Amazon RDS users Batch prediction generated and stored in S3 DB Schemas CSV Files Unstructured files QuickSight generates BI reports on prediction data. An RDS schema acts as a source for QuickSight
  • 62. Join us tonight at the Health Care happy hour sponsored by Cambia Health Solutions, 8KMiles.com and AWS at: Japonais restaurant in the Mirage on Monday 11/28 from 6-8 PM AWS and Cambia are co-presenting: SEC305 – Scaling Security Resources for Your First 10 Million Customers Tuesday, Nov 29, 12:30 PM - 1:30 PM Do you want to know more about how to secure health data?