Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze data while ensuring technical safeguards that help you remain in compliance.
3. Big Data Analytics in Health and Life Sciences
Now: Disparate
streams of data
Next: Integrated
computing and data
Genomics
Clinical
Claims &
transactions
Meds &
labs
Patient
experience
Personal
data
Better decisions and outcomes at
reduced cost
Clinical Analysis
Genomic Analysis
From population- to person-based
treatment
4. Cost Savings via Big Data Analytics
Provider
Patient
Payer
Producer
Regulator
Personalized medicine
Data-driven adherence
Proven Pathways of care
Co-ordinated across providers
Shift volume to right setting
Reducing ER (re)admit rates
Provider / performance transparency
& payment innovation
Accelerated Approval
Accelerated Discovery
$180B
$100B$100B
$70B
6. Technical Safeguards
Access Control A covered entity must implement technical policies and
procedures that allow only authorized persons to access
electronic protected health information (e-PHI).
Audit Controls A covered entity must implement hardware, software, and/or
procedural mechanisms to record and examine access and
other activity in information systems that contain or use e-PHI.
Integrity Controls A covered entity must implement policies and procedures to
ensure that e-PHI is not improperly altered or destroyed.
Electronic measures must be put in place to confirm that e-PHI
has not been improperly altered or destroyed.
Transmission Security A covered entity must implement technical security measures
that guard against unauthorized access to e-PHI that is being
transmitted over an electronic network.
18. Protect Hadoop APIs
• Enforces consistent security policies across all Hadoop
services
• Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS
APIs
• Common Criteria EAL4+, HSM, FIPS 140-2 certified
• Deploys as software, virtual appliance, or hardware appliance
• Available on AWS Marketplace
Hcatalog
Stargate
WebHDFS
19. Provide role-based access control
AuthZ
• File, table, and cell-level
access control in HBase
• JIRA HBASE-6222:
Add per-KeyValue security
_acl_table
20. Provide encryption for data at rest
MapReduce
RecordReader
Map
Combiner
Partitioner
Local
Merge & Sort
Reduce
RecordWriter
Decrypt
Encrypt
Derivative
Encrypt
Derivative
Decrypt
HDFS
• Extends compression
codec into crypto codec
• Provides an abstract API
for general use
22. Pig & Hive Encryption
• Pig Encryption Capabilities
– Support of text file and Avro* file format
– Intermediate job output file protection
– Pluggable key retrieving and key resolving
– Protection of key distribution in cluster
• Hive Encryption Capabilities
– Support of RC file and Avro file format
– Intermediate and final output data encryption
– Encryption is transparent to end user without changing existing SQL
23. Crypto Codec Framework
• Extends compression codec
• Establishes a common abstraction of the API level that can be shared
by all crypto codec implementations
CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf);
CryptoContext cryptoContext = new CryptoContext();
...
cryptoCodec.setCryptoContext(cryptoContext);
CompressionInputStream input = cryptoCodec.createInputStream(inputStream);
...
• Provides a foundation for other components in Hadoop* such as
MapReduce or HBase* to support encryption features
24. Key Distribution
• Enabling crypto codec in a MapReduce job
• Enabling different key storage or management systems
• Allowing different stages and files to use different keys
• API to integrate with external key manage system
26. Intel® Data Protection Technology
AES-NI
• Processor assistance for
performing AES encryption
• Makes enabled encryption
software faster and stronger
Secure Key (DRNG)
• Processor-based true random
number generator
• More secure, standards
compliance, high performance
Internet
Data in Motion
Secure transactions used
pervasively in ecommerce,
banking, etc.
Data in Process
Most enterprise and cloud applications
offer encryption options to secure
information and protect confidentiality
Data at Rest
Full disk encryption software
protects data while saving to disk
AES-NI - Advanced Encryption Standard New Instructions
Secure Key - previously known as Intel Digital
Random Number Generator (DRNG)
27. Intel® AES-NI Accelerated Encryption
18.2x/19.8x
Non Intel®
AES-NI
With Intel®
AES-NI
Intel® AES-NI
Multi-Buffer
5.3x/19.8x
Encryption
Decryption
Encryption
Decryption
AES-NI - Advanced Encryption Standard New Instructions
20X
Faster
Crypto
Relative speed of crypto functions
Higher is better
Based on Intel tests
28. Cloud Platform for secure Hadoop
Intel® Xeon® Processors
• E7 Family
• E5 Family
• E3 Family
Amazon
• EC2 Reserved Instances
• EC2 Dedicated Instances
29. 20 more at aws.amazon.com/ec2/instance-types
Amazon EC2 Instances with AES-NI
31. For more information
• intel.com/bigdata
• intel.com/healthcare/bigdata
• github.com/intel-hadoop/project-rhino/
• aws.amazon.com/compliance/
• aws.amazon.com/ec2/instance-types/