SlideShare a Scribd company logo
1 of 40
Download to read offline
Migrate and Modernize Hadoop-
Based Security Policies for
Databricks
Steve Touw
CTO, Immuta
Can I just migrate my Apache Ranger/Sentry
Policies Directly to [Databricks]?
[presto]
[synapse]
[snowflake]
[starburst]
[etc…]
Can I just migrate my Apache Ranger/Sentry
Policies Directly to [Databricks]?
Migrate
Modernize
Yes!
No!
How do I get to
Yes for both?
(that’s what this talk
is about…)
Why Modernize?
2012 - Development of
Cloudera Access (later
renamed to Sentry) starts
2013 - XA Secure created,
later acquired by Hortonworks
A lot has changed in 8 years...
Hadoop is No Longer The Center of the Universe
Multi-cloud, Multi-
compute
Managing compute-specific controls across
more than one of these systems is not
feasible
Data Protection Laws of the World...Growing
https://www.dlapiperdataprotection.com/
WHY IMMUTA
1990 2025
Privacy Rules & Regulations driving
data “fuel crisis”
Compliant
Data for
Analytics
HIPAA
(1996)
GDPR
(2018)
CCPA
(2020)
GLBA
(1999)
HITECH
(2009)
350+
Privacy & Infosec
Bills Proposed
The Data
“Fuel” Crisis
DataLegallyUsableforAnalytics
WHY IMMUTA
We need to secure
our data.
I need to use
our data.
LEGAL / COMPLIANCE
DATA ANALYSTS
& SCIENTISTS
So the data “tug of war” has begun…
DATA
DATA PLATFORM OWNER / DATA
ENGINEERING
More Complexity, Changing Definitions of Privacy
Preservation
Language from CCPA (and other similar language in GDPR)
“1798.145(a)(5): The obligations imposed on businesses by this title shall not restrict a
business’ ability to collect, use, retain, sell, or disclose consumer information that is
deidentified or in the aggregate consumer information.”
Meaning, if you deidentify/anonymize the data, CCPA doesn’t apply, yay!
But, nothing in life is free…
PI is defined as information "that identifies, relates to, describes, is capable of being
associated with, or could reasonably be linked." !!!!!
How to balance the speed of the business with secure access to sensitive data?
The Privacy vs Utility Tradeoff
FULL PRIVACY FULL UTILITY
Closed Open
THE RISK OF DATA USE
Sweet
spot
More stringent
definitions are
swinging the
pendulum here
Momentum
LEGAL / COMPLIANCE
DATA ANALYSTS
& SCIENTISTS
The World has Changed.
We are in:
The “Cloud Private Data Era”
More regulatory and privacy
concerns
More stringent definitions of
privacy preservation
Complex data platform
ecosystem
The “Cloud Private Data Era” Has Created a Role Tidal
Wave
More regulatory and privacy
concerns
More stringent definitions of
privacy preservation
Complex data platform
ecosystem
Role Explosion Example (Real Customer Use Case)
Each row-level policy in
Ranger is tied to an
individual role - but they
are all doing the “same
thing”
If you want to show new
data, you need a new Role
and a new Policy
This isn’t just Ranger -
think AWS IAM Roles too!
redacted
redacted
redacted
redacted
redacted
redacted
redacted
redacted
user associated
to role
the exact same policy written
over and over again
the only change: the role
Role-Based Access Control (RBAC) is Broken
▪ RBAC should really be named “Static-
based Access Control”
▪ It’s like writing code without being
able to use variables!
2012 - Development of
Cloudera Access (later
renamed to Sentry) starts
2013 - XA Secure created,
later acquired by Hortonworks
Conceived Before the Cloud
Private Data Era
You Must Do Both…
If You Don’t, You Won’t Realize the Benefits of
the Cloud
Migrate
Modernize
Yes!
Yes!
Let’s Cover How To Fix Each of These...
Attribute-based Access
Control (ABAC)
Privacy Enhancing
Technologies (PETs)
Separation of Policy from
Platform
More regulatory and privacy
concerns
More stringent definitions of
privacy preservation
Complex data platform
ecosystem
Let’s Cover How To Fix Each of These...
Attribute-based Access
Control (ABAC)
Privacy Enhancing
Technologies (PETs)
Separation of Policy from
Platform
More regulatory and privacy
concerns
More stringent definitions of
privacy preservation
Complex data platform
ecosystem
Separate Policy from Platform
Just like the big data era required the separation of compute
from storage, the private data era requires the separation of
policy from platform.
This allows defining policy externally from the platform and
executing enforcement live in the platform without creating data
copies/views.
● Table access controls
● Column level controls
● Row level security
● Cell-level controls
In a consistent manner,
no matter your compute
You Must Also Separate Policy from Physical
Thousands of
tables and
columns
PoliciesThousands of
policies
Abstract with
logical metadata
PII, PHI, Address, SSN, etc...
Very few,
understandable,
policies
Let’s Cover How To Fix Each of These...
Attribute-based Access
Control (ABAC)
Privacy Enhancing
Technologies (PETs)
Separation of Policy from
Platform
More regulatory and privacy
concerns
More stringent definitions of
privacy preservation
Complex data platform
ecosystem
Remember This?
▪ RBAC should really be
named “Static-based
Access Control”
▪ It’s like writing code
without being able to use
variables!
Wouldn’t it have been nice to just
write this with a variable and have
the policy dynamically defined at
RUN TIME?
organization_name IN
(SELECT org_name from redacted
WHERE role IN (@role))
▪ This is ABAC and it really
should be called
“Dynamic-based Access
Control”
Ranger/Hortonworks Real Customer Example
They had 8 rules per
table times 12 tables
for a total of 96
rules!
redacted
redacted
redacted
redacted
redacted
redacted
redacted
redacted
user associated
to role
the exact same policy written
over and over again
the only change: the role
With ABAC/Immuta, It’s a Single Policy!
This is because it separates the user
details from the policy and treats them as a
read-time variable. This also future-proofs
the policy.
We can also build the rule once and have it
apply to all 12 tables with our logical
metadata layer (discussed previously).
This also future-proofs adding new tables.
Let’s Cover How To Fix Each of These...
Attribute-based Access
Control (ABAC)
Privacy Enhancing
Technologies (PETs)
Separation of Policy from
Platform
More regulatory and privacy
concerns
More stringent definitions of
privacy preservation
Complex data platform
ecosystem
How to balance the speed of the business with secure access to sensitive data?
How Do We Hit The Privacy vs Utility Sweet Spot?
FULL PRIVACY FULL UTILITY
Closed Open
THE RISK OF DATA USE
Sweet
spot
LEGAL / COMPLIANCE
DATA ANALYSTS
& SCIENTISTS
I know stuff about Judd and Leslie
photo credit: Gawker
New York Taxi & Limousine Commission
• Data was released containing taxi pickups,
dropoffs, location, time, amount, and tip
amount, among others
• This seems pretty harmless?
Well, Judd and Leslie May Not Think It’s Harmless
• This photos was geotagged (with time), so
by simply querying by medallion and time,
we know how much Judd and Leslie tip!
Limit
Features
Limit
Records
Limit
Functions
Reduced specificity
Regular Expressions for strings
Rounding for numeric data
Column restriction
Hide or replace values with
NULL
Row restrictions
Restrict access to certain
types of rows
Differential Privacy
Inject noise into aggregate
measures based on privacy
guarantees
Hashing/Encryption Local DP
Randomly alter a percentage
of data
Aggregate-Only
Only allow aggregate
functions on data
K-anonymization
Suppress values that can lead
to linkage attacks
Taxi data properly anonymized while
providing utility
Generalize: remove
precision from time
and space
Randomize: replace
with false but
legitimate values at
a specified rate
Mask:
using salted
deterministic hash
Direct Identifier: Indirect Identifiers: Sensitive
Attack occurs when the
potential for re-identification
exists. Factors include:
● Access
● External Knowledge
● Incentives
Attack Event (A) represents
the probability that an attack
occurs
Success Event (S)
represents the probability
that an attack is successful
Terminology
BACKGROUND
Attack
A
S
Data Risk
Risk
Mitigation modify data to limit
the ability of an adversary to
make inferences
Inferences
● Record ownership
● Participation
● Attribute Values
Techniques
● k-Anon
● LDP
● DP
● Masking
A
S
Context Risk
Risk
A
A
S
Mitigation “shrinks” the
attack surface.
Controls
● Limiting Access
● Limiting types of Queries
● Purpose Limitations
● Agreements
● Creating Disincentives
● Training
A
S
A
S
A
S
Risk Utility Risk Utility Risk Utility
Ok, but I put all this effort into
Sentry / Ranger, this seems
like a big change...
Migration Utility from Ranger/Sentry → Immuta
Migrate policies but
also modernize
DEMO...
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

What's hot

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Data Modeling and Relational to NoSQL
 Data Modeling and Relational to NoSQL  Data Modeling and Relational to NoSQL
Data Modeling and Relational to NoSQL DATAVERSITY
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Knoldus Inc.
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityDATAVERSITY
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data Modeling and Relational to NoSQL
 Data Modeling and Relational to NoSQL  Data Modeling and Relational to NoSQL
Data Modeling and Relational to NoSQL
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
 
Data mesh
Data meshData mesh
Data mesh
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 

Similar to Migrate and Modernize Hadoop-Based Security Policies for Databricks

How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeDavid Linthicum
 
Whose Cloud is It Anyway - Data Security in the Cloud
Whose Cloud is It Anyway - Data Security in the CloudWhose Cloud is It Anyway - Data Security in the Cloud
Whose Cloud is It Anyway - Data Security in the CloudSafeNet
 
Cloud basics for pen testers, red teamers, and defenders
Cloud basics for pen testers, red teamers, and defendersCloud basics for pen testers, red teamers, and defenders
Cloud basics for pen testers, red teamers, and defendersGerald Steere
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...Amazon Web Services
 
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)Denodo
 
Karen's Favourite Features of SQL Server 2016
Karen's Favourite Features of  SQL Server 2016Karen's Favourite Features of  SQL Server 2016
Karen's Favourite Features of SQL Server 2016Karen Lopez
 
3 Reasons Why The Host Rules Intrusion Detection in The Cloud
3 Reasons Why The Host Rules Intrusion Detection in The Cloud 3 Reasons Why The Host Rules Intrusion Detection in The Cloud
3 Reasons Why The Host Rules Intrusion Detection in The Cloud Threat Stack
 
Cloud Computing Overview
Cloud Computing OverviewCloud Computing Overview
Cloud Computing OverviewDoug Allen
 
The security of SAAS and private cloud
The security of SAAS and private cloudThe security of SAAS and private cloud
The security of SAAS and private cloudAzure Group
 
Herding cats in the Cloud
Herding cats in the CloudHerding cats in the Cloud
Herding cats in the CloudDewey Sasser
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Denodo
 
Modern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas JellemaModern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas JellemaLucas Jellema
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
DDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing StandardDDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing StandardAngelo Corsaro
 
Presentation on Cloud Mashups
Presentation on Cloud MashupsPresentation on Cloud Mashups
Presentation on Cloud MashupsMichael Heydt
 

Similar to Migrate and Modernize Hadoop-Based Security Policies for Databricks (20)

How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
 
Whose Cloud is It Anyway - Data Security in the Cloud
Whose Cloud is It Anyway - Data Security in the CloudWhose Cloud is It Anyway - Data Security in the Cloud
Whose Cloud is It Anyway - Data Security in the Cloud
 
Cloud basics for pen testers, red teamers, and defenders
Cloud basics for pen testers, red teamers, and defendersCloud basics for pen testers, red teamers, and defenders
Cloud basics for pen testers, red teamers, and defenders
 
NoSQL
NoSQLNoSQL
NoSQL
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
 
Amazon cloud service
Amazon cloud serviceAmazon cloud service
Amazon cloud service
 
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
 
Karen's Favourite Features of SQL Server 2016
Karen's Favourite Features of  SQL Server 2016Karen's Favourite Features of  SQL Server 2016
Karen's Favourite Features of SQL Server 2016
 
3 Reasons Why The Host Rules Intrusion Detection in The Cloud
3 Reasons Why The Host Rules Intrusion Detection in The Cloud 3 Reasons Why The Host Rules Intrusion Detection in The Cloud
3 Reasons Why The Host Rules Intrusion Detection in The Cloud
 
Cloud Computing Overview
Cloud Computing OverviewCloud Computing Overview
Cloud Computing Overview
 
The security of SAAS and private cloud
The security of SAAS and private cloudThe security of SAAS and private cloud
The security of SAAS and private cloud
 
Cloud services and it security
Cloud services and it securityCloud services and it security
Cloud services and it security
 
Herding cats in the Cloud
Herding cats in the CloudHerding cats in the Cloud
Herding cats in the Cloud
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 
Modern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas JellemaModern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas Jellema
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
DDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing StandardDDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing Standard
 
Presentation on Cloud Mashups
Presentation on Cloud MashupsPresentation on Cloud Mashups
Presentation on Cloud Mashups
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Recently uploaded (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Migrate and Modernize Hadoop-Based Security Policies for Databricks

  • 1. Migrate and Modernize Hadoop- Based Security Policies for Databricks Steve Touw CTO, Immuta
  • 2. Can I just migrate my Apache Ranger/Sentry Policies Directly to [Databricks]? [presto] [synapse] [snowflake] [starburst] [etc…]
  • 3. Can I just migrate my Apache Ranger/Sentry Policies Directly to [Databricks]? Migrate Modernize Yes! No! How do I get to Yes for both? (that’s what this talk is about…)
  • 5. 2012 - Development of Cloudera Access (later renamed to Sentry) starts 2013 - XA Secure created, later acquired by Hortonworks A lot has changed in 8 years...
  • 6. Hadoop is No Longer The Center of the Universe Multi-cloud, Multi- compute Managing compute-specific controls across more than one of these systems is not feasible
  • 7. Data Protection Laws of the World...Growing https://www.dlapiperdataprotection.com/
  • 8. WHY IMMUTA 1990 2025 Privacy Rules & Regulations driving data “fuel crisis” Compliant Data for Analytics HIPAA (1996) GDPR (2018) CCPA (2020) GLBA (1999) HITECH (2009) 350+ Privacy & Infosec Bills Proposed The Data “Fuel” Crisis DataLegallyUsableforAnalytics
  • 9. WHY IMMUTA We need to secure our data. I need to use our data. LEGAL / COMPLIANCE DATA ANALYSTS & SCIENTISTS So the data “tug of war” has begun… DATA DATA PLATFORM OWNER / DATA ENGINEERING
  • 10. More Complexity, Changing Definitions of Privacy Preservation Language from CCPA (and other similar language in GDPR) “1798.145(a)(5): The obligations imposed on businesses by this title shall not restrict a business’ ability to collect, use, retain, sell, or disclose consumer information that is deidentified or in the aggregate consumer information.” Meaning, if you deidentify/anonymize the data, CCPA doesn’t apply, yay! But, nothing in life is free… PI is defined as information "that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked." !!!!!
  • 11. How to balance the speed of the business with secure access to sensitive data? The Privacy vs Utility Tradeoff FULL PRIVACY FULL UTILITY Closed Open THE RISK OF DATA USE Sweet spot More stringent definitions are swinging the pendulum here Momentum LEGAL / COMPLIANCE DATA ANALYSTS & SCIENTISTS
  • 12. The World has Changed. We are in: The “Cloud Private Data Era” More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  • 13. The “Cloud Private Data Era” Has Created a Role Tidal Wave More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  • 14. Role Explosion Example (Real Customer Use Case) Each row-level policy in Ranger is tied to an individual role - but they are all doing the “same thing” If you want to show new data, you need a new Role and a new Policy This isn’t just Ranger - think AWS IAM Roles too! redacted redacted redacted redacted redacted redacted redacted redacted user associated to role the exact same policy written over and over again the only change: the role
  • 15. Role-Based Access Control (RBAC) is Broken ▪ RBAC should really be named “Static- based Access Control” ▪ It’s like writing code without being able to use variables!
  • 16. 2012 - Development of Cloudera Access (later renamed to Sentry) starts 2013 - XA Secure created, later acquired by Hortonworks Conceived Before the Cloud Private Data Era
  • 17. You Must Do Both… If You Don’t, You Won’t Realize the Benefits of the Cloud Migrate Modernize Yes! Yes!
  • 18. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  • 19. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  • 20. Separate Policy from Platform Just like the big data era required the separation of compute from storage, the private data era requires the separation of policy from platform. This allows defining policy externally from the platform and executing enforcement live in the platform without creating data copies/views. ● Table access controls ● Column level controls ● Row level security ● Cell-level controls In a consistent manner, no matter your compute
  • 21. You Must Also Separate Policy from Physical Thousands of tables and columns PoliciesThousands of policies Abstract with logical metadata PII, PHI, Address, SSN, etc... Very few, understandable, policies
  • 22. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  • 23. Remember This? ▪ RBAC should really be named “Static-based Access Control” ▪ It’s like writing code without being able to use variables! Wouldn’t it have been nice to just write this with a variable and have the policy dynamically defined at RUN TIME? organization_name IN (SELECT org_name from redacted WHERE role IN (@role)) ▪ This is ABAC and it really should be called “Dynamic-based Access Control”
  • 24. Ranger/Hortonworks Real Customer Example They had 8 rules per table times 12 tables for a total of 96 rules! redacted redacted redacted redacted redacted redacted redacted redacted user associated to role the exact same policy written over and over again the only change: the role
  • 25. With ABAC/Immuta, It’s a Single Policy! This is because it separates the user details from the policy and treats them as a read-time variable. This also future-proofs the policy. We can also build the rule once and have it apply to all 12 tables with our logical metadata layer (discussed previously). This also future-proofs adding new tables.
  • 26. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  • 27. How to balance the speed of the business with secure access to sensitive data? How Do We Hit The Privacy vs Utility Sweet Spot? FULL PRIVACY FULL UTILITY Closed Open THE RISK OF DATA USE Sweet spot LEGAL / COMPLIANCE DATA ANALYSTS & SCIENTISTS
  • 28. I know stuff about Judd and Leslie photo credit: Gawker
  • 29. New York Taxi & Limousine Commission • Data was released containing taxi pickups, dropoffs, location, time, amount, and tip amount, among others • This seems pretty harmless?
  • 30. Well, Judd and Leslie May Not Think It’s Harmless • This photos was geotagged (with time), so by simply querying by medallion and time, we know how much Judd and Leslie tip!
  • 31. Limit Features Limit Records Limit Functions Reduced specificity Regular Expressions for strings Rounding for numeric data Column restriction Hide or replace values with NULL Row restrictions Restrict access to certain types of rows Differential Privacy Inject noise into aggregate measures based on privacy guarantees Hashing/Encryption Local DP Randomly alter a percentage of data Aggregate-Only Only allow aggregate functions on data K-anonymization Suppress values that can lead to linkage attacks
  • 32. Taxi data properly anonymized while providing utility Generalize: remove precision from time and space Randomize: replace with false but legitimate values at a specified rate Mask: using salted deterministic hash Direct Identifier: Indirect Identifiers: Sensitive
  • 33. Attack occurs when the potential for re-identification exists. Factors include: ● Access ● External Knowledge ● Incentives Attack Event (A) represents the probability that an attack occurs Success Event (S) represents the probability that an attack is successful Terminology BACKGROUND Attack A S
  • 34. Data Risk Risk Mitigation modify data to limit the ability of an adversary to make inferences Inferences ● Record ownership ● Participation ● Attribute Values Techniques ● k-Anon ● LDP ● DP ● Masking A S
  • 35. Context Risk Risk A A S Mitigation “shrinks” the attack surface. Controls ● Limiting Access ● Limiting types of Queries ● Purpose Limitations ● Agreements ● Creating Disincentives ● Training
  • 36. A S A S A S Risk Utility Risk Utility Risk Utility
  • 37. Ok, but I put all this effort into Sentry / Ranger, this seems like a big change...
  • 38. Migration Utility from Ranger/Sentry → Immuta Migrate policies but also modernize
  • 40. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.