SlideShare a Scribd company logo
1 of 25
© 2015 MapR Technologies 1© 2015 MapR Technologies
Deploying a Governed Data Lake
© 2015 MapR Technologies 2
Welcome
• Event will be recorded
• Ask your questions in the Q&A Panel in the lower right-hand
corner of your screen
• Tweet us @mapr during the event
© 2015 MapR Technologies 3
Key Points
• The data lake is becoming a “real-time” shared service to provide
data to the business to support data science and big data
analytics needs
• As the data lake becomes a trusted source of data to drive big
data analytics, security and data governance have to be
addressed
• Security and data governance policies need to be implemented
in a way that still enables self-service and quick time to value vs.
creating 3-6 month delays
© 2015 MapR Technologies 4
Deliver Data Discovery Agility with a Governed “Data Layer”
Adhere to security,
compliance and data
governance policies
Catalog data assets at scale,
with secure provisioning to
the business
Find and understand best-
suited and most trusted data
© 2015 MapR Technologies 5
The danger of the data lake becoming a flea market
Botond Horvath / Shutterstock.com
INVENTORY
DATA
Can’t create and maintain an
inventory fast enough
Big Data Architect INVENTORY
DATA
Can’t explore everything to find
the best item
Data Engineer/Data
Scientist/Business Analyst
INVENTORY
DATA
Can’t tell what’s what and what
can be trusted
CDO/Data Steward
© 2015 MapR Technologies 6
Imagine shopping on Amazon.com
GOVERNANCE
Inventory
Find and Understand
Provision
© 2015 MapR Technologies 7
Governed data lake is like Amazon.com for data in Hadoop
GOVERNANCE
Inventory
Find and Understand
Provision
© 2015 MapR Technologies 8
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
Analytics
Search
Schema-less
data exploration
BI, reporting
Ad-hoc integrated
analytics
Operational
Apps
Recommendation
Fraud Detection
Logistics
MapR-DB MapR-FS
MapR Data Platform
Distribution including
Apache Hadoop
The Governed Data Lake on Apache Hadoop
Data Inventory:
Find, understand
and govern
© 2015 MapR Technologies 9
The Governed Data Lake
Define Ingest Inventory Explore Provision
Wrangle/Model/Vi
sualize
• Critical data elements
• Sensitive data elements
• Security and data
governance policies
• Load
• Profile
• Automatic tagging
• Discover metadata
and generate tags
• Discover data lineage
• Manage tags
• Browse/search
inventory
• Inspect data quality
• Tag and annotate
• Bookmark
• Copy
• Authorized view
Governed data lake as a shared service
Data Governance Data Discovery Agility
Data protection, authentication, authorization, auditing
Can you achieve both?
© 2015 MapR Technologies 10
Find, understand and govern data in Hadoop
© 2015 MapR Technologies 11
Waterline Data is like Amazon.com for data in Hadoop
GOVERNANCE
Inventory
Find and Understand
Provision
© 2015 MapR Technologies 12
Inventory
© 2015 MapR Technologies 13
Find and Understand
© 2015 MapR Technologies 14
Provision
Future: Generate
Drill Views
© 2015 MapR Technologies 15
Governance
© 2015 MapR Technologies 16
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
Analytics
Search
Schema-less
data exploration
BI, reporting
Ad-hoc integrated
analytics
Operational
Apps
Recommendation
Fraud Detection
Logistics
MapR-DB MapR-FS
MapR Data Platform
Distribution including
Apache Hadoop
The Governed Data Lake on Apache Hadoop with MapR
Data Inventory:
Find, understand
and govern
© 2015 MapR Technologies 17
Separate Distinct Data Sets via MapR Volumes
Volumes dramatically simplify
management:
• Replication factor
• Scheduled mirroring
• Scheduled snapshots
• Data placement control
• User access and tracking
• Administrative permissions
/projects
/tahoe
/yosemite
/user
/msmith
/bjohnson
© 2015 MapR Technologies 18
MapR Trust Model (Product Security)
Flexible
Authentication
• Wire-level authentication for all
services in the cluster
• NSA-level cryptographic algorithms
• Integration with LDAP, Active
Directory and other third party
directory services
• Kerberos or username/password
authentication
1
A
AA
DP
Granular
Authorization
• Access Control Expressions
• Protect files, tables, column families,
columns, and management objects
• Extend to role-based access control
(RBAC) with custom role functions
• Drill Views
2Robust
Auditing
• All events recorded immediately
in JSON log files
• Includes data access and
administrative actions
• Ad-hoc queries and custom
reports on audit logs via SQL and
standard BI tools
3
Ubiquitous
Data Protection
• Encryption for Data in Motion
• Within a Cluster
• Between Clusters
• Between Client and Cluster
• Encryption for Data at Rest
• LUKS
• Self-Encrypting Disk
• Partners
4
© 2015 MapR Technologies 19
MapR Comprehensive Auditing
Serving Security Analysts…
Monitoring
Incident
Response
• Who touched customer records outside of
business hours?
• What actions did users take in the days
before leaving the company?
• What operations were performed without
following change control?
• Are users accessing sensitive files from
protected/secured source IPs?
• Why do my reports look different, despite
sourcing from same underlying data?
Security
© 2015 MapR Technologies 20
MapR Comprehensive Auditing (cont.)
…And Data Scientists Too
• Which data is used most frequently?
Implication: High Value; Share More
Broadly
• Which data is least commonly used?
Implication: Low Value; Candidate
for Purge
• Which data should be used more?
Implication: Underutilized; Increase
Awareness
• What administrative actions are
most commonly performed?
Implication: Candidate for
automation
Predictive Analytics
© 2015 MapR Technologies 21
MapR Audits – Key Features
Data Access
• Files
• MapR-DB Tables
Cluster Operations
• Administrative Operations
• Maprcli commands
Authentication Requests
Secure
High Performance
Flexible
• Retention Period
• Maxsize
• Coalesce Interval
JSON Format
{"timestamp":"{$date=2015-06-
01T05:24:58.231Z}","operation":"GETATTR",
"user":"root","uid":"0","ipAddress":"10.10.x.x",
"nfsServer":"10.10.x.x","srcPath":"/dbtest.0/","
srcFid":"2147.16.2","VolumeName":“mktg_file
s","volumeId":“mktg_files","status":"0"}
© 2015 MapR Technologies 22
Access Control that Scales
PAM Authentication +
User Impersonation
Fine-grained row and
column level access control
with Drill Views – no
centralized security
repository required
Files HBase Hive
Drill
View 1
Drill
View 2
UUU
User
User
© 2015 MapR Technologies 23
Ownership Chaining
Combine Self Service Exploration with Data Governance
Name City State Credit Card #
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Name City State Credit Card #
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist (/views/V_Scientist)
Jane (Read)
John (Owner)
Name City State
Dave San Jose CA
John Boulder CO
Analyst(/views/V_Analyst)
Jack (Read)
Jane(Owner)
RAWFILEV_ScientistV_Analyst
Does Jack have access to V_Analyst? ->YES
Who is the owner of V_Analyst? ->Jane
Drill accesses V_Analyst as Jane (Impersonation hop 1)
Does Jane have access to V_Scientist ? -> YES
Who is the owner of V_Scientist? ->John
Drill accesses V_Scientist as John (Impersonation hop 2)
John(Owner)
Does John have permissions on raw file? -> YES
Who is the owner of raw file? ->John
Drill accesses source file as John (no impersonation here)
Jack queries the view V_Analyst
*Ownership chain length (# hops) is configurable
Ownership
chaining
Access
path
© 2015 MapR Technologies 24
Find, Understand and Govern Data in Hadoop
At Scale and in Real-Time
Discover and protect
sensitive data, audit
and authorize access
to the data lake,
discover data lineage,
and provide data
stewardship
CDO/Data Steward
Automate cataloging of
data assets at scale,
with secure
provisioning to
business users
Big Data Architect
Find and understand
best-suited and most
trusted data without
having to explore
every file manually
Data Engineer/Data
Scientist/Business Analyst
© 2015 MapR Technologies 25
Learn More
www.waterlinedata.com
• Watch the solution video
• Read analyst papers
• Download the free Waterline
Data / MapR sandbox
• Request a demo
• Download and evaluate the
product
www.mapr.com
• Get free On-Demand
Training for Hadoop
• Download the free Waterline
Data / MapR sandbox

More Related Content

More from MapR Technologies

More from MapR Technologies (20)

Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 

Best Practices to Deploy a Governed Data Lake

  • 1. © 2015 MapR Technologies 1© 2015 MapR Technologies Deploying a Governed Data Lake
  • 2. © 2015 MapR Technologies 2 Welcome • Event will be recorded • Ask your questions in the Q&A Panel in the lower right-hand corner of your screen • Tweet us @mapr during the event
  • 3. © 2015 MapR Technologies 3 Key Points • The data lake is becoming a “real-time” shared service to provide data to the business to support data science and big data analytics needs • As the data lake becomes a trusted source of data to drive big data analytics, security and data governance have to be addressed • Security and data governance policies need to be implemented in a way that still enables self-service and quick time to value vs. creating 3-6 month delays
  • 4. © 2015 MapR Technologies 4 Deliver Data Discovery Agility with a Governed “Data Layer” Adhere to security, compliance and data governance policies Catalog data assets at scale, with secure provisioning to the business Find and understand best- suited and most trusted data
  • 5. © 2015 MapR Technologies 5 The danger of the data lake becoming a flea market Botond Horvath / Shutterstock.com INVENTORY DATA Can’t create and maintain an inventory fast enough Big Data Architect INVENTORY DATA Can’t explore everything to find the best item Data Engineer/Data Scientist/Business Analyst INVENTORY DATA Can’t tell what’s what and what can be trusted CDO/Data Steward
  • 6. © 2015 MapR Technologies 6 Imagine shopping on Amazon.com GOVERNANCE Inventory Find and Understand Provision
  • 7. © 2015 MapR Technologies 7 Governed data lake is like Amazon.com for data in Hadoop GOVERNANCE Inventory Find and Understand Provision
  • 8. © 2015 MapR Technologies 8 Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Operational Apps Recommendation Fraud Detection Logistics MapR-DB MapR-FS MapR Data Platform Distribution including Apache Hadoop The Governed Data Lake on Apache Hadoop Data Inventory: Find, understand and govern
  • 9. © 2015 MapR Technologies 9 The Governed Data Lake Define Ingest Inventory Explore Provision Wrangle/Model/Vi sualize • Critical data elements • Sensitive data elements • Security and data governance policies • Load • Profile • Automatic tagging • Discover metadata and generate tags • Discover data lineage • Manage tags • Browse/search inventory • Inspect data quality • Tag and annotate • Bookmark • Copy • Authorized view Governed data lake as a shared service Data Governance Data Discovery Agility Data protection, authentication, authorization, auditing Can you achieve both?
  • 10. © 2015 MapR Technologies 10 Find, understand and govern data in Hadoop
  • 11. © 2015 MapR Technologies 11 Waterline Data is like Amazon.com for data in Hadoop GOVERNANCE Inventory Find and Understand Provision
  • 12. © 2015 MapR Technologies 12 Inventory
  • 13. © 2015 MapR Technologies 13 Find and Understand
  • 14. © 2015 MapR Technologies 14 Provision Future: Generate Drill Views
  • 15. © 2015 MapR Technologies 15 Governance
  • 16. © 2015 MapR Technologies 16 Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Operational Apps Recommendation Fraud Detection Logistics MapR-DB MapR-FS MapR Data Platform Distribution including Apache Hadoop The Governed Data Lake on Apache Hadoop with MapR Data Inventory: Find, understand and govern
  • 17. © 2015 MapR Technologies 17 Separate Distinct Data Sets via MapR Volumes Volumes dramatically simplify management: • Replication factor • Scheduled mirroring • Scheduled snapshots • Data placement control • User access and tracking • Administrative permissions /projects /tahoe /yosemite /user /msmith /bjohnson
  • 18. © 2015 MapR Technologies 18 MapR Trust Model (Product Security) Flexible Authentication • Wire-level authentication for all services in the cluster • NSA-level cryptographic algorithms • Integration with LDAP, Active Directory and other third party directory services • Kerberos or username/password authentication 1 A AA DP Granular Authorization • Access Control Expressions • Protect files, tables, column families, columns, and management objects • Extend to role-based access control (RBAC) with custom role functions • Drill Views 2Robust Auditing • All events recorded immediately in JSON log files • Includes data access and administrative actions • Ad-hoc queries and custom reports on audit logs via SQL and standard BI tools 3 Ubiquitous Data Protection • Encryption for Data in Motion • Within a Cluster • Between Clusters • Between Client and Cluster • Encryption for Data at Rest • LUKS • Self-Encrypting Disk • Partners 4
  • 19. © 2015 MapR Technologies 19 MapR Comprehensive Auditing Serving Security Analysts… Monitoring Incident Response • Who touched customer records outside of business hours? • What actions did users take in the days before leaving the company? • What operations were performed without following change control? • Are users accessing sensitive files from protected/secured source IPs? • Why do my reports look different, despite sourcing from same underlying data? Security
  • 20. © 2015 MapR Technologies 20 MapR Comprehensive Auditing (cont.) …And Data Scientists Too • Which data is used most frequently? Implication: High Value; Share More Broadly • Which data is least commonly used? Implication: Low Value; Candidate for Purge • Which data should be used more? Implication: Underutilized; Increase Awareness • What administrative actions are most commonly performed? Implication: Candidate for automation Predictive Analytics
  • 21. © 2015 MapR Technologies 21 MapR Audits – Key Features Data Access • Files • MapR-DB Tables Cluster Operations • Administrative Operations • Maprcli commands Authentication Requests Secure High Performance Flexible • Retention Period • Maxsize • Coalesce Interval JSON Format {"timestamp":"{$date=2015-06- 01T05:24:58.231Z}","operation":"GETATTR", "user":"root","uid":"0","ipAddress":"10.10.x.x", "nfsServer":"10.10.x.x","srcPath":"/dbtest.0/"," srcFid":"2147.16.2","VolumeName":“mktg_file s","volumeId":“mktg_files","status":"0"}
  • 22. © 2015 MapR Technologies 22 Access Control that Scales PAM Authentication + User Impersonation Fine-grained row and column level access control with Drill Views – no centralized security repository required Files HBase Hive Drill View 1 Drill View 2 UUU User User
  • 23. © 2015 MapR Technologies 23 Ownership Chaining Combine Self Service Exploration with Data Governance Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist (/views/V_Scientist) Jane (Read) John (Owner) Name City State Dave San Jose CA John Boulder CO Analyst(/views/V_Analyst) Jack (Read) Jane(Owner) RAWFILEV_ScientistV_Analyst Does Jack have access to V_Analyst? ->YES Who is the owner of V_Analyst? ->Jane Drill accesses V_Analyst as Jane (Impersonation hop 1) Does Jane have access to V_Scientist ? -> YES Who is the owner of V_Scientist? ->John Drill accesses V_Scientist as John (Impersonation hop 2) John(Owner) Does John have permissions on raw file? -> YES Who is the owner of raw file? ->John Drill accesses source file as John (no impersonation here) Jack queries the view V_Analyst *Ownership chain length (# hops) is configurable Ownership chaining Access path
  • 24. © 2015 MapR Technologies 24 Find, Understand and Govern Data in Hadoop At Scale and in Real-Time Discover and protect sensitive data, audit and authorize access to the data lake, discover data lineage, and provide data stewardship CDO/Data Steward Automate cataloging of data assets at scale, with secure provisioning to business users Big Data Architect Find and understand best-suited and most trusted data without having to explore every file manually Data Engineer/Data Scientist/Business Analyst
  • 25. © 2015 MapR Technologies 25 Learn More www.waterlinedata.com • Watch the solution video • Read analyst papers • Download the free Waterline Data / MapR sandbox • Request a demo • Download and evaluate the product www.mapr.com • Get free On-Demand Training for Hadoop • Download the free Waterline Data / MapR sandbox