SlideShare a Scribd company logo
1 of 24
Download to read offline
Apache Atlas
Data Governance
for Hadoop
Sean Roberts
Partner Engineering
London & EMEA
@seano
Data Governance
Availability
Usability
Integrity
Security
Data Governance Technology
Transparency
Reproducibility
Auditability
Consistency
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Common
Governance
Framework
Use Cases
Financial Reporting
Chain of custody, Lineage narratives
Healthcare
30 day measures reporting
Retail
Point of sale analysis, Price optimization
Telco
Device log management, Correlation, Analysis & Mitigation
Challenges in Hadoop ecosystem
Ecosystem
No holistic approach
Business Demand
Apache Atlas
Data Governance
for Hadoop
Open & co-development with users!
wiki.apache.org/incubator/AtlasProposal
Apache Atlas
Atlas: Capabilities
● Data Classification
● Metadata Exchange
● Centralized Auditing
● Search & Lineage
● Policy Engine
● Security
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Certification
● Metadata exchange
● Stability
● Interoperability
○ Low cost to switch
● Fosters innovation
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self Service
Visualization
Apache Atlas
Components
Atlas: Knowledge Store
Metadata exchange
Flexible Taxonomy
● Data sets/objects
● Tables/Columns
● Logical Context
● Source/Destination
Tech: Titan with HBase
● PluggableApache Atlas
Audit Store
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
Type System
Class
Struct
Trait
Primitives
Collections
● Map
● Array
Instances (Entity)
● Referenceable
Type System
Atlas: Data Lifecycle Management
Focus on:
● Provenance
● Replication
● Data retention/eviction
● Late data handling
● Automation
Tech: Falcon
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Data Lifecycle
Management
Other
CWM
Energy
PPDM
Atlas: Audit Store
Historical repository
● Security & Operational
● Indexed
● Searchable (DSL)
Tech:
● YARN ATS, HBase, Hive
● Solr, ElasticSearch
○ PluggableApache Atlas
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Audit Store
Other
CWM
Energy
PPDM
Atlas: Policy Engine
Metadata driven
Rationalized at runtime
Geo/Time based rules
Prohibitions
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Taxonomies
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Policy Rules
Policy Engine
Security
Other
CWM
Energy
PPDM
Atlas: Security
Enforces policies
Metadata driven
ABAC (not simple RBAC)
● Attribute-based access control
Tech: Ranger
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Taxonomies
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Policy Rules
Policy Engine
Security
Other
CWM
Energy
PPDM
Atlas: RESTful Interface
API everything
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Atlas: Metadata Exchange
Metadata
Metadata
Metadata
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Apache Atlas
Now & Future
MVP: ASF Incubated
● Rest API
● UI
● Centralized Taxonomy
● Import / Export Metadata
● Documentation
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
2015 mid-year GA
● Policy Rules Engine
● Real-time Access Control
● Column Level Tagging
● Audit Store
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
2015 2H
● Enhanced Audit Store
○ Immutable File Format
○ Event Metadata Tagging
○ Advanced Reporting
● Advanced Policy Engine
● Row / Column Masking
● 3rd Party Metadata Exchange
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Apache Atlas
Data Governance
for Hadoop
Sean Roberts
@seano

More Related Content

What's hot

Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance InitiativeDataWorks Summit
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Komes Chandavimol
 
Atlas ApacheCon 2017
Atlas ApacheCon 2017Atlas ApacheCon 2017
Atlas ApacheCon 2017Vimal Sharma
 
History of Privacera
History of PrivaceraHistory of Privacera
History of PrivaceraPrivacera
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache RangerDataWorks Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentDataWorks Summit/Hadoop Summit
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopDataWorks Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 

What's hot (20)

Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache Atlas
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)
 
Atlas ApacheCon 2017
Atlas ApacheCon 2017Atlas ApacheCon 2017
Atlas ApacheCon 2017
 
History of Privacera
History of PrivaceraHistory of Privacera
History of Privacera
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 

Viewers also liked

Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현K data
 
DLAB company info and big data case studies
DLAB company info and big data case studiesDLAB company info and big data case studies
DLAB company info and big data case studiesDLAB
 
Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...
Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...
Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...Amazon Web Services
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellKoji Kawamura
 
오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개Hyoungjun Kim
 
Multi User Data science with Zeppelin
Multi User Data science with ZeppelinMulti User Data science with Zeppelin
Multi User Data science with ZeppelinVinay Shukla
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 

Viewers also liked (20)

Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repository
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
[2016 데이터 그랜드 컨퍼런스] 2 3(빅데이터). 엑셈 빅데이터 적용 사례 및 플랫폼 구현
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
DLAB company info and big data case studies
DLAB company info and big data case studiesDLAB company info and big data case studies
DLAB company info and big data case studies
 
Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...
Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...
Bringing Governance to an Existing Cloud at NASA's JPL (ENT201) | AWS re:Inve...
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개오픈소스 프로젝트 따라잡기_공개
오픈소스 프로젝트 따라잡기_공개
 
Multi User Data science with Zeppelin
Multi User Data science with ZeppelinMulti User Data science with Zeppelin
Multi User Data science with Zeppelin
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 

Similar to Apache Atlas. Data Governance for Hadoop. Strata London 2015

What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
The 4th Generation Kingland platform
The 4th Generation Kingland platformThe 4th Generation Kingland platform
The 4th Generation Kingland platformKingland
 
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014Amazon Web Services
 
Dev sum hypemedia talk
Dev sum hypemedia talkDev sum hypemedia talk
Dev sum hypemedia talkGlenn Block
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopDataWorks Summit
 
Ipedo Company Overview
Ipedo Company OverviewIpedo Company Overview
Ipedo Company OverviewTim_Matthews
 
Maximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentMaximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentTeradata
 
PART I of III: Advanced Authorization for SAP Global Deployments: September ...
PART I of III:  Advanced Authorization for SAP Global Deployments: September ...PART I of III:  Advanced Authorization for SAP Global Deployments: September ...
PART I of III: Advanced Authorization for SAP Global Deployments: September ...NextLabs, Inc.
 
Seamless API-tailoring at GANT
Seamless API-tailoring at GANTSeamless API-tailoring at GANT
Seamless API-tailoring at GANTNordic APIs
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...DataWorks Summit/Hadoop Summit
 
Iam suite introduction
Iam suite introductionIam suite introduction
Iam suite introductionwardell henley
 
Architecting extremelylarge scale web applications
Architecting extremelylarge scale web applicationsArchitecting extremelylarge scale web applications
Architecting extremelylarge scale web applicationsPrashanth Panduranga
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics WebinarEckerson Group
 
Data governance datalakes_multitenancy
Data governance datalakes_multitenancyData governance datalakes_multitenancy
Data governance datalakes_multitenancySathish K S
 
Platform as a Service overview
Platform as a Service overviewPlatform as a Service overview
Platform as a Service overviewMiguel Pastor
 
Advanced Authorization for SAP Global Deployments Part III of III
Advanced Authorization for SAP Global Deployments Part III of IIIAdvanced Authorization for SAP Global Deployments Part III of III
Advanced Authorization for SAP Global Deployments Part III of IIINextLabs, Inc.
 
Fusion - IBANK
Fusion - IBANKFusion - IBANK
Fusion - IBANKibankuk
 
Fusion - iBank.uk.com 07474222079
Fusion - iBank.uk.com 07474222079Fusion - iBank.uk.com 07474222079
Fusion - iBank.uk.com 07474222079ibankuk
 

Similar to Apache Atlas. Data Governance for Hadoop. Strata London 2015 (20)

BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
HPE_Software_Portfolio_VKS2016
HPE_Software_Portfolio_VKS2016HPE_Software_Portfolio_VKS2016
HPE_Software_Portfolio_VKS2016
 
The 4th Generation Kingland platform
The 4th Generation Kingland platformThe 4th Generation Kingland platform
The 4th Generation Kingland platform
 
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
(ENT305) Develop an Enterprise-wide Cloud Adoption Strategy | AWS re:Invent 2014
 
Dev sum hypemedia talk
Dev sum hypemedia talkDev sum hypemedia talk
Dev sum hypemedia talk
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using Hadoop
 
Ipedo Company Overview
Ipedo Company OverviewIpedo Company Overview
Ipedo Company Overview
 
Maximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentMaximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology Investment
 
PART I of III: Advanced Authorization for SAP Global Deployments: September ...
PART I of III:  Advanced Authorization for SAP Global Deployments: September ...PART I of III:  Advanced Authorization for SAP Global Deployments: September ...
PART I of III: Advanced Authorization for SAP Global Deployments: September ...
 
Seamless API-tailoring at GANT
Seamless API-tailoring at GANTSeamless API-tailoring at GANT
Seamless API-tailoring at GANT
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 
Iam suite introduction
Iam suite introductionIam suite introduction
Iam suite introduction
 
Architecting extremelylarge scale web applications
Architecting extremelylarge scale web applicationsArchitecting extremelylarge scale web applications
Architecting extremelylarge scale web applications
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics Webinar
 
Data governance datalakes_multitenancy
Data governance datalakes_multitenancyData governance datalakes_multitenancy
Data governance datalakes_multitenancy
 
Platform as a Service overview
Platform as a Service overviewPlatform as a Service overview
Platform as a Service overview
 
Advanced Authorization for SAP Global Deployments Part III of III
Advanced Authorization for SAP Global Deployments Part III of IIIAdvanced Authorization for SAP Global Deployments Part III of III
Advanced Authorization for SAP Global Deployments Part III of III
 
Fusion - IBANK
Fusion - IBANKFusion - IBANK
Fusion - IBANK
 
Fusion - iBank.uk.com 07474222079
Fusion - iBank.uk.com 07474222079Fusion - iBank.uk.com 07474222079
Fusion - iBank.uk.com 07474222079
 

Recently uploaded

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 

Recently uploaded (20)

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 

Apache Atlas. Data Governance for Hadoop. Strata London 2015

  • 1. Apache Atlas Data Governance for Hadoop Sean Roberts Partner Engineering London & EMEA @seano
  • 4. Use Cases Financial Reporting Chain of custody, Lineage narratives Healthcare 30 day measures reporting Retail Point of sale analysis, Price optimization Telco Device log management, Correlation, Analysis & Mitigation
  • 5. Challenges in Hadoop ecosystem Ecosystem No holistic approach Business Demand
  • 7. Open & co-development with users! wiki.apache.org/incubator/AtlasProposal Apache Atlas
  • 8. Atlas: Capabilities ● Data Classification ● Metadata Exchange ● Centralized Auditing ● Search & Lineage ● Policy Engine ● Security Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 9. Certification ● Metadata exchange ● Stability ● Interoperability ○ Low cost to switch ● Fosters innovation Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization
  • 11. Atlas: Knowledge Store Metadata exchange Flexible Taxonomy ● Data sets/objects ● Tables/Columns ● Logical Context ● Source/Destination Tech: Titan with HBase ● PluggableApache Atlas Audit Store Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM Knowledge Store ModelsType-System Policy RulesTaxonomies
  • 12. Type System Class Struct Trait Primitives Collections ● Map ● Array Instances (Entity) ● Referenceable
  • 14. Atlas: Data Lifecycle Management Focus on: ● Provenance ● Replication ● Data retention/eviction ● Late data handling ● Automation Tech: Falcon Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Data Lifecycle Management Other CWM Energy PPDM
  • 15. Atlas: Audit Store Historical repository ● Security & Operational ● Indexed ● Searchable (DSL) Tech: ● YARN ATS, HBase, Hive ● Solr, ElasticSearch ○ PluggableApache Atlas Knowledge Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Audit Store Other CWM Energy PPDM
  • 16. Atlas: Policy Engine Metadata driven Rationalized at runtime Geo/Time based rules Prohibitions Apache Atlas Knowledge Store Audit Store ModelsType-System Taxonomies Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Policy Rules Policy Engine Security Other CWM Energy PPDM
  • 17. Atlas: Security Enforces policies Metadata driven ABAC (not simple RBAC) ● Attribute-based access control Tech: Ranger Apache Atlas Knowledge Store Audit Store ModelsType-System Taxonomies Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Policy Rules Policy Engine Security Other CWM Energy PPDM
  • 18. Atlas: RESTful Interface API everything Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 19. Atlas: Metadata Exchange Metadata Metadata Metadata Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 21. MVP: ASF Incubated ● Rest API ● UI ● Centralized Taxonomy ● Import / Export Metadata ● Documentation Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 22. 2015 mid-year GA ● Policy Rules Engine ● Real-time Access Control ● Column Level Tagging ● Audit Store Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 23. 2015 2H ● Enhanced Audit Store ○ Immutable File Format ○ Event Metadata Tagging ○ Advanced Reporting ● Advanced Policy Engine ● Row / Column Masking ● 3rd Party Metadata Exchange Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 24. Apache Atlas Data Governance for Hadoop Sean Roberts @seano