SlideShare a Scribd company logo
1 of 34
1© Cloudera, Inc. All rights reserved.
Enterprise Metadata Integration
Mirko Kämpf | Cloudera
GraphConnect 2017 – London
2© Cloudera, Inc. All rights reserved.
Who is speaking?
Solutions Architect @ Cloudera
-time series analysis, network analysis, data enrichment pipelines
-personal interest: QA-Systems and semantic search
Data Science Activities
The Detection of Emerging Trends Using Wikipedia Traffic Data
and Context Networks (PLOS ONE, 2015)
Hadoop.TS (IJCA, 2013)
Fluctuations in Wikipedia Access-Rate and Edit-Event Data.
(Physica A, 2012).
3© Cloudera, Inc. All rights reserved.
Our Approach: Multilayer Metadata Integration …
• Status dashboards are provided per Use-Case.
• Each dashboard offers facts from multiple layers:
- (L1) technical layer
- (L2) operational metadata (Hadoop specific only)
- (L3) application specific operational metadata
- (L4) quality metrics (second order metadata)
• Our Achievements:
• Graph database (Neo4J) allows context exploration.
• Cluster spanning metadata exploration is possible now.
• Exposure of inherent but sometimes hidden facts becomes as easy as writing an email.
Integration of facts
to gain business
knowledge
4© Cloudera, Inc. All rights reserved.
Intro
5© Cloudera, Inc. All rights reserved.
People do mining … for centuries!
http://www.montanregion-erzgebirge.de/welterbe-erleben/montanregion-fuer-bergbauspezialisten/geschichtliches.html
gold & diamonds,
ore & coal,
minerals,
oil …
Outcome drives whole economy
6© Cloudera, Inc. All rights reserved.
People use computers … for decades!
1938
Z1: World’s first free programmable
device, created by Conrad Zuse.
U.S. Department of Energy uses Intel
Supercomputer at Argonne National Laboratory.
2015
http://www.intel.com/content/dam/www/public/us/en/images/photography-business/RWD/aurora-aerial-reflection-floor-rwd.png
http://www.horst-zuse.homepage.t-online.de/z1.html
7© Cloudera, Inc. All rights reserved.
DATA
MINING
http://codecondo.com/9-free-books-for-learning-data-mining-data-analysis/
Blog: About Learning Data Mining & Data Analysis
8© Cloudera, Inc. All rights reserved.
If data is the new oil …
… metadata are nuggets
and brilliants of our age.
Screenshot taken from:
https://www.quora.com/Who-should-get-credit-for-the-quote-data-is-the-new-oil
9© Cloudera, Inc. All rights reserved.
Diamonds: beautiful even as raw material Brilliant: result of expert’s work
Even more exciting in combination
with other material and skills …
10© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Success Factors:
http://www.burkhard-beyer.net/Reportage_Goldschmied.html
11© Cloudera, Inc. All rights reserved.
Be very careful with initial success …
… work towards a professional level!
High quality and reproducibility
are results of a
Professional Management
It is hard to believe what
you can get and which
options arise …
Manage overwhelming
excitement!
Start new activities
not randomly …
12© Cloudera, Inc. All rights reserved.
Let’s Think Data Driven!
• Build a mid-term or better a long-term strategy.
• Try to stay independent of a particular technology or tool.
Not the fancy toolset but rather data is what matters most.
• After initial success you should slow down and control speed of expansion.
• Focus on: maximized accessibility of data.
Google’s goal was to make the data of the internet accessible.
You should become your own Google!
• Idea & Vision
• Material
• Skills / Methods
• Tools
13© Cloudera, Inc. All rights reserved.
Dataset Profiles / Flow Descriptors
•Our material is data & metadata:
- Data about data : descriptive data, Dublin core metadata model, …
- Derived data : statistics extracted from processes, documents, …
- Results of ML/AI procedures : extracted structure and learned models
- Outcome of crowd based operations : Wikipedia with its inherent
structure, communication logs, access and edit history.
• Idea & Vision
• Material
• Skills / Methods
• Tools
14© Cloudera, Inc. All rights reserved.
Knowledge Extraction for
Better Data Science
15© Cloudera, Inc. All rights reserved.
Science:
According to Wikipedia:
Science is a systematic
enterprise that builds and
organizes knowledge in the
form of testable explanations
and predictions about
the universe.
https://en.wikipedia.org/wiki/Science
16© Cloudera, Inc. All rights reserved.
Data Science:
My observation:
Commercial Data Science
is a systematic enterprise
that builds and organizes
knowledge in the form of
testable explanations and
predictions about the
market / business context.
https://en.wikipedia.org/wiki/Infographic#/media/File:Gartner_Hype_Cycle_for_Emerging_Technologies.gif
17© Cloudera, Inc. All rights reserved.
Details
Look into nature ….
18© Cloudera, Inc. All rights reserved.
Context
Look into nature ….
19© Cloudera, Inc. All rights reserved.
Result: Visualization of Facts
• An image shows what the text says.
> Multi-channel communication
• Data Science benefits from such an approach.
> Today we still use infographics
Difference:
Biologist who created this one on the left observed by
eye. Today, we use more and
more data analysis methods.
20© Cloudera, Inc. All rights reserved.
Process: Knowledge Extraction is a Natural Process
• Combine multiple sources
• Repeat observation
• Incorporate context to explain
differences/variation
• Cross-checks to identify
anomalies
21© Cloudera, Inc. All rights reserved.
Process: Knowledge Extraction is a Natural Process
Knowledge
Facts
Data
22© Cloudera, Inc. All rights reserved.
How did we implement EMDM?
- Hadoop Based: for scalability.
- Open Graph Data Model: for flexibility and connectivity
- Data Centric: following the Big Data paradigm
23© Cloudera, Inc. All rights reserved.
Big Data Processing:
e.g., with Hadoop
24© Cloudera, Inc. All rights reserved.
Big Graph Processing on Hadoop:
e.g., with Giraph
25© Cloudera, Inc. All rights reserved.
Project Name should stand for:
Graphs, Hadoop, and the ecosystem …
26© Cloudera, Inc. All rights reserved.
Project Name should stand for:
Graphs, Hadoop, and the ecosystem …
27© Cloudera, Inc. All rights reserved.
Data Science Process Model (DSPM)
• DSPM defines core artifacts for knowledge management
• Describes analysis / transformation context
• Allows repeatable execution
• Process properties become measurable
• Supports comparison of results from multiple procedures
• All those fatcs are essential ingredients to business optimization.
• But: Logging & tracking should never block creativity!
• Remember: Scientists often act like artists.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Toolbox and
Management Methods
28© Cloudera, Inc. All rights reserved.
Data Science Process Model (DSPM)
• Idea & Vision
• Material
• Skills / Methods
• Tools
Representation of domain knowledge
(in our case it is data science in general)
Human
Interaction
Ontology Toolbox and
Management Methods
Ability to solve
a problem using
IT and data
Technology Aspects
- represent and inter-
act with facts & data
Data Governance
Certified QM
29© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Semantic Logging
• Property with name: (K,V) : key-value-pair
• Property of a thing: S => (K,V) : (S,P,O) is a triple
K becomes P; V becomes O
• Many of those triples in one common context with name G:
G => (S,P,O) is called quad or named graph
• Log4J is the logging standard we build on.
• Using structured data instead of plain strings allows easy parsing
(e.g., apache log format).
• Triple representation avoids specific parsing and makes log data
part of the linked data graph.
30© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Etosha Toolbox
Data extractors,
Data transformers,
Ontology based orchestration,
People and machines,
contribute facts,
Iterative approach with
closed feedback-loops,
Scalable environment …
C
O
N
C
E
P
T
31© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Multi-layer metadata capturing
Operational metrics
Metrics about fast & static data
Business metrics
Contextualized presentation
Ad-hoc queries for exploration
Graph-analytics
> Knowledge exposure
> Self-Service DS and BI can
speak the same language.
I
N
I
T
I
A
L
I
M
P
L
E
M
E
N
T
A
T
I
O
N
32© Cloudera, Inc. All rights reserved.
Results: Access Facts & Context of Critical Processes
DEMO of context exploration:
https://www.youtube.com/watch?v=ZE7Gcanv90s&feature=youtu.be
33© Cloudera, Inc. All rights reserved.
Results: Better Collaboration for
(Hadoop) Knowledge Workers
• Our Achievements:
• The open graph model is language-, OS-, and hardware-independent.
• Merging of knowledge partitions enables cluster spanning metadata exploration.
• Query beans expose facts from multiple stores to a web-based interfaces.
• Next Steps:
• Improve implicit triplification (Query Solr-index and get RDF data)
• Standardize the process and integrate with existing ontologies.
• Grow a community … and enter the Apache Incubator.
34© Cloudera, Inc. All rights reserved.
Thank you!
mirko@cloudera.com
@semanpix

More Related Content

What's hot

Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksDatabricks
 
Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldSrivatsan Srinivasan
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingTrieu Nguyen
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceDeepak Chandramouli
 
platform for Machine Learning
 platform for Machine Learning platform for Machine Learning
platform for Machine LearningSivapriyaS12
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
Quantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computingQuantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computingData Con LA
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Khalid Salama
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...Big Data Spain
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data LakeTrivadis
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Steven Moy
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 ShiHeng1
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j
 

What's hot (20)

Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native world
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
platform for Machine Learning
 platform for Machine Learning platform for Machine Learning
platform for Machine Learning
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Quantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computingQuantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computing
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data Lake
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 

Similar to Enterprise Metadata Integration, Cloudera

Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the EnterpriseThe Hive
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationCloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015hadooparchbook
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeCloudera, Inc.
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Data Con LA
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in EnterpriseJosh Yeh
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 

Similar to Enterprise Metadata Integration, Cloudera (20)

Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 

More from Neo4j

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 

More from Neo4j (20)

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Enterprise Metadata Integration, Cloudera

  • 1. 1© Cloudera, Inc. All rights reserved. Enterprise Metadata Integration Mirko Kämpf | Cloudera GraphConnect 2017 – London
  • 2. 2© Cloudera, Inc. All rights reserved. Who is speaking? Solutions Architect @ Cloudera -time series analysis, network analysis, data enrichment pipelines -personal interest: QA-Systems and semantic search Data Science Activities The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks (PLOS ONE, 2015) Hadoop.TS (IJCA, 2013) Fluctuations in Wikipedia Access-Rate and Edit-Event Data. (Physica A, 2012).
  • 3. 3© Cloudera, Inc. All rights reserved. Our Approach: Multilayer Metadata Integration … • Status dashboards are provided per Use-Case. • Each dashboard offers facts from multiple layers: - (L1) technical layer - (L2) operational metadata (Hadoop specific only) - (L3) application specific operational metadata - (L4) quality metrics (second order metadata) • Our Achievements: • Graph database (Neo4J) allows context exploration. • Cluster spanning metadata exploration is possible now. • Exposure of inherent but sometimes hidden facts becomes as easy as writing an email. Integration of facts to gain business knowledge
  • 4. 4© Cloudera, Inc. All rights reserved. Intro
  • 5. 5© Cloudera, Inc. All rights reserved. People do mining … for centuries! http://www.montanregion-erzgebirge.de/welterbe-erleben/montanregion-fuer-bergbauspezialisten/geschichtliches.html gold & diamonds, ore & coal, minerals, oil … Outcome drives whole economy
  • 6. 6© Cloudera, Inc. All rights reserved. People use computers … for decades! 1938 Z1: World’s first free programmable device, created by Conrad Zuse. U.S. Department of Energy uses Intel Supercomputer at Argonne National Laboratory. 2015 http://www.intel.com/content/dam/www/public/us/en/images/photography-business/RWD/aurora-aerial-reflection-floor-rwd.png http://www.horst-zuse.homepage.t-online.de/z1.html
  • 7. 7© Cloudera, Inc. All rights reserved. DATA MINING http://codecondo.com/9-free-books-for-learning-data-mining-data-analysis/ Blog: About Learning Data Mining & Data Analysis
  • 8. 8© Cloudera, Inc. All rights reserved. If data is the new oil … … metadata are nuggets and brilliants of our age. Screenshot taken from: https://www.quora.com/Who-should-get-credit-for-the-quote-data-is-the-new-oil
  • 9. 9© Cloudera, Inc. All rights reserved. Diamonds: beautiful even as raw material Brilliant: result of expert’s work Even more exciting in combination with other material and skills …
  • 10. 10© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Success Factors: http://www.burkhard-beyer.net/Reportage_Goldschmied.html
  • 11. 11© Cloudera, Inc. All rights reserved. Be very careful with initial success … … work towards a professional level! High quality and reproducibility are results of a Professional Management It is hard to believe what you can get and which options arise … Manage overwhelming excitement! Start new activities not randomly …
  • 12. 12© Cloudera, Inc. All rights reserved. Let’s Think Data Driven! • Build a mid-term or better a long-term strategy. • Try to stay independent of a particular technology or tool. Not the fancy toolset but rather data is what matters most. • After initial success you should slow down and control speed of expansion. • Focus on: maximized accessibility of data. Google’s goal was to make the data of the internet accessible. You should become your own Google! • Idea & Vision • Material • Skills / Methods • Tools
  • 13. 13© Cloudera, Inc. All rights reserved. Dataset Profiles / Flow Descriptors •Our material is data & metadata: - Data about data : descriptive data, Dublin core metadata model, … - Derived data : statistics extracted from processes, documents, … - Results of ML/AI procedures : extracted structure and learned models - Outcome of crowd based operations : Wikipedia with its inherent structure, communication logs, access and edit history. • Idea & Vision • Material • Skills / Methods • Tools
  • 14. 14© Cloudera, Inc. All rights reserved. Knowledge Extraction for Better Data Science
  • 15. 15© Cloudera, Inc. All rights reserved. Science: According to Wikipedia: Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe. https://en.wikipedia.org/wiki/Science
  • 16. 16© Cloudera, Inc. All rights reserved. Data Science: My observation: Commercial Data Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the market / business context. https://en.wikipedia.org/wiki/Infographic#/media/File:Gartner_Hype_Cycle_for_Emerging_Technologies.gif
  • 17. 17© Cloudera, Inc. All rights reserved. Details Look into nature ….
  • 18. 18© Cloudera, Inc. All rights reserved. Context Look into nature ….
  • 19. 19© Cloudera, Inc. All rights reserved. Result: Visualization of Facts • An image shows what the text says. > Multi-channel communication • Data Science benefits from such an approach. > Today we still use infographics Difference: Biologist who created this one on the left observed by eye. Today, we use more and more data analysis methods.
  • 20. 20© Cloudera, Inc. All rights reserved. Process: Knowledge Extraction is a Natural Process • Combine multiple sources • Repeat observation • Incorporate context to explain differences/variation • Cross-checks to identify anomalies
  • 21. 21© Cloudera, Inc. All rights reserved. Process: Knowledge Extraction is a Natural Process Knowledge Facts Data
  • 22. 22© Cloudera, Inc. All rights reserved. How did we implement EMDM? - Hadoop Based: for scalability. - Open Graph Data Model: for flexibility and connectivity - Data Centric: following the Big Data paradigm
  • 23. 23© Cloudera, Inc. All rights reserved. Big Data Processing: e.g., with Hadoop
  • 24. 24© Cloudera, Inc. All rights reserved. Big Graph Processing on Hadoop: e.g., with Giraph
  • 25. 25© Cloudera, Inc. All rights reserved. Project Name should stand for: Graphs, Hadoop, and the ecosystem …
  • 26. 26© Cloudera, Inc. All rights reserved. Project Name should stand for: Graphs, Hadoop, and the ecosystem …
  • 27. 27© Cloudera, Inc. All rights reserved. Data Science Process Model (DSPM) • DSPM defines core artifacts for knowledge management • Describes analysis / transformation context • Allows repeatable execution • Process properties become measurable • Supports comparison of results from multiple procedures • All those fatcs are essential ingredients to business optimization. • But: Logging & tracking should never block creativity! • Remember: Scientists often act like artists. • Idea & Vision • Material • Skills / Methods • Tools Toolbox and Management Methods
  • 28. 28© Cloudera, Inc. All rights reserved. Data Science Process Model (DSPM) • Idea & Vision • Material • Skills / Methods • Tools Representation of domain knowledge (in our case it is data science in general) Human Interaction Ontology Toolbox and Management Methods Ability to solve a problem using IT and data Technology Aspects - represent and inter- act with facts & data Data Governance Certified QM
  • 29. 29© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Semantic Logging • Property with name: (K,V) : key-value-pair • Property of a thing: S => (K,V) : (S,P,O) is a triple K becomes P; V becomes O • Many of those triples in one common context with name G: G => (S,P,O) is called quad or named graph • Log4J is the logging standard we build on. • Using structured data instead of plain strings allows easy parsing (e.g., apache log format). • Triple representation avoids specific parsing and makes log data part of the linked data graph.
  • 30. 30© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Etosha Toolbox Data extractors, Data transformers, Ontology based orchestration, People and machines, contribute facts, Iterative approach with closed feedback-loops, Scalable environment … C O N C E P T
  • 31. 31© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Multi-layer metadata capturing Operational metrics Metrics about fast & static data Business metrics Contextualized presentation Ad-hoc queries for exploration Graph-analytics > Knowledge exposure > Self-Service DS and BI can speak the same language. I N I T I A L I M P L E M E N T A T I O N
  • 32. 32© Cloudera, Inc. All rights reserved. Results: Access Facts & Context of Critical Processes DEMO of context exploration: https://www.youtube.com/watch?v=ZE7Gcanv90s&feature=youtu.be
  • 33. 33© Cloudera, Inc. All rights reserved. Results: Better Collaboration for (Hadoop) Knowledge Workers • Our Achievements: • The open graph model is language-, OS-, and hardware-independent. • Merging of knowledge partitions enables cluster spanning metadata exploration. • Query beans expose facts from multiple stores to a web-based interfaces. • Next Steps: • Improve implicit triplification (Query Solr-index and get RDF data) • Standardize the process and integrate with existing ontologies. • Grow a community … and enter the Apache Incubator.
  • 34. 34© Cloudera, Inc. All rights reserved. Thank you! mirko@cloudera.com @semanpix

Editor's Notes

  1. Results tell us about very specific properties of the system: Lets look into a thermodynamics: http://images.google.de/imgres?imgurl=http%3A%2F%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=http%3A%2F%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  2. http://www.montanregion-erzgebirge.de/welterbe-erleben/montanregion-fuer-bergbauspezialisten/geschichtliches.html
  3. http://www.horst-zuse.homepage.t-online.de/z1.html Z1 Der Rechner Z1 gilt als der erste frei programmierbare Rechner der Welt. Er wurde 1938 fertiggestellt und vollständig aus privaten Mitteln finanziert. Konrad Zuses erster - in den Jahren 1936-1938 - entstandener Rechner Z1 wurde ein Opfer der Bomben des 2. Weltkrieges und mit ihm sämtliche Konstruktionsunterlagen. Im Jahr 1986 entschloß sich Konrad Zuse, den Rechner Z1 nachzubauen. Der Rechner Z1 enthält alle Bausteine eines modernen Computers, wie z.B. Leitwerk, Programmsteuerung, Speicher, Mikrosequenzen, Gleitkommarithmetik. Konrad Zuse konstruierte die Z1 in der elterlichen Wohnung. Dort wurde ihm dafür das Wohnzimmer von seinen Eltern zur Verfügung gestellt. Um den Rechner Z1 zu bauen, gab Zuse 1936 seine Stelle bei den Henschel Flugzeugwerken auf und richtete die Werkstatt im Wohnzimmer seiner Eltern ein. Die Eltern Zuses waren von dem Vorhaben nicht gerade begeistert, unterstützten ihn aber wo sie konnten.
  4. http://codecondo.com/9-free-books-for-learning-data-mining-data-analysis/
  5. https://www.quora.com/Who-should-get-credit-for-the-quote-data-is-the-new-oil
  6. OBEN: https://pagewizz.com/edelsteine-1/
  7. http://www.burkhard-beyer.net/Reportage_Goldschmied.html
  8. This has to be managed or culticated. Creativity is good but often not scalable !!!
  9. wikipedia
  10. https://en.wikipedia.org/wiki/Infographic#/media/File:Gartner_Hype_Cycle_for_Emerging_Technologies.gif
  11. wikipedia
  12. wikipedia
  13. wikipedia
  14. http://clipart-work.net/clipart/onion-clipart.html
  15. http://clipart-work.net/clipart/onion-clipart.html
  16. CSV => Neo4J https://www.youtube.com/watch?v=Eh_79goBRUk https://blog.logentries.com/2016/06/self-describing-logging-using-log4j/ => JSON Structure contains meaning => Using a standard format gives us Semantic-Logging
  17. There are two key characteristics of RDF stores (aka triple stores): the first and by far the most relevant is that they represent, store and query data as a graph. The second is that they are semantic, which is a rather pompous way of saying that they can store not only data but also explicit descriptions of the meaning of that data. The RDF and linked data community often refer to these explicit descriptions as ontologies. In case you’re not familiar with the concept, an ontology is a machine readable description of a domain that typically includes a vocabulary of terms and some specification of how these terms inter-relate, imposing a structure on the data for such domain. This is also known as a schema. In this post both terms schema and ontology will be used interchangeably to refer to these explicitly described semantics. https://github.com/SciGraph/SciGraph/wiki/Neo4jMapping