SlideShare a Scribd company logo
1 of 41
Download to read offline
Neo4j Data Loading
with Kettle
Matt Casters
Chief Solutions Architect / Kettle Project Founder
Agenda
➢ What is Kettle?
➢ The Neo4j plugins
➢ Data loading performance tips
➢ Streaming data integration
➢ Metadata driven data possibilities
➢ Kettle Execution lineage in a graph
➢ Roadmap update
➢ Q&A
What is Kettle?
3
Kettle: Introduction
➢ Pentaho Data Integration from Hitachi Vantara
➢ One of the most widely used ETL tools
➢ Ready for the most demanding tasks
➢ Open source Apache Public License 2.0
➢ Well maintained
➢ Large community, marketplace, ...
➢ Easy to embed, install, package, rebrand
➢ Download : Sourceforge / Pentaho / 8.2 / PDI-CE
Kettle: where is it used?
➢ On tiny and enormous systems, real or virtual
➢ Very small computers, Raspberry Pie sized
➢ Your laptop or browser
➢ Locally or in the cloud
➢ On Hadoop clusters, VMs, Docker, Serverless,
➢ At large and small companies
➢ In government
➢ In education
➢ In the Neo4j Solutions Reference Architecture
Kettle: Why is it used?
➢ Reduce costs!
➢ Answers the “build or buy?” question
build
buy
Time
Accum.
Cost
Kettle
Kettle: Architecture
➢ Metadata driven, engine based :
○ No code generation
○ Define what you need to happen
→ GUI, Web, code, rules, …
○ Clear and transparent, self documenting
➢ Types of work:
○ Jobs for workflows
○ Transformations for parallel data streaming
Kettle: Design
➢ 100% Exposure of our engine through UI elements
➢ Everyone should be able to play along: plugins!
➢ We built integration points for others: run everywhere!
➢ Allow the user to avoid programming anything
➢ Allow the user to program anything: JavaScript, Java,
Groovy, RegEx, Rules, Python, Ruby, R, …
➢ Transparency wins: best in class logging, data lineage,
execution lineage, debugging, data previewing, row
sniff testing, …
Kettle: things of note
➢ SpoonGit: UI integration with git
➢ WebSpoon: web interface to the full Spoon UI
➢ Data Sets: build transformation unit tests
➢ Huge list of other plugins available, including from
Neo4j, on a marketplace, …
➢ Support for the latest technology stacks
➢ Project on github has over 1,000 forks
https://github.com/pentaho/pentaho-kettle
Kettle: The Toolset
➢ Spoon: GUI
➢ Scripts
➢ Server(s)
➢ Java API & SDK
➢ Standard file format
➢ Plugin ecosystem
➢ Docker image(s)
➢ Documentation, books, ...
Neo4j Kettle Plugins
11
Neo4j Plugins: where to find?
➢ Started by the community, extended by Neo4j
➢ Releases/Download shortcut:
○ http://neo4j.kettle.be
➢ Project:
○ https://github.com/knowbi/knowbi-pentaho-
pdi-neo4j-output
Give us feedback!
Neo4j Cypher
➢ For reading and writing
➢ Dynamic Cypher
➢ Batching and UNWIND
➢ Parameters
➢ Return values
➢ Helpers
Neo4j Output
➢ Easy node creation
➢ Create/Merge of ()-[]-()
➢ Batching and UNWIND
➢ Dynamic labels
Neo4j Graph Output
➢ Update (parts of) a graph
➢ Using a logical model
➢ Using field mapping
➢ Auto-generate Cypher
Check Neo4j Connection
➢ Job Entry (workflow)
➢ Validate DBs are up
➢ Used in error diagnostic
➢ Defensive setup
➢ Pessimistic approach
Neo4j Cypher Script
➢ Job Entry (workflow)
➢ Executes series of Cypher statements
Neo4j Kettle Plugins v4
18
Plugins v4
➢ Bulk loading steps
➢ Performance options
➢ Encrypted/obfuscated password in variables
➢ Bug fixes & UI improvements
Neo4j Generate CSVs
➢ Generate CSV files for Neo4j Import
➢ Generates appropriate header
➢ Handles escaping, quoting, …
➢ Outputs file names
Neo4j Split Graph
➢ Splits a graph field into nodes and relationships
➢ Used for unique value calculation
Neo4j Importer
➢ Runs a neo4j-import command
➢ Accepts the filenames of CSV files
Data loading
Performance tips
23
Pre-processing in Kettle
➢ Do work in Kettle that can be avoided in Neo4j
➢ Calculate unique nodes
➢ Do required data conversions
➢ Data cleaning
Parallel loading & batching
➢ Parallel node creation
➢ Limit high parallelism in the general case
➢ UNWIND in Neo4j Cypher step
➢ Create option in Neo4j Output step
➢ Use larger batch sizes (>1000)
➢ Create indexes up-front or with the options
Importing data
➢ Bulk loading with import is much faster
➢ A few orders of magnitude faster
➢ Collect all the data in CSV files
➢ Use the new steps to load
➢ Seamless path to incremental loads
Streaming data loads
27
Streaming options
➢ Micro-batching (every X minutes)
➢ Kafka, Event Hubs, Queues,... (never ending)
Streaming options
➢ Transformations can be never ending
➢ Any operation is possible
➢ Can collect data in other data platforms
➢ Is transactionally safe if it is supported (Kafka, …)
➢ Can be parallelized & scaled out
Metadata driven
Data possibilities
30
➢ Kettle transformations & jobs are metadata
➢ ETL Metadata Injection: transformation templates
➢ Neo4j is a great metadata database
➢ Kettle can make use of this
Metadata FTW
Metadata driven loads
➢ Loading hundreds of types of files
➢ Processing data from hundreds of databases
➢ Automatic data standardization and normalisation
→ Massive time gains!
Metadata driven extracts
➢ Without hardcoded sources, selections and targets
➢ Sourcing selections from users, processes, ...
➢ Using the possibilities of the Kettle engine
→ Flexibility, performance, without coding
Kettle Execution Lineage
34
Kettle Logging Architecture
➢ Unique ID per execution
➢ Precise sourcing of logging records
➢ Very “graphy” data
Execution
Metadata
Impact
Parent /
child
relation
Parent /
child
relation
The Kettle Neo4j Logging plugin
➢ Stores operational metadata in a graph
➢ https://github.com/mattcasters/kettle-neo4j-logging
➢ Tools
○ View execution information: log, duration, errors
○ Find error paths
○ Jump to error location
○ Find execution path of a step
○ Get time window: “since last succesful execution”
Execution lineage in a graph
➢ Documents the exection process
○ Log text, metadata, times, ...
Roadmap update
38
Roadmap Neo4j plugin
➢ 25 releases in 2018
➢ Major 4.0 release next week
➢ Then:
○ New Neo4j Output step
○ More graph data type operations
○ <Insert YOUR suggestion!>
➢ Tuning options for Neo4j steps running in initial
Kettle Apache Beam implementation:
→ DataFlow, Spark, Flink, …
Roadmap Neo4j Logging plugin
➢ Generic impact information logging
➢ Store data lineage in Neo4j
➢ Git revision graph loading (new step)
➢ Storing and viewing unit testing results
➢ Operational “dashboard”
Q&A
41

More Related Content

What's hot

Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Cours Big Data Chap6
Cours Big Data Chap6Cours Big Data Chap6
Cours Big Data Chap6Amal Abid
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxTomazBratanic1
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 

What's hot (20)

Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Cours Big Data Chap6
Cours Big Data Chap6Cours Big Data Chap6
Cours Big Data Chap6
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptx
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 

Similar to Neo4j Data Loading with Kettle

GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...Neo4j
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpNathan Handler
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfJose Manuel Ortega Candel
 
DevSecOps - Security in DevOps
DevSecOps - Security in DevOpsDevSecOps - Security in DevOps
DevSecOps - Security in DevOpsAarno Aukia
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the CloudAmihay Zer-Kavod
 
ZaloPay Merchant Platform on K8S on-premise
ZaloPay Merchant Platform on K8S on-premiseZaloPay Merchant Platform on K8S on-premise
ZaloPay Merchant Platform on K8S on-premiseChau Thanh
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyDaniel Barker
 
Fineo Technical Overview - NextSQL for IoT
Fineo Technical Overview - NextSQL for IoTFineo Technical Overview - NextSQL for IoT
Fineo Technical Overview - NextSQL for IoTJesse Yates
 
ODN - Technical introduction of the platform
ODN - Technical introduction of the platformODN - Technical introduction of the platform
ODN - Technical introduction of the platformComsode - FP7 project
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managabilityGaurav Bahrani
 
Containers for grownups migrating traditional &amp; existing applications[1...
Containers for grownups   migrating traditional &amp; existing applications[1...Containers for grownups   migrating traditional &amp; existing applications[1...
Containers for grownups migrating traditional &amp; existing applications[1...DevOps.com
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemJohn Efstathiades
 
Architecting the Future: Abstractions and Metadata - GlueCon
Architecting the Future: Abstractions and Metadata - GlueConArchitecting the Future: Abstractions and Metadata - GlueCon
Architecting the Future: Abstractions and Metadata - GlueConDaniel Barker
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02Kumar Gaurav
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...PranavPatil822557
 
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...InfluxData
 

Similar to Neo4j Data Loading with Kettle (20)

GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdf
 
DevSecOps - Security in DevOps
DevSecOps - Security in DevOpsDevSecOps - Security in DevOps
DevSecOps - Security in DevOps
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
ZaloPay Merchant Platform on K8S on-premise
ZaloPay Merchant Platform on K8S on-premiseZaloPay Merchant Platform on K8S on-premise
ZaloPay Merchant Platform on K8S on-premise
 
Architecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in TechnologyArchitecting The Future - WeRise Women in Technology
Architecting The Future - WeRise Women in Technology
 
Fineo Technical Overview - NextSQL for IoT
Fineo Technical Overview - NextSQL for IoTFineo Technical Overview - NextSQL for IoT
Fineo Technical Overview - NextSQL for IoT
 
ODN - Technical introduction of the platform
ODN - Technical introduction of the platformODN - Technical introduction of the platform
ODN - Technical introduction of the platform
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
Containers for grownups migrating traditional &amp; existing applications[1...
Containers for grownups   migrating traditional &amp; existing applications[1...Containers for grownups   migrating traditional &amp; existing applications[1...
Containers for grownups migrating traditional &amp; existing applications[1...
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 
Architecting the Future: Abstractions and Metadata - GlueCon
Architecting the Future: Abstractions and Metadata - GlueConArchitecting the Future: Abstractions and Metadata - GlueCon
Architecting the Future: Abstractions and Metadata - GlueCon
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Pentaho ppt up
Pentaho ppt upPentaho ppt up
Pentaho ppt up
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
 
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
David Henthorn [Rose-Hulman Institute of Technology] | Illuminating the Dark ...
 

More from Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Neo4j Data Loading with Kettle

  • 1. Neo4j Data Loading with Kettle Matt Casters Chief Solutions Architect / Kettle Project Founder
  • 2. Agenda ➢ What is Kettle? ➢ The Neo4j plugins ➢ Data loading performance tips ➢ Streaming data integration ➢ Metadata driven data possibilities ➢ Kettle Execution lineage in a graph ➢ Roadmap update ➢ Q&A
  • 4. Kettle: Introduction ➢ Pentaho Data Integration from Hitachi Vantara ➢ One of the most widely used ETL tools ➢ Ready for the most demanding tasks ➢ Open source Apache Public License 2.0 ➢ Well maintained ➢ Large community, marketplace, ... ➢ Easy to embed, install, package, rebrand ➢ Download : Sourceforge / Pentaho / 8.2 / PDI-CE
  • 5. Kettle: where is it used? ➢ On tiny and enormous systems, real or virtual ➢ Very small computers, Raspberry Pie sized ➢ Your laptop or browser ➢ Locally or in the cloud ➢ On Hadoop clusters, VMs, Docker, Serverless, ➢ At large and small companies ➢ In government ➢ In education ➢ In the Neo4j Solutions Reference Architecture
  • 6. Kettle: Why is it used? ➢ Reduce costs! ➢ Answers the “build or buy?” question build buy Time Accum. Cost Kettle
  • 7. Kettle: Architecture ➢ Metadata driven, engine based : ○ No code generation ○ Define what you need to happen → GUI, Web, code, rules, … ○ Clear and transparent, self documenting ➢ Types of work: ○ Jobs for workflows ○ Transformations for parallel data streaming
  • 8. Kettle: Design ➢ 100% Exposure of our engine through UI elements ➢ Everyone should be able to play along: plugins! ➢ We built integration points for others: run everywhere! ➢ Allow the user to avoid programming anything ➢ Allow the user to program anything: JavaScript, Java, Groovy, RegEx, Rules, Python, Ruby, R, … ➢ Transparency wins: best in class logging, data lineage, execution lineage, debugging, data previewing, row sniff testing, …
  • 9. Kettle: things of note ➢ SpoonGit: UI integration with git ➢ WebSpoon: web interface to the full Spoon UI ➢ Data Sets: build transformation unit tests ➢ Huge list of other plugins available, including from Neo4j, on a marketplace, … ➢ Support for the latest technology stacks ➢ Project on github has over 1,000 forks https://github.com/pentaho/pentaho-kettle
  • 10. Kettle: The Toolset ➢ Spoon: GUI ➢ Scripts ➢ Server(s) ➢ Java API & SDK ➢ Standard file format ➢ Plugin ecosystem ➢ Docker image(s) ➢ Documentation, books, ...
  • 12. Neo4j Plugins: where to find? ➢ Started by the community, extended by Neo4j ➢ Releases/Download shortcut: ○ http://neo4j.kettle.be ➢ Project: ○ https://github.com/knowbi/knowbi-pentaho- pdi-neo4j-output Give us feedback!
  • 13. Neo4j Cypher ➢ For reading and writing ➢ Dynamic Cypher ➢ Batching and UNWIND ➢ Parameters ➢ Return values ➢ Helpers
  • 14. Neo4j Output ➢ Easy node creation ➢ Create/Merge of ()-[]-() ➢ Batching and UNWIND ➢ Dynamic labels
  • 15. Neo4j Graph Output ➢ Update (parts of) a graph ➢ Using a logical model ➢ Using field mapping ➢ Auto-generate Cypher
  • 16. Check Neo4j Connection ➢ Job Entry (workflow) ➢ Validate DBs are up ➢ Used in error diagnostic ➢ Defensive setup ➢ Pessimistic approach
  • 17. Neo4j Cypher Script ➢ Job Entry (workflow) ➢ Executes series of Cypher statements
  • 19. Plugins v4 ➢ Bulk loading steps ➢ Performance options ➢ Encrypted/obfuscated password in variables ➢ Bug fixes & UI improvements
  • 20. Neo4j Generate CSVs ➢ Generate CSV files for Neo4j Import ➢ Generates appropriate header ➢ Handles escaping, quoting, … ➢ Outputs file names
  • 21. Neo4j Split Graph ➢ Splits a graph field into nodes and relationships ➢ Used for unique value calculation
  • 22. Neo4j Importer ➢ Runs a neo4j-import command ➢ Accepts the filenames of CSV files
  • 24. Pre-processing in Kettle ➢ Do work in Kettle that can be avoided in Neo4j ➢ Calculate unique nodes ➢ Do required data conversions ➢ Data cleaning
  • 25. Parallel loading & batching ➢ Parallel node creation ➢ Limit high parallelism in the general case ➢ UNWIND in Neo4j Cypher step ➢ Create option in Neo4j Output step ➢ Use larger batch sizes (>1000) ➢ Create indexes up-front or with the options
  • 26. Importing data ➢ Bulk loading with import is much faster ➢ A few orders of magnitude faster ➢ Collect all the data in CSV files ➢ Use the new steps to load ➢ Seamless path to incremental loads
  • 28. Streaming options ➢ Micro-batching (every X minutes) ➢ Kafka, Event Hubs, Queues,... (never ending)
  • 29. Streaming options ➢ Transformations can be never ending ➢ Any operation is possible ➢ Can collect data in other data platforms ➢ Is transactionally safe if it is supported (Kafka, …) ➢ Can be parallelized & scaled out
  • 31. ➢ Kettle transformations & jobs are metadata ➢ ETL Metadata Injection: transformation templates ➢ Neo4j is a great metadata database ➢ Kettle can make use of this Metadata FTW
  • 32. Metadata driven loads ➢ Loading hundreds of types of files ➢ Processing data from hundreds of databases ➢ Automatic data standardization and normalisation → Massive time gains!
  • 33. Metadata driven extracts ➢ Without hardcoded sources, selections and targets ➢ Sourcing selections from users, processes, ... ➢ Using the possibilities of the Kettle engine → Flexibility, performance, without coding
  • 35. Kettle Logging Architecture ➢ Unique ID per execution ➢ Precise sourcing of logging records ➢ Very “graphy” data Execution Metadata Impact Parent / child relation Parent / child relation
  • 36. The Kettle Neo4j Logging plugin ➢ Stores operational metadata in a graph ➢ https://github.com/mattcasters/kettle-neo4j-logging ➢ Tools ○ View execution information: log, duration, errors ○ Find error paths ○ Jump to error location ○ Find execution path of a step ○ Get time window: “since last succesful execution”
  • 37. Execution lineage in a graph ➢ Documents the exection process ○ Log text, metadata, times, ...
  • 39. Roadmap Neo4j plugin ➢ 25 releases in 2018 ➢ Major 4.0 release next week ➢ Then: ○ New Neo4j Output step ○ More graph data type operations ○ <Insert YOUR suggestion!> ➢ Tuning options for Neo4j steps running in initial Kettle Apache Beam implementation: → DataFlow, Spark, Flink, …
  • 40. Roadmap Neo4j Logging plugin ➢ Generic impact information logging ➢ Store data lineage in Neo4j ➢ Git revision graph loading (new step) ➢ Storing and viewing unit testing results ➢ Operational “dashboard”