Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neo4j-Databridge: Enterprise-scale ETL for Neo4j

Neo4j-Databridge is a fully-featured ETL tool specifically built for Neo4j, and designed for usability, expressive power and high performance. It has been created to help solve the most common problems faced by large enterprises when importing data into Neo4j - data locality, multiple data sources and formats, performance when loading very large data sets, bespoke data conversions, inclusion of non-tabular data, filtering, merging and de-duplication...

In this webinar, we’ll take a quick tour of the main features of Neo4j-Databridge and understand how it can to help to solve these problems and facilitate importing your data easily and quickly into Neo4j.

  • Login to see the comments

  • Be the first to like this

Neo4j-Databridge: Enterprise-scale ETL for Neo4j

  1. 1. GraphAware® Neo4j-Databridge Enterprise-scale ETL for Neo4j from GraphAware @graph_aware
  2. 2. Vince Bickers Principal Consultant @GraphAware Primary author of Neo4j OGM library Ingesting data quickly and easily into Neo4j is an obsession :-) About me GraphAware®
  3. 3. Importing data - common challenges Options? Databridge core features Demo! Agenda GraphAware®
  4. 4. Integrating multiple datasources Filtering, data conversion, non-tabular data… Update strategies / de-duplication Bulk load and incremental updates (offline / online) Performance with large data sets Ease of use and configuration Common challenges GraphAware®
  5. 5. Cypher LOAD CSV Neo4j Batch Importer Neo4j JDBC ETL / APOC Talend Apache Camel DIY Neo4j-Databridge Options? GraphAware®
  6. 6. For Extremely flexible Against CSV files only Not suitable for large datasets Need to know Cypher Relatively slow Difficult to manage complex imports LOAD CSV GraphAware®
  7. 7. For Extremely fast Can handle very large data sets Against Requires data to be pre-processed first Inflexible No “memory” Neo4 Batch Importer GraphAware®
  8. 8. For Fast: JDBC ETL uses the batch importer under the hood JDBC mapping files are configurable Against JDBC datasources only Using APOC library requires you to write Cypher No “memory” Neo4j JDBC ETL / APOC GraphAware®
  9. 9. For Pretty interface Part of a full ETL platform Against Cypher queries over REST API LOAD CSV is much better! No “memory” Talend GraphAware®
  10. 10. For ??? Against It’s not available Planning to use Spring Data Neo4j ? Apache Camel GraphAware®
  11. 11. For Total control! Against Time consuming Hard work DIY GraphAware®
  12. 12. Data resources and mappings defined using JSON Command-line shell Out-of-the-box adapters for common data formats Direct auto-import from JDBC databases Full-featured expression language for filtering and more Wide range of data converters Supports full and incremental imports Bulk Load endpoint (offline) and Streaming endpoint (online) User-extensible with Groovy scripts and the Adapter API Databridge core features GraphAware®
  13. 13. satellites - Tabular data: (bulk load CSV) charities - Tabular data (streaming Excel) olympics - Structured data: JSON empdb - Auto-import from a MySql database iomtt - TomTom Itinerary (ITN) navaid - UK Aviation waypoints (GPX) hawkeye - 9.4m nodes, 5m edges in 1 minute Live demo GraphAware®
  14. 14. More info GraphAware® WIKI Email Questions Ask on StackOverflow, using the tags neo4j databridge