Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Graph and Amazon Neptune - Bill Baldwin

Speaker: Bill Baldwin - Global Enterprise Support Lead, AWS

  • Login to see the comments

Graph and Amazon Neptune - Bill Baldwin

  1. 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Graph and Amazon Neptune Bill Baldwin bbaldwin@amazon.com Global Enterprise Support Leader
  2. 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HIGHLY CONNECTED DATA Retail Fraud DetectionRestaurant RecommendationsSocial Networks
  3. 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. US E C AS ES FOR HI G HL Y C ONNEC T ED DAT A Social Networking Life Sciences Network & IT OperationsFraud Detection Recommendations Knowledge Graphs
  4. 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RECOMMENDATIONS BASED ON RELATIONSHIPS
  5. 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KNOWLEDGE GRAPH APPLICATIONS What museums should Alice visit while in Paris? Who painted the Mona Lisa? What artists have paintings in The Louvre?
  6. 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NAV I GAT E A WEB OF GLOB AL T AX POLI C I ES “Our customers are increasingly required to navigate a complex web of global tax policies and regulations. We need an approach to model the sophisticated corporate structures of our largest clients and deliver an end-to-end tax solution. We use a microservices architecture approach for our platforms and are beginning to leverage Amazon Neptune as a graph-based system to quickly create links within the data.” said Tim Vanderham, chief technology officer, Thomson Reuters Tax & Accounting
  7. 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges Building Apps with Highly Connected DataRELATIONAL DATABASE CHALLENGES BUILDING APPS WITH HIGHLY CONNECTED DATA Unnatural for querying graph Inefficient graph processing Rigid schema inflexible for changing data
  8. 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DIFFERENT APPROACHES FOR HIGHLY CONNECTED DATA Purpose-built for a business process Purpose-built to answer questions about relationships
  9. 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A G RAPH DATABASE IS OPTIMIZ E D F OR E F F ICIE NT STORAG E AND RE TRIE VAL OF HIG HL Y CONNE CTE D DATA
  10. 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Open Source Apache TinkerPop Gremlin Traversal Language W3C Standard SPARQL Query Language R E S O U R C E D E S C R I P T I O N F R A M E W O R K ( R D F ) P R O P E R T Y G R A P H LEADING GRAPH MODELS AND FRAMEWORKS
  11. 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CHALLENGES OF EXISTING GRAPH DATABASES Difficult to maintain high availability Difficult to scale Limited support for open standards Too expensive
  12. 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE F u l l y m a n a g e d g r a p h d a t a b a s e FAST RELIABLE OPEN Query billions of relationships with millisecond latency 6 replicas of your data across 3 AZs with full backup and restore Build powerful queries easily with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF graph models EASY
  13. 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE HIGH LEVEL ARCHITECTURE Bulk load from Amazon S3 Database Mgmt.
  14. 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PROPERTY GRAPH A property graph is a set of vertices and edges with respective properties (i.e. key/value pairs) • Vertex represents entities/domains • Edge represents directional relationship between vertices. • Each edge has a label that denotes the type of relationship • Each vertex & edge has a unique identifier • Vertex and edges can have properties • Properties express non-relational information about the vertices and edges FRIENDname: Bill name: Sarah UserUser Since 11/29/16
  15. 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PROPERTY GRAPH & APACHE TINKERPOP • Apache TinkerPop Open source graph computing framework for Property Graph • Gremlin Graph traversal language used to analyze the graph Amazon Neptune is fully compatibility with Tinkerpop Gremlin 3.3.0 (latest version released August 2018) and provides optimized query execution engine for Gremlin query language.
  16. 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CREATING A TINKERPOP GRAPH //Connect to Neptune and receive a remote graph, g. user1 = g.addVertex (id, 1, label, "User", "name", "Bill"); user2 = g.addVertex (id, 2, label, "User", "name", "Sarah"); ... user1.addEdge("FRIEND", user2, id, 21); Gremlin (Apache TinkerPop 3.3) FRIEND name: Bill name: Sarah User User
  17. 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RDF GRAPHS • RDF Graphs are described as a collection of triples: subject, predicate, and object. • Internationalized Resource Identifiers (IRIs) uniquely identify subjects. • The Object can be an IRI or Literal. • A Literal in RDF is like a property and RDF supports the XML data types. • When the Object is an IRI, it forms an “Edge” in the graph. <http://www.socialnetwork.com/person#1> rdf:type contacts:User; contact:name: ”Bill” . subject predicate Object (literal) name: Bill User <http://www.socialnetwork.com/person#1>IRI <http://www.socialnetwork.com/person#1> contacts:friend <http://www.socialnetwork.com/person#2> . subject predicate Object (IRI) FRIEND #1 2#2
  18. 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “THERE’S NO TROUBLE WITH TRIPLES”: RDF EXAMPLE @prefix contacts: <http://www.socialnetwork.com/people#>. <http://www.socialnetwork.com/person#1> rdf:type contacts:User; contact:name: ”Bill” . <http://www.socialnetwork.com/person#1> contacts:friend <http://www.socialnetwork.com/person#2> . <http://www.socialnetwork.com/person#2> rdf:type contacts:User; contact:name: ”Sarah” . RDF (Turtle Serialization) FRIEND name: Bill name: Sarah User User
  19. 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GRAPH VS. RELATIONAL DATABASE MODELING. * Source : http://www.playnexacro.com/index.html#show:article Relational model Graph model subset CompanyName: Acme … Customers OrderDate: 8/1/2018 … Order PURCHASED HAS_DETAILS UnitPrice: $179.99 … Order DetailsProductName: “Echo” … Product HAS_PRODUCT CompanyName: “Amazon” … SupplierSUPPLIES
  20. 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SQL RELATIONAL DATABASE QUERY SELECT distinct c.CompanyName FROM customers AS c JOIN orders AS o ON /* Join the customer from the order */ (c.CustomerID = o.CustomerID) JOIN order_details AS od /* Join the order details from the order */ ON (o.OrderID = od.OrderID) JOIN products as p /* Join the products from the order details */ ON (od.ProductID = p.ProductID) WHERE p.ProductName = ’Echo'; /* Find the product named ‘Echo’ */ Find the name of companies that purchased the ‘Echo’.
  21. 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SPARQL DECLARATIVE GRAPH QUERY PREFIX sales_db: <http://sales.widget.com/> SELECT distinct ?comp_name WHERE { ?customer <sales_db:HAS_ORDER> ?order ; #customer graph pattern <sales_db:CompanyName> ?comp_name . #orders graph pattern ?order <sales_db:HAS_DETAILS> ?order_d . #order details graph pattern ?order_d <sales_db:HAS_PRODUCT> ?product . #products graph pattern ?product <sales_db:ProductName> “Echo” . } * Source : http://www.playnexacro.com/index.html#show:article Find the name of companies that purchased the ‘Echo’.
  22. 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GREMLIN IMPERATIVE GRAPH TRAVERSAL /* All products named ”Echo” */ g.V().hasLabel(‘Product’).has('name',’Echo') .in(’HAS_PRODUCT') /* Traverse to order details */ .in(‘HAS_DETAILS’) /* Traverse to order */ .in(’HAS_ORDER’) /* Traverse to Customer */ .values(’CompanyName’).dedup() /* Unique Company Name */ Find the name of companies that purchased the ‘Echo’.
  23. 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TRIADIC CLOSURE – CLOSING TRIANGLES FRIEND FRIEND Terry Bill Sarah FRIEND
  24. 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RECOMMENDING NEW CONNECTIONS Terry
  25. 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. IMMEDIATE FRIENDSHIPS FRIEND Terry Bill
  26. 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MEANS AND MOTIVE FRIEND FRIEND Terry Bill Sarah
  27. 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RECOMMENDATION FRIEND FRIEND Terry Bill Sarah
  28. 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recommend New Connections g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr)
  29. 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FIND TERRY g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr)
  30. 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FIND TERRY’S FRIENDS g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr)
  31. 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AND THE FRIENDS OF THOSE FRIENDS g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr) user friend fof FRIEND FRIEND
  32. 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ...WHO AREN’T TERRY AND AREN’T FRIENDS WITH TERRY g = graph.traversal() g.V().has('name','Terry').as('user'). both('FRIEND').aggregate('friends'). both('FRIEND'). where(neq('user')).where(neq('friends')). groupCount().by('name'). order(local).by(values, decr) user friend fof X FRIEND FRIEND
  33. 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully Managed Service Easily configurable via the console Multi-AZ high availability Support for up to 15 read replicas Supports encryption at rest Supports encryption in transit (TLS) Backup and restore, point-in-time recovery B E N E F I T S
  34. 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Secure deployment in a VPC • Increased availability through deployment in two subnets in two different Availability Zones (AZs) • Cluster volume always spans three AZ to provide durable storage • See the Amazon Neptune Documentation for VPC setup details AMAZON NEPTUNE: VPC DEPLOYMENT
  35. 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BATTLE-TESTED CLOUD-NATIVE STORAGE ENGINE OVERVIEW Data is replicated 6 times across 3 Availability Zones Continuous backup to Amazon S3 (built for 11 9s durability) Continuous monitoring of nodes and disks for repair 10 GB segments as unit of repair or hotspot rebalance Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes Storage volume automatically grows up to 64 TB AZ 1 AZ 2 AZ 3 Amazon S3 Amazon Neptune Storage Node Storage Node Storage Node Storage Node Storage Node Storage Node Storage Monitoring
  36. 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE HIGH AVAILABILITY AND FAULT TOLERANCE (CLOUD -NATIVE STORAGE) What can fail? Segment failures (disks) Node failures (machines) AZ failures (network or datacenter) Optimizations 4 out of 6 write quorum 3 out of 6 read quorum Peer-to-peer replication for repairs AZ 1 AZ 2 AZ 3 Caching Amazon Neptune AZ 1 AZ 2 AZ 3 Caching Amazon Neptune
  37. 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE READ REPLICAS Availability • Failing database nodes are automatically detected and replaced • Failing database processes are automatically detected and recycled • Replicas are automatically promoted to primary if needed (failover) • Customer specifiable fail-over order AZ 1 AZ 3AZ 2 Primary Node Primary Node Primary Master Node Primary Node Primary Node Read Replica Primary Node Primary Node Read Replica Cluster and Instance Monitoring Performance • Customer applications can scale out read traffic across read replicas • Read balancing across read replicas
  38. 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE FAILOVER TIMES ARE TYPICALLY < 30 SECONDS Replica-Aware App Running Failure Detection DNS Propagation Recovery Database Failure 1 5 - 2 0 s e c 3 - 1 0 s e c App Running
  39. 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE CONTINUOUS BACKUP (CLOUD - NATIVE STORAGE) • Take periodic snapshot of each segment in parallel; stream the logs to Amazon S3 • Backup happens continuously without performance or availability impact • At restore, retrieve the appropriate segment snapshots and log streams to storage nodes • Apply log streams to segment snapshots in parallel and asynchronously Segment snapshot Log records Recovery point Segment 1 Segment 2 Segment 3 Time
  40. 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AMAZON NEPTUNE ONLINE POINT -IN-TIME RESTORE (CLOUD-NATIVE STORAGE) Online point-in-time restore is a quick way to bring the database to a particular point in time without having to restore from backups • Rewinding the database to quickly • Rewind multiple times to determine the desired point-in-time in the database state t0 t1 t2 t0 t1 t2 t3 t4 t3 t4 Rewind to t1 Rewind to t3 Invisible Invisible
  41. 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved aws.amazon.com/activate Everything and Anything Startups Need to Get Started on AWS

×