Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The Gremlin Graph Traversal Language by Marko Rodriguez 20595 views
- Intro to Graph Databases Using Tink... by Caleb Jones 41905 views
- Solving Problems with Graphs by Marko Rodriguez 33544 views
- The Graph Traversal Programming Pat... by Marko Rodriguez 80231 views
- Problem-Solving using Graph Travers... by Marko Rodriguez 110237 views
- Traversing Graph Databases with Gre... by Marko Rodriguez 12314 views

Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation.

No Downloads

Total views

46,465

On SlideShare

0

From Embeds

0

Number of Embeds

2,855

Shares

0

Downloads

1,099

Comments

18

Likes

78

No notes for slide

- 1. Gremlin G = (V, E) A Graph-Based Programming Language Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com http://gremlin.tinkerpop.com February 25, 2010
- 2. Abstract Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 3. Acknowledgements • Marko A. Rodriguez [http://markorodriguez.com] designed, developed, tested, and documented Gremlin. • Peter Neubauer [http://www.linkedin.com/in/neubauer] aided in the design and the evangelizing of Gremlin. • Pavel Yaskevich [http://github.com/xedin] aided in the development of user deﬁned functions in Gremlin. • Joshua Shinavier [http://fortytwo.net] provided initial conceptual support for Gremlin. • Ketrina Yim [http://csillustrated.berkeley.edu] designed the logo for Gremlin. • Gremlin-Users Group [http://groups.google.com/group/gremlin-users] provided much direction in the design and implementation of Gremlin. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 4. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 5. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 6. What is a Graph? • A graph (network) is composed of a collection of vertices (dots) and edges (lines). There are many types of graphs: directed/undirected, weighted, attributed, etc. vertex-labeled a hyper d edge-attributed ed bele ht e-la multi ig edgknows created=2-01-09 we 0.2 modiﬁed=2-11-09 cted tic undire di an re ct m hired ed se reg ge ula half-ed r pseudo http://ex.com/123 type="person" name="emil" resource description framework vertex-attributed Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 7. Why Use a Graph? • A graph is a very general data structure that can be used to model various systems. A graph can model the structure of transportation, technological, bibliographic, etc. systems. A graph can model a list, a map, a tree, etc. • There are numerous graph algorithms that are deﬁned independent of the domain of the graph model. • There are numerous graph databases, frameworks, packages, etc. that aid in the creation, manipulation, and analysis of graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 8. Graph Databases, Frameworks, and Packages • Neo4j Graph Database [http://neo4j.org] • AllegroGraph Quad Store [http://http://www.franz.com/agraph] • HyperGraphDB [http://www.kobrix.com/hgdb.jsp] • Java Universal Network/Graph Framework [http://jung.sourceforge.net] • OpenRDF Sesame Framework [http://www.openrdf.org] • InfoGrid Graph Database [http://infogrid.org] • Filament Graph Toolkit [http://filament.sourceforge.net] • OWLim Semantic Repository [http://www.ontotext.com/owlim] • Sones Graph Database [http://www.sones.com] • NetworkX Graph Toolkit [http://networkx.lanl.gov] • iGraph Toolkit [http://igraph.sourceforge.net] • Blueprints Graph API [http://blueprints.tinkerpop.com] • ... and many more. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 9. What Makes Gremlin Diﬀerent? • Gremlin is a domain speciﬁc language for working with graphs. • Gremlin is not an application programming interface (API). • Gremlin makes use of various graph databases, frameworks, packages. • Gremlin is a language that currently has a virtual machine implementation written in Java. • What can be succinctly expressed in Gremlin is verbose/clumsy to express in general purpose languages such as Java, Python, Ruby, etc. • Gremlin allows one to map single-relational graph analysis algorithms over to the multi-relational domain. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 10. Single-Relational Graphs • In single-relational graphs, all edges have the same meaning (e.g. all edges are either frienship, kinship, worksWith, knows, etc.). G = (V, E ⊆ (V × V )) • Most graph algorithms are deﬁned for single-relational graphs (e.g. centrality/ranking, clustering/community detection, etc.). person-c person-a person-b NOTE: These types of graphs are also known as directed, vertex-labeled graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 11. Multi-Relational Graphs • In multi-relational graphs, edges can have diﬀerent meanings. G = (V, E ⊂ (V × V ), ω : E → Σ∗) • Most graph software is designed for multi-relational graphs (e.g. arbitrary objects as vertices and edges, knowledge-based reasoning systems, etc.). book-c read cites person-a authored book-b NOTE: These types of graphs are also known as directed, vertex/edge-labeled graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 12. Gremlin and Multi-Relational Graphs • Gremlin provides a means to elegantly map single-relational graph analysis algorithms over to the multi-relational graph domain. • Gremlin provides an elegant way to do automated reasoning in multi-relational graphs using path expressions. These two points form the primary thesis of this presentation. Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), 29–41, doi:10.1016/j.joi.2009.06.004, LA-UR-08-03931, http://arxiv.org/abs/0806.2274, December 2009. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 13. Property Graphs • Gremlin works with a type of multi-relational graph called a property graph. Vertices and edges are labeled with unique identiﬁers. Edges are directed, labeled, and can form loops. Multiple edges of the same label can exist for the same vertex pair. Vertices and edges can have any number of key/value pair properties/attributes. Property graphs are a relatively general graph structure that can be constrained to model other graph structures — though, a property-based hypergraph would be the most general (see HyperGraphDB and the JUNG API). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 14. Property Graphs name = "lop" lang = "java" weight = 0.4 3 name = "marko" age = 29 created weight = 0.2 9 1 created 8 created 12 7 weight = 1.0 weight = 0.4 6 weight = 0.5 knows knows 11 name = "peter" age = 35 name = "josh" 4 age = 32 2 10 name = "vadas" age = 27 weight = 1.0 created 5 name = "ripple" lang = "java" Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 15. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 16. Gremlin System Architecture • The Gremlin console is a scripting environment Gremlin Gremlin which allows for the dynamic evaluation of Console ScriptEngine Gremlin code. • Gremlin implements JSR 223 which allows Gremlin to also be used within the Java language and thus, as a virtual machine directly accessible to Java applications. Popular JSR 223 implementations include Jython, JRuby, and Groovy. For a ﬁne list of implementations see https://scripting.dev.java.net. • Blueprints is a set of interfaces for abstract data structures such as graphs and documents. Implementations to these interfaces exist for various data management systems. • There exist many graph data management systems that span various graph data models Neo4j NativeStore TinkerGraph (e.g. edge labeled graphs, RDF graphs, hypergraphs, etc.). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 17. “Hello World” in the Gremlin Console marko$ ./gremlin.sh ,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> gremlin> concat(‘goodbye’, ‘ ’, ‘self’) ==>goodbye self Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 18. Simple Traversals in Gremlin name = "lop" gremlin> $_ := g:key(‘name’,‘marko’) lang = "java" ==>v[1] weight = 0.4 3 name = "marko" age = 29 created gremlin> . 1 9 ==>v[1] created 7 8 created 12 gremlin> ./outE 6 weight = 0.5 knows ==>e[7][1-knows->2] knows 11 weight = 1.0 ==>e[9][1-created->3] name = "josh" 4 2 age = 32 ==>e[8][1-knows->4] name = "vadas" 10 gremlin> ./outE/@weight age = 27 ==>0.5 created ==>0.4 5 ==>1.0 ./outE/@weight: “Get the current object(s). Then get the outgoing edges of those objects. Then get the weights of those edges.” $ is a reserved variable meaning the root list of objects. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 19. Simple Traversals in Gremlin name = "lop" gremlin> . lang = "java" ==>v[1] 3 name = "marko" gremlin> ./outE[@label=‘created’]/inV age = 29 created 9 ==>v[3] 1 created 8 created gremlin> $_ := $_last 12 7 6 ==>v[3] knows knows 11 gremlin> ./@name ==>lop 4 2 gremlin> g:map(.) 10 ==>name=lop created ==>lang=java 5 ./outE[@label=‘created’]/inV: “Get the current object(s). Then get the outgoing edges of those objects, where their labels equal ‘created’. Then get the incoming vertices of those ‘created’ edges.” $ last is a reserved variable meaning the last value evaluated. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 20. Simple Traversals in Gremlin name = "lop" lang = "java" 3 name = "marko" age = 29 created 9 1 created 8 created 12 7 6 knows knows 11 name = "josh" 4 age = 32 2 10 name = "vadas" age = 27 created 5 ./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name ==>vadas Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 21. Simple Traversals in Gremlin ./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name 1. .: Get the current object(s). 2. outE[@label=‘knows’]: Get the outgoing edges of the current object(s), where their labels equal ‘knows’. 3. inV[matches(@name,‘va.{3}’) and @age > 21]: Get the incoming vertices of those ‘knows’ edges, where the names of those vertices are 5 characters long, start with ‘va’, and whose age is greater than 21. 4. @name: get the name of those particular incoming vertices. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 22. Knowledge-Based Reasoning • Blueprints implements the Sesame SAIL interfaces and thus, Gremlin can be used over the many Resource Description Framework (RDF) triple/quad stores. In such cases, RDF is modeled as a property graph where the named graph component is the @ng edge property. • Gremlin makes use of the Sesame SAIL SPARQL engine to allow for queries based on graph-pattern matching. gremlin> sail:sparql(‘SELECT ?x ?y WHERE { ?x foaf:knows ?y }’) ==>{y=v[http://ex.com#2], x=v[http://ex.com#1]} ==>{y=v[http://ex.com#4], x=v[http://ex.com#1]} • Gremlin is useful for knowledge-based reasoning using path expressions. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 23. Reasoning as Deﬁning New Types of Adjacency • Graph-based reasoning is the process of making explicit what is implicit in lop co-developer the graph. created marko created • A reasoner takes a graph G co-developer peter and a collection of graph-patterns created (i.e. transformation/rewrite rules) and knows knows creates a new graph G (usually, G ⊂ josh G ). G has new relationships/edges vadas and thus, new deﬁnitions of vertex created adjacency. • Example: The co-developers of person ripple A are those people who have created the same software as person A and who are themselves, not person A (as person For these “co-developer” examples, we will use A has created the same software as him vertex 1 (marko) as the source of the reasoning or herself). process. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 24. The Co-Developers of Marko A. Rodriguez in SPARQL name = "lop" SELECT ?x WHERE { lang = "java" ?y marko created ?y . 3 name = "marko" age = 29 created ?z created ?y . marko 1 created ?z ?z != marko . created 6 ?z name ?x knows name = "peter" } age = 35 ?x knows ?z 4 name = "josh" age = 32 ?x This query would return: josh and 2 peter. created 5 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 25. The Co-Developers of Marko A. Rodriguez in Gremlin co-developer lop co-developer created created marko co-developer peter created knows knows josh vadas created ripple gremin> ./@name ==>marko gremlin> ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name ==>josh ==>peter Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 26. The Co-Developers of Marko A. Rodriguez in Gremlin ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name 1. .: Get the current object(s) (i.e. vertex 1 — denoting Marko). 2. outE[@label=‘created’]: Get the outgoing edges of the Marko vertex, where their labels equal ‘created’. 3. inV: Get the incoming (i.e. head) vertices of those ‘created’ edges. 4. inE[@label=‘created’]: Get the incoming edges of those vertices, where their labels equal ‘created’. 5. outV[g:except($ )]: Get the outgoing (i.e. tail) vertices of those ‘created’ edges, where those vertices are not the Marko vertex. 6. @name: get the name of those non-Marko vertices. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 27. Deﬁning Co-Developers in Gremlin path co-developer ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)] end Once deﬁned, you can use it like any other path segment. gremlin> ./co-developer ==>v[4] ==>v[6] gremlin> ./co-developer/@name ==>josh ==>peter Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 28. Deﬁning Co-Developers in Java public class CoDeveloperPath implements Path { public List invoke(Object root) { if(root instanceof Vertex) { List<Vertex> projects = new ArrayList<Vertex>(); for(Edge edge : ((Vertex)root).getOutEdges()) { if(edge.getLabel().equals("created")) { projects.add(edge.getInVertex()); } } List<Vertex> coDevelopers = new ArrayList<Vertex>(); for(Vertex project : projects) { for(Edge edge : project.getInEdges()) { if(edge.getLabel().equals("created") && edge.getOutVertex() != root) { coDevelopers.add(edge.getOutVertex()); } } } return coDevelopers; } else { return null; } } } Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 29. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 30. Gremlin Type System object element graph number string boolean map list vertex edge Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 31. Predeﬁned Paths and Properties vertex 1 out edges vertex 3 in edges edge 9 out vertex edge 9 label edge 9 in vertex edge 9 id 1 9 created 3 8 11 knows created 4 vertex 4 id vertex 4 properties name = "josh" age = 32 object property description example graph V the vertex iterator of the graph $g/V graph E the edge iterator of the graph $g/E vertex/edge @id the identiﬁer of the element $v/@id vertex outE the outgoing edges of the vertex $v/outE vertex inE the incoming edges of the vertex $v/inE vertex bothE both in and out edges of the vertex $v/bothE edge outV the outgoing tail vertex of the edge $e/outV edge inV the incoming head vertex of the edge $e/outV edge bothV both in and out vertices of the edge $e/bothV edge @label the label of the edge $e/@label Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 32. Predeﬁned Functions g:assign() g:remove-idx() g:list() g:sort() g:print() g:assign() g:load() g:dedup() g:map() g:time() g:unassign() g:save() g:union() g:keys() g:p() g:id() g:clear() g:intersect() g:values() g:to-json() g:key() g:close() g:difference() g:rand-nat() g:from-json() g:add-v() g:keys() g:retain() g:rand-real() ... g:add-e() g:values() g:except() g:prob() .. g:remove-ve() g:map() g:remove() g:cont() . g:idx-all() g:get() g:get() g:halt() g:add-idx() g:op-value() g:op-value() g:type() There are over 70 predeﬁned functions. See the following for a description of each. http://wiki.github.com/tinkerpop/gremlin/core-function-library http://wiki.github.com/tinkerpop/gremlin/gremlin-function-library Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 33. Working With Non-Graph Types gremlin> 1.2 + 6 ==>7.2 gremlin> ‘this is a string’ ==>this is a string gremlin> true() or false() ==>true gremlin> g:map(‘marko’,‘lanl’,‘peter’,‘neotech’,‘josh’,‘rpi’) ==>marko=lanl ==>peter=neotech ==>josh=rpi gremlin> g:list(‘graphs’,‘hockey’,‘motorcylces’,6) ==>graphs ==>hockey ==>motorcylces ==>6.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 34. Working With Non-Graph Types gremlin> $m := g:map(‘hobbies’,g:list(‘hockey’,‘graphs’), ‘location’, g:map(‘state’,‘new mexico’, ‘city’, ‘santa fe’, ‘zipcode’, 87501), ‘age’, 30) ==>location={zipcode=87501.0, state=new mexico, city=santa fe} ==>age=30.0 ==>hobbies=[hockey, graphs] gremlin> $m/@age ==>30.0 gremlin> $m/@hobbies[2] ==>graphs gremlin> $m/@location/@city ==>santa fe Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 35. Variables • Variables in Gremlin are preﬁxed with a $ character. • There are a collection of reserved variables that all begin with $ . $ is the root list of objects. $ last is the last result evaluated by the evaluator. $ g is the “working graph” to reduce typing with graph functions. gremlin> $x := 1 ==>1.0 gremlin> $y := 2 ==>2.0 gremlin> $x + $y ==>3.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 36. Language Statements Variable Assignment Repeat gremlin> $i := 0 gremlin> $i := 1 + 5 ==>0.0 ==>6.0 gremlin> repeat 10 gremlin> $i $i := $i + 1 ==>6.0 end ==>10.0 If/Else While gremlin> if true() gremlin> $i := ‘g’ $i := 1 ==>g else gremlin> while not(matches($i, ‘ggg’)) $i := 2 $i := concat($i,‘g’) end end ==>1.0 ==>ggg Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 37. Language Statements Foreach Path gremlin> $i := 0 gremlin> path friend_name ==>0.0 ./outE[@label=‘knows’]/inV/@name gremlin> foreach $j in 1 | 2 | 3 end $i := $i + $j gremlin> gremlin> ./friend_name end ==>vadas ==>6.0 ==>josh Function gremlin> func ex:hello($name) concat(‘hello ’, $name) end gremlin> ex:hello(‘pavel’) ==>hello pavel You can deﬁne functions and paths in native Gremlin (as demonstrated above) or in Java. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 38. XPath Filters • Use [ ] ﬁlters to ﬁlter objects in a path expression (i.e. “such that” or “where”) • The evaluated result of [ ] must be a number or boolean. If its a number, it is treated as the position within an array (i.e. list). If it is boolean, it is treated as whether to include or exclude the object from the next path in the sequence. gremlin> ./outE[@label=‘knows’] ==>e[7][1-knows->2] ==>e[8][1-knows->4] gremlin> ./outE[@label=‘knows’ and @weight>0.5]/inV[@age<21 or @name=‘josh’][true()][1] ==>v[4] Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 39. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusion Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 40. A Grateful Dead Dataset 2,500 concerts 35,000 songs played 600 songs 30 years 11 members 1 band ... the Grateful Dead. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 41. A Grateful Dead Dataset • vertices denote songs and artists type: “song” or “artist” name: name of song or artist. performances: number of times song was played in concert. song type: whether the song was a “cover” or “original”. • edges denote followed by, sung by, written by weight: number of times a song was followed by another song over all concerts played. Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening Behavior,” First Monday, 14(1), University of Illinois at Chicago Library, http://arxiv.org/abs/0807.2466, January 2009. NOTE: A portion of the raw dataset courtesy of Mark Leone http://www.cs.cmu.edu/ mleone/gdead/setlists.html Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 42. A Grateful Dead Dataset Stanley Theater type="artist" type="artist" name="Hunter" name="Garcia" Pittsburgh, PA (11/30/79) type="song" name="Scarlet.." 7 2nd Set 5 written_by 1 sung_by ------------------- weight=239 Scarlet Begonias followed_by type="song" Fire on the Mountain name="Fire on.." sung_by sung_by written_by Passenger 2 Terrapin Station weight=1 type="artist" name="Lesh" ... followed_by type="song" name="Pass.." 6 .. written_by 3 sung_by . followed_by type="song" weight=2 name="Terrap.." 4 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 43. A Grateful Dead Dataset – Load Data/Basic Stats gremlin> g:load(‘data/graph-example-2.xml’) ==>true gremlin> count($_g/V) ==>809.0 gremlin> count($_g/E) ==>8049.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 44. A Grateful Dead Dataset – Out-Degree of Each Vertex gremlin> $degrees := g:map() gremlin> foreach $v in $_g/V $degrees[@name=$v/@name] := count($v/outE) end Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 45. A Grateful Dead Dataset – Out-Degree of Each Vertex gremlin> g:sort($degrees, ‘value’, true()) ==>PLAYING IN THE BAND=96.0 ==>SUGAR MAGNOLIA=92.0 ==>PROMISED LAND=89.0 ==>GOOD LOVING=87.0 ==>NOT FADE AWAY=86.0 ==>I KNOW YOU RIDER=85.0 ==>CASSIDY=83.0 ==>DEAL=82.0 ==>JACK STRAW=81.0 ==>ONE MORE SATURDAY NIGHT=81.0 ==>EL PASO=80.0 ==>MEXICALI BLUES=79.0 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 46. A Grateful Dead Dataset – Inspecting Single Vertex gremlin> $v := g:key(‘name’,‘CHINA DOLL’)[1] ==>v[129] gremlin> g:map($v) ==>name=CHINA DOLL ==>song_type=original ==>performances=114 ==>type=song gremlin> $v/outE[@label=‘sung_by’]/inV/@name ==>Garcia Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 47. A Grateful Dead Dataset – Inspecting Single Vertex gremlin> $v/outE[@label=‘followed_by’]/inV/@name ==>BIG RIVER ==>THROWING STONES ==>SAMSON AND DELILAH ==>TRUCKING ==>CASEY JONES ==>HIGH TIME ... gremlin> $v/outE[@label=‘followed_by’]/@weight ==>2 ==>8 ==>1 ==>2 ==>1 ==>1 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 48. Introduction to PageRank • The remainder of this section will discuss the PageRank algorithm and its application to multi-relational graphs. • The arguments made and the examples presented generalizes to all other single-relational graph algorithms. However, for the sake of brevity and consistency, only PageRank will be discussed. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 49. Introduction to Matrix-Based PageRank • PageRank is a centrality measure based on the primary eigenvector |V |×|V | of a modiﬁed version of a graph. Let A ∈ R+ denote the adjacency matrix representing the graph. • In order to ensure a positive real values in the eigenvector, the graph must be strongly connected. PageRank induces strong connectivity by overlaying a low probability (deﬁned by α ∈ [0, 1] – usually 0.15) 1 |V |×|V | “teleportation” graph over the original graph. Let B ∈ |V | denote a teleportation adjacency matrix where ever vertex is connected to vertex with equal probability. |V |×|V | C = (1 − α)A + αB, where C ∈ R+ |V | λ = λC, where λ ∈ R+ is the PageRank vector over V . Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 50. Introduction to Random Walk-Based PageRank • PageRank can be implemented by a random walk. • Create a vertex counter map, m : V → N+. • Place a walker on a random vertex in V . Denote the walker’s current vertex i ∈ V . 1. increment the vertex counter by 1 (i.e. m(i) ← m(i) + 1). 2. the walker chooses a random adjacent vertex with probability α. 3. the walker chooses a random vertex in V with probability 1 − α. 4. rinse and repeat until m reaches a stationary probability distribution (continually normalize m if you want a probability distribution). • We will use this random walk model in the Gremlin examples to follow. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 51. PageRank over Multi-Relational Graphs • PageRank was designed for single-relational graphs (i.e. where all edges have the same meaning). • In a multi-relational graph, what does it mean to ﬁnd the centrality of a vertex when vertices can be related by various types of edges? For example, if there exists “socializes with” and “met once”, then the person who “met once” many people could be the most centrally located in the graph. Also, what if you graph has more than just “person”-type vertices (e.g. cars, pets, buildings, articles, etc.) and “person”-type edges (e.g. owns, walks, livesAt, cites, etc.). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 52. PageRank over Multi-Relational Graphs • Calculating single-relational PageRank would yield Person as the most central ... Person type vertex. type type • You can boolean ﬁlter certain edge labels type type (e.g. ignore type edges — in such cases, type type type type type type type you would have the centrality scores over the knows social graph). • However, what if you only wanted to traverse knows edges if and only if the Herbert Johan Marko Josh Jen ... adjacent vertex knows more than 10 other people? knows knows knows knows • In the end, you want complete knows knows control (universal computability) over the paths that the traverser/walker can take through a graph. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 53. PageRank over Multi-Relational Graphs • In multi-relational graphs, the meaning of your graph algorithm’s results are deﬁned by your deﬁnition of adjacency. • With respect to random walk-based PageRank, deﬁne the path that the walker should take. That path is the deﬁnition of adjacency. • The stationary probability distribution created from this walk yields a path-dependent centrality. • Thus, in a multi-relational graph, there are many types of PageRanks that can be calculated — one for each type of path deﬁned for a walker. Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, Knowledge-Based Systems, 21(7), 727–739, http://arxiv.org/abs/0803.4355, October 2008. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 54. PageRank over “Garcia Followed By” SubGraph • Deﬁne a path that will go from song-to-song by “followed by” edges and only traverse songs that are “sung by” Jerry Garcia. (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’] /inV[name=‘Garcia’]/../..)[g:rand-nat()] A B C D /../.. followed_by sung_by name="Garcia" g:rand-nat() . followed_by sung_by name="Garcia" followed_by sung_by name="Weir" Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 55. PageRank over “Garcia Followed By” SubGraph path garcia-followed_by (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’] /inV[name=‘Garcia’]/../..)[g:rand-nat()] end $m := g:map() $alpha := 0.15 $_ := g:key(‘type’, ‘song’)[g:rand-nat()] repeat 2500 $_ := ./garcia-followed_by if count($_) > 0 g:op-value(‘+’,$m,$_[1]/@name, 1.0) end if g:rand-real() < $alpha or count($_) = 0 $_ := g:key(‘type’, ’song’)[g:rand-nat()] end end Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 56. PageRank over “Garcia Followed By” SubGraph gremlin> g:sort($m,‘value’,true()) ==>CRAZY FINGERS=98.0 ==>HES GONE=85.0 ==>CHINA CAT SUNFLOWER=79.0 ==>BERTHA=76.0 ==>UNCLE JOHNS BAND=74.0 ==>TERRAPIN STATION=72.0 ==>GOING DOWN THE ROAD FEELING BAD=71.0 ==>WHARF RAT=71.0 ==>EYES OF THE WORLD=65.0 ==>COLD RAIN AND SNOW=62.0 ==>SHIP OF FOOLS=58.0 ==>RAMBLE ON ROSE=53.0 ==>CASEY JONES=51.0 ==>DARK STAR=47.0 ==>DEAL=46.0 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 57. Universal Computation in Paths path path-name # any arbitrary computation can occur here end • A path deﬁnition can be used to deﬁne adjacencies. adjacency can be expressed as anything that can be computed by a Turing machine. path deﬁnitions are used to create “semantically meaningful” results from single- relational graph algorithms applied to multi-relational graphs. path deﬁnitions make explicit what is implicit in the structure of the graph. This has applications to knowledge-based reasoning. • A path deﬁnition can perform any arbitrary computation. path deﬁnitions can check/set vertex/edge properties. path deﬁnitions can create new vertices and edges. path deﬁnitions can call/deﬁne functions. This allows ﬁne grained control over how your traverser/walker moves through a graph. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 58. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 59. The Current Gremlin EcoSystems • Webling: Web console for Gremlin (developed by Pavel Yaskevich w/ funding from Neo Technology) Webling • Project Gargamel: Distributed Graph Computing (uses Linked Process and Gremlin) • ReXster: A Graph-Based Recommender Engine Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
- 60. Thank You Please enjoy Gremlin at http://gremlin.tinkerpop.com ... My homepage is http://markorodriguez.com. Please feel to contact me with any questions or comments. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

No public clipboards found for this slide

Login to see the comments