SlideShare a Scribd company logo
1 of 68
Download to read offline
TITAN
BIG GRAPH DATA WITH CASSANDRA
#TITANDB #GRAPHDB #CASSANDRA12



Matthias Broecheler, CTO         AURELIUS
August VIII, MMXII               THINKAURELIUS.COM
Abstract
Titan is an open source distributed graph database build on top of
Cassandra that can power real-time applications with thousands of
concurrent users over graphs with billions of edges. Graphs are a versatile
data model for capturing and analyzing rich relational structures. Graphs
are an increasingly popular way to represent data in a wide range of
domains such as           social networking, recommendation engines,
advertisement optimization, knowledge representation,         health care,
education, and security.

This presentation discusses Titan's data model, query language, and novel
techniques in edge compression, data layout, and vertex-centric indices
which facilitate the representation and processing of Big Graph Data
across a Cassandra cluster. We demonstrate Titan's performance on a
large scale benchmark evaluation using Twitter data.
Titan Graph Database
      supports real time local traversals (OLTP)
      is highly scalable
         in the number of concurrent users
         in the size of the graph
      is open source under the Apache2 license
      builds on top of Apache Cassandra for
       distribution and replication
I
The Graph Data Model


                       AURELIUS
                       THINKAURELIUS.COM
Hercules: demigod
Alcmene: human
Jupiter: god
Saturn: titan
Pluto: god
Neptune: god
Cerberus: monster
                     Entities
Name
       Type
Hercules
   demigod
Alcmene
    human
Jupiter
    god
Saturn
     titan
Pluto
      god
Neptune
    god
Cerberus
   monster




                       Table
Name:
      Name:
     Name:
      Name:
Hercules
   Alcmene
   Jupiter
    Saturn
Type:
      Type:
     Type:
      Type:
demigod
    human
     god
        titan



Name:
      Name:
     Name:
Pluto
      Neptune
   Cerberus
Type:
      Type:
     Type:
god
        god
       monster



                                      Documents
Hercules
   type:demigod

Alcmene
    type:human

Jupiter
    type:god

Saturn
     type:titan

Pluto
      type:god

Neptune
    type:god

Cerberus
   type:monster


                            Key->Value
name: Neptune
   name: Alcmene
                         type: god
       type: god


Vertex
                                                              Property

         name: Saturn
   name: Jupiter
   name: Hercules
         type: titan
    type: god
       type: demigod




                         name: Pluto
     name: Cerberus
                         type: god
       type: monster




                                                            Graph
name: Neptune
                name: Alcmene
                                   type: god
                    type: god


Edge
                        brother
                       mother


       name: Saturn
               name: Jupiter
                name: Hercules
       type: titan
                type: god
                    type: demigod


              father
                       father
                                                                                      Edge
                                                      battled
                        brother
                                                      time:12
                                                                                    Property

                                   name: Pluto
                  name: Cerberus
                                   type: god
                    type: monster
   Edge
   Type                                      pet


                                                                                   Graph
I
Graph = Agile Data Model
II
Graph Use Cases


                  AURELIUS
                  THINKAURELIUS.COM
Recommendations
Recommendation?

name: Hercules
name: “Muscle building for beginners”
name: Hercules
   type: book


 bought
name: “Muscle building for beginners”
name: Hercules
             type: book


 bought


                  bought



name: Newton
name: “Muscle building for beginners”
name: Hercules
             type: book


 bought


                  bought

                            name: “How to deal with Father issues”
name: Newton
               type: book


 bought
name: “Muscle building for beginners”
     name: Hercules
             type: book


      bought


recommend
             bought

                                 name: “How to deal with Father issues”
     name: Newton
               type: book


      bought




                                               Traversal
name: “Dancing with the Stars”
                            type: DVD


       in-Cart


                            name: “Muscle building for beginners”
name: Hercules
             type: book


 bought


                  bought

                            name: “How to deal with Father issues”
name: Newton
               type: book


 bought



    viewed
                 name: “Friends forever bracelet”
                            type: Accessory
name: “Dancing with the Stars”
                                       type: DVD


                  in-Cart


                                       name: “Muscle building for beginners”
           name: Hercules
             type: book


            bought


friends
                             bought

                                       name: “How to deal with Father issues”
           name: Newton
               type: book


            bought



               viewed
                 name: “Friends forever bracelet”
                                       type: Accessory
name: “Dancing with the Stars”
                                        type: DVD


                  in-Cart


                                        name: “Muscle building for beginners”
           name: Hercules
              type: book


            bought
      time:24


friends
                              bought
                          time:22
                                        name: “How to deal with Father issues”
           name: Newton
                type: book


            bought
      time:20



               viewed
                  name: “Friends forever bracelet”
                                        type: Accessory
Recommendations




        Path
      Finding
name: Neptune
                     name: Alcmene
                                 type: god
                         type: god

                                                               X
                      brother
                       mother


     name: Saturn
               name: Jupiter
                     name: Hercules
     type: titan
                type: god
                         type: demigod

X
          father
                       father

                                                    battled
                      brother
                                                    time:12


                                 name: Pluto
                       name: Cerberus
                                 type: god
                         type: monster


                                           pet

                                                                                        Path
                                                                                      Finding
name: Neptune
                     name: Alcmene
                                 type: god
                         type: god

                                                               X
                      brother
                       mother


     name: Saturn
               name: Jupiter
                     name: Hercules
     type: titan
                type: god
                         type: demigod

X
          father
                       father

                                                    battled
                      brother
                                                    time:12


                                 name: Pluto
                       name: Cerberus
                                 type: god
                         type: monster


                                           pet

                                                                                        Path
                                                                                      Finding
Titan: Big Graph Data with Cassandra
yahoo.com
   geocities.com      cnn.com
             /johnlittlesite
<html> …                        <html> …
</html>!     <html> …           </html>!
             </html>!




                                  Credibility?
url: yahoo.com
                                        url: cnn.com
html: <html>…!                                         html: <html>…!




                  url: geocities.com/johnlittlesite
                                                                         Link
                  html: <html>…!
                                                                        Graph
url: yahoo.com
                                          url: cnn.com
html: <html>…!                                           html: <html>…!


                              elections



              funny cat
                       foreign policy




                  url: geocities.com/johnlittlesite
                                                                           Link
                  html: <html>…!
                                                                          Graph
II
Graph = Milk Your Connections
III
The Titan Graph Database


                           AURELIUS
                           THINKAURELIUS.COM
Titan Features
  numerous concurrent users
  real-time traversals (OLTP)
  high availability
  dynamic scalability
  built on Apache Cassandra
Titan Ecosystem
  Native Blueprints                     Graph
                                         Server



   Implementation
                       Graph
                                       Algorithms



  Gremlin Query                      Object-Graph
                                        Mapper



   Language
                           Traversal
                                       Language



  Rexster Server
                     Dataflow
                                      Processing


    any Titan graph can be exposed     Generic
                                       Graph API
     as a REST endpoint
Titan Internals
I.  Data Management

II.  Edge Compression

III. Vertex-Centric
   Indices
IV
Rebuilding Twitter with Titan


                          AURELIUS
                          THINKAURELIUS.COM
text: string
              name: string!                 time: long!




 follows
       User
                        Tweet
time: long!



                              tweets
                              time: long!
time: long!
                              stream
       text: string
              name: string!                 time: long!




 follows
       User
                        Tweet
time: long!



                              tweets
                              time: long!
Titan Storage Model
  Adjacency list in one                         5

   column family
  Row key = vertex id
  Each property and edge
                                           5
   in one column
     Denormalized, i.e. stored twice
  Direction and label/key as column prefix
     Use slice predicate for quick retrieval
Connecting Titan




titan$ bin/gremlin.sh!
          ,,,/!
          (o o)!
-----oOOo-(_)-oOOo-----!
gremlin> conf = new BaseConfiguration();!
==>org.apache.commons.configuration.BaseConfiguration@763861e6!
gremlin> conf.setProperty("storage.backend","cassandra");!
gremlin> conf.setProperty("storage.hostname","77.77.77.77");!
gremlin> g = TitanFactory.open(conf);
==>titangraph[cassandra:77.77.77.77]!
gremlin>!
Defining Property Keys




gremlin>   g.makeType().name(“time”).!
   ! !     dataType(Long.class).!
   ! !     functional().!
   ! !     makePropertyKey();!
gremlin>   g.makeType().name(“text”).dataType(String.class).!
   ! !     functional().makePropertyKey();!
gremlin>   g.makeType().name(“name”).dataType(String.class).!
   ! !     indexed().!
   ! !     unique().!
   ! !     functional().makePropertyKey();!
Defining Property Keys



                                                         Each type has a unique name

gremlin>   g.makeType().name(“time”).!                             The allowed data type
   ! !     dataType(Long.class).!
   ! !     functional().!          If a key is functional, each vertex can
   ! !     makePropertyKey();!     have at most one property for this key
gremlin>   g.makeType().name(“text”).dataType(String.class).!
   ! !     functional().makePropertyKey();!
gremlin>   g.makeType().name(“name”).dataType(String.class).!
   ! !     indexed().!
   ! !     unique().!
   ! !     functional().makePropertyKey();!
Defining Property Keys




gremlin>   g.makeType().name(“time”).!
   ! !     dataType(Long.class).!
   ! !     functional().!
   ! !     makePropertyKey();!
gremlin>   g.makeType().name(“text”).dataType(String.class).!
   ! !     functional().makePropertyKey();!
gremlin>   g.makeType().name(“name”).dataType(String.class).!
   ! !     indexed().!     Creates and maintains an index over property values
   ! !     unique().!
                                                      Ensures that each property value is uniquely
   ! !     functional().makePropertyKey();! associated with only one vertex by acquiring a lock.
Titan Indexing
  Vertices can be retrieved by
   property key + value
          name : Hercules
   5

  Titan maintains index in a
                                  name : Jupiter
    9
   separate column family as
   graph is updated
  Only need to define a
   property key as .index()
Titan Locking
  Locking ensures consistency
   when it is needed
                    name : Hercules
         5
  Titan uses time stamped
   quorum reads and writes on                                     9
   separate CFs for locking
  Uses
                                 name :
                name :
                                                                Jupiter
                                         Hercules
     Property uniqueness: .unique()
                 father

     Functional edges: .functional()
     Global ID management
                                                     x
         name :
                                                      father
   Pluto
Defining Edge Labels




gremlin>   g.makeType().name(“follows”).!
   ! !     primaryKey(time).!
   ! !     makeEdgeLabel();!
gremlin>   g.makeType().name(“tweets”).!
   ! !     primaryKey(time).makeEdgeLabel();!
gremlin>   g.makeType().name(“stream).!
   ! !     primaryKey(time).!
   ! !     unidirected().!
   ! !     makeEdgeLabel();!
Defining Edge Labels




gremlin>   g.makeType().name(“follows”).!
   ! !     primaryKey(time).!     Sort/index key for edges of this label
   ! !     makeEdgeLabel();!
gremlin>   g.makeType().name(“tweets”).!
   ! !     primaryKey(time).makeEdgeLabel();!
gremlin>   g.makeType().name(“stream).!
   ! !     primaryKey(time).!
   ! !     unidirected().!
   ! !     makeEdgeLabel();!
Defining Edge Labels




gremlin>   g.makeType().name(“follows”).!
   ! !     primaryKey(time).!
   ! !     makeEdgeLabel();!
gremlin>   g.makeType().name(“tweets”).!
   ! !     primaryKey(time).makeEdgeLabel();!
gremlin>   g.makeType().name(“stream).!
   ! !     primaryKey(time).!
   ! !     unidirected().!
                                Store edges of this label only in outgoing direction 
   ! !     makeEdgeLabel();!
Vertex-Centric Indices
  Sort and index edges per
   vertex by primary key
     Primary key can be composite
  Enables efficient focused
   traversals
     Only retrieve edges that matter
  Uses slice predicate for quick,
   index-driven retrieval
tweets
             tweets
              tweets
time: 123
         time: 334
         time: 624


                                                                            v.query()!
                                                     tweets
                                v
                 time: 1112
     follows




        follows
                     follows
                    follows
tweets
       tweets
              tweets
time: 123
   time: 334
         time: 624


                                                                      v.query()!
                                               tweets
                .direction(OUT)!
                          v
                 time: 1112




                               follows
                    follows
tweets
       tweets
            tweets
time: 123
   time: 334
        time: 624


                                                          v.query()!
                                             tweets
      .direction(OUT)!
                          v
                time: 1112
   .labels(“tweets”)!
v.query()!
      tweets
      .direction(OUT)!
v
   time: 1112
   .labels(“tweets”)!
                   .has(“time”,T.gt,1000)!
name: Hercules
   name: Pluto
Create Accounts




   gremlin> hercules = g.addVertex(['name':'Hercules']);!

   gremlin> pluto = g.addVertex(['name':'Pluto']);!
name: Hercules
      name: Pluto
Add Followship
                                                                 follows
                                                                  time:2




   gremlin> hercules = g.addVertex(['name':'Hercules']);!

   gremlin> pluto = g.addVertex(['name':'Pluto']);!

   gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);!
name: Hercules
            name: Pluto
Publish Tweet
                                                                follows
                                                                 time:2


                                                                             tweets
                                                                             time:4


                                                                           text: A tweet!
                                                                           time: 4!
  gremlin> hercules = g.addVertex(['name':'Hercules']);!

  gremlin> pluto = g.addVertex(['name':'Pluto']);!

  gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);!

  gremlin> tweet = g.addVertex(['text':'A tweet!','time':4])!

  gremlin> g.addEdge(pluto,tweet,"tweets",['time':4]) !
name: Hercules
            name: Pluto
Update Streams
                                                                  follows
                                                                   time:2


                                                       stream
                 tweets
                                                       time:4
                 time:4


                                                                             text: A tweet!
                                                                             time: 4!
   gremlin> hercules = g.addVertex(['name':'Hercules']);!

   gremlin> pluto = g.addVertex(['name':'Pluto']);!

   gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);!

   gremlin> tweet = g.addVertex(['text':'A tweet!','time':4])!

   gremlin> g.addEdge(pluto,tweet,"tweets",['time':4]) !

   gremlin> pluto.in("follows").each{g.addEdge(it,tweet,"stream",['time':4])} !
name: Hercules
            name: Pluto
Read Stream
                                                                         follows
                                                                          time:2


                                                              stream
                 tweets
                                                              time:4
                 time:4


                                                                                    text: A tweet!
                                                                                    time: 4!
 gremlin> hercules = g.addVertex(['name':'Hercules']);!

 gremlin> pluto = g.addVertex(['name':'Pluto']);!

 gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);!

 gremlin> tweet = g.addVertex(['text':'A tweet!','time':4])!

 gremlin> g.addEdge(pluto,tweet,"tweets",['time':4]) !

 gremlin> pluto.in("follows").each{g.addEdge(it,tweet,"stream",['time':4])} !

 gremlin> hercules.outE('stream')[0..9].inV.map!
                                     Sorted by time because its ‘stream’s primary key
Followship                                           name: Hercules
            name: Pluto

Recommendation
                                                                  follows
                                                                   time:2


                                                       follows
               follows
                                                                               time:9


                                                                             name: Neptune




   follows = g.V('name',’Hercules’).out('follows').toList()!
   follows20 = follows[(0..19).collect{random.nextInt(follows.size)}]!
   m = [:]!
   follows20.each !
       { it.outE('follows’[0..29].inV.except(follows).groupCount(m).iterate() }!
   m.sort{a,b -> b.value <=> a.value}[0..4]!
IV
Titan Performance Evaluation on
Twitter-like Benchmark

                         AURELIUS
                         THINKAURELIUS.COM
Twitter Benchmark
  1.47 billion followship edges
   and 41.7 million users
     Loaded into Titan using BatchGraph
     Twitter in 2009, crawled by Kwak et. al
  4 Transaction Types
       Create Account (1%)
       Publish tweet (15%)
       Read stream (76%)
       Recommendation (8%)
                     Kwak, H., Lee, C., Park, H., Moon, S., “What is

           Follow recommended user (30%)
       Twitter, a Social Network or a News Media?,”
                                                 World Wide Web Conference, 2010.
Benchmark Setup
  6 cc1.4xl Cassandra nodes
     in one placement group
     Cassandra 1.10
  40 m1.small worker machines
     repeatedly running transactions
     simulating servers handling user
      requests
  EC2 cost: $11/hour
Benchmark Results

Transaction Type
            Number of tx
         Mean tx time
       Std of tx time
Create account
                        379,019 
         115.15 ms
            5.88 ms
Publish tweet
                       7,580,995      
     18.45 ms
            6.34 ms
Read stream
                       37,936,184 
             6.29 ms
           1.62 ms
Recommendation
                      3,793,863      
     67.65 ms
           13.89 ms
                    Total
         49,690,061
                 Runtime
             2.3 hours
                                                         5,900 tx/sec
Peak Load Results

Transaction Type
            Number of tx
         Mean tx time
      Std of tx time
Create account
                        374,860 
         172.74 ms     
     10.52 ms
Publish tweet
                       7,517,667      
     70.07 ms
          19.43 ms
Read stream
                       37,618,648 
           24.40 ms
           
3.18 ms
Recommendation
                      3,758,266      
    229.83 ms
          29.08 ms
                    Total
         49,269,441
                 Runtime
             1.3 hours
                                                        10,200 tx/sec
Benchmark Conclusion
Titan  can  handle  10s  of  thousands  of  concurrent  users  
with   short   response   5mes   even   for   complex   traversals  
on   a   simulated   social   networking   applica5on   based   on  
real-­‐world   network   data   with   billions   of   edges   and  
millions  of  users  in  a  standard  EC2  deployment.  
For  more  informa5on  on  the  benchmark:  
hDp://thinkaurelius.com/2012/08/06/5tan-­‐provides-­‐real-­‐5me-­‐big-­‐graph-­‐data/  
Future Titan
  Titan+Cassandra embedding
    sending Gremlin queries into
     the cluster
  Graph partitioning together
   with ByteOrderedPartitioner
    data locality = better performance
  Let us know what you need!
Titan goes OLAP


                            Map/Reduce
                           Load & Compress




                                Analysis results
                                back into Titan

    Stores a massive-scale                   Batch processing of large           Runs global graph algorithms
property graph allowing real-                  graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                         in-memory graphs
III
Graph = Scalable + Practical
TITAN
THINKAURELIUS.GITHUB.COM/TITAN
AURELIUS
THINKAURELIUS.COM

More Related Content

Viewers also liked

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsNhatHai Phan
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learningjoshwills
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in PythonImry Kissos
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesPier Luca Lanzi
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013Philip Zheng
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentationlpaviglianiti
 

Viewers also liked (20)

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 

More from Matthias Broecheler

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Matthias Broecheler
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Matthias Broecheler
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraMatthias Broecheler
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksMatthias Broecheler
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksMatthias Broecheler
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Matthias Broecheler
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksMatthias Broecheler
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksMatthias Broecheler
 

More from Matthias Broecheler (10)

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan NYC Meetup March 2014
Titan NYC Meetup March 2014Titan NYC Meetup March 2014
Titan NYC Meetup March 2014
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with Cassandra
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large Networks
 
Probabilistic Soft Logic
Probabilistic Soft LogicProbabilistic Soft Logic
Probabilistic Soft Logic
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
 

Recently uploaded

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 

Recently uploaded (20)

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 

Titan: Big Graph Data with Cassandra

  • 1. TITAN BIG GRAPH DATA WITH CASSANDRA #TITANDB #GRAPHDB #CASSANDRA12 Matthias Broecheler, CTO AURELIUS August VIII, MMXII THINKAURELIUS.COM
  • 2. Abstract Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Graphs are a versatile data model for capturing and analyzing rich relational structures. Graphs are an increasingly popular way to represent data in a wide range of domains such as social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security. This presentation discusses Titan's data model, query language, and novel techniques in edge compression, data layout, and vertex-centric indices which facilitate the representation and processing of Big Graph Data across a Cassandra cluster. We demonstrate Titan's performance on a large scale benchmark evaluation using Twitter data.
  • 3. Titan Graph Database   supports real time local traversals (OLTP)   is highly scalable   in the number of concurrent users   in the size of the graph   is open source under the Apache2 license   builds on top of Apache Cassandra for distribution and replication
  • 4. I The Graph Data Model AURELIUS THINKAURELIUS.COM
  • 5. Hercules: demigod Alcmene: human Jupiter: god Saturn: titan Pluto: god Neptune: god Cerberus: monster Entities
  • 6. Name Type Hercules demigod Alcmene human Jupiter god Saturn titan Pluto god Neptune god Cerberus monster Table
  • 7. Name: Name: Name: Name: Hercules Alcmene Jupiter Saturn Type: Type: Type: Type: demigod human god titan Name: Name: Name: Pluto Neptune Cerberus Type: Type: Type: god god monster Documents
  • 8. Hercules type:demigod Alcmene type:human Jupiter type:god Saturn type:titan Pluto type:god Neptune type:god Cerberus type:monster Key->Value
  • 9. name: Neptune name: Alcmene type: god type: god Vertex Property name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod name: Pluto name: Cerberus type: god type: monster Graph
  • 10. name: Neptune name: Alcmene type: god type: god Edge brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father Edge battled brother time:12 Property name: Pluto name: Cerberus type: god type: monster Edge Type pet Graph
  • 11. I Graph = Agile Data Model
  • 12. II Graph Use Cases AURELIUS THINKAURELIUS.COM
  • 15. name: “Muscle building for beginners” name: Hercules type: book bought
  • 16. name: “Muscle building for beginners” name: Hercules type: book bought bought name: Newton
  • 17. name: “Muscle building for beginners” name: Hercules type: book bought bought name: “How to deal with Father issues” name: Newton type: book bought
  • 18. name: “Muscle building for beginners” name: Hercules type: book bought recommend bought name: “How to deal with Father issues” name: Newton type: book bought Traversal
  • 19. name: “Dancing with the Stars” type: DVD in-Cart name: “Muscle building for beginners” name: Hercules type: book bought bought name: “How to deal with Father issues” name: Newton type: book bought viewed name: “Friends forever bracelet” type: Accessory
  • 20. name: “Dancing with the Stars” type: DVD in-Cart name: “Muscle building for beginners” name: Hercules type: book bought friends bought name: “How to deal with Father issues” name: Newton type: book bought viewed name: “Friends forever bracelet” type: Accessory
  • 21. name: “Dancing with the Stars” type: DVD in-Cart name: “Muscle building for beginners” name: Hercules type: book bought time:24 friends bought time:22 name: “How to deal with Father issues” name: Newton type: book bought time:20 viewed name: “Friends forever bracelet” type: Accessory
  • 22. Recommendations Path Finding
  • 23. name: Neptune name: Alcmene type: god type: god X brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod X father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Path Finding
  • 24. name: Neptune name: Alcmene type: god type: god X brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod X father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Path Finding
  • 26. yahoo.com geocities.com cnn.com /johnlittlesite <html> … <html> … </html>! <html> … </html>! </html>! Credibility?
  • 27. url: yahoo.com url: cnn.com html: <html>…! html: <html>…! url: geocities.com/johnlittlesite Link html: <html>…! Graph
  • 28. url: yahoo.com url: cnn.com html: <html>…! html: <html>…! elections funny cat foreign policy url: geocities.com/johnlittlesite Link html: <html>…! Graph
  • 29. II Graph = Milk Your Connections
  • 30. III The Titan Graph Database AURELIUS THINKAURELIUS.COM
  • 31. Titan Features   numerous concurrent users   real-time traversals (OLTP)   high availability   dynamic scalability   built on Apache Cassandra
  • 32. Titan Ecosystem   Native Blueprints Graph Server Implementation Graph Algorithms   Gremlin Query Object-Graph Mapper Language Traversal Language   Rexster Server Dataflow Processing   any Titan graph can be exposed Generic Graph API as a REST endpoint
  • 33. Titan Internals I.  Data Management II.  Edge Compression III. Vertex-Centric Indices
  • 34. IV Rebuilding Twitter with Titan AURELIUS THINKAURELIUS.COM
  • 35. text: string name: string! time: long! follows User Tweet time: long! tweets time: long!
  • 36. time: long! stream text: string name: string! time: long! follows User Tweet time: long! tweets time: long!
  • 37. Titan Storage Model   Adjacency list in one 5 column family   Row key = vertex id   Each property and edge 5 in one column   Denormalized, i.e. stored twice   Direction and label/key as column prefix   Use slice predicate for quick retrieval
  • 38. Connecting Titan titan$ bin/gremlin.sh! ,,,/! (o o)! -----oOOo-(_)-oOOo-----! gremlin> conf = new BaseConfiguration();! ==>org.apache.commons.configuration.BaseConfiguration@763861e6! gremlin> conf.setProperty("storage.backend","cassandra");! gremlin> conf.setProperty("storage.hostname","77.77.77.77");! gremlin> g = TitanFactory.open(conf); ==>titangraph[cassandra:77.77.77.77]! gremlin>!
  • 39. Defining Property Keys gremlin> g.makeType().name(“time”).! ! ! dataType(Long.class).! ! ! functional().! ! ! makePropertyKey();! gremlin> g.makeType().name(“text”).dataType(String.class).! ! ! functional().makePropertyKey();! gremlin> g.makeType().name(“name”).dataType(String.class).! ! ! indexed().! ! ! unique().! ! ! functional().makePropertyKey();!
  • 40. Defining Property Keys Each type has a unique name gremlin> g.makeType().name(“time”).! The allowed data type ! ! dataType(Long.class).! ! ! functional().! If a key is functional, each vertex can ! ! makePropertyKey();! have at most one property for this key gremlin> g.makeType().name(“text”).dataType(String.class).! ! ! functional().makePropertyKey();! gremlin> g.makeType().name(“name”).dataType(String.class).! ! ! indexed().! ! ! unique().! ! ! functional().makePropertyKey();!
  • 41. Defining Property Keys gremlin> g.makeType().name(“time”).! ! ! dataType(Long.class).! ! ! functional().! ! ! makePropertyKey();! gremlin> g.makeType().name(“text”).dataType(String.class).! ! ! functional().makePropertyKey();! gremlin> g.makeType().name(“name”).dataType(String.class).! ! ! indexed().! Creates and maintains an index over property values ! ! unique().! Ensures that each property value is uniquely ! ! functional().makePropertyKey();! associated with only one vertex by acquiring a lock.
  • 42. Titan Indexing   Vertices can be retrieved by property key + value name : Hercules 5   Titan maintains index in a name : Jupiter 9 separate column family as graph is updated   Only need to define a property key as .index()
  • 43. Titan Locking   Locking ensures consistency when it is needed name : Hercules 5   Titan uses time stamped quorum reads and writes on 9 separate CFs for locking   Uses name : name : Jupiter Hercules   Property uniqueness: .unique() father   Functional edges: .functional()   Global ID management x name : father Pluto
  • 44. Defining Edge Labels gremlin> g.makeType().name(“follows”).! ! ! primaryKey(time).! ! ! makeEdgeLabel();! gremlin> g.makeType().name(“tweets”).! ! ! primaryKey(time).makeEdgeLabel();! gremlin> g.makeType().name(“stream).! ! ! primaryKey(time).! ! ! unidirected().! ! ! makeEdgeLabel();!
  • 45. Defining Edge Labels gremlin> g.makeType().name(“follows”).! ! ! primaryKey(time).! Sort/index key for edges of this label ! ! makeEdgeLabel();! gremlin> g.makeType().name(“tweets”).! ! ! primaryKey(time).makeEdgeLabel();! gremlin> g.makeType().name(“stream).! ! ! primaryKey(time).! ! ! unidirected().! ! ! makeEdgeLabel();!
  • 46. Defining Edge Labels gremlin> g.makeType().name(“follows”).! ! ! primaryKey(time).! ! ! makeEdgeLabel();! gremlin> g.makeType().name(“tweets”).! ! ! primaryKey(time).makeEdgeLabel();! gremlin> g.makeType().name(“stream).! ! ! primaryKey(time).! ! ! unidirected().! Store edges of this label only in outgoing direction ! ! makeEdgeLabel();!
  • 47. Vertex-Centric Indices   Sort and index edges per vertex by primary key   Primary key can be composite   Enables efficient focused traversals   Only retrieve edges that matter   Uses slice predicate for quick, index-driven retrieval
  • 48. tweets tweets tweets time: 123 time: 334 time: 624 v.query()! tweets v time: 1112 follows follows follows follows
  • 49. tweets tweets tweets time: 123 time: 334 time: 624 v.query()! tweets .direction(OUT)! v time: 1112 follows follows
  • 50. tweets tweets tweets time: 123 time: 334 time: 624 v.query()! tweets .direction(OUT)! v time: 1112 .labels(“tweets”)!
  • 51. v.query()! tweets .direction(OUT)! v time: 1112 .labels(“tweets”)! .has(“time”,T.gt,1000)!
  • 52. name: Hercules name: Pluto Create Accounts gremlin> hercules = g.addVertex(['name':'Hercules']);! gremlin> pluto = g.addVertex(['name':'Pluto']);!
  • 53. name: Hercules name: Pluto Add Followship follows time:2 gremlin> hercules = g.addVertex(['name':'Hercules']);! gremlin> pluto = g.addVertex(['name':'Pluto']);! gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);!
  • 54. name: Hercules name: Pluto Publish Tweet follows time:2 tweets time:4 text: A tweet! time: 4! gremlin> hercules = g.addVertex(['name':'Hercules']);! gremlin> pluto = g.addVertex(['name':'Pluto']);! gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);! gremlin> tweet = g.addVertex(['text':'A tweet!','time':4])! gremlin> g.addEdge(pluto,tweet,"tweets",['time':4]) !
  • 55. name: Hercules name: Pluto Update Streams follows time:2 stream tweets time:4 time:4 text: A tweet! time: 4! gremlin> hercules = g.addVertex(['name':'Hercules']);! gremlin> pluto = g.addVertex(['name':'Pluto']);! gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);! gremlin> tweet = g.addVertex(['text':'A tweet!','time':4])! gremlin> g.addEdge(pluto,tweet,"tweets",['time':4]) ! gremlin> pluto.in("follows").each{g.addEdge(it,tweet,"stream",['time':4])} !
  • 56. name: Hercules name: Pluto Read Stream follows time:2 stream tweets time:4 time:4 text: A tweet! time: 4! gremlin> hercules = g.addVertex(['name':'Hercules']);! gremlin> pluto = g.addVertex(['name':'Pluto']);! gremlin> g.addEdge(hercules,pluto,"follows",['time':2]);! gremlin> tweet = g.addVertex(['text':'A tweet!','time':4])! gremlin> g.addEdge(pluto,tweet,"tweets",['time':4]) ! gremlin> pluto.in("follows").each{g.addEdge(it,tweet,"stream",['time':4])} ! gremlin> hercules.outE('stream')[0..9].inV.map! Sorted by time because its ‘stream’s primary key
  • 57. Followship name: Hercules name: Pluto Recommendation follows time:2 follows follows time:9 name: Neptune follows = g.V('name',’Hercules’).out('follows').toList()! follows20 = follows[(0..19).collect{random.nextInt(follows.size)}]! m = [:]! follows20.each ! { it.outE('follows’[0..29].inV.except(follows).groupCount(m).iterate() }! m.sort{a,b -> b.value <=> a.value}[0..4]!
  • 58. IV Titan Performance Evaluation on Twitter-like Benchmark AURELIUS THINKAURELIUS.COM
  • 59. Twitter Benchmark   1.47 billion followship edges and 41.7 million users   Loaded into Titan using BatchGraph   Twitter in 2009, crawled by Kwak et. al   4 Transaction Types   Create Account (1%)   Publish tweet (15%)   Read stream (76%)   Recommendation (8%) Kwak, H., Lee, C., Park, H., Moon, S., “What is   Follow recommended user (30%) Twitter, a Social Network or a News Media?,” World Wide Web Conference, 2010.
  • 60. Benchmark Setup   6 cc1.4xl Cassandra nodes   in one placement group   Cassandra 1.10   40 m1.small worker machines   repeatedly running transactions   simulating servers handling user requests   EC2 cost: $11/hour
  • 61. Benchmark Results Transaction Type Number of tx Mean tx time Std of tx time Create account 379,019 115.15 ms 5.88 ms Publish tweet 7,580,995 18.45 ms 6.34 ms Read stream 37,936,184 6.29 ms 1.62 ms Recommendation 3,793,863 67.65 ms 13.89 ms Total 49,690,061 Runtime 2.3 hours 5,900 tx/sec
  • 62. Peak Load Results Transaction Type Number of tx Mean tx time Std of tx time Create account 374,860 172.74 ms 10.52 ms Publish tweet 7,517,667 70.07 ms 19.43 ms Read stream 37,618,648 24.40 ms 3.18 ms Recommendation 3,758,266 229.83 ms 29.08 ms Total 49,269,441 Runtime 1.3 hours 10,200 tx/sec
  • 63. Benchmark Conclusion Titan  can  handle  10s  of  thousands  of  concurrent  users   with   short   response   5mes   even   for   complex   traversals   on   a   simulated   social   networking   applica5on   based   on   real-­‐world   network   data   with   billions   of   edges   and   millions  of  users  in  a  standard  EC2  deployment.   For  more  informa5on  on  the  benchmark:   hDp://thinkaurelius.com/2012/08/06/5tan-­‐provides-­‐real-­‐5me-­‐big-­‐graph-­‐data/  
  • 64. Future Titan   Titan+Cassandra embedding   sending Gremlin queries into the cluster   Graph partitioning together with ByteOrderedPartitioner   data locality = better performance   Let us know what you need!
  • 65. Titan goes OLAP Map/Reduce Load & Compress Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 66. III Graph = Scalable + Practical