SlideShare a Scribd company logo
1 of 27
Download to read offline
Building a Cassandra
                         Based Application
                                  From 0 to Deploy

                                   Patrick McFadin
                            Solution Architect at DataStax




Wednesday, November 7, 12
Me

                   • Solution Architect at DataStax, THE
                            Cassandra company
                   • Cassandra user since .7
                   • Follow me here: @PatrickMcFadin


Wednesday, November 7, 12
Goals

                   • Take a new application concept
                   • What is the data model??
                   • Express that in CQL 3
                   • Some sample code

Wednesday, November 7, 12
The Plan

                   • Conceptualize a new application
                   • Identify the entity tables
                   • Identify query tables
                   • Code. Rinse. Repeat.
                   • Deploy

Wednesday, November 7, 12
www.killrvideos.com

        Start with a                                           Video Title                       Username


          concept                               Recommended
                                                                             Description




                                                                    Meow
                                                                                                               Ads
                                                                                                            by Google
                                                              Text
       • Video
               sharing                                         Rating:                  Tags: Foo Bar
               website
                                                                                                            Upload New!
                                                                             Comments




                *Cat drawing by goodrob13 on Flickr


Wednesday, November 7, 12
Break down the
                               features
                   • Post a video
                   • View a video
                   • Add a comment
                   • Rate a video
                   • Tag a video

Wednesday, November 7, 12
Create Entity Tables


                                 Basic storage unit




Wednesday, November 7, 12
Users
                            firstname   lastname   email      password     created_date
            Username




    • Similar to a RDBMS table. Fairly fixed columns
    • Username is unique
    • Use secondary indexes on firstname and lastname for lookup
    • Adding columns with Cassandra is super easy
                                                           CREATE TABLE users (
                                                              username varchar,
                                                              firstname varchar,
                                                              lastname varchar,
                                                              email varchar,
                                                              password varchar,
                                                              created_date timestamp,
                                                              PRIMARY KEY (username)
                                                           );

Wednesday, November 7, 12
Users: The insert code
            static void setUser(User user, Keyspace keyspace) {

               // Create a mutator that allows you to talk to casssandra
               Mutator<String> mutator = HFactory.createMutator(keyspace,
            stringSerializer);

                  try {

                       // Use the mutator to insert data into our table
                       mutator.addInsertion(user.getUsername(), "users",
                          HFactory.createStringColumn("firstname", user.getFirstname()));
                       mutator.addInsertion(user.getUsername(), "users”,
                          HFactory.createStringColumn("lastname", user.getLastname()));
                       mutator.addInsertion(user.getUsername(), "users",
                          HFactory.createStringColumn("password", user.getPassword()));

                       // Once the mutator is ready, execute on cassandra
                       mutator.execute();

                  } catch (HectorException he) {
                     he.printStackTrace();
                  }
            }



Wednesday, November 7, 12
Videos         (one-to-many)

           VideoId          videoname     username   description        tags        upload_date
            <UUID>




      • Use a UUID as a row key for uniqueness
      • Allows for same video names
      • Tags should be stored in some sort of delimited format
      • Index on username may not be the best plan
                                                             CREATE TABLE videos (
                                                                videoid uuid,
                                                                videoname varchar,
                                                                username varchar,
                                                                description varchar,
                                                                tags varchar,
                                                                upload_date timestamp,
                                                                PRIMARY KEY (videoid,videoname)
                                                             );


Wednesday, November 7, 12
Videos: The get code
     static Video getVideoByUUID(UUID videoId, Keyspace keyspace){

         Video video = new Video();

         //Create a slice query. We'll be getting specific column names
         SliceQuery<UUID, String, String> sliceQuery =
            HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer);

         sliceQuery.setColumnFamily("videos");
         sliceQuery.setKey(videoId);

         sliceQuery.setColumnNames("videoname","username","description","tags");

         // Execute the query and get the list of columns
         ColumnSlice<String,String> result = sliceQuery.execute().get();

         // Get each column by name and add them to our video object
         video.setVideoName(result.getColumnByName("videoname").getValue());
         video.setUsername(result.getColumnByName("username").getValue());
         video.setDescription(result.getColumnByName("description").getValue());
         video.setTags(result.getColumnByName("tags").getValue().split(","));

         return video;
     }




Wednesday, November 7, 12
Comments                          (many-to-many)


             VideoId        username   comment_ts         comment
              <UUID>




     • Videos have many comments
     • Comments have many users
     • Order is as inserted
     • Use getSlice() to pull some or all of the comments
                                                    CREATE TABLE comments (
                                                       videoid uuid,
                                                       username varchar,
                                                       comment_ts timestamp,
                                                       comment varchar,
                                                       PRIMARY KEY (videoid,username,comment_ts)
                                                    );




Wednesday, November 7, 12
Comments... pt 2
                            VideoId   username:comment_ts   ..          username:comment_ts
                             <UUID>        comment          ..               comment


                                                                   Wide row
                                                                 Time ordered

            •       This is what’s really going on

            •       VideoID is the key

            •       Composite of username and comment_ts
                    are the column name

            •       1 column per comment


Wednesday, November 7, 12
Ratings
                                        rating_count    rating_total
                            VideoId
                             <UUID>      <counter>        <counter>




         • Use counter for single call update
         • rating_count is how many ratings were given
         • rating_total is the sum of rating
         • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6
                                                       CREATE TABLE video_rating (
                                                          videoid uuid,
                                                          rating_counter counter,
                                                          rating_total counter,
                                                          PRIMARY KEY (videoid)
                                                       );




Wednesday, November 7, 12
Video Events
                                               start_<timestamp>   stop_<timestamp>    start_<timestamp>
                            VideoId:Username
                                                                   video_<timestamp>


                                                Latest .. Oldest


      • Track viewing events
      • Combine Video ID and Username for a unique row
      • Stop time can be used to pick up where they left off
      • Great for usage analytics later
      • Reverse comparator!


                        CREATE TABLE video_event (
                          videoid_username varchar,
                          event varchar,
                          event_timestamp timestamp,
                          video_timestamp bigint,
                          PRIMARY KEY (videoid_username, event_timestamp, event)
                        ) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC);



Wednesday, November 7, 12
Create Query Tables
                            Indexes to support fast lookups




Wednesday, November 7, 12
Index table principles
                        Lookup5RowKey5


                                         RowKey1



                                                    • Lookup by rowkey
                                         RowKey2
                                         RowKey3
                                          RowKey4
                                         RowKey5

                                         RowKey6    • Indexed
                                                    • Cached (most times)
                                         RowKey7
                                         RowKey8
                                         RowKey9
                                         RowKey10
                                         RowKey11
                                         RowKey12




Wednesday, November 7, 12
Index table principles
                                         Col3      Col4      Col5       Col6




           GetSlice6Col37Col6



                                                              Sequential Read

                     RowKey5      Col1      Col2      Col3      Col4        Col5   Col6   Col7   Col8




                   • Get row by the key
                   • Slice. Get data in one pass
                   • Cached (sometimes)
Wednesday, November 7, 12
Video by Username
                                       VideoId:<timestamp>      ..         VideoId:<timestamp>
                            Username



                                                                         Wide row

  • Username is unique
  • One column for each new video uploaded
  • Column slice for time span. From x to y
  • VideoId is added the same time a Video record is added
                                                CREATE TABLE username_video_index (
                                                   username varchar,
                                                   videoid uuid,
                                                   upload_date timestamp,
                                                   video_name varchar,
                                                   PRIMARY KEY (username, videoid, upload_date)
                                                );




Wednesday, November 7, 12
Video by Tag
                                    VideoId     ..              VideoId
                            tag
                                    timestamp                  timestamp




         • Tag is unique regardless of video
         • Great for “List videos with X tag”
         • Tags have to be updated in Video and Tag at the same time
         • Index integrity is maintained in app logic
                                                     CREATE TABLE tag_index (
                                                        tag varchar,
                                                        videoid varchar,
                                                        timestamp timestamp,
                                                        PRIMARY KEY (tag, videoid)
                                                     );




Wednesday, November 7, 12
Deployment
             • Replication factor?
             • Multi-datacenter?
             • Cost?



Wednesday, November 7, 12
Deployment

                   • Today != tomorrow
                   • Scale when needed
                   • Have expansion plan ready


Wednesday, November 7, 12
DataStax Enterprise
          • Analytics - Hadoop
          • Search - Solr




Wednesday, November 7, 12
Hadoop
       • Embedded with Cassandra
       • No single point of failure
       • Use native c* data
       • Hive, Pig, Mahout



Wednesday, November 7, 12
Solr
         • Embeded with Cassandra
         • Fast reverse-index
         • Shards Solr by key range




Wednesday, November 7, 12
OpsCenter




Wednesday, November 7, 12
Thank you!

                            Connect with me at @PatrickMcFadin
                                       Or linkedIn




Wednesday, November 7, 12

More Related Content

What's hot

Java FX 2.0 - A Developer's Guide
Java FX 2.0 - A Developer's GuideJava FX 2.0 - A Developer's Guide
Java FX 2.0 - A Developer's GuideStephen Chin
 
What istheservicestack
What istheservicestackWhat istheservicestack
What istheservicestackDemis Bellot
 
What is the ServiceStack?
What is the ServiceStack?What is the ServiceStack?
What is the ServiceStack?Demis Bellot
 
Dom API In Java Script
Dom API In Java ScriptDom API In Java Script
Dom API In Java ScriptRajat Pandit
 
Node.js in action
Node.js in actionNode.js in action
Node.js in actionSimon Su
 
Groovy, Transforming Language
Groovy, Transforming LanguageGroovy, Transforming Language
Groovy, Transforming LanguageUehara Junji
 
Security and performance designs for client-server communications
Security and performance designs for client-server communicationsSecurity and performance designs for client-server communications
Security and performance designs for client-server communicationsWO Community
 
Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureEduardo Castro
 
Standing on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRubyStanding on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRubyTheo Hultberg
 
CapitalCamp Features
CapitalCamp FeaturesCapitalCamp Features
CapitalCamp FeaturesPhase2
 
A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)Nati Shalom
 

What's hot (15)

Java FX 2.0 - A Developer's Guide
Java FX 2.0 - A Developer's GuideJava FX 2.0 - A Developer's Guide
Java FX 2.0 - A Developer's Guide
 
hibernate
hibernatehibernate
hibernate
 
What istheservicestack
What istheservicestackWhat istheservicestack
What istheservicestack
 
What is the ServiceStack?
What is the ServiceStack?What is the ServiceStack?
What is the ServiceStack?
 
Dom API In Java Script
Dom API In Java ScriptDom API In Java Script
Dom API In Java Script
 
Node.js in action
Node.js in actionNode.js in action
Node.js in action
 
Groovy, Transforming Language
Groovy, Transforming LanguageGroovy, Transforming Language
Groovy, Transforming Language
 
Security and performance designs for client-server communications
Security and performance designs for client-server communicationsSecurity and performance designs for client-server communications
Security and performance designs for client-server communications
 
Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage Azure
 
Brubeck
BrubeckBrubeck
Brubeck
 
Nio
NioNio
Nio
 
Phactory
PhactoryPhactory
Phactory
 
Standing on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRubyStanding on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRuby
 
CapitalCamp Features
CapitalCamp FeaturesCapitalCamp Features
CapitalCamp Features
 
A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)
 

Viewers also liked

Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraPatrick McFadin
 
Toronto jaspersoft meetup
Toronto jaspersoft meetupToronto jaspersoft meetup
Toronto jaspersoft meetupPatrick McFadin
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, strongerPatrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureDataStax
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 

Viewers also liked (16)

Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Toronto jaspersoft meetup
Toronto jaspersoft meetupToronto jaspersoft meetup
Toronto jaspersoft meetup
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Become a super modeler
Become a super modelerBecome a super modeler
Become a super modeler
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 

Similar to Cassandra data modeling talk

Build and Deploy Sites Using Features
Build and Deploy Sites Using Features Build and Deploy Sites Using Features
Build and Deploy Sites Using Features Phase2
 
Building Killr Applications with DataStax Enterprise
Building Killr Applications with  DataStax EnterpriseBuilding Killr Applications with  DataStax Enterprise
Building Killr Applications with DataStax EnterpriseDataStax
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSEDataStax
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012robosonia Mar
 
Building Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning Talks
Building Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning TalksBuilding Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning Talks
Building Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning TalksAtlassian
 
C fowler azure-dojo
C fowler azure-dojoC fowler azure-dojo
C fowler azure-dojosdeconf
 
Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeDavid Boike
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStackPuppet
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStackke4qqq
 
Groovy - Grails as a modern scripting language for Web applications
Groovy - Grails as a modern scripting language for Web applicationsGroovy - Grails as a modern scripting language for Web applications
Groovy - Grails as a modern scripting language for Web applicationsIndicThreads
 
Puppetpreso
PuppetpresoPuppetpreso
Puppetpresoke4qqq
 
Atlassian Groovy Plugins
Atlassian Groovy PluginsAtlassian Groovy Plugins
Atlassian Groovy PluginsPaul King
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesBram Vogelaar
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS Chicago
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)Codemotion
 
Introduction to YouDebug - Scriptable Java Debugger
Introduction to YouDebug - Scriptable Java DebuggerIntroduction to YouDebug - Scriptable Java Debugger
Introduction to YouDebug - Scriptable Java DebuggerWolfgang Schell
 
Road to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoopsRoad to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoopsGianluca Varisco
 

Similar to Cassandra data modeling talk (20)

Build and Deploy Sites Using Features
Build and Deploy Sites Using Features Build and Deploy Sites Using Features
Build and Deploy Sites Using Features
 
Building Killr Applications with DataStax Enterprise
Building Killr Applications with  DataStax EnterpriseBuilding Killr Applications with  DataStax Enterprise
Building Killr Applications with DataStax Enterprise
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012
 
Building Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning Talks
Building Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning TalksBuilding Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning Talks
Building Atlassian Plugins with Groovy - Atlassian Summit 2010 - Lightning Talks
 
C fowler azure-dojo
C fowler azure-dojoC fowler azure-dojo
C fowler azure-dojo
 
Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught Me
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
 
Groovy - Grails as a modern scripting language for Web applications
Groovy - Grails as a modern scripting language for Web applicationsGroovy - Grails as a modern scripting language for Web applications
Groovy - Grails as a modern scripting language for Web applications
 
Puppetpreso
PuppetpresoPuppetpreso
Puppetpreso
 
Atlassian Groovy Plugins
Atlassian Groovy PluginsAtlassian Groovy Plugins
Atlassian Groovy Plugins
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet Profiles
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
 
Introduction to YouDebug - Scriptable Java Debugger
Introduction to YouDebug - Scriptable Java DebuggerIntroduction to YouDebug - Scriptable Java Debugger
Introduction to YouDebug - Scriptable Java Debugger
 
Road to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoopsRoad to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoops
 
Dbdeployer
DbdeployerDbdeployer
Dbdeployer
 

More from Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guidePatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 

More from Patrick McFadin (17)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Cassandra data modeling talk

  • 1. Building a Cassandra Based Application From 0 to Deploy Patrick McFadin Solution Architect at DataStax Wednesday, November 7, 12
  • 2. Me • Solution Architect at DataStax, THE Cassandra company • Cassandra user since .7 • Follow me here: @PatrickMcFadin Wednesday, November 7, 12
  • 3. Goals • Take a new application concept • What is the data model?? • Express that in CQL 3 • Some sample code Wednesday, November 7, 12
  • 4. The Plan • Conceptualize a new application • Identify the entity tables • Identify query tables • Code. Rinse. Repeat. • Deploy Wednesday, November 7, 12
  • 5. www.killrvideos.com Start with a Video Title Username concept Recommended Description Meow Ads by Google Text • Video sharing Rating: Tags: Foo Bar website Upload New! Comments *Cat drawing by goodrob13 on Flickr Wednesday, November 7, 12
  • 6. Break down the features • Post a video • View a video • Add a comment • Rate a video • Tag a video Wednesday, November 7, 12
  • 7. Create Entity Tables Basic storage unit Wednesday, November 7, 12
  • 8. Users firstname lastname email password created_date Username • Similar to a RDBMS table. Fairly fixed columns • Username is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) ); Wednesday, November 7, 12
  • 9. Users: The insert code static void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); try { // Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword())); // Once the mutator is ready, execute on cassandra mutator.execute(); } catch (HectorException he) { he.printStackTrace(); } } Wednesday, November 7, 12
  • 10. Videos (one-to-many) VideoId videoname username description tags upload_date <UUID> • Use a UUID as a row key for uniqueness • Allows for same video names • Tags should be stored in some sort of delimited format • Index on username may not be the best plan CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) ); Wednesday, November 7, 12
  • 11. Videos: The get code static Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. We'll be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags"); // Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video; } Wednesday, November 7, 12
  • 12. Comments (many-to-many) VideoId username comment_ts comment <UUID> • Videos have many comments • Comments have many users • Order is as inserted • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts) ); Wednesday, November 7, 12
  • 13. Comments... pt 2 VideoId username:comment_ts .. username:comment_ts <UUID> comment .. comment Wide row Time ordered • This is what’s really going on • VideoID is the key • Composite of username and comment_ts are the column name • 1 column per comment Wednesday, November 7, 12
  • 14. Ratings rating_count rating_total VideoId <UUID> <counter> <counter> • Use counter for single call update • rating_count is how many ratings were given • rating_total is the sum of rating • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6 CREATE TABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); Wednesday, November 7, 12
  • 15. Video Events start_<timestamp> stop_<timestamp> start_<timestamp> VideoId:Username video_<timestamp> Latest .. Oldest • Track viewing events • Combine Video ID and Username for a unique row • Stop time can be used to pick up where they left off • Great for usage analytics later • Reverse comparator! CREATE TABLE video_event ( videoid_username varchar, event varchar, event_timestamp timestamp, video_timestamp bigint, PRIMARY KEY (videoid_username, event_timestamp, event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC); Wednesday, November 7, 12
  • 16. Create Query Tables Indexes to support fast lookups Wednesday, November 7, 12
  • 17. Index table principles Lookup5RowKey5 RowKey1 • Lookup by rowkey RowKey2 RowKey3 RowKey4 RowKey5 RowKey6 • Indexed • Cached (most times) RowKey7 RowKey8 RowKey9 RowKey10 RowKey11 RowKey12 Wednesday, November 7, 12
  • 18. Index table principles Col3 Col4 Col5 Col6 GetSlice6Col37Col6 Sequential Read RowKey5 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 • Get row by the key • Slice. Get data in one pass • Cached (sometimes) Wednesday, November 7, 12
  • 19. Video by Username VideoId:<timestamp> .. VideoId:<timestamp> Username Wide row • Username is unique • One column for each new video uploaded • Column slice for time span. From x to y • VideoId is added the same time a Video record is added CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid, upload_date) ); Wednesday, November 7, 12
  • 20. Video by Tag VideoId .. VideoId tag timestamp timestamp • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid) ); Wednesday, November 7, 12
  • 21. Deployment • Replication factor? • Multi-datacenter? • Cost? Wednesday, November 7, 12
  • 22. Deployment • Today != tomorrow • Scale when needed • Have expansion plan ready Wednesday, November 7, 12
  • 23. DataStax Enterprise • Analytics - Hadoop • Search - Solr Wednesday, November 7, 12
  • 24. Hadoop • Embedded with Cassandra • No single point of failure • Use native c* data • Hive, Pig, Mahout Wednesday, November 7, 12
  • 25. Solr • Embeded with Cassandra • Fast reverse-index • Shards Solr by key range Wednesday, November 7, 12
  • 27. Thank you! Connect with me at @PatrickMcFadin Or linkedIn Wednesday, November 7, 12