Unleash Your Potential - Namagunga Girls Coding Club
Cassandra data modeling talk
1. Building a Cassandra
Based Application
From 0 to Deploy
Patrick McFadin
Solution Architect at DataStax
Wednesday, November 7, 12
2. Me
• Solution Architect at DataStax, THE
Cassandra company
• Cassandra user since .7
• Follow me here: @PatrickMcFadin
Wednesday, November 7, 12
3. Goals
• Take a new application concept
• What is the data model??
• Express that in CQL 3
• Some sample code
Wednesday, November 7, 12
4. The Plan
• Conceptualize a new application
• Identify the entity tables
• Identify query tables
• Code. Rinse. Repeat.
• Deploy
Wednesday, November 7, 12
5. www.killrvideos.com
Start with a Video Title Username
concept Recommended
Description
Meow
Ads
by Google
Text
• Video
sharing Rating: Tags: Foo Bar
website
Upload New!
Comments
*Cat drawing by goodrob13 on Flickr
Wednesday, November 7, 12
6. Break down the
features
• Post a video
• View a video
• Add a comment
• Rate a video
• Tag a video
Wednesday, November 7, 12
8. Users
firstname lastname email password created_date
Username
• Similar to a RDBMS table. Fairly fixed columns
• Username is unique
• Use secondary indexes on firstname and lastname for lookup
• Adding columns with Cassandra is super easy
CREATE TABLE users (
username varchar,
firstname varchar,
lastname varchar,
email varchar,
password varchar,
created_date timestamp,
PRIMARY KEY (username)
);
Wednesday, November 7, 12
9. Users: The insert code
static void setUser(User user, Keyspace keyspace) {
// Create a mutator that allows you to talk to casssandra
Mutator<String> mutator = HFactory.createMutator(keyspace,
stringSerializer);
try {
// Use the mutator to insert data into our table
mutator.addInsertion(user.getUsername(), "users",
HFactory.createStringColumn("firstname", user.getFirstname()));
mutator.addInsertion(user.getUsername(), "users”,
HFactory.createStringColumn("lastname", user.getLastname()));
mutator.addInsertion(user.getUsername(), "users",
HFactory.createStringColumn("password", user.getPassword()));
// Once the mutator is ready, execute on cassandra
mutator.execute();
} catch (HectorException he) {
he.printStackTrace();
}
}
Wednesday, November 7, 12
10. Videos (one-to-many)
VideoId videoname username description tags upload_date
<UUID>
• Use a UUID as a row key for uniqueness
• Allows for same video names
• Tags should be stored in some sort of delimited format
• Index on username may not be the best plan
CREATE TABLE videos (
videoid uuid,
videoname varchar,
username varchar,
description varchar,
tags varchar,
upload_date timestamp,
PRIMARY KEY (videoid,videoname)
);
Wednesday, November 7, 12
11. Videos: The get code
static Video getVideoByUUID(UUID videoId, Keyspace keyspace){
Video video = new Video();
//Create a slice query. We'll be getting specific column names
SliceQuery<UUID, String, String> sliceQuery =
HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer);
sliceQuery.setColumnFamily("videos");
sliceQuery.setKey(videoId);
sliceQuery.setColumnNames("videoname","username","description","tags");
// Execute the query and get the list of columns
ColumnSlice<String,String> result = sliceQuery.execute().get();
// Get each column by name and add them to our video object
video.setVideoName(result.getColumnByName("videoname").getValue());
video.setUsername(result.getColumnByName("username").getValue());
video.setDescription(result.getColumnByName("description").getValue());
video.setTags(result.getColumnByName("tags").getValue().split(","));
return video;
}
Wednesday, November 7, 12
12. Comments (many-to-many)
VideoId username comment_ts comment
<UUID>
• Videos have many comments
• Comments have many users
• Order is as inserted
• Use getSlice() to pull some or all of the comments
CREATE TABLE comments (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username,comment_ts)
);
Wednesday, November 7, 12
13. Comments... pt 2
VideoId username:comment_ts .. username:comment_ts
<UUID> comment .. comment
Wide row
Time ordered
• This is what’s really going on
• VideoID is the key
• Composite of username and comment_ts
are the column name
• 1 column per comment
Wednesday, November 7, 12
14. Ratings
rating_count rating_total
VideoId
<UUID> <counter> <counter>
• Use counter for single call update
• rating_count is how many ratings were given
• rating_total is the sum of rating
• Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6
CREATE TABLE video_rating (
videoid uuid,
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
Wednesday, November 7, 12
15. Video Events
start_<timestamp> stop_<timestamp> start_<timestamp>
VideoId:Username
video_<timestamp>
Latest .. Oldest
• Track viewing events
• Combine Video ID and Username for a unique row
• Stop time can be used to pick up where they left off
• Great for usage analytics later
• Reverse comparator!
CREATE TABLE video_event (
videoid_username varchar,
event varchar,
event_timestamp timestamp,
video_timestamp bigint,
PRIMARY KEY (videoid_username, event_timestamp, event)
) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC);
Wednesday, November 7, 12
16. Create Query Tables
Indexes to support fast lookups
Wednesday, November 7, 12
18. Index table principles
Col3 Col4 Col5 Col6
GetSlice6Col37Col6
Sequential Read
RowKey5 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
• Get row by the key
• Slice. Get data in one pass
• Cached (sometimes)
Wednesday, November 7, 12
19. Video by Username
VideoId:<timestamp> .. VideoId:<timestamp>
Username
Wide row
• Username is unique
• One column for each new video uploaded
• Column slice for time span. From x to y
• VideoId is added the same time a Video record is added
CREATE TABLE username_video_index (
username varchar,
videoid uuid,
upload_date timestamp,
video_name varchar,
PRIMARY KEY (username, videoid, upload_date)
);
Wednesday, November 7, 12
20. Video by Tag
VideoId .. VideoId
tag
timestamp timestamp
• Tag is unique regardless of video
• Great for “List videos with X tag”
• Tags have to be updated in Video and Tag at the same time
• Index integrity is maintained in app logic
CREATE TABLE tag_index (
tag varchar,
videoid varchar,
timestamp timestamp,
PRIMARY KEY (tag, videoid)
);
Wednesday, November 7, 12