SlideShare a Scribd company logo
1 of 47
Download to read offline
Launching PS4 with 
Cassandra
Introduction 
• Alexander Filipchik 
– Staff Software Engineer at SNEI 
• Dustin Pham 
– Staff Software Engineer at SNEI
Agenda 
• Journey towards Cassandra 
• Cassandra-backed PS4 Features 
• Ops-y Stuff 
• Lessons learned
Journey towards 
Cassandra
Challenges 
• Small Team 
• Legacy Support 
• Hardware Deadline 
• Scaling @ Peak Time
Why Cassandra 
• Strong community 
• Horizontally scalable architecture 
• Good performance 
• Cost effective 
• New adventure J 
6
PS4 Features 
backed by 
Cassandra
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• +more
Cassandra-backed PS4 features 
• What’s New 
• Video Library 
• My Library 
• PS Now 
• Notifications 
• LiveArea 
• Store catalog 
• Pre-order 
• PS Plus 
• Recommendations 
• Remote Download 
• Share 
• Authentication 
• + more
Ops-y Stuff
Infrastructure 
• Hosted in cloud and physical DCs 
• Several hundred nodes and growing 
• Cluster by feature 
• Vnodes and Assigned token clusters 
• Astyanax Client
Stats for PS4 cloud nodes 
• Data throughput: Gigabytes / sec 
• Cassandra read/writes: > 200,000 / sec 
• Data size: tens of terabytes 
• 10M PS4 and 80M PS3 sold 
24
Clusters 
• Cluster per Read/Write pattern initially 
• Now use cluster per feature 
• Seeds referenced by DNS names 
• Size Tiered compaction 
• Manual compactions for some CFs 
25
A typical node 
• m2.4xl + i2.2xl 
• 2 ephemeral disks (~ 2 x 800 GB) 
• Commit log on root partition 
• Topology managed in the topology file 
managed by chef 
26
AWS 
• Nodes are 
interleaved 
between AZs 
– Replication factor 
spreads data 
across AZ’s 
– Minimizes 
downtime due to 
AZ outage 
Availability Zone A Availability Zone C
Eph1 
Disk Layout 
Eph0 
Pre-Launch Launch Current 
ü 2 Ephemerals in a RAID 0 
ü Higher throughput (io 
spreads into 2 devices for 
reading & writing) 
ü If you lose 1 device, you 
loose the array ! 
ü 2 Ephemerals in a RAID 1 
ü Higher throughput for 
reading (io spreads into 2 
devices), but not for 
writing 
ü If you lose 1 device, the 
array continues up in 
degraded mode. 
ü ½ the available space 
Eph0 
ü 2 individual Ephemerals 
ü Higher throughput (io 
spreads into 2 devices for 
reading & writing) 
ü You lose 1 device, 
Cassandra stops 
(configurable) 
ü No RAID overhead 
Eph0 
AWS m2.4xl 
RAID 0 
Eph1 
AWS m2.4xl 
RAID 1 
Eph1 
AWS m2.4xl
Cluster Resizing
Thrift Payload Size 
thri%_framed_transport_size_in_mb 
thri%_max_message_length_in_mb
Bouncing Nodes 
phi_convict_threshold
Inter-DC Latency
Monitor system health 
• Nagios 
• Kibana/Elasticsearch 
• Graphite 
• AWS Cloudwatch 
• App level monitoring 
• Opscenter
App level metrics
Lessons Learned
Fun with Astyanax Client 
• Cross DC Latencies 
– Several second latencies in JP and EE data 
centers 
– Astyanax configs to ensure local datacenters 
used 
• Imbalanced node traffic 
– Hashing algorithm (MD5 vs Murmur3) 
• DNS Caching in the JVM 
– Stale seed nodes
A tale of 2 Nodes
Cluster lessons 
• A single bad node can raise app 
latencies significantly 
• Taking out an entire cassandra cluster is 
easy (not so fun) 
– Compressing data before sending to 
cassandra helps a lot. 
• Corrupted SStable resulted in 
cascading failure
• Monitoring 
– Memtable flush frequency 
– Hinted handoffs 
– Garbage collection 
– Compactions 
– Histograms
• VPNs are a dangerous 
bottle neck 
• Easier to rebuild a node 
than to fix 
• Backup data 
– Replication factor helps 
but does not account for 
data corruption
• Denormalization costs 
• Disk is cheap but EC2s are 
not 
• TTL on almost everything 
• Adjust gc_grace_period 
based off TTL times 
• Transactions ? Be creative 
• Load test with real data
• Replication strategy: 
– Read / Write pattern 
– Data is source of truth or not 
– Data locality 
– User Level data vs App level 
data 
• Cluster wide commands 
should be staggered 
– Global repair L
Tokens 
• Vnodes vs Assigned Tokens 
– Increased chattiness on gossip protocol 
with vnodes 
– Perceived slowness on repair and cleanup 
operations on vnodes enabled cluster 
– Astyanax client does not like vnodes…
Compactions 
• Compactions are your worst enemy 
– larger disk usage = high cpu & longer 
compactions 
• Leveled compaction vs sized compaction 
– Start up time 
– Cpu tradeoff 
– IO tradeoff 
• Updates + Removals eat up disks
We are hiring… 
sonyentertainmentnetwork.com/careers

More Related Content

What's hot

From legacy, to batch, to near real-time
From legacy, to batch, to near real-timeFrom legacy, to batch, to near real-time
From legacy, to batch, to near real-time
Marc Sturlese
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
DataWorks Summit
 

What's hot (20)

How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
GNW03: Stream Processing with Apache Kafka by Gwen ShapiraGNW03: Stream Processing with Apache Kafka by Gwen Shapira
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
From legacy, to batch, to near real-time
From legacy, to batch, to near real-timeFrom legacy, to batch, to near real-time
From legacy, to batch, to near real-time
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 

Viewers also liked

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 

Viewers also liked (16)

Netflix Operational Simplicity with Apache Cassandra
Netflix Operational Simplicity with Apache CassandraNetflix Operational Simplicity with Apache Cassandra
Netflix Operational Simplicity with Apache Cassandra
 
รู้สิ่งใดไม่สู้...รู้งี้....
รู้สิ่งใดไม่สู้...รู้งี้....รู้สิ่งใดไม่สู้...รู้งี้....
รู้สิ่งใดไม่สู้...รู้งี้....
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayCassandra Summit 2014: Apache Cassandra Best Practices at Ebay
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challenge
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it Open
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 

Similar to Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
Howard Marks
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overview
scrazzl
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
NGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDANGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDA
UniFabric
 

Similar to Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra (20)

Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)
 
Is Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon ToigoIs Disk Now a Viable Solution for Archive - Jon Toigo
Is Disk Now a Viable Solution for Archive - Jon Toigo
 
Qts 4.2 presentation
Qts 4.2 presentationQts 4.2 presentation
Qts 4.2 presentation
 
Technological Innovations for Home Entertainment & Video Storage
 Technological Innovations for Home Entertainment & Video Storage Technological Innovations for Home Entertainment & Video Storage
Technological Innovations for Home Entertainment & Video Storage
 
Meeting the Challenges of Archival Storage
Meeting the Challenges of Archival StorageMeeting the Challenges of Archival Storage
Meeting the Challenges of Archival Storage
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub
 
Presentazione VMware @ VMUGIT UserCon 2015
Presentazione VMware @ VMUGIT UserCon 2015Presentazione VMware @ VMUGIT UserCon 2015
Presentazione VMware @ VMUGIT UserCon 2015
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overview
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018
 
QNAP - Event v1.4
QNAP - Event v1.4QNAP - Event v1.4
QNAP - Event v1.4
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
High Performance WordPress II
High Performance WordPress IIHigh Performance WordPress II
High Performance WordPress II
 
NGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDANGENSTOR_ODA_HPDA
NGENSTOR_ODA_HPDA
 
QNAP NAS Presentation The New Era of NAS
QNAP NAS Presentation The New Era of NAS QNAP NAS Presentation The New Era of NAS
QNAP NAS Presentation The New Era of NAS
 

More from DataStax Academy

Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

  • 1. Launching PS4 with Cassandra
  • 2. Introduction • Alexander Filipchik – Staff Software Engineer at SNEI • Dustin Pham – Staff Software Engineer at SNEI
  • 3. Agenda • Journey towards Cassandra • Cassandra-backed PS4 Features • Ops-y Stuff • Lessons learned
  • 5. Challenges • Small Team • Legacy Support • Hardware Deadline • Scaling @ Peak Time
  • 6. Why Cassandra • Strong community • Horizontally scalable architecture • Good performance • Cost effective • New adventure J 6
  • 7. PS4 Features backed by Cassandra
  • 8. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 9. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 10. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 11. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 12. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 13. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 14. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 15. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 16. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 17. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 18. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 19. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 20. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • +more
  • 21. Cassandra-backed PS4 features • What’s New • Video Library • My Library • PS Now • Notifications • LiveArea • Store catalog • Pre-order • PS Plus • Recommendations • Remote Download • Share • Authentication • + more
  • 23. Infrastructure • Hosted in cloud and physical DCs • Several hundred nodes and growing • Cluster by feature • Vnodes and Assigned token clusters • Astyanax Client
  • 24. Stats for PS4 cloud nodes • Data throughput: Gigabytes / sec • Cassandra read/writes: > 200,000 / sec • Data size: tens of terabytes • 10M PS4 and 80M PS3 sold 24
  • 25. Clusters • Cluster per Read/Write pattern initially • Now use cluster per feature • Seeds referenced by DNS names • Size Tiered compaction • Manual compactions for some CFs 25
  • 26. A typical node • m2.4xl + i2.2xl • 2 ephemeral disks (~ 2 x 800 GB) • Commit log on root partition • Topology managed in the topology file managed by chef 26
  • 27. AWS • Nodes are interleaved between AZs – Replication factor spreads data across AZ’s – Minimizes downtime due to AZ outage Availability Zone A Availability Zone C
  • 28. Eph1 Disk Layout Eph0 Pre-Launch Launch Current ü 2 Ephemerals in a RAID 0 ü Higher throughput (io spreads into 2 devices for reading & writing) ü If you lose 1 device, you loose the array ! ü 2 Ephemerals in a RAID 1 ü Higher throughput for reading (io spreads into 2 devices), but not for writing ü If you lose 1 device, the array continues up in degraded mode. ü ½ the available space Eph0 ü 2 individual Ephemerals ü Higher throughput (io spreads into 2 devices for reading & writing) ü You lose 1 device, Cassandra stops (configurable) ü No RAID overhead Eph0 AWS m2.4xl RAID 0 Eph1 AWS m2.4xl RAID 1 Eph1 AWS m2.4xl
  • 30. Thrift Payload Size thri%_framed_transport_size_in_mb thri%_max_message_length_in_mb
  • 33. Monitor system health • Nagios • Kibana/Elasticsearch • Graphite • AWS Cloudwatch • App level monitoring • Opscenter
  • 34.
  • 37. Fun with Astyanax Client • Cross DC Latencies – Several second latencies in JP and EE data centers – Astyanax configs to ensure local datacenters used • Imbalanced node traffic – Hashing algorithm (MD5 vs Murmur3) • DNS Caching in the JVM – Stale seed nodes
  • 38. A tale of 2 Nodes
  • 39. Cluster lessons • A single bad node can raise app latencies significantly • Taking out an entire cassandra cluster is easy (not so fun) – Compressing data before sending to cassandra helps a lot. • Corrupted SStable resulted in cascading failure
  • 40.
  • 41. • Monitoring – Memtable flush frequency – Hinted handoffs – Garbage collection – Compactions – Histograms
  • 42. • VPNs are a dangerous bottle neck • Easier to rebuild a node than to fix • Backup data – Replication factor helps but does not account for data corruption
  • 43. • Denormalization costs • Disk is cheap but EC2s are not • TTL on almost everything • Adjust gc_grace_period based off TTL times • Transactions ? Be creative • Load test with real data
  • 44. • Replication strategy: – Read / Write pattern – Data is source of truth or not – Data locality – User Level data vs App level data • Cluster wide commands should be staggered – Global repair L
  • 45. Tokens • Vnodes vs Assigned Tokens – Increased chattiness on gossip protocol with vnodes – Perceived slowness on repair and cleanup operations on vnodes enabled cluster – Astyanax client does not like vnodes…
  • 46. Compactions • Compactions are your worst enemy – larger disk usage = high cpu & longer compactions • Leveled compaction vs sized compaction – Start up time – Cpu tradeoff – IO tradeoff • Updates + Removals eat up disks
  • 47. We are hiring… sonyentertainmentnetwork.com/careers