SlideShare a Scribd company logo
1 of 30
Agenda
 What do we mean by synergy?
 Storm

 Shark / Spark
 Redis
 ElasticSearch
 Hadoop
What do we mean by
Synergy?
 synergy
 1. The interaction of two or more agents or forces so
that their combined effect is greater than the sum of
their individual effects.
What do we mean by
Synergy?
 Cassandra excellent for:





Fast read or write performance
Scalable, runs on commodity hardware
Reliable cross-DC replication
Robust persistence for high volume data

 Needs some special sauce for:
 Real-time calculations for high volume streams
 Complex search functions (free-text etc.)
 Map Reduce on RDDs
Twitter Storm
Storm
 Open Sourced by Twitter in 2011
 Distributed event processor
 Operates on Resilient Distributed Data Sets
 Getting started in Apache Incubator
 Can persist to and read from from C*

 Great for high volume, real time (complex) calculations on
streamed data
Storm
 Is a CEP architecture
 Spout – Collects & submits tuples for processing
 Bolt – processes tuples and emits new tuples
 Tuple – a collection of data passed in storm
 Stream – identifies outputs from a spout / bolt and enforces
tuple structure

 Uses Zookeeper and ZeroMQ for coordination and message
passing respectively
Example Topology
Synergy?
 Can use Cassandra as the input data source
 Can write tuples into Cassandra

 Example project here…
 https://github.com/tjake/stormscraper/

 See CassandraWriterBolt.java for simple example of a Java
Driver CQL based bolt that writes to Cassandra.
 Good as an example application, but not production ready
Use Case
 Top N words for popularity tracking
 Input: a constant stream of messages into the system
 Count occurrences of each word in a message
 Store raw messages in Cassandra
 Use a bolt to break up messages and maintain sorted list of
top N words

 Persist the Top N words and their counts periodically in
Cassandra
Use Case
CREATE TABLE messages (date_hour TIMESTAMP,
message_id TIMEUUID, message VARCHAR,
PRIMARY KEY(date_hour, message_id));
CREATE TABLE top_words (date_hour TIMESTAMP,
position INTEGER, word VARCHAR, PRIMARY
KEY(date_hour, position));
Use Case
 https://github.com/nathanmarz/storm-starter/
 Use RollingTopWords.java as base
 Integrate CassandraWriterBolt into use case
 Add spout for input messages
 Add bolt for persisting messages & writing Top N words
 Reference : http://www.michaelnoll.com/blog/2013/01/18/implementing-real-timetrending-topics-in-storm/
Storm: Conclusion
 Powerful Architecture
 Lots of potential as an Apache project
 Nice abstractions to simplify development (Trident)
 Great for operating on high velocity, high volume streams
 Not prohibitively difficult to integrate with other systems for
input and output
 Lots of people experimenting with it!
Spark & Shark
Lightning fast cluster computing
Apache Spark
 100x faster than Hadoop MapReduce!
 Faster in-memory MapR operations

 Integration with Cassandra either via:
 https://github.com/tuplejump/calliope-release
 Or via Cassandra’s Hadoop support

 Combines SQL, Streaming and Complex Analytics
Apache Spark
 Can read and write to Cassandra…
 Reading from CF / Table into RDD via Calliope (Scala)
val cas = CasBuilder.cql3.withColumnFamily("casDemo",
"Words”).where("book = 'The Three Musketeers'”)
val rdd = sc.cql3Cassandra[Map[String, String], Map[String,
String]](cas)
* where clause can use partition key or secondary index, CasBuilder
also supports paging
Shark
 With Spark we can achieve super fast in-memory queries on
subsets of data in Cassandra
 Effectively all the features of Hive running on RDD not HDFS

 Uses HiveQL queries
 Includes machine learning algorithms out of the box
 CqlStorageHandler provided to read RDD from Cassandra or
read SSTables directly
 https://github.com/richardalow/cassowary
Spark / Shark: Conclusion
 Need resource isolation if running directly on
Cassandra nodes
 Otherwise dealing with higher latency but not affecting
cluster resources
 Impressive possibilities for machine learning
algorithms as well as more basic Hive queries
 Introduces possibilities for JOINs on hot data!
REDIS
What is it?

“Redis is an open source, BSD licensed, advanced keyvalue store. It is often referred to as a data structure server
since keys can contain strings, hashes, lists, sets and sorted
sets.”
Synergy?
 Good for…






Sorting sets & lists
Pubsub messaging
(more) Accurate counters
Merging sets
Transactions!

 Works in memory, can serve data fast based on key
 Good for runtime storage of aggregate data
 Could use shared resources on Cassandra nodes (could populate
most recent data via triggers (naughty))
Elastic Search
Distributed real-time search engine based
What is it?
 Distributed real-time search engine
 Built from the ground up for reliability and scalability

 Supports lots of other features as well free text search
 Spatial
 Query by arbitrary fields
 Facets

 Multi-lingual query support
Synergy?
 Although external to Cassandra it can provide rich query
capabilities over the same data
 Simplify Data Models in Cassandra to maximise storage
 Separate read and write workloads (read from ES, write to
Cassandra)
 Some integration for Storm for writing records to elastic
search and Cassandra as data enters the system
 Again… Spatial!
Hadoop
Batch Analytics
What is it?
 Open Source under Apache License 2.0
 Top Level Apache project

 Runs on commodity hardware
 Used for storage and large scale processing of data-sets
 Lots of complementary tools… impala, mahout etc.
Some terms…
 HDFS


a distributed file-system that stores data on commodity
machines, providing very high aggregate bandwidth across the
cluster.

 Hadoop MapReduce - a programming model for large scale
data processing.
 Hive - An SQL like abstraction for map reduce jobs

 Pig - A procedural style language for expressing map reduce
jobs
Synergy?
 Multiple ways to use it with Cassandra
 DataStax Enterprise supports Hadoop on top of a
Cassandra File System
 Replication managed in-cluster (efficient)
 Full Hadoop toolset available

 Some Hadoop support in vanilla distribution.
 Limited support for efficient querying
Questions?

More Related Content

What's hot

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraJeremy Hanna
 
Spark Application for Time Series Analysis
Spark Application for Time Series AnalysisSpark Application for Time Series Analysis
Spark Application for Time Series AnalysisMapR Technologies
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Datasamuel shamiri
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavasrisatish ambati
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveDataWorks Summit
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskDave Gardner
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopStu Hood
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatationAshish Saraf
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡youngick
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
 Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
Big Data Analytics using Amazon Elastic MapReduce and Amazon RedshiftIndicThreads
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelDean Wampler
 

What's hot (20)

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
Cassandra admin
Cassandra adminCassandra admin
Cassandra admin
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
 
Spark Application for Time Series Analysis
Spark Application for Time Series AnalysisSpark Application for Time Series Analysis
Spark Application for Time Series Analysis
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Data
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and Hive
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = Brisk
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatation
 
알쓸신잡
알쓸신잡알쓸신잡
알쓸신잡
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
 Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
Big Data Analytics using Amazon Elastic MapReduce and Amazon Redshift
 
Searching At Scale
Searching At ScaleSearching At Scale
Searching At Scale
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) Model
 

Viewers also liked

Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Turi, Inc.
 
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...Apigee | Google Cloud
 
Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti...
 Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti... Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti...
Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti...Loqate, a GBG Solution
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesAlice Zheng
 
Predicting behaviour with Machine Learning
Predicting behaviour with Machine Learning Predicting behaviour with Machine Learning
Predicting behaviour with Machine Learning Loqate, a GBG Solution
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax
 
Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing VideoTuri, Inc.
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsTuri, Inc.
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsTuri, Inc.
 

Viewers also liked (10)

Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
 
Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti...
 Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti... Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti...
Optimising eCommerce with Machine Learning & Game Theory — Cassandra, Elasti...
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
Predicting behaviour with Machine Learning
Predicting behaviour with Machine Learning Predicting behaviour with Machine Learning
Predicting behaviour with Machine Learning
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 

Similar to Cassandra synergy

Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkDataStax Academy
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop IntegrationJeremy Hanna
 
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...PyData
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about cephEmma Haruka Iwao
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011sandeep_tata
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologieszahid-mian
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMIJCI JOURNAL
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceStu Hood
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQLPankaj Khattar
 

Similar to Cassandra synergy (20)

Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
No sql
No sqlNo sql
No sql
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
hadoop-spark.ppt
hadoop-spark.ppthadoop-spark.ppt
hadoop-spark.ppt
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
Tobi Bosede - PyCassa Setting Up and Using Apache Cassandra with Python in Wi...
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
The myth of Cassandra
The myth of CassandraThe myth of Cassandra
The myth of Cassandra
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Cassandra synergy

  • 1.
  • 2. Agenda  What do we mean by synergy?  Storm  Shark / Spark  Redis  ElasticSearch  Hadoop
  • 3. What do we mean by Synergy?  synergy  1. The interaction of two or more agents or forces so that their combined effect is greater than the sum of their individual effects.
  • 4. What do we mean by Synergy?  Cassandra excellent for:     Fast read or write performance Scalable, runs on commodity hardware Reliable cross-DC replication Robust persistence for high volume data  Needs some special sauce for:  Real-time calculations for high volume streams  Complex search functions (free-text etc.)  Map Reduce on RDDs
  • 6. Storm  Open Sourced by Twitter in 2011  Distributed event processor  Operates on Resilient Distributed Data Sets  Getting started in Apache Incubator  Can persist to and read from from C*  Great for high volume, real time (complex) calculations on streamed data
  • 7.
  • 8. Storm  Is a CEP architecture  Spout – Collects & submits tuples for processing  Bolt – processes tuples and emits new tuples  Tuple – a collection of data passed in storm  Stream – identifies outputs from a spout / bolt and enforces tuple structure  Uses Zookeeper and ZeroMQ for coordination and message passing respectively
  • 10. Synergy?  Can use Cassandra as the input data source  Can write tuples into Cassandra  Example project here…  https://github.com/tjake/stormscraper/  See CassandraWriterBolt.java for simple example of a Java Driver CQL based bolt that writes to Cassandra.  Good as an example application, but not production ready
  • 11. Use Case  Top N words for popularity tracking  Input: a constant stream of messages into the system  Count occurrences of each word in a message  Store raw messages in Cassandra  Use a bolt to break up messages and maintain sorted list of top N words  Persist the Top N words and their counts periodically in Cassandra
  • 12. Use Case CREATE TABLE messages (date_hour TIMESTAMP, message_id TIMEUUID, message VARCHAR, PRIMARY KEY(date_hour, message_id)); CREATE TABLE top_words (date_hour TIMESTAMP, position INTEGER, word VARCHAR, PRIMARY KEY(date_hour, position));
  • 13. Use Case  https://github.com/nathanmarz/storm-starter/  Use RollingTopWords.java as base  Integrate CassandraWriterBolt into use case  Add spout for input messages  Add bolt for persisting messages & writing Top N words  Reference : http://www.michaelnoll.com/blog/2013/01/18/implementing-real-timetrending-topics-in-storm/
  • 14. Storm: Conclusion  Powerful Architecture  Lots of potential as an Apache project  Nice abstractions to simplify development (Trident)  Great for operating on high velocity, high volume streams  Not prohibitively difficult to integrate with other systems for input and output  Lots of people experimenting with it!
  • 15. Spark & Shark Lightning fast cluster computing
  • 16. Apache Spark  100x faster than Hadoop MapReduce!  Faster in-memory MapR operations  Integration with Cassandra either via:  https://github.com/tuplejump/calliope-release  Or via Cassandra’s Hadoop support  Combines SQL, Streaming and Complex Analytics
  • 17. Apache Spark  Can read and write to Cassandra…  Reading from CF / Table into RDD via Calliope (Scala) val cas = CasBuilder.cql3.withColumnFamily("casDemo", "Words”).where("book = 'The Three Musketeers'”) val rdd = sc.cql3Cassandra[Map[String, String], Map[String, String]](cas) * where clause can use partition key or secondary index, CasBuilder also supports paging
  • 18. Shark  With Spark we can achieve super fast in-memory queries on subsets of data in Cassandra  Effectively all the features of Hive running on RDD not HDFS  Uses HiveQL queries  Includes machine learning algorithms out of the box  CqlStorageHandler provided to read RDD from Cassandra or read SSTables directly  https://github.com/richardalow/cassowary
  • 19. Spark / Shark: Conclusion  Need resource isolation if running directly on Cassandra nodes  Otherwise dealing with higher latency but not affecting cluster resources  Impressive possibilities for machine learning algorithms as well as more basic Hive queries  Introduces possibilities for JOINs on hot data!
  • 20. REDIS
  • 21. What is it? “Redis is an open source, BSD licensed, advanced keyvalue store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.”
  • 22. Synergy?  Good for…      Sorting sets & lists Pubsub messaging (more) Accurate counters Merging sets Transactions!  Works in memory, can serve data fast based on key  Good for runtime storage of aggregate data  Could use shared resources on Cassandra nodes (could populate most recent data via triggers (naughty))
  • 24. What is it?  Distributed real-time search engine  Built from the ground up for reliability and scalability  Supports lots of other features as well free text search  Spatial  Query by arbitrary fields  Facets  Multi-lingual query support
  • 25. Synergy?  Although external to Cassandra it can provide rich query capabilities over the same data  Simplify Data Models in Cassandra to maximise storage  Separate read and write workloads (read from ES, write to Cassandra)  Some integration for Storm for writing records to elastic search and Cassandra as data enters the system  Again… Spatial!
  • 27. What is it?  Open Source under Apache License 2.0  Top Level Apache project  Runs on commodity hardware  Used for storage and large scale processing of data-sets  Lots of complementary tools… impala, mahout etc.
  • 28. Some terms…  HDFS  a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.  Hadoop MapReduce - a programming model for large scale data processing.  Hive - An SQL like abstraction for map reduce jobs  Pig - A procedural style language for expressing map reduce jobs
  • 29. Synergy?  Multiple ways to use it with Cassandra  DataStax Enterprise supports Hadoop on top of a Cassandra File System  Replication managed in-cluster (efficient)  Full Hadoop toolset available  Some Hadoop support in vanilla distribution.  Limited support for efficient querying