SlideShare a Scribd company logo
1 of 41
Download to read offline
Stratio Meta 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20141"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20142"
Who are we? 
STRATIO 
• Stra3o-is-a-Big-Data-Company 
• Founded-in-2013 
• Commercially-launched-in-2014 
• 50+-employees-in-Madrid 
• Office-in-San-Francisco 
• Cer3fied-Spark-distribu3on 
#CassandraSummit-2014 
3"
We love… 
Cassandra 
• P2P-architecture 
• Read/write-performance 
• Fault-tolerance 
• Easy-to-deploy 
• CQL 
#CassandraSummit-2014 
4"
• Introduction 
• Crossdata architecture 
• Metadata management 
• Streaming sources 
• Full text search 
• Spark and Crossdata 
• ODBC 
• The future 
Agenda 
5"
Introduction 
o Big-Data-analysis-is-commonly-associated-with-batch-processing 
• Users-aiming-to-combine-batch-and-stream-processing-have-to- 
rely-on-tailorRmade-architectures 
o Users-buy-Big-Data-plaSorms,-but 
• How-do-I-start? 
• What-is-my-entry-point-to-the-plaSorm? 
#CassandraSummit-2014 
6"
What our clients demand? 
o Easy-deployment 
o Easy-administra3on 
o Read/write-performance 
o EasyRtoRlearn-query-language-o 
Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
7"
What our clients demand? 
! Easy%deployment% 
! Easy%administra0on% 
! Read/write%performance% 
! Easy6to6learn%query%language% 
o Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
8"
What our clients demand? 
! Easy"deployment" 
! Easy"administra8on" 
! Read/write"performance" 
! Easy>to>learn"query"language" 
! Integra3on-with-BI-Tools 
! Join-opera3ons 
! Support-for-streaming-sources 
! Integra3on-with-other-data-stores 
! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
9"
Crossdata 
o A-new-technology-that: 
• Is-not-limited-by-the-underlying-datastore-capabili3es 
• Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons 
• Supports-batch-and-streaming-queries 
• Supports-mul3ple-clusters-and-technologies 
#CassandraSummit-2014 
10"
Our architecture 
#CassandraSummit-2014 
11"
Connecting to the outside world 
o Crossdata-defines-an-IConnector-extension-interface 
o User-can-easily-add-new-connectors-to-support 
• Different-datastores 
• Different-processing-engines 
• Different-versions 
o Where-each-connector-defines-its-capabili3es 
#CassandraSummit-2014 
12" 
Our planner will choose the best connector for each query
Query execution 
#CassandraSummit-2014 
13" 
Parsing" Valida8on" Planning" Execu8on" 
C*" 
Connector1" 
Connector2" 
Connector3" 
Our planner will choose the best connector for each query
Multi-cluster support 
o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- 
across-a-set-of-datastores.- 
• Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance 
" E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- 
readRop3mized-cluster,-etc.- 
• A-table-is-saved-in-a-unique-datastore 
#CassandraSummit-2014 
14"
Logical and physical mapping 
SELECT&*&FROM&app.users;& 
Users"table" Test"table" old_users"table" 
#CassandraSummit-2014 
15" 
App"catalog" 
C*"produc8on" C*"development" Other"datastores"
Metadata 
Management 
16"
Metadata in the era of Schemaless NoSQL datastores 
o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- 
• Flexible-schemas-vs-Schemaless 
• Crossdata-provides-a-Metadata-manager-that-stores-schemas- 
for-any-datasource 
" Remember-ODBC-and-those-BI-tools 
" 
1010010101010 
1010110101010 
1111010001111 
?" 001000" 
#CassandraSummit-2014 
17"
Metadata management 
#CassandraSummit-2014 
18" 
Connector" 
C*"produc8on" 
Metadata"Store" 
Infinispan" 
Metadata"Manager" 
2% 
Updated"metadata" 
informa8on"is" 
maintained"among" 
Crossdata"servers" 
using"Infinispan" 
If"the"connector"does" 
not"support"metadata" 
opera8ons"those"are" 
skipped" 1% 2%
Streaming sources 
19"
Managing streaming sources 
o Nowadays-use-cases-expect-some-type-of-streaming-datasource 
• Streaming-data-has-an-ephemeral-nature 
• In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- 
#CassandraSummit-2014 
to-work-with-streaming-sources-as-classical- 
RDBMS-tables 
20" 
streaming" 
source" 
{schema:{col1:…},…}" 
col1:text" col2:int" col3:int" col4:text" 
Streaming_query0" 
…" 
Streaming_queryn"
Streaming queries 
o Streaming-queries-are-infinite-by-defini3on 
• A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- 
ingested-by-the-system-in-that-period 
• The-user-launches-queries-specifying-a-processing-3me-window 
" Crossdata-provides-methods-to-list-and-stop-running-streaming- 
#CassandraSummit-2014 
queries 
21"
Streaming queries: windows syntax 
#CassandraSummit-2014 
22" 
SELECT fieldGroup,avg(Field2) 
FROM eph_table 
WITH WINDOW 5 minutes 
WHERE field1=100 AND field2>100 
GROUP BY fieldGroup;
Joining batch and streaming 
SELECT * FROM demo.temporal 
WITH WINDOW 10 secs 
INNER JOIN demo.users 
#CassandraSummit-2014 
ON users.name = temporal.name; 
SELECT * FROM 
demo.temporal 
WITH WINDOW 10 secs 
" 
SELECT * 
FROM demo.users 
" 
INNER JOIN ON 
users.name = 
temporal.name 
" 
23"
Full text search 
24"
Full text search with 
o Clients-request-the-ability-to-perform-full-text-searches 
o We-have-developed-an-integra3on-between-Lucene-and- 
Cassandra 
o C*-users-can-now-enjoy-all-Lucene-features: 
• Full-text-searches,-range-queries,-fuzzy-queries…. 
#CassandraSummit-2014 
25" 
https://github.com/Stratio/stratio-cassandra
Stratio Lucene 2i 
#CassandraSummit-2014 
26" 
C*" 
node" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
Lucene" 
index"
Full text search queries 
o With-Crossdata,-we-simplify: 
• The-crea3on-syntax- 
• The-query-syntax-using-the-match-operator 
#CassandraSummit-2014 
27" 
CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& 
SELECT&*&FROM&app.users&& 
where&email&MATCH&‘*@stratio.com’;&
& Stratio Crossdata 
28"
Why Spark? 
o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons 
o Spark-brings-several-benefits-over-Hadoop-o 
InRMemory-processing 
o RDD-abstrac3on 
o Simpler-API-o 
Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) 
#CassandraSummit-2014 
29"
What about Spark SQL? 
o Different-approach-to-query-execu3on 
• We-only-use-Spark-when-it-speedups-queries 
" Na3ve-drivers-are-faster-for-simple-queries 
" Spark-SQL-has-limited-RDD-sources 
• Avoid-some-Spark-limita3ons 
• Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 
#CassandraSummit-2014 
30"
Query approach 
SparkSQL"approach" Crossdata"approach" 
#CassandraSummit-2014 
SparkSQL" 
Spark" 
Cassandra" 
Spark" Na8ve"driver" 
Cassandra" 
31" 
Stra8o"Crossdata"
Our Cassandra-Spark integration 
o Project-started-in-June-2013 
" With-the-objec3ve-of-providing-a-method-to-interact-with- 
Cassandra-from-Spark 
" Ini3al-approach-based-on-the-HadoopInputFormat-interface 
" Current-version-uses-the-na3ve-Datastax-Java-driver 
#CassandraSummit-2014 
32" 
https://github.com/Stratio/stratio-deep
Our Cassandra-Spark integration 
o Benchmark-in-process-comparing-our-solu3on-with-the- 
Datastax-Spark-driver 
• Results-highly-influenced-by-the-split-size 
• Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- 
Datastax-default-values 
• Group-by-–-up-to-40%-faster 
• Join-–-up-to-17%-faster 
• Stay-tuned-for-the-benchmark-publica3on! 
#CassandraSummit-2014 
33"
Spark vs Lucene 2i 
#CassandraSummit-2014 
34" 
Time" 
Spark" 
Lucen"2i" 
Records/node"
ODBC 
35"
Stratio Crossdata ODBC 
o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) 
o We-have-implemented-for-Crossdata-using-Simba-SDK 
o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- 
world 
o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel 
#CassandraSummit-2014 
36" 
One ODBC for all datastores!
The future 
37"
The future 
o Security 
o Query-op3mizer-and-smart-query-planner 
o Leverage-system-sta3s3cs 
o Support-for-UDFs 
o Become-an-Apache-project 
#CassandraSummit-2014 
38" 
https://github.com/Stratio/stratio-meta
We are looking for an Apache Champion 
#CassandraSummit-2014 
39" 
Can"you" 
help"us?"
A wish list for Cassandra 
o Ability-to-stop-running-queries 
o Interac3ve-users-are-unpredictable 
o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) 
o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator 
• E.g.,-aggrega3ons-like-count(*) 
#CassandraSummit-2014 
40"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-201441"

More Related Content

What's hot

BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgDavid Pilato
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopPatricia Gorla
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaDataStax Academy
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaDataStax Academy
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Robert Stupp
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introductionAlex Su
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisDuyhai Doan
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
 

What's hot (20)

BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introduction
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 

Viewers also liked

Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 

Viewers also liked (6)

Big Data Technology
Big Data TechnologyBig Data Technology
Big Data Technology
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 

Similar to Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch and Streaming Query Capabilities

Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraStratio
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationSean Chittenden
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Spark Summit
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZconfluent
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014dhiguero
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Andrés de la Peña
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Johnny Miller
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 

Similar to Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch and Streaming Query Capabilities (20)

Presentation
PresentationPresentation
Presentation
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch and Streaming Query Capabilities

  • 1. Stratio Meta An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-20141"
  • 2. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-20142"
  • 3. Who are we? STRATIO • Stra3o-is-a-Big-Data-Company • Founded-in-2013 • Commercially-launched-in-2014 • 50+-employees-in-Madrid • Office-in-San-Francisco • Cer3fied-Spark-distribu3on #CassandraSummit-2014 3"
  • 4. We love… Cassandra • P2P-architecture • Read/write-performance • Fault-tolerance • Easy-to-deploy • CQL #CassandraSummit-2014 4"
  • 5. • Introduction • Crossdata architecture • Metadata management • Streaming sources • Full text search • Spark and Crossdata • ODBC • The future Agenda 5"
  • 6. Introduction o Big-Data-analysis-is-commonly-associated-with-batch-processing • Users-aiming-to-combine-batch-and-stream-processing-have-to- rely-on-tailorRmade-architectures o Users-buy-Big-Data-plaSorms,-but • How-do-I-start? • What-is-my-entry-point-to-the-plaSorm? #CassandraSummit-2014 6"
  • 7. What our clients demand? o Easy-deployment o Easy-administra3on o Read/write-performance o EasyRtoRlearn-query-language-o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 7"
  • 8. What our clients demand? ! Easy%deployment% ! Easy%administra0on% ! Read/write%performance% ! Easy6to6learn%query%language% o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 8"
  • 9. What our clients demand? ! Easy"deployment" ! Easy"administra8on" ! Read/write"performance" ! Easy>to>learn"query"language" ! Integra3on-with-BI-Tools ! Join-opera3ons ! Support-for-streaming-sources ! Integra3on-with-other-data-stores ! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 9"
  • 10. Crossdata o A-new-technology-that: • Is-not-limited-by-the-underlying-datastore-capabili3es • Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons • Supports-batch-and-streaming-queries • Supports-mul3ple-clusters-and-technologies #CassandraSummit-2014 10"
  • 12. Connecting to the outside world o Crossdata-defines-an-IConnector-extension-interface o User-can-easily-add-new-connectors-to-support • Different-datastores • Different-processing-engines • Different-versions o Where-each-connector-defines-its-capabili3es #CassandraSummit-2014 12" Our planner will choose the best connector for each query
  • 13. Query execution #CassandraSummit-2014 13" Parsing" Valida8on" Planning" Execu8on" C*" Connector1" Connector2" Connector3" Our planner will choose the best connector for each query
  • 14. Multi-cluster support o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- across-a-set-of-datastores.- • Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance " E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- readRop3mized-cluster,-etc.- • A-table-is-saved-in-a-unique-datastore #CassandraSummit-2014 14"
  • 15. Logical and physical mapping SELECT&*&FROM&app.users;& Users"table" Test"table" old_users"table" #CassandraSummit-2014 15" App"catalog" C*"produc8on" C*"development" Other"datastores"
  • 17. Metadata in the era of Schemaless NoSQL datastores o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- • Flexible-schemas-vs-Schemaless • Crossdata-provides-a-Metadata-manager-that-stores-schemas- for-any-datasource " Remember-ODBC-and-those-BI-tools " 1010010101010 1010110101010 1111010001111 ?" 001000" #CassandraSummit-2014 17"
  • 18. Metadata management #CassandraSummit-2014 18" Connector" C*"produc8on" Metadata"Store" Infinispan" Metadata"Manager" 2% Updated"metadata" informa8on"is" maintained"among" Crossdata"servers" using"Infinispan" If"the"connector"does" not"support"metadata" opera8ons"those"are" skipped" 1% 2%
  • 20. Managing streaming sources o Nowadays-use-cases-expect-some-type-of-streaming-datasource • Streaming-data-has-an-ephemeral-nature • In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- #CassandraSummit-2014 to-work-with-streaming-sources-as-classical- RDBMS-tables 20" streaming" source" {schema:{col1:…},…}" col1:text" col2:int" col3:int" col4:text" Streaming_query0" …" Streaming_queryn"
  • 21. Streaming queries o Streaming-queries-are-infinite-by-defini3on • A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- ingested-by-the-system-in-that-period • The-user-launches-queries-specifying-a-processing-3me-window " Crossdata-provides-methods-to-list-and-stop-running-streaming- #CassandraSummit-2014 queries 21"
  • 22. Streaming queries: windows syntax #CassandraSummit-2014 22" SELECT fieldGroup,avg(Field2) FROM eph_table WITH WINDOW 5 minutes WHERE field1=100 AND field2>100 GROUP BY fieldGroup;
  • 23. Joining batch and streaming SELECT * FROM demo.temporal WITH WINDOW 10 secs INNER JOIN demo.users #CassandraSummit-2014 ON users.name = temporal.name; SELECT * FROM demo.temporal WITH WINDOW 10 secs " SELECT * FROM demo.users " INNER JOIN ON users.name = temporal.name " 23"
  • 25. Full text search with o Clients-request-the-ability-to-perform-full-text-searches o We-have-developed-an-integra3on-between-Lucene-and- Cassandra o C*-users-can-now-enjoy-all-Lucene-features: • Full-text-searches,-range-queries,-fuzzy-queries…. #CassandraSummit-2014 25" https://github.com/Stratio/stratio-cassandra
  • 26. Stratio Lucene 2i #CassandraSummit-2014 26" C*" node" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" Lucene" index"
  • 27. Full text search queries o With-Crossdata,-we-simplify: • The-crea3on-syntax- • The-query-syntax-using-the-match-operator #CassandraSummit-2014 27" CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& SELECT&*&FROM&app.users&& where&email&MATCH&‘*@stratio.com’;&
  • 29. Why Spark? o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons o Spark-brings-several-benefits-over-Hadoop-o InRMemory-processing o RDD-abstrac3on o Simpler-API-o Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) #CassandraSummit-2014 29"
  • 30. What about Spark SQL? o Different-approach-to-query-execu3on • We-only-use-Spark-when-it-speedups-queries " Na3ve-drivers-are-faster-for-simple-queries " Spark-SQL-has-limited-RDD-sources • Avoid-some-Spark-limita3ons • Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 #CassandraSummit-2014 30"
  • 31. Query approach SparkSQL"approach" Crossdata"approach" #CassandraSummit-2014 SparkSQL" Spark" Cassandra" Spark" Na8ve"driver" Cassandra" 31" Stra8o"Crossdata"
  • 32. Our Cassandra-Spark integration o Project-started-in-June-2013 " With-the-objec3ve-of-providing-a-method-to-interact-with- Cassandra-from-Spark " Ini3al-approach-based-on-the-HadoopInputFormat-interface " Current-version-uses-the-na3ve-Datastax-Java-driver #CassandraSummit-2014 32" https://github.com/Stratio/stratio-deep
  • 33. Our Cassandra-Spark integration o Benchmark-in-process-comparing-our-solu3on-with-the- Datastax-Spark-driver • Results-highly-influenced-by-the-split-size • Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- Datastax-default-values • Group-by-–-up-to-40%-faster • Join-–-up-to-17%-faster • Stay-tuned-for-the-benchmark-publica3on! #CassandraSummit-2014 33"
  • 34. Spark vs Lucene 2i #CassandraSummit-2014 34" Time" Spark" Lucen"2i" Records/node"
  • 36. Stratio Crossdata ODBC o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) o We-have-implemented-for-Crossdata-using-Simba-SDK o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- world o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel #CassandraSummit-2014 36" One ODBC for all datastores!
  • 38. The future o Security o Query-op3mizer-and-smart-query-planner o Leverage-system-sta3s3cs o Support-for-UDFs o Become-an-Apache-project #CassandraSummit-2014 38" https://github.com/Stratio/stratio-meta
  • 39. We are looking for an Apache Champion #CassandraSummit-2014 39" Can"you" help"us?"
  • 40. A wish list for Cassandra o Ability-to-stop-running-queries o Interac3ve-users-are-unpredictable o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator • E.g.,-aggrega3ons-like-count(*) #CassandraSummit-2014 40"
  • 41. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-201441"