SlideShare a Scribd company logo
1 of 34
Download to read offline
Top 5 Factors to Consider When
Choosing a Big Data Solution
 Robin Schumacher, VP Products


©2012 DataStax                   1
•  VP Products, DataStax
    •  Director of Product Management MySQL, then
       EnterpriseDB
    •  VP Product Management at Embarcadero
       Technologies
    •  DBA with Oracle, Teradata, SQL Server, DB2,
       others…
    •  Database software reviewer for various
       magazines
    •  Author of 3 database books

©2012 DataStax                                       2
•  Define big data
       •  Identify “must have’s” of a big data solution
       •  Discuss difficulty in getting all of them from a
          business and technical perspective
       •  Brief tour of NoSQL, Cassandra and DataStax
          Enterprise




©2012 DataStax                                             3
What big data is and the
                 domains of data that need to
                 be considered.




©2012 DataStax                                  4
©2012 DataStax   5
“Big data technologies describe a new generation of technologies and
    architectures, designed to economically extract value from very large volumes of a
    wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”



     "Big data is data that exceeds the processing capacity of conventional database
     systems. The data is too big, moves too fast, or doesn't fit the strictures of your
     database architectures. To gain value from this data, you must choose an
     alternative way to process it."



    ”Datasets whose size is beyond the ability of typical database software tools to
    capture, store, manage, and analyze "



      * All definitions have one thing in common: new technology is needed for big data…

©2012 DataStax                                                                              6
1.  Real-time – transactional, online, streaming, low latency
        data
    2.  Analytic – aggregated data from real-time feeds or other
        sources; many times batch in nature
    3.  Search – supporting data, both external and internal, used
        for locating desired information and/or objects (e.g.
        products, documents, etc.)




©2012 DataStax                                                       7
Research done by McKinsey & Company shows the eye-opening, 10-year
          category growth rate differences between businesses that smartly use
          their big data and those that do not.


©2012 DataStax                                                                  8
What are the top five things to
                 consider in a big data
                 solution?




©2012 DataStax                                     9
©2012 DataStax   10
The characteristics that define big data are:

    1.  Velocity – includes the speed at which data comes in, and
        the number of events/elements being stored
    2.  Variety – involves structured, semi-structured, unstructured
        data
    3.  Volume – can equate to TB-PB’s of data
    4.  Complexity – typically entails the difficulty distributing the
        data (e.g. multi-data centers, cloud, etc.) and managing the
        data traffic/movement (e.g. ETL, migrations, etc.)




©2012 DataStax                                                         11
•  Data has high rate of input
         •  Data has large quantity of elements/events



                 • Sensor data
                 • Media streaming
                 • Mobile devices
                 • Financial streams
                 • Web clickstream
                 • Traffic monitoring
                 • Patient care




©2012 DataStax                                           12
•  Includes structured, semi, and unstructured
         •  Necessitates new data model and file formats
         •  Involves, real-time, analytic, and search data




©2012 DataStax                                               13
•  TB’s to PB’s
         •  Also involves data maintenance functions (e.g.
            purging, etc.)




©2012 DataStax                                               14
The McKinsey report found that the average investment firm with fewer than 1,000 employees has
      3.8 petabytes of data stored, experiences a data growth rate of 40 percent per year, and stores
      structured, semi-structured, and unstructured data. Overall, McKinsey found that 15 out of 17
      industry sectors in the United States have more data stored per company than the U.S. Library of
      Congress (which had 235 terabytes of information at the time of McKinsey’s study)

©2012 DataStax                                                                                           15
•  Typically involves data distribution, movement,
            etc., across multiple data centers and
            geographies
         •  Can be on-premise, cloud, or hybrid




©2012 DataStax                                                16
Getting a big data technology that provides two out of three can be
       challenging; finding one that supplies all three can be very hard.

©2012 DataStax                                                               17
NoSQL, Cassandra, and
                 DataStax Enterprise for big
                 data.




©2012 DataStax                                 18
NoSQL is a broad class of next-generation database management
        systems that differ from the classic model of the relational database
        management system (RDBMS) in some significant ways, most important
        being they:

         •       Sport a less-rigid, more dynamic data model
         •       Look to provide user controlled trade-off’s to the CAP theorem
         •       Do not support ANSI SQL or operations such as joins
         •       Attempt to solve some or all of the challenges of big data




©2012 DataStax                                                                   19
A NoSQL solution like Apache Cassandra:
         •  Handles high velocity data with ease
         •  Uses schema that support broad varieties of data
         •  Scales from GB’s to PB’s with linear performance capabilities
         •  Is built to handle multi-location/data center use cases
         •  Is designed for continuous availability
         •  Offers quick installation and configuration for multi-node
            clusters
         •  Is open source and/or cost 80-90% less than RDBMS’s




©2012 DataStax                                                              20
Overview of DataStax
        •  Founded in April 2010
        •  Commercial leader in Apache Cassandra™, the
           popular open-source “big data” database
        •  140+ customers
        •  40+ employees
        •  Home to Apache Cassandra Chair & most
           committers
        •  Headquartered in San Francisco Bay area
        •  Funded by prominent venture firms




©2012 DataStax                                           21
* Uses Cassandra and Hadoop for data management
©2012 DataStax                                          22
Cassandra is:
    Nearly 4x better in writes
    Nearly 2x better in reads
    Over 12x better in reads/updates




    YCSB Benchmark
    Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2-
    NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email

©2012 DataStax                                                                                                              23
Stores financial options tick data into very fluid data model for storage and
                 analysis into Cassandra.



©2012 DataStax                                                                                   24
“The hundreds of millions of web pages that contain this information are
                 stored in a multi-terabyte cache that grows continually as we crawl the
                 web, analyzing new pages and finding new versions of existing pages.” –
                 Zoominfo Architect on using Cassandra

©2012 DataStax                                                                              25
“I can create a Cassandra cluster in any region of the world in 10
                 minutes. When marketing guys decide we want to move into a
                 certain part of the world, we’re ready.” - Netflix architect

©2012 DataStax                                                                        26
•       Fully integrated smart big data platform
         •       Production certified Cassandra
         •       Continuously available analytics with Hadoop
         •       Scalable enterprise search with Solr
         •       Built in workload isolation
         •       No costly and error-prone ETL operations
         •       Easy migration of RDBMS and log data
         •       Simple to install and grow
         •       OpsCenter management solution
         •       80-90% less cost than RDBMS vendors




©2012 DataStax                                                  27
•  DataStax OpsCenter is a visual management and
           monitoring solution for DataStax Enterprise
        •  Manage and monitor all Cassandra and Hadoop and Solr
           operations
        •  Visual alerts and notifications




©2012 DataStax                                                    28
1.  Does it handle high data velocity?
        2.  Can it tackle all types of data?
        3.  How well does it perform with large data volumes?
        4.  Can it handle complex distribution and implementation
            use cases (e.g. on-premise/cloud, multi-geo)?
        5.  How does it stack up in hitting the big data “bulls
            eye?” (i.e. cost, saleable performance, and operational
            ease are concerned)?




©2012 DataStax                                                        29
DataStax Enterprise is tailor made for high-velocity, multi-variety, large
       volume, and complex deployment use cases that involve big data.




©2012 DataStax                                                                      30
Recommended Reading




                 http://www.datastax.com/resources/whitepapers

©2012 DataStax                                                   31
Next Steps
         Download DataStax Enterprise and try it in your
         own environment.

          ›  Go to www.datastax.com/
              software
          ›  Download a copy of DataStax
              Enterprise
          ›  Installs and configures in
              minutes
          ›  Completely free for
              development use




©2012 DataStax                                             32
For More Information




©2012 DataStax                  33
Move Faster.




©2012 DataStax                  34

More Related Content

What's hot

SteelEye 표준 제안서
SteelEye 표준 제안서SteelEye 표준 제안서
SteelEye 표준 제안서Yong-uk Choe
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hadoop et son écosystème - v2
Hadoop et son écosystème - v2Hadoop et son écosystème - v2
Hadoop et son écosystème - v2Khanh Maudoux
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB AtlasMongoDB
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysqlliufabin 66688
 
Hardware planning & sizing for sql server
Hardware planning & sizing for sql serverHardware planning & sizing for sql server
Hardware planning & sizing for sql serverDavide Mauri
 
Big data بزرگ داده ها
Big data بزرگ داده هاBig data بزرگ داده ها
Big data بزرگ داده هاOmid Sohrabi
 
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록Jaehyeuk Oh
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB AtlasMongoDB
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
How SAP uses Flowable as its BPMN engine for SAP CP Workflow
How SAP uses Flowable as its BPMN engine for SAP CP WorkflowHow SAP uses Flowable as its BPMN engine for SAP CP Workflow
How SAP uses Flowable as its BPMN engine for SAP CP WorkflowFlowable
 
MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxNeoClova
 
Evolution of Big Data Messaging
Evolution of Big Data Messaging Evolution of Big Data Messaging
Evolution of Big Data Messaging Kartik Paramasivam
 
Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0Olivier DASINI
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 

What's hot (20)

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
금융산업의 빅데이터 활용 및 이슈
금융산업의 빅데이터 활용 및 이슈금융산업의 빅데이터 활용 및 이슈
금융산업의 빅데이터 활용 및 이슈
 
SteelEye 표준 제안서
SteelEye 표준 제안서SteelEye 표준 제안서
SteelEye 표준 제안서
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop et son écosystème - v2
Hadoop et son écosystème - v2Hadoop et son écosystème - v2
Hadoop et son écosystème - v2
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysql
 
Hardware planning & sizing for sql server
Hardware planning & sizing for sql serverHardware planning & sizing for sql server
Hardware planning & sizing for sql server
 
Big data بزرگ داده ها
Big data بزرگ داده هاBig data بزرگ داده ها
Big data بزرگ داده ها
 
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
How SAP uses Flowable as its BPMN engine for SAP CP Workflow
How SAP uses Flowable as its BPMN engine for SAP CP WorkflowHow SAP uses Flowable as its BPMN engine for SAP CP Workflow
How SAP uses Flowable as its BPMN engine for SAP CP Workflow
 
Introducción a Hadoop
Introducción a HadoopIntroducción a Hadoop
Introducción a Hadoop
 
MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptx
 
Evolution of Big Data Messaging
Evolution of Big Data Messaging Evolution of Big Data Messaging
Evolution of Big Data Messaging
 
Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade from MySQL 5.7 to MySQL 8.0
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 

Similar to Top 5 Considerations for a Big Data Solution

The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionDATAVERSITY
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - CassandraJen Wei Lee
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big DataDataStax
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Denodo
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 

Similar to Top 5 Considerations for a Big Data Solution (20)

The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - Cassandra
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Getting Big Value from Big Data
Getting Big Value from Big DataGetting Big Value from Big Data
Getting Big Value from Big Data
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Speak to Your Data
Speak to Your DataSpeak to Your Data
Speak to Your Data
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 

Recently uploaded (20)

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 

Top 5 Considerations for a Big Data Solution

  • 1. Top 5 Factors to Consider When Choosing a Big Data Solution Robin Schumacher, VP Products ©2012 DataStax 1
  • 2. •  VP Products, DataStax •  Director of Product Management MySQL, then EnterpriseDB •  VP Product Management at Embarcadero Technologies •  DBA with Oracle, Teradata, SQL Server, DB2, others… •  Database software reviewer for various magazines •  Author of 3 database books ©2012 DataStax 2
  • 3. •  Define big data •  Identify “must have’s” of a big data solution •  Discuss difficulty in getting all of them from a business and technical perspective •  Brief tour of NoSQL, Cassandra and DataStax Enterprise ©2012 DataStax 3
  • 4. What big data is and the domains of data that need to be considered. ©2012 DataStax 4
  • 6. “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” "Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it." ”Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze " * All definitions have one thing in common: new technology is needed for big data… ©2012 DataStax 6
  • 7. 1.  Real-time – transactional, online, streaming, low latency data 2.  Analytic – aggregated data from real-time feeds or other sources; many times batch in nature 3.  Search – supporting data, both external and internal, used for locating desired information and/or objects (e.g. products, documents, etc.) ©2012 DataStax 7
  • 8. Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not. ©2012 DataStax 8
  • 9. What are the top five things to consider in a big data solution? ©2012 DataStax 9
  • 11. The characteristics that define big data are: 1.  Velocity – includes the speed at which data comes in, and the number of events/elements being stored 2.  Variety – involves structured, semi-structured, unstructured data 3.  Volume – can equate to TB-PB’s of data 4.  Complexity – typically entails the difficulty distributing the data (e.g. multi-data centers, cloud, etc.) and managing the data traffic/movement (e.g. ETL, migrations, etc.) ©2012 DataStax 11
  • 12. •  Data has high rate of input •  Data has large quantity of elements/events • Sensor data • Media streaming • Mobile devices • Financial streams • Web clickstream • Traffic monitoring • Patient care ©2012 DataStax 12
  • 13. •  Includes structured, semi, and unstructured •  Necessitates new data model and file formats •  Involves, real-time, analytic, and search data ©2012 DataStax 13
  • 14. •  TB’s to PB’s •  Also involves data maintenance functions (e.g. purging, etc.) ©2012 DataStax 14
  • 15. The McKinsey report found that the average investment firm with fewer than 1,000 employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent per year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey found that 15 out of 17 industry sectors in the United States have more data stored per company than the U.S. Library of Congress (which had 235 terabytes of information at the time of McKinsey’s study) ©2012 DataStax 15
  • 16. •  Typically involves data distribution, movement, etc., across multiple data centers and geographies •  Can be on-premise, cloud, or hybrid ©2012 DataStax 16
  • 17. Getting a big data technology that provides two out of three can be challenging; finding one that supplies all three can be very hard. ©2012 DataStax 17
  • 18. NoSQL, Cassandra, and DataStax Enterprise for big data. ©2012 DataStax 18
  • 19. NoSQL is a broad class of next-generation database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways, most important being they: •  Sport a less-rigid, more dynamic data model •  Look to provide user controlled trade-off’s to the CAP theorem •  Do not support ANSI SQL or operations such as joins •  Attempt to solve some or all of the challenges of big data ©2012 DataStax 19
  • 20. A NoSQL solution like Apache Cassandra: •  Handles high velocity data with ease •  Uses schema that support broad varieties of data •  Scales from GB’s to PB’s with linear performance capabilities •  Is built to handle multi-location/data center use cases •  Is designed for continuous availability •  Offers quick installation and configuration for multi-node clusters •  Is open source and/or cost 80-90% less than RDBMS’s ©2012 DataStax 20
  • 21. Overview of DataStax •  Founded in April 2010 •  Commercial leader in Apache Cassandra™, the popular open-source “big data” database •  140+ customers •  40+ employees •  Home to Apache Cassandra Chair & most committers •  Headquartered in San Francisco Bay area •  Funded by prominent venture firms ©2012 DataStax 21
  • 22. * Uses Cassandra and Hadoop for data management ©2012 DataStax 22
  • 23. Cassandra is: Nearly 4x better in writes Nearly 2x better in reads Over 12x better in reads/updates YCSB Benchmark Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2- NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email ©2012 DataStax 23
  • 24. Stores financial options tick data into very fluid data model for storage and analysis into Cassandra. ©2012 DataStax 24
  • 25. “The hundreds of millions of web pages that contain this information are stored in a multi-terabyte cache that grows continually as we crawl the web, analyzing new pages and finding new versions of existing pages.” – Zoominfo Architect on using Cassandra ©2012 DataStax 25
  • 26. “I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, we’re ready.” - Netflix architect ©2012 DataStax 26
  • 27. •  Fully integrated smart big data platform •  Production certified Cassandra •  Continuously available analytics with Hadoop •  Scalable enterprise search with Solr •  Built in workload isolation •  No costly and error-prone ETL operations •  Easy migration of RDBMS and log data •  Simple to install and grow •  OpsCenter management solution •  80-90% less cost than RDBMS vendors ©2012 DataStax 27
  • 28. •  DataStax OpsCenter is a visual management and monitoring solution for DataStax Enterprise •  Manage and monitor all Cassandra and Hadoop and Solr operations •  Visual alerts and notifications ©2012 DataStax 28
  • 29. 1.  Does it handle high data velocity? 2.  Can it tackle all types of data? 3.  How well does it perform with large data volumes? 4.  Can it handle complex distribution and implementation use cases (e.g. on-premise/cloud, multi-geo)? 5.  How does it stack up in hitting the big data “bulls eye?” (i.e. cost, saleable performance, and operational ease are concerned)? ©2012 DataStax 29
  • 30. DataStax Enterprise is tailor made for high-velocity, multi-variety, large volume, and complex deployment use cases that involve big data. ©2012 DataStax 30
  • 31. Recommended Reading http://www.datastax.com/resources/whitepapers ©2012 DataStax 31
  • 32. Next Steps Download DataStax Enterprise and try it in your own environment. ›  Go to www.datastax.com/ software ›  Download a copy of DataStax Enterprise ›  Installs and configures in minutes ›  Completely free for development use ©2012 DataStax 32