SlideShare a Scribd company logo
1 of 104
Download to read offline
10: Taxonomy of Data and Storage
Zubair Nabi
zubair.nabi@itu.edu.pk
April 20, 2013
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27
Introduction
Data is everywhere and is the driving force behind our lives
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Datasets can easily be classified on the basis of their structure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Datasets can easily be classified on the basis of their structure
1 Structured
2 Unstructured
3 Semi-structured
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Structured Data
Formatted in a universally understandable and identifiable way
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Each field also has an associated type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Each field also has an associated type
Possible to search for items based on their data types
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Unstructured Data
Data without any conceptual definition or type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual definition or type
Can vary from raw text to binary data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual definition or type
Can vary from raw text to binary data
Processing unstructured data requires parsing and tagging on the fly
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual definition or type
Can vary from raw text to binary data
Processing unstructured data requires parsing and tagging on the fly
In most cases, consists of simple log files
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
For instance, while binary data has no structure, audio and video files
have meta-data which has structure, such as author, time of creation,
etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
For instance, while binary data has no structure, audio and video files
have meta-data which has structure, such as author, time of creation,
etc.
Can also be labelled as self-describing structure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27
Database Management Systems (DBMS)
Used to store and manage data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Security is useful too; to enable fine-grained access control
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Security is useful too; to enable fine-grained access control
Ability to keep working in the face of failure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
The same key field is used to connect one table to another
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
The same key field is used to connect one table to another
For instance, a relation might have customer ID as key and her details
as data; another table might have the same key but different data, say
her purchases; yet another table with the same key might have a
breakdown of her preferences
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
The same key field is used to connect one table to another
For instance, a relation might have customer ID as key and her details
as data; another table might have the same key but different data, say
her purchases; yet another table with the same key might have a
breakdown of her preferences
Examples include Oracle Database, MS SQL Server, MySQL, IBM
DB2, and Teradata
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Instructions consist of a specific SQL statement and additional
parameters and operands
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Instructions consist of a specific SQL statement and additional
parameters and operands
For instance, the SELECT operator retrieves certain records, INSERT
adds a record, and so on
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
For instance, adding a new attribute to an existing row necessitates
adding a new column to the entire table
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
For instance, adding a new attribute to an existing row necessitates
adding a new column to the entire table
Extremely suboptimal in tables with millions of rows
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the fly
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the fly
RDBMS would require the creation of a new table each time such a
change takes place
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the fly
RDBMS would require the creation of a new table each time such a
change takes place
Therefore, unstructured and semi-structured data does not fit the
relational model
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacrifice consistency leading to eventual consistency
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacrifice consistency leading to eventual consistency
This basically available, soft state, eventually consistent (BASE) model
enables applications to function even in the face of partial failure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacrifice consistency leading to eventual consistency
This basically available, soft state, eventually consistent (BASE) model
enables applications to function even in the face of partial failure
High Throughput: Most NoSQL databases sacrifice consistency for
availability leading to higher throughput (in some cases an order of
magnitude)
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Commodity Hardware: A large number of RDBMS require specialized
and proprietary hardware for operation. In contrast, NoSQL databases
function over commodity off-the-shelf hardware
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Commodity Hardware: A large number of RDBMS require specialized
and proprietary hardware for operation. In contrast, NoSQL databases
function over commodity off-the-shelf hardware
Programming Language Support: Over the years programming
languages have started providing abstractions for database support
(LINQ, etc.) while bypassing SQL. NoSQL databases provide
abstractions that directly map onto the language abstractions leading
to tighter coupling
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (3)
The Rise of Cloud Computing: Cloud Computing applications require
horizontal scalability and low administration overhead. Both
requirements are naturally satisfied by NoSQL stores
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
3 Query Model: What type of API it exposes
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
3 Query Model: What type of API it exposes
4 Persistence: How persistent the data is
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Classification by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Classification by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
2 Document Stores: Complex data structures to encapsulate document
key/value pairs
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Classification by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
2 Document Stores: Complex data structures to encapsulate document
key/value pairs
3 Column-Oriented Stores: Data laid out by column
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Key/value Stores
Data is stored within a large hash map
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Limit on the size of the key
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Limit on the size of the key
Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,
and Memcached
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Document Stores
Key/value semantics but based on documents
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Documents can also be retrieved based on their content
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Documents can also be retrieved based on their content
Examples include Apache CouchDB and MongoDB
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Column-Oriented Stores
Data is stored and processed by column
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efficient compression
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efficient compression
Columns are stored separately so they can be loaded in parallel
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efficient compression
Columns are stored separately so they can be loaded in parallel
Examples include Google’s BigTable (Apache HBase is its open source
clone) and Facebook’s Cassandra
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classified into:
1 New Databases: Designed from scratch
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classified into:
1 New Databases: Designed from scratch
2 New MySQL Storage Engines: Keep MySQL as interface but replace
the storage engine
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classified into:
1 New Databases: Designed from scratch
2 New MySQL Storage Engines: Keep MySQL as interface but replace
the storage engine
3 Transparent Clustering: Add pluggable features to existing databases
to ensure scalability
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
A set of processing nodes receives queries and pulls in required data
from the central node
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
A set of processing nodes receives queries and pulls in required data
from the central node
Examples include VMware’s SQLFire
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
References
1 NoSQL Databases: https:
//oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf
2 NewSQL – The New Way to Handle Big Data: http://www.
linuxforu.com/2012/01/newsql-handle-big-data/
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27

More Related Content

What's hot

آموزش ساختمان داده ها - بخش ششم
آموزش ساختمان داده ها - بخش ششمآموزش ساختمان داده ها - بخش ششم
آموزش ساختمان داده ها - بخش ششمfaradars
 
MBA- MARK SHEET 1
MBA- MARK SHEET 1MBA- MARK SHEET 1
MBA- MARK SHEET 1nigusse ali
 
Newspaper article - The Stress of Success
Newspaper article - The Stress of SuccessNewspaper article - The Stress of Success
Newspaper article - The Stress of SuccessJeff Wirick
 
Zs 1080 teks viler - zenda (scanturion & emeri)(5 mb)
Zs 1080   teks viler - zenda (scanturion & emeri)(5 mb)Zs 1080   teks viler - zenda (scanturion & emeri)(5 mb)
Zs 1080 teks viler - zenda (scanturion & emeri)(5 mb)zoran radovic
 
Rav Pov - NN - LUD 039
Rav Pov - NN - LUD 039Rav Pov - NN - LUD 039
Rav Pov - NN - LUD 039Stripovi Klub
 
068 - Teks Viler - KRVAVA LAGUNA.PDF
068 - Teks Viler - KRVAVA LAGUNA.PDF068 - Teks Viler - KRVAVA LAGUNA.PDF
068 - Teks Viler - KRVAVA LAGUNA.PDFzoran radovic
 
Zagor Zavrsnica partije (L 286).pdf
Zagor Zavrsnica partije (L 286).pdfZagor Zavrsnica partije (L 286).pdf
Zagor Zavrsnica partije (L 286).pdfStripovizijacom
 
Zagor LIB KB 049 - Kaznena ekspedicija
Zagor LIB KB 049 - Kaznena ekspedicijaZagor LIB KB 049 - Kaznena ekspedicija
Zagor LIB KB 049 - Kaznena ekspedicijaStripovizijacom
 
Zs 0007. Jedan Protiv Dadeset
Zs 0007. Jedan Protiv DadesetZs 0007. Jedan Protiv Dadeset
Zs 0007. Jedan Protiv DadesetTompa *
 
Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...
Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...
Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...StripovizijaStripovi
 
005. Daj! Daj! Daj!
005. Daj! Daj! Daj!005. Daj! Daj! Daj!
005. Daj! Daj! Daj!Tompa *
 

What's hot (15)

آموزش ساختمان داده ها - بخش ششم
آموزش ساختمان داده ها - بخش ششمآموزش ساختمان داده ها - بخش ششم
آموزش ساختمان داده ها - بخش ششم
 
MBA- MARK SHEET 1
MBA- MARK SHEET 1MBA- MARK SHEET 1
MBA- MARK SHEET 1
 
Newspaper article - The Stress of Success
Newspaper article - The Stress of SuccessNewspaper article - The Stress of Success
Newspaper article - The Stress of Success
 
Zs 1080 teks viler - zenda (scanturion & emeri)(5 mb)
Zs 1080   teks viler - zenda (scanturion & emeri)(5 mb)Zs 1080   teks viler - zenda (scanturion & emeri)(5 mb)
Zs 1080 teks viler - zenda (scanturion & emeri)(5 mb)
 
Rav Pov - NN - LUD 039
Rav Pov - NN - LUD 039Rav Pov - NN - LUD 039
Rav Pov - NN - LUD 039
 
068 - Teks Viler - KRVAVA LAGUNA.PDF
068 - Teks Viler - KRVAVA LAGUNA.PDF068 - Teks Viler - KRVAVA LAGUNA.PDF
068 - Teks Viler - KRVAVA LAGUNA.PDF
 
Zagor Zavrsnica partije (L 286).pdf
Zagor Zavrsnica partije (L 286).pdfZagor Zavrsnica partije (L 286).pdf
Zagor Zavrsnica partije (L 286).pdf
 
Zagor LIB KB 049 - Kaznena ekspedicija
Zagor LIB KB 049 - Kaznena ekspedicijaZagor LIB KB 049 - Kaznena ekspedicija
Zagor LIB KB 049 - Kaznena ekspedicija
 
M Sc marksheet
M Sc marksheetM Sc marksheet
M Sc marksheet
 
Zs 0007. Jedan Protiv Dadeset
Zs 0007. Jedan Protiv DadesetZs 0007. Jedan Protiv Dadeset
Zs 0007. Jedan Protiv Dadeset
 
Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...
Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...
Marti Misterija Libellus Almanah 006 - Docteur Mystere i narod tame - Docteur...
 
005. Daj! Daj! Daj!
005. Daj! Daj! Daj!005. Daj! Daj! Daj!
005. Daj! Daj! Daj!
 
Quran with Tajwid Surah 62 ﴾القرآن سورۃ الجمعة﴿ Al-Jumu'a 🙪 PDF
Quran with Tajwid Surah 62 ﴾القرآن سورۃ الجمعة﴿ Al-Jumu'a 🙪 PDFQuran with Tajwid Surah 62 ﴾القرآن سورۃ الجمعة﴿ Al-Jumu'a 🙪 PDF
Quran with Tajwid Surah 62 ﴾القرآن سورۃ الجمعة﴿ Al-Jumu'a 🙪 PDF
 
Handmade Paper
Handmade PaperHandmade Paper
Handmade Paper
 
Hizb 49
Hizb 49Hizb 49
Hizb 49
 

Similar to Topic 10: Taxonomy of Data and Storage

Database Management System
Database Management SystemDatabase Management System
Database Management SystemRHIMRJ Journal
 
Ch # 09 database management system
Ch # 09 database management systemCh # 09 database management system
Ch # 09 database management systemMuhammadRobeel3
 
Database management system
Database management systemDatabase management system
Database management systemSayed Ahmed
 
Database management system
Database management systemDatabase management system
Database management systemSayed Ahmed
 
Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaApex
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databasessharing notes123
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesSharing Slides Training
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesSharing Slides Training
 
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...Bahria University Islamabad, Pakistan
 

Similar to Topic 10: Taxonomy of Data and Storage (20)

Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Ch # 09 database management system
Ch # 09 database management systemCh # 09 database management system
Ch # 09 database management system
 
Database management system
Database management systemDatabase management system
Database management system
 
Database management system
Database management systemDatabase management system
Database management system
 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
 
Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)Sunita
 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook.pdf
Database systems handbook.pdfDatabase systems handbook.pdf
Database systems handbook.pdf
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
 
Databasell
DatabasellDatabasell
Databasell
 
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
 

More from Zubair Nabi

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationZubair Nabi
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: VirtualizationZubair Nabi
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversZubair Nabi
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tablesZubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System callsZubair Nabi
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldZubair Nabi
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanZubair Nabi
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS HybridsZubair Nabi
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application ScriptingZubair Nabi
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingZubair Nabi
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationZubair Nabi
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud StacksZubair Nabi
 

More from Zubair Nabi (20)

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Topic 10: Taxonomy of Data and Storage

  • 1. 10: Taxonomy of Data and Storage Zubair Nabi zubair.nabi@itu.edu.pk April 20, 2013 Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27
  • 2. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27
  • 3. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27
  • 4. Introduction Data is everywhere and is the driving force behind our lives Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 5. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 6. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 7. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 8. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 9. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Datasets can easily be classified on the basis of their structure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 10. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Datasets can easily be classified on the basis of their structure 1 Structured 2 Unstructured 3 Semi-structured Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 11. Structured Data Formatted in a universally understandable and identifiable way Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 12. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 13. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 14. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 15. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Each field also has an associated type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 16. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Each field also has an associated type Possible to search for items based on their data types Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 17. Unstructured Data Data without any conceptual definition or type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 18. Unstructured Data Data without any conceptual definition or type Can vary from raw text to binary data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 19. Unstructured Data Data without any conceptual definition or type Can vary from raw text to binary data Processing unstructured data requires parsing and tagging on the fly Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 20. Unstructured Data Data without any conceptual definition or type Can vary from raw text to binary data Processing unstructured data requires parsing and tagging on the fly In most cases, consists of simple log files Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 21. Semi-structured Data Occupies the space between the structured and unstructured data spectrum Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 22. Semi-structured Data Occupies the space between the structured and unstructured data spectrum For instance, while binary data has no structure, audio and video files have meta-data which has structure, such as author, time of creation, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 23. Semi-structured Data Occupies the space between the structured and unstructured data spectrum For instance, while binary data has no structure, audio and video files have meta-data which has structure, such as author, time of creation, etc. Can also be labelled as self-describing structure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 24. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27
  • 25. Database Management Systems (DBMS) Used to store and manage data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 26. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 27. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 28. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Security is useful too; to enable fine-grained access control Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 29. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Security is useful too; to enable fine-grained access control Ability to keep working in the face of failure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 30. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 31. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 32. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 33. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row The same key field is used to connect one table to another Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 34. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row The same key field is used to connect one table to another For instance, a relation might have customer ID as key and her details as data; another table might have the same key but different data, say her purchases; yet another table with the same key might have a breakdown of her preferences Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 35. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row The same key field is used to connect one table to another For instance, a relation might have customer ID as key and her details as data; another table might have the same key but different data, say her purchases; yet another table with the same key might have a breakdown of her preferences Examples include Oracle Database, MS SQL Server, MySQL, IBM DB2, and Teradata Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 36. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 37. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 38. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 39. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Instructions consist of a specific SQL statement and additional parameters and operands Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 40. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Instructions consist of a specific SQL statement and additional parameters and operands For instance, the SELECT operator retrieves certain records, INSERT adds a record, and so on Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 41. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 42. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 43. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 44. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 45. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it For instance, adding a new attribute to an existing row necessitates adding a new column to the entire table Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 46. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it For instance, adding a new attribute to an existing row necessitates adding a new column to the entire table Extremely suboptimal in tables with millions of rows Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 47. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 48. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 49. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 50. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 51. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the fly Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 52. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the fly RDBMS would require the creation of a new table each time such a change takes place Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 53. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the fly RDBMS would require the creation of a new table each time such a change takes place Therefore, unstructured and semi-structured data does not fit the relational model Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 54. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27
  • 55. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 56. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 57. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 58. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 59. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 60. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacrifice consistency leading to eventual consistency Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 61. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacrifice consistency leading to eventual consistency This basically available, soft state, eventually consistent (BASE) model enables applications to function even in the face of partial failure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 62. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacrifice consistency leading to eventual consistency This basically available, soft state, eventually consistent (BASE) model enables applications to function even in the face of partial failure High Throughput: Most NoSQL databases sacrifice consistency for availability leading to higher throughput (in some cases an order of magnitude) Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 63. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 64. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Commodity Hardware: A large number of RDBMS require specialized and proprietary hardware for operation. In contrast, NoSQL databases function over commodity off-the-shelf hardware Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 65. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Commodity Hardware: A large number of RDBMS require specialized and proprietary hardware for operation. In contrast, NoSQL databases function over commodity off-the-shelf hardware Programming Language Support: Over the years programming languages have started providing abstractions for database support (LINQ, etc.) while bypassing SQL. NoSQL databases provide abstractions that directly map onto the language abstractions leading to tighter coupling Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 66. Motivation (3) The Rise of Cloud Computing: Cloud Computing applications require horizontal scalability and low administration overhead. Both requirements are naturally satisfied by NoSQL stores Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27
  • 67. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27
  • 68. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 69. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 70. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is 3 Query Model: What type of API it exposes Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 71. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is 3 Query Model: What type of API it exposes 4 Persistence: How persistent the data is Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 72. Classification by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 73. Classification by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key 2 Document Stores: Complex data structures to encapsulate document key/value pairs Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 74. Classification by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key 2 Document Stores: Complex data structures to encapsulate document key/value pairs 3 Column-Oriented Stores: Data laid out by column Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 75. Key/value Stores Data is stored within a large hash map Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 76. Key/value Stores Data is stored within a large hash map Simple get/put API Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 77. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 78. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Limit on the size of the key Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 79. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Limit on the size of the key Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis, and Memcached Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 80. Document Stores Key/value semantics but based on documents Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 81. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 82. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 83. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Documents can also be retrieved based on their content Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 84. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Documents can also be retrieved based on their content Examples include Apache CouchDB and MongoDB Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 85. Column-Oriented Stores Data is stored and processed by column Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 86. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 87. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efficient compression Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 88. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efficient compression Columns are stored separately so they can be loaded in parallel Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 89. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efficient compression Columns are stored separately so they can be loaded in parallel Examples include Google’s BigTable (Apache HBase is its open source clone) and Facebook’s Cassandra Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 90. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27
  • 91. Introduction A hybrid of traditional RDBMS and NoSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 92. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 93. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 94. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 95. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classified into: 1 New Databases: Designed from scratch Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 96. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classified into: 1 New Databases: Designed from scratch 2 New MySQL Storage Engines: Keep MySQL as interface but replace the storage engine Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 97. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classified into: 1 New Databases: Designed from scratch 2 New MySQL Storage Engines: Keep MySQL as interface but replace the storage engine 3 Transparent Clustering: Add pluggable features to existing databases to ensure scalability Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 98. New Databases 1 Query Distribution: Each node holds a subset of the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 99. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 100. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 101. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 102. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data A set of processing nodes receives queries and pulls in required data from the central node Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 103. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data A set of processing nodes receives queries and pulls in required data from the central node Examples include VMware’s SQLFire Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 104. References 1 NoSQL Databases: https: //oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf 2 NewSQL – The New Way to Handle Big Data: http://www. linuxforu.com/2012/01/newsql-handle-big-data/ Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27