AZURE Data Related Services

AZURE DATA RELATED
SERVICES
Azure Data services overview, scale options

WHAT IS MICROSOFT AZURE ?
Your app. Your framework. Your platform. All welcome.
Microsoft Azure is a rapidly growing collection of integrated cloud services—analytics,
computing, database, mobile, networking, storage, and web—for moving faster,
achieving more, and saving money.

WHY PUBLIC CLOUD?
Cost savings:
• Lower TCO
• Pay for usage, avoid over provisioning/capacity
Scalability:
• Rapid expansion – Local & Global
• DR (no need to pay for what can possibly not happen)
Flexibility:
• Change HW configuration on the fly or at least reboot
• Adapt platform to baseline dynamically
• Easily integrate systems in cloud
Training:
• Setup a lab instantly
• Try new features/technology
https://azure.microsoft.com/en-us/documentation/ - list of all services

AZURE SQL DATABASE
Azure SQL Database – Managed relational SQL DB PaaS
• SQL Server database in the cloud
• Usual management tools can be used: SSMS, Visual Studio
• Fully compatible with Azure services (Data, Storage, Web)
• Easy scalable - single DB and elastic pools
• SLA at least 99.99%
• learns, adapts, and grows with your application – Database Advisor, auto tuning(since V12 version)
• Threat detection and alerts – audit
• Security, encryption, compliance – all can be met

SQL DATABASE OPTIONS AND
PERFORMANCE TIERS
Basic
• Best suited for a small database, supporting typically one single active operation at a given time.
Examples include databases used for development or testing, or small-scale infrequently used
applications.
Standard
• The go-to option for most cloud applications, supporting multiple concurrent queries. Examples
include workgroup or web applications.
Premium
• Designed for high transactional volume, supporting a large number of concurrent users and
requiring the highest level of business continuity capabilities. Examples are databases
supporting mission critical applications.

UNDERSTANDING DTU
The Database Transaction Unit (DTU) is the unit of measure in SQL Database that represents
the relative power of databases based on a real-world measure: the database transaction.
This is a set of operations that are typical for an online transaction processing (OLTP) request,
and then measured how many transactions could be completed per second under fully
loaded conditions (that’s the short version, details in the Benchmark overview)

PERFORMANCE BENCHMARKS FOR DTU
Benchmarks overview:
https://azure.microsoft.com/en-us/documentation/articles/sql-database-benchmark-
overview/
• Read Lite [35%] - SELECT; in-memory; read-only
• Read Medium [20%] - SELECT; mostly in-memory; read-only
• Read Heavy [5%] - SELECT; mostly not in-memory; read-only
• Update Lite [20%] - UPDATE; in-memory; read-write
• Update Heavy [3%] - UPDATE; mostly not in-memory; read-write
• Insert Lite [3%] - INSERT; in-memory; read-write
• Insert Heavy [2%] - INSERT; mostly not in-memory; read-write
• Delete [2%] - DELETE; mix of in-memory and not in-memory; read-write
• CPU Heavy [10%] - SELECT; in-memory; relatively heavy CPU load; read-only

PERFORMANCE BENCHMARKS FOR DTU
Tiers requirements

SQL DATABASE PERFORMANCE TIERS

GEO REPLICATION
Standard Geo-Replication - will be retired in April, 2017
• Standard geo-replication creates an offline secondary database in a pre-paired Azure region
within the same geographic area that is at least 500 miles away.
• Secondary standard geo-replication databases are priced at 0.75x of primary database prices
Active Geo-Replication
• Active geo-replication creates up to 4 online (readable) secondaries in any Azure region
• Secondary active geo-replication databases are priced at 1x of primary database prices

SQL DATABASE – ELASTIC POOL
Elastic pool characteristics:
• It is given a set number of eDTUs, for a set price
• Within the pool, individual databases are given the flexibility to auto-scale within set parameters
• Under heavy load a database can consume more eDTUs to meet demand
• Databases under light loads consume less
• Databases under no load don’t consume any eDTUs

ELASTIC POOL AND ELASTIC DB
OPTIONS
https://azure.microsoft.com/en-us/documentation/learning-paths/sql-database-elastic-scale/

SQL DATABASE V12
Increased application compatibility with SQL Server
A key goal for SQL Database V12(Compatibility level 130) was to improve the compatibility with
Microsoft SQL Server 2014, and to maintain the compatibility as new versions of SQL Server are
released. Among other areas, V12 achieves parity with SQL Server in the important area of
programmability.
For example:
• Built-in JSON support
• Window functions, with OVER
• XML indexes and selective XML indexes
• Change tracking
• SELECT...INTO
• Full-text search
• ALTER DATABASE SCOPED CONFIGURATION (Transact-SQL)
Please refer link here for the small set of features not yet supported in Azure SQL Database.

SCALING WITH AZURE SQL DATABASE
Sharding
A technique to distribute large amounts of identically-structured data across a number of
independent databases.
• The total amount of data is too large to fit within the constraints of a single database
• The transaction throughput of the overall workload exceeds the capabilities of a single database
• Tenants may require physical isolation from each other, so separate databases are needed for each tenant
• Different sections of a database may need to reside in different geographies for compliance, performance or
geopolitical reasons.

HORIZONTAL AND VERTICAL SCALING
Scaling options
• Horizontal “scaling out” –
Sharding data is partitioned
across a collection of identically
structured databases. Is
managed using the Elastic
Database client library.
• Vertical scaling is accomplished
using Azure PowerShell cmdlets
to change the service tier, or by
placing databases in an elastic
pool.

Elastic Database tools
1. A set of Azure SQL databases are
hosted on Azure using sharding
architecture.
2. The Elastic Database client library is
used to manage a shard set.
3. A subset of the databases are put into
an Elastic Database pool.
4. An Elastic Database job runs T-SQL
scripts against all databases.
5. The Split-merge tool is used to move
data from one shard to another.
6. The Elastic Database query allows you
to write a query that spans all
databases in the shard set.
7. Elastic transactions allows you to run
transactions that span several
databases.

Shard map manager
The shard map manager is a
special database that maintains
global mapping information
about all shards (databases) in a
shard set
More details:
https://azure.microsoft.com/en-
us/documentation/articles/sql-database-elastic-
scale-shard-map-management/

PRICING
https://azure.microsoft.com/en-us/pricing/details/sql-database/

Azure SQL Server Stretch
Database

SQL Server Stretch Database
Dynamically stretch SQL Server databases to
Azure
• Scale SQL Server 2016 using bottomless cloud
storage
• Make warm and cold data available to users at
low cost
• Access and query stretched data online
• Move data easily—no query or application
changes required
• Use with advanced security features like Always
Encrypted
• Reduce maintenance and storage costs for on-
premises data
AZURE STRETCH DATABASE

AZURE STRETCH DATABASE
https://azure.microsoft.com/en-us/pricing/details/sql-server-stretch-database/ - Pricing

AZURE DOCUMENT DB
Document DB – NoSQL DBaaS (Designed to leverage Programming standards JSON and JS)
In DocumentDB, you can store and query schema-less JSON documents with order-of-millisecond response times
at any scale. DocumentDB provides containers for storing data called collections.
Key features:
• Schema free highly scalable
• Allows T-SQL querying
• Allows JS programming to execute transactional application logic using JS based triggers, UDFs, SPs
• Data always indexed automatically
• Easily integrates with Azure HDInsight, Azure Search and other Azure services

AZURE DOCUMENT DB
RU
• Request Unit (RU) per second is the unit of throughput measurement.
• A single request unit represents the processing capacity required to read a single 1KB document
• When you query against a collection, Azure returns request charge value in portal or through x-ms-request-
charge response header in code. Therefore, you can get some ideas about cost of your queries.
• Many factors are involved in request unit measurement. Things like number of document properties,
indexes, document size and data consistency. Therefore, RU cost differs from application to another
application.

AZURE DOCUMENT DB
Simplified structure

AZURE DATA WAREHOUSE
SQL Data Warehouse
• Petabyte scale with massively parallel processing
• Independent scaling of compute and storage—in seconds
• Transact-SQL queries across relational and non-relational data
• Full enterprise-class SQL Server experience
• Works seamlessly with Power BI, Machine Learning, HDInsight, and Data Factory
• Combines Azure proven SQL Server relational database with Azure cloud scale-out capabilities.
You can increase, decrease, pause, or resume compute in seconds.
MPP architecture spreads data across 60 shared-nothing storage and processing units. The data is
stored in Premium locally redundant storage and linked to compute nodes for query execution.

DWUs
Data Warehouse Unit is a measure of three precise metrics that are highly correlated with data warehousing workload
performance:
• Scan/Aggregation: This workload metric takes a standard data warehousing query that scans a large number of rows and then
performs a complex aggregation. This is a IO and CPU intensive operation.
• Load: This metric measures the ability to ingest data into the service. Loads are completed with PolyBase loading a
representative dataset from an Azure Storage Blob. This metric is designed to stress Network and CPU aspects of the service.
• CREATE TABLE AS SELECT (CTAS): CTAS measures the ability to create copy of a table. This involves reading data from storage,
distributing it across the nodes of the appliance, and writing it to storage again. It is a CPU and Network intensive operation
Pricing
https://azure.microsoft.com/en-us/pricing/details/sql-data-warehouse/

MPP architecture
• Grow or shrink storage independent of compute
• Grow or shrink compute without moving data
• Pause compute capacity while keeping data intact
• Resume compute capacity at a moment's notice
Control node: The Control node manages and optimizes queries. Coordinates all
of the data movement and computation required to run parallel queries on your
distributed data
Compute Nodes: The Compute nodes serve as the power behind SQL Data
Warehouse. They are SQL Databases which store your data and process your
query.
Storage: Data is stored in Azure Storage Blobs. When Compute nodes interact with
data, they write and read directly to and from blob storage. Since Azure storage
expands transparently and limitlessly, SQL Data Warehouse can do the same.
Data Movement Service: Data Movement Service (DMS) is Microsoft technology
for moving data between the nodes. DMS gives the Compute nodes access to data
they need for joins and aggregations. DMS is not an Azure service. It is a Windows
service that runs alongside SQL Database on all the nodes.

AZURE DATA WAREHOUSE LOAD DATA
Load options/utilities:
Load from Azure blob storage
• PolyBase - load in parallel using MPP architecture
• Azure Data Factory - pipeline that uses PolyBase to load data from Azure blob storage into SQL Data Warehouse
Load from SQL Server
• SSIS – does not perform the load in parallel. Not supported datatypes should be converted.
• AzCopy – move flat files to Blob Storage (CLI). Consider if data size is < 10 TB.
• Bcp - If you have a small amount of data you can use bcp to load directly into Azure SQL Data
Warehouse.
• Disk shipping service Import/Export (recommended for > 10 TB data)

AZURE STORAGE TYPES
Blob - For users with large amounts of unstructured object data to store in the cloud
• Good choice for storing documents, media files, backups etc.
Table - Is a key-attribute store, meaning that every value in a table is stored with a typed property name
• Table storage can be used to store flexible datasets, such as user data for web applications, address books, device
information, and any other type of metadata that your service requires
Queues - Provides a reliable messaging solution for asynchronous communication between application components
• Queue storage also supports managing asynchronous tasks and building process workflows
Files - cloud-based SMB file shares
• Applications running in Azure virtual machines or cloud services can mount a file share in the cloud. Data in the share can
be accessed via file sytem I/O APIs in the cloud.
• On-premise applications can call the File storage REST API to access data in a file share.

AZURE REDIS CACHE
Redis Cache – High throughput, low latency data access to build fast and scalable apps
• Advanced key-value store in memory
• Secure, dedicated open source Redis cache, managed by Microsoft
• Helps your application become more responsive even as user load increases
Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and
message broker.
It supports data structures such as strings, hashes, lists, sets, sorted sets with range
queries, bitmaps, hyperlogs and geospatial indexes with radius queries. Redis has built-in replication, Lua
scripting, LRU eviction, transactions and different levels of on-disk persistence.
Provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

AZURE DATA FACTORY
Azure Data Factory - cloud-based data integration service that orchestrates and automates the movement
and transformation of data
Data Factory is priced by the frequency of activities (high or low) and where the activities run (cloud or
on-premises).
A low-frequency activity occurs once a day or less; a high-frequency activity occurs more than once a
day. Charges for copying activities are based on source of data and calculated as per the data movement meters.

AZURE DATA FACTORY
Prising
https://azure.microsoft.com/en-us/pricing/details/data-factory/

AZURE DATA LAKE STORE
Azure Data Lake Store is designed to be an enterprise-wide, hyper scale repository for big data analytic
workloads.
In the data lake, you can easily capture data of any size, type and speed in a single place for the purposes of
operational and exploratory analytics.
• Built for Hadoop: A Hadoop Distributed File System for the Cloud
• Unlimited storage: No fixed limits on file size, account size, or the number of files
• Performance Tuned for Big Data: Optimized for massive throughput to query and analyze any amount of data
• Enterprise Grade Security: Azure Active Directory authentication and role-based access control
• All Data: Store data in its native format without prior transformation

AZURE DATA LAKE ANALYTICS
Azure U-SQL Job

AZURE DATA LAKE ANALYTICS
Prising
https://azure.microsoft.com/en-us/pricing/details/data-lake-analytics/

PRISING CALCULATOR
Discounts areas:
• Startups Offers
• Visual Studio licensed devs
• Prepaid 12-month subscription
https://azure.microsoft.com/en-us/pricing/calculator/

AZURE SQL GOVERNMENT
Azure SQL Government
• The cloud platform designed to meet US government demands
• Physical and logical network-isolated instance of Azure
• Dedicated to US government with all data, applications, and hardware residing in the continental United
States
• Broad range of compliance certifications critical to US government
• US datacenters located more than 500 miles apart, providing true geographic redundancy
• Support for hybrid scenarios, as well as a vast array of services, programming languages, and tools
• Part of the complete Microsoft Cloud for Government solution
Compliant with
• FedRAMP certification
• DISA certification
• Support to enable IRS 1075 compliance
• Have ability to issue HIPAA Business Associate Agreements
• Criminal Justice Information Services (CJIS)–capable Platform

The end
Thanks for listening !!! 

RESOURCES USED
https://www.youtube.com/watch?v=AicqMIPpZKc - Hybrid Cloud Solutions with Microsoft Azure - For Architects (2015)
https://azure.microsoft.com/en-us/documentation/services/sql-database/ - Azure SQL Database documentation
https://www.youtube.com/watch?v=mi-lilKoYok – Elastic Scale Azure Databases (2016)
https://www.youtube.com/watch?v=N2N5TbWmCcU - Azure SQL Database for Business-Critical Cloud Applications (2016)
https://channel9.msdn.com/events/Ignite/Microsoft-Ignite-New-Zealand-2015/M378 - Elastic for SQL – shards, pools, stretch
https://azure.microsoft.com/en-us/documentation/videos/azurecon-2015-overview-of-azure-sql-data-warehouse/ - Overview of
Azure SQL Data Warehouse
https://www.youtube.com/watch?v=mSDz6O0bhyc - Azure Data Lake Deep Dive
Azure Documentation and Videos: https://azure.microsoft.com/en-us/

AZURE Data Related Services

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to AZURE Data Related Services

Similar to AZURE Data Related Services (20)

Recently uploaded

Recently uploaded (20)

AZURE Data Related Services