Rodan Zadeh, Director of Product Management at Attunity talks about how to optimize data for the logical data warehouse for the Cisco Virtual Tradeshow.
Microservices, Docker deploy and Microservices source code in C#
Optimize Data for the Logical Data Warehouse
1. Optimize Data for
the Logical Data
Warehouse
www.attunity.com
Get more value from your data!
2. Enterprise Data Challenges Today
Marketing
Operations
Sales
Exploding
Data
ERP
CRM
POS
Legacy
Logs
Sensor
Files
Data
Warehou
se
Database
Multiple
Platforms
Several
Business Lines
Escalating
Costs
Lack Of
Visibility
Increasing
Complexity
4. The Need for Data Warehouse Optimization
Significant amount of data in data warehouse is unused/dormant
ETL/ELT processes for unused data unnecessarily consuming CPU capacity
Dormant data consuming unnecessary storage capacity
Hot Warm Cold Data
Transformations (ELT) of unused data
Storage capacity for dormant data
4
Transformations /
Data Loads
Analytical
Queries
System
Resources
65%
35%
5. • Single console across platforms
Teradata|Exadata|DB2|Netezza|Hadoop
• Modernize the Data Warehouse with
Hadoop by identifying intensive
workloads and unused data
• Optimize storage by identifying
frequently and infrequently used data
• Improve performance by diagnosing
bottlenecks based on data usage
• Charge-back / Show-back activity and
usage by Business Lines and
Departments
• Track user activity on sensitive data for
audit and compliance
Attunity Visibility
Enterprise Data Usage Analytics
6. Customer Success
• Data Warehouse at 600+ TB
• Data growth 50% every year
• Cost prohibitive, poor performance
With Attunity Visibility:
• Offloaded to Hadoop
• Saved over $15 million in data
warehouse costs over 3 years
• 300 Node Hadoop
• Data Warehouse at 300+ TB
• System at maximum capacity
• No visibility into business use of data
With Attunity Visibility:
• Offloaded to Hadoop
• Saved over $6 million in data
warehouse costs in less than 2 years
• 500 Node Hadoop
Fortune 50 Bank Online Travel Site
7. Data Usage
Unused Databases with
largest number of Tables
by Size. Drill to identify
specific Tables.
• Unused Data (e.g. Tables
with no ‘SELECT’
statements)
70 Terabytes in
Unused Databases
8. • History of data used in
large “Fact” table
• Queries go back only 2 years
• Maintains 8 years of data
Data Usage
14. 14
Attunity Visibility Architecture
EDW Database
Platforms
Purger
Populator
Cataloger
Analyzer
Collector Central
Repository
Visibility Processes
Key Components
Repository – Centrally
stores analyzed queries
& performance metrics
Cataloger - Snapshot
of DW metadata/
schema
Collector - Collects
information from query
logs
Analyzer - Analyzes
and parses data
collected; Builds &
stores a full parse tree
Populator - Aggregates
& moves parsed data
from Target Schema
into Reporting Schema
Purger - Removes old
data from Repository
Web application with
dashboards & analytics
User Activity Data Usage Workload Performance
15. Attunity and Cisco
Solutions for the Logical Data Warehouse
• Create the ROI for the Logical Data Warehouse with Attunity Visibility
• Ingest data to fill and expand Logical Data Warehouse with Attunity Replicate
Start getting more value from your Big Data today!
16. Thank you!
For more information,
send an e-mail to
sales@attunity.com
Or, go to www.attunity.com
Editor's Notes
Today IT is faced with enormous challenges in delivering data to the enterprise.
Data is growing exponentially but IT budgets are staying flat. You cannot continue to invest in infrastructure at the same rate of data growth.
In addition, data is increasing being delivered through multiple or platforms making it very complex to efficiently manage and optimize the environment.
It is also very difficult for IT to prioritize and justify investments without ability to charge-back or show-back utilization (of data and system resources) by business lines.
On the other hand, the Business expects to data in real-time that is at the right place at the right time, and can extract value from the data as quickly as possible.
ELT processes are driving up data warehousing costs.
Our experience in analyzing data usage at large organizations shows that a significant amount of data is not being used – but is continuously loaded on a daily basis.
Dormant data not only is taking up storage capacity, but the bigger impact is the processing capacity in terms of CPU and I/O that is wasted on running ELT on the data warehouse - to load data that the business does not actively use.
Admittedly, in many situations – organizations are required by regulatory reasons to maintain a history of data – even if it is not being used.
So the best approach here to significantly cut data warehousing costs is to : Eliminate batch loads for data that is not used and not needed. More importantly offload the ELT processes for unused data that needs to be maintained – do it all on on Hadoop and actively archive that unused data on Hadoop. This way you can recover all the wasted capacity from your expensive data warehouse systems.