Big Data Management: What's New, What's Different, and What You Need To Know

1
Big Data Management:
What’s New, What’s Different
and What You Need to Know

2
Today’s Featured Presenter
Matt Aslett
Research Director,
Data Platforms and Analytics
451 Research
As Research Director, Matt has overall responsibility for the data platforms and
analytics research coverage, which includes operational and analytic databases,
Hadoop, grid/cache, stream processing, search-based data platforms, data
integration, data quality, data management, analytics, and advanced analytics.
Matt's own primary area of focus includes data management, reporting and
analytics, and exploring how the various data platform and analytics technology
sectors are converging in the form of next-generation data platform

33
Agenda
• Big Data Management
– Matt Aslett, 451 Research
• SnapLogic Overview
• SnapLogic Demonstration
– Ravi Dharnikota, Head of SnapLogic Enterprise Architecture
• Q&A

Copyright (C) 2016 451 Research LLC
Big Data Management
Matt Aslett, Research Director

451 Research is a leading IT research & advisory company
5
Founded in 2000
250+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
50,000+ IT professionals, business users and consumers in our research
community
Over 52 million data points published each quarter and 4,500+ reports
published each year
2,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions
of The 451 Group
Headquartered in New York City, with offices in London, Boston, San
Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia,
Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market

Big data and beyond
• V is for various things…
but does not define big data
3

Big data and beyond
• To understand the trends driving
‘big data’ 451 Research focused
beyond the nature of the data on
what enterprises wanted to do
with it
4

Big data and beyond
8
• To understand the trends driving
‘big data’ 451 Research focused
beyond the nature of the data on
what enterprises wanted to do
with it
• Totality – storing and processing all data (or as much as is economically viable)
• Exploration – schema-free approaches to analyzing data to identify new patterns
• Frequency – more frequent analysis of data to enable real-time decision making

‘Big data’ is primarily driven by economics, not data
6
• ‘Big Data’ is the realization of competitive advantage based on the fact that it is now
more economically feasible to store and process data that was previously ignored due
to the cost and functional limitations of traditional data management technologies to
handle its volume, velocity and variety

6
“Big data is what happened when the cost of keeping information became less than the cost of throwing
it away.”
George Dyson

7
“Big data is what happened when the cost of keeping information became less than the cost of throwing
it away.”
George Dyson
• Moved from storing 1% of data for 60 days in EDW @ $100,000/TB
• To 100% of data for a year in Hadoop @ $900/TB

Source: 451 Research, Total Data Analytics 2016
The evolution of enterprise analytics
12
REPORTING
- What happened
ANALYSIS
- Why did it happen?
PRESCRIPTIVE
- Influence what happens
STATISTICAL
MODELING
MACHINE
LEARNING
DESCRIPTIVE
- What is happening?
PREDICTIVE
- What will happen?
Complexity
AutomatedUser-drivenIT-driven
VISUALIZATION

Data sources:
Multi-structured
RDBMS,
Hadoop, NoSQL,
stream processing,
historical and real-time
Source: 451 Research, Total Data Analytics 2016
Data sources:
Structured,
RDBMS,
historical
The evolution of enterprise analytics
13
REPORTING
- What happened
ANALYSIS
- Why did it happen?
PRESCRIPTIVE
- Influence what happens
STATISTICAL
MODELING
MACHINE
LEARNING
DESCRIPTIVE
- What is happening?
PREDICTIVE
- What will happen?
Complexity
AutomatedUser-drivenIT-driven
VISUALIZATION

EDW vs Hadoop (Schema-on-write vs schema-on-read)
14
Source: https://www.flickr.com/photos/wbaiv/16510090506/ Source: https://www.flickr.com/photos/notbrucelee/5696238930/

Schema-on-write
15
Source: https://www.flickr.com/photos/wbaiv/16510090506/
• Pre-prepared
• Single-purpose
• Some assembly required
• Inflexible

Schema-on-read
16
Source: https://www.flickr.com/photos/notbrucelee/5696238930/
• Flexible
• Reusable
• Some imagination required*
• Multi-purpose
• *Instructions available if desired

Hadoop-based data lakes
• The concept of the data lake
has taken off in recent years,
with the Apache Hadoop
data-processing framework
serving as the unified
repository into which raw
data is landed from multiple
sources and made available
to multiple users for multiple
purposes.
17
Photo: Myrabella / Wikimedia Commons, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=11263585

Hadoop-based data lakes
• The concept of the data lake
has taken off in recent years,
with the Apache Hadoop
data-processing framework
serving as the unified
repository into which raw
data is landed from multiple
sources and made available
to multiple users for multiple
purposes.
• Beware the data swamp
18
https://www.flickr.com/photos/lofink/4501610335/

Data governance, data preparation and the data lake
• Data needs to be filtered, processed, treated
and managed to make it suitable for multiple
analytics use cases.
• Data governance
• Data catalog
• Data security
• Data lineage
• Data preparation
• Data discovery
• Data cleansing
• Data harmonization
19
• Data inventory
• Data quality
• Data pipelines
• Data enrichment
• Data matching
• Collaboration

Data governance, data preparation and the data lake
20
DATA-AS-A-SERVICE
PARTNERS
SUPPLIERS
SELF-SERVICE
DATA PREPARATION
IT
DATA LAKE
APPLICATIONS
DATA GOVERNANCE
Data lineage Data inventory
Data catalog
Data security Data quality
Data pipelines
DATA STEWARDS
Data cleansing
Data harmonization
Data discovery
Collaboration
Data matching
Data enrichment
ADVANCED ANALYTICS
DATA SCIENTISTS
SELF-SERVICE ANALYTICS
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA ANALYSTS

Hadoop and other animals
21

Recommendations
22
• Enterprises should seriously consider the data governance and management requirements before
embarking on data lake projects to ensure that the functionality is available to turn the concept into
reality.
• For flexibility and agility, employ data management approaches and technologies that abstract data
processing pipelines from the execution environment.
• Look for data integration and transformation technologies that execute natively, taking advantage of
the underlying engine (e.g. Spark, YARN).
• Seek out data management and integration technologies that enable consumption and
transformation of large volumes of structured and unstructured data.

Thank You!
matthew.aslett@451research.com
@maslett
www.451research.com

SnapLogic Elastic Integration
Accelerate Your Integration. Accelerate Your Business
“We can do more in two hours with SnapLogic than we could in two days with traditional solutions.”

25
CSV
Big Data and hybrid cloud environments are making
yesterday’s approaches to integration obsolete

2727
SnapLogic in the Modern Data Fabric: Ingest, Transform, Deliver
ConsumeStore&ProcessSource
z z z z
HANA
Data Warehouses &
Data Marts
Big Data and Data
Lakes
INGEST INGEST
Data Integration and
Transformation
On Prem
Applications
Relational
Databases
Cloud
Applications
NoSQL
Databases
Web
Logs
Internet of
Things
DELIVER DELIVER

28
Modern Architecture: Hybrid and Elastic Execution
Streams: No data is
stored/cached
Secure: 100%
standards-based
Elastic: Scales out &
handles data and app
integration use cases
Metadata
Data
Databases
On Prem
Apps
Big Data
Cloud Apps
and DataCloud-Based Designer, Manager,
Dashboard
Execution
Execution
Execution
Firewall
SnapLogic “respects data’s gravity.”

30
Discussion
Matt Aslett
Research Director,
Data Platforms and Analytics
451 Research
Ravi Dharnikota
Head of Enterprise Architecture
SnapLogic

31
Integrate at the speed of
modern business
+1 888-494-1570
sales@snaplogic.com
@SnapLogic
www.snaplogic.com

Big Data Management: What's New, What's Different, and What You Need To Know

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data Management: What's New, What's Different, and What You Need To Know

Similar to Big Data Management: What's New, What's Different, and What You Need To Know (20)

More from SnapLogic

More from SnapLogic (20)

Recently uploaded

Recently uploaded (20)

Big Data Management: What's New, What's Different, and What You Need To Know

Editor's Notes