This presentation is from a recorded webinar with 451 Research analyst and thought leader Matt Aslett for a discussion about the growing importance of the right data management best practices and techniques for delivering on the promise of big data in the enterprise. Matt reviews the big data landscape, how the data lake complements and competes with the data warehouse, and key takeaways as you move from big data test and development environments to production. You can watch the webinar here: http://bit.ly/25ShiQu
2. 2
Today’s Featured Presenter
Matt Aslett
Research Director,
Data Platforms and Analytics
451 Research
As Research Director, Matt has overall responsibility for the data platforms and
analytics research coverage, which includes operational and analytic databases,
Hadoop, grid/cache, stream processing, search-based data platforms, data
integration, data quality, data management, analytics, and advanced analytics.
Matt's own primary area of focus includes data management, reporting and
analytics, and exploring how the various data platform and analytics technology
sectors are converging in the form of next-generation data platform
3. 33
Agenda
• Big Data Management
– Matt Aslett, 451 Research
• SnapLogic Overview
• SnapLogic Demonstration
– Ravi Dharnikota, Head of SnapLogic Enterprise Architecture
• Q&A
4. Copyright (C) 2016 451 Research LLC
Big Data Management
Matt Aslett, Research Director
5. Copyright (C) 2016 451 Research LLC
451 Research is a leading IT research & advisory company
5
Founded in 2000
250+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
50,000+ IT professionals, business users and consumers in our research
community
Over 52 million data points published each quarter and 4,500+ reports
published each year
2,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions
of The 451 Group
Headquartered in New York City, with offices in London, Boston, San
Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia,
Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
6. Copyright (C) 2016 451 Research LLC
Big data and beyond
• V is for various things…
but does not define big data
3
7. Copyright (C) 2016 451 Research LLC
Big data and beyond
• V is for various things…
but does not define big data
• To understand the trends driving
‘big data’ 451 Research focused
beyond the nature of the data on
what enterprises wanted to do
with it
4
8. Copyright (C) 2016 451 Research LLC
Big data and beyond
8
• V is for various things…
but does not define big data
• To understand the trends driving
‘big data’ 451 Research focused
beyond the nature of the data on
what enterprises wanted to do
with it
• Totality – storing and processing all data (or as much as is economically viable)
• Exploration – schema-free approaches to analyzing data to identify new patterns
• Frequency – more frequent analysis of data to enable real-time decision making
9. Copyright (C) 2016 451 Research LLC
‘Big data’ is primarily driven by economics, not data
6
• ‘Big Data’ is the realization of competitive advantage based on the fact that it is now
more economically feasible to store and process data that was previously ignored due
to the cost and functional limitations of traditional data management technologies to
handle its volume, velocity and variety
10. Copyright (C) 2016 451 Research LLC
‘Big data’ is primarily driven by economics, not data
6
“Big data is what happened when the cost of keeping information became less than the cost of throwing
it away.”
George Dyson
• ‘Big Data’ is the realization of competitive advantage based on the fact that it is now
more economically feasible to store and process data that was previously ignored due
to the cost and functional limitations of traditional data management technologies to
handle its volume, velocity and variety
11. Copyright (C) 2016 451 Research LLC
‘Big data’ is primarily driven by economics, not data
7
“Big data is what happened when the cost of keeping information became less than the cost of throwing
it away.”
George Dyson
• ‘Big Data’ is the realization of competitive advantage based on the fact that it is now
more economically feasible to store and process data that was previously ignored due
to the cost and functional limitations of traditional data management technologies to
handle its volume, velocity and variety
• Moved from storing 1% of data for 60 days in EDW @ $100,000/TB
• To 100% of data for a year in Hadoop @ $900/TB
12. Copyright (C) 2016 451 Research LLC
Source: 451 Research, Total Data Analytics 2016
The evolution of enterprise analytics
12
REPORTING
- What happened
ANALYSIS
- Why did it happen?
PRESCRIPTIVE
- Influence what happens
STATISTICAL
MODELING
MACHINE
LEARNING
DESCRIPTIVE
- What is happening?
PREDICTIVE
- What will happen?
Complexity
AutomatedUser-drivenIT-driven
VISUALIZATION
13. Copyright (C) 2016 451 Research LLC
Data sources:
Multi-structured
RDBMS,
Hadoop, NoSQL,
stream processing,
historical and real-time
Source: 451 Research, Total Data Analytics 2016
Data sources:
Structured,
RDBMS,
historical
The evolution of enterprise analytics
13
REPORTING
- What happened
ANALYSIS
- Why did it happen?
PRESCRIPTIVE
- Influence what happens
STATISTICAL
MODELING
MACHINE
LEARNING
DESCRIPTIVE
- What is happening?
PREDICTIVE
- What will happen?
Complexity
AutomatedUser-drivenIT-driven
VISUALIZATION
14. Copyright (C) 2016 451 Research LLC
EDW vs Hadoop (Schema-on-write vs schema-on-read)
14
Source: https://www.flickr.com/photos/wbaiv/16510090506/ Source: https://www.flickr.com/photos/notbrucelee/5696238930/
16. Copyright (C) 2016 451 Research LLC
Schema-on-read
16
Source: https://www.flickr.com/photos/notbrucelee/5696238930/
• Flexible
• Reusable
• Some imagination required*
• Multi-purpose
• *Instructions available if desired
17. Copyright (C) 2016 451 Research LLC
Hadoop-based data lakes
• The concept of the data lake
has taken off in recent years,
with the Apache Hadoop
data-processing framework
serving as the unified
repository into which raw
data is landed from multiple
sources and made available
to multiple users for multiple
purposes.
17
Photo: Myrabella / Wikimedia Commons, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=11263585
18. Copyright (C) 2016 451 Research LLC
Hadoop-based data lakes
• The concept of the data lake
has taken off in recent years,
with the Apache Hadoop
data-processing framework
serving as the unified
repository into which raw
data is landed from multiple
sources and made available
to multiple users for multiple
purposes.
• Beware the data swamp
18
https://www.flickr.com/photos/lofink/4501610335/
19. Copyright (C) 2016 451 Research LLC
Data governance, data preparation and the data lake
• Data needs to be filtered, processed, treated
and managed to make it suitable for multiple
analytics use cases.
• Data governance
• Data catalog
• Data security
• Data lineage
• Data preparation
• Data discovery
• Data cleansing
• Data harmonization
19
• Data inventory
• Data quality
• Data pipelines
• Data enrichment
• Data matching
• Collaboration
20. Copyright (C) 2016 451 Research LLC
Data governance, data preparation and the data lake
20
DATA-AS-A-SERVICE
PARTNERS
SUPPLIERS
SELF-SERVICE
DATA PREPARATION
IT
DATA LAKE
APPLICATIONS
DATA GOVERNANCE
Data lineage Data inventory
Data catalog
Data security Data quality
Data pipelines
DATA STEWARDS
Data cleansing
Data harmonization
Data discovery
Collaboration
Data matching
Data enrichment
ADVANCED ANALYTICS
DATA SCIENTISTS
SELF-SERVICE ANALYTICS
SENIOR EXECUTIVES BUSINESS ANALYSTS DATA ANALYSTS
22. Copyright (C) 2016 451 Research LLC
Recommendations
22
• Enterprises should seriously consider the data governance and management requirements before
embarking on data lake projects to ensure that the functionality is available to turn the concept into
reality.
• For flexibility and agility, employ data management approaches and technologies that abstract data
processing pipelines from the execution environment.
• Look for data integration and transformation technologies that execute natively, taking advantage of
the underlying engine (e.g. Spark, YARN).
• Seek out data management and integration technologies that enable consumption and
transformation of large volumes of structured and unstructured data.
24. SnapLogic Elastic Integration
Accelerate Your Integration. Accelerate Your Business
“We can do more in two hours with SnapLogic than we could in two days with traditional solutions.”
25. 25
CSV
Big Data and hybrid cloud environments are making
yesterday’s approaches to integration obsolete
26. 26
Anything
apps | data | APIs | things
SnapLogic: Unified Platform for Data and Application Integration
Anytime
batch | streaming | real-time
Anywhere
on prem | cloud | hybrid
27. 2727
SnapLogic in the Modern Data Fabric: Ingest, Transform, Deliver
ConsumeStore&ProcessSource
z z z z
HANA
Data Warehouses &
Data Marts
Big Data and Data
Lakes
INGEST INGEST
Data Integration and
Transformation
On Prem
Applications
Relational
Databases
Cloud
Applications
NoSQL
Databases
Web
Logs
Internet of
Things
DELIVER DELIVER
28. 28
Modern Architecture: Hybrid and Elastic Execution
Streams: No data is
stored/cached
Secure: 100%
standards-based
Elastic: Scales out &
handles data and app
integration use cases
Metadata
Data
Databases
On Prem
Apps
Big Data
Cloud Apps
and DataCloud-Based Designer, Manager,
Dashboard
Execution
Execution
Execution
Firewall
SnapLogic “respects data’s gravity.”
31. 31
Integrate at the speed of
modern business
+1 888-494-1570
sales@snaplogic.com
@SnapLogic
www.snaplogic.com
Editor's Notes
Cast your mind back to 2010/11 – everyone is trying to define ‘big data’ with words beginning with V. 451 Research took a different tack
Cast your mind back to 2010/11 – everyone is trying to define ‘big data’ with words beginning with V. 451 Research took a different tack
Cast your mind back to 2010/11 – everyone is trying to define ‘big data’ with words beginning with V. 451 Research took a different tack
Connecting applications or data from multiple sources is not new – ESB, SOA, ETL have been around for a long time. But the old ways are not keeping up with today’s realities…
Leading enterprises choose SnapLogic because we help them connect data and applications faster.
We connect anything: sources including applications, APIs, things, or data
We connect anytime: in batches, streaming, or in real time
And we connect anywhere: on premises, in the cloud or a combination of both
Here is an example of a SnapLogic deployment.
The SnapLogic control plane – including he Designer, Manager and Dashboard - does not store your data. It’s metadata only.
Once a pipeline is executed, it looks for the associated Snaplex or Hadooplex. The plex dynamically scales out, adding more nodes as needed.
We like to say that SnapLogic “respects data gravity” and runs as close to the data as need be. If you are integrating only cloud applications, it would make no sense to run your integrations behind the firewall. Similarly, if you’re doing ground to ground or cloud to ground, you may want to run your Snaplex on Window or Linux servers.
Note that the dotted line is sending instructions via metadata to the plex, which is waiting to run. The solid line indicates how data movies bi-directionally between systems.
Leading enterprises choose SnapLogic because we help them connect data and applications faster.
We connect anything: sources including applications, APIs, things, or data
We connect anytime: in batches, streaming, or in real time
And we connect anywhere: on premises, in the cloud or a combination of both