SlideShare a Scribd company logo
1 of 45
Download to read offline
The Shifting
Landscape of Data
Integration
Presented by: William McKnight
“#1 Global Influencer in Data Warehousing” Onalytica
President, McKnight Consulting Group Inc 5000
@williammcknight
www.mcknightcg.com
(214) 514-1444
© All rights reserved. Matillion 2021
A vendor perspective from Matillion
The Shifting Landscape of
Data Integration
© All rights reserved. Matillion 2021
2
Paul Lacey
Sr. Director Product Marketing
Matillion
Speakers
© All rights reserved. Matillion 2021
New Challenges in the Cloud
From 3 Vs to 3 Ds
© All rights reserved. Matillion 2021
© All rights reserved. Matillion 2021
Cloud Data
Challenges
Source: Calming the Storm with Cloud Data Management
- Webinar - Matillion & IDC, April 2021, n=401
© All rights reserved. Matillion 2021
© All rights reserved. Matillion 2021
IDC Sees Shift From the 3 Vs to the 3 Ds
Velocity
Volume
Variety
Cloud Migration
Dynamic
Distributed
Diverse
Source: Calming the Storm with Cloud Data Management
- Webinar - Matillion & IDC, April 2021
© All rights reserved. Matillion 2021
About Matillion
A Modern Solution to Modern Challenges
© All rights reserved. Matillion 2021
Bringing the power of the cloud to
modern data challenges with the
flexibility of cloud-native data
integration
Low-Code,
No-Compromise
7 © All rights reserved. Matillion 2021
© All rights reserved. Matillion 2021
Delivering transformative value
Pay only for what you use:
Achieve rapid returns on cloud
investments that can be paid forward
across the business
Easy to use
Get things done faster in the cloud
with low-code, visual, repeatable
design patterns
Built for the enterprise
Easily scale for massive data volumes
and stay within enterprise security
requirements
Built for the cloud
Leverage the full power of the cloud
with push-down ELT and native cloud
platform integrations
About
Matillion
8
Cloud data platforms we support:
© All rights reserved. Matillion 2021
9
Our
Products
© All rights reserved. Matillion 2021
© All rights reserved. Matillion 2021
More Info:
matillion.com
© All rights reserved. Matillion 2021
Matillion #TeamGreen Matillion #teamgreen
matillion.com
Thank You!
William McKnight
• Dog owner
• Fitness
– Current National Age Group Champion
Deka.Fit and Hyrox Pro
• Piano
Reports
Spreadsheets
Dashboards
Transform
Tools/Scripts
Cubes or
Marts
Enterprise
Analytic Data
Operational
Systems
Data Science
Artificial
Intelligence
Executives
Compliance Audit
IT Staff
Governance Finance
Stewardship
Cloud Stores
Cloud Apps
Process Steward
Data Steward
Data
Owner
Business Terms
Ops Documents
Managers
Users
COBOL
Reports
Data, Processes, People, Privacy, Problems, and
Projects Over Time
Why So Many Data Stores?
4
Performance
• Performance is a critical point of interest when it comes to
selecting an analytics platform
• To measure data warehouse performance, we use similarly
priced specifications across data warehouse competitors.
• Usually when people say they care about performance, it is
the ultimate metric of price/performance
• The realities of creating fair tests can be overwhelming to
many shops, and is a task usually underestimated.
5
Cost Predictability and Transparency
• The cost profile options for cloud databases are straightforward
if you accept the defaults for simple workload or proof-of-
concept (POC) environments
• Initial entry costs and inadequately scoped environments can
artificially lower expectations of the true costs of jumping into a
cloud data warehouse environment.
• For some, you pay for compute resources as a function of time,
but you also choose the hourly rate based on certain enterprise
features you need.
• With some platforms, you pay for bytes processed and the
underlying architecture is unknown. The environment is scaled
automatically without affecting price. There is also a cost-per-
hour flat rate where you would need to calculate how long it
would take to run your queries to completion to predict costs.
• Customers need to analyze current workloads, performance,
and concurrency and project those into realistic pricing in
alternative platforms.
6
Administration
• Overall costs, time, as well as storage and compute resources
are affected by the simplicity of configurability and overall use.
• The platform should have embraced a self-sufficiency model
for its customers and be well into the process of automating
repetitive tasks.
• Easy administration starts with setup that is a simple process of
asking basic information and providing helpful information for
selecting the storage and node configurations.
• The data store should support mission-critical business
applications with minimal downtime.
7
Optimizer
• You may need:
• Conditional parallelism and what the causes are of
variations in the parallelism deployed.
• Dynamic and controllable prioritization of resources for
queries.
• Time and requirements for optimal queries, such as
compiling indexes or updating statistics.
• Workload isolation capabilities.
8
Concurrency Scaling
• If the database has concurrency limitations, designing
around them is difficult at best, and limiting to effective
data usage.
• If the data store automatically scales up to overcome
concurrency limitations, this may be costly if the data
warehouse charges by compute node.
• If the data store charges per user, costs will also increase as
the data warehouse is put to more use in the company.
• The workload may need linear scaling in overall query
workload performance as concurrent users are added.
9
Resource Elasticity
• You may need the ability to scale up and down and
take advantage of the elastic compute and storage
capabilities in the cloud, public or private, without
disruption or delay.
• The more the customer needs to be involved in
resource determination and provisioning, the less
elastic, and less modern, the solution is.
• One thing to watch for in elasticity scaling is keeping
the amount of money spent by the customer under the
customer’s control.
10
Machine Learning
• Today, some data query languages need to be extended to include machine learning, or
firms may find the programming required will be too challenging to keep pace.
• Data stores today may need to weave machine learning into their data processing
workflows
• Vendors must accommodate and extend SQL to include machine learning functions and
algorithms to expand the capabilities of those tools and users.
• If your database does not include machine learning, there are many extra things to be
concerned with.
• Other components will be needed to complete the toolbox and get the job done.
• Ideally, security for machine learning will be the same as database security.
• The data warehouse also needs to be able to operate at scale, beyond sampling
11
Data Storage Format Alternatives
• Cloud object storage is relatively inexpensive making data storage at high scale
affordable.
• On-premises, specialized private cloud storage options such as Pure Flashblades
tend to offer similar data type storage flexibility
• To take full advantage of the elasticity of the cloud without driving up costs, data
warehouse compute and storage need to be scaled separately.
• To take full advantage of the many types of data available, such as Apache ORC,
Apache Parquet, JSON, Apache Avro, etc., modern data warehouses need to be
able to analyze that data without moving or altering it.
• A unified analytics warehouse that supports these various data formats means you
have the benefit of querying them directly, without greatly expanding the
hierarchical complex data types to a standard tabular data structure for analysis.
• You should also be able to import data directly from these formats
• The ability to join data for analysis between the various internal and external data
formats provides the highest level of analytic flexibility.
12
Challenges of Today’s Environment
vComplexity
vContext
vCost
Data Integration Product #Goals
• Cloud Native
• Intelligently driven automation suggests and generates new data pipelines between
source and target without manually mapping or design
• Data Orchestration
– Managing the ebb and flow of data throughout the ecosystem,
– Determining what data is analyzed/stored at the origin,
– Determining what data is moved upstream and at what granularity and state
– Moving, integrating and updating data, metadata and master data and machine learning models as
evolution happens at the core
• Trust created through transparency and understanding
– Modern data management and integration platforms give organizations holistic views of their enterprise
data and deep, thorough lineage paths to show how critical data traces back to a trusted, primary source
• Able to dynamically, and even automatically, scale to meet the increasing complexity and
concurrency demands of query executions.
14
Capabilities for Cloud Data Integration
Data Lineage Stakeholders and Use Cases
Data Lineage Stakeholders and Use Case
• Legislative Bodies and Compliance
• Data Discovery Initiatives
• Impact Analysis
• Audit Functions
• Data Quality Programs
• Project Sponsors and Managers
• Data Architects
• Training and Onboarding Staff
• Root Cause Analysis
Data Lineage Requirements
Data Lineage Requirements
• Graphical Representation
• Impact Analysis
• Root Cause Analysis
• Extend to Non-Standard or Custom Sources
• General Accessibility to Lineage and Metadata
Data Integration Requirements
• Comprehensive Native Connectivity
• Multi-Latency Data Ingestion
• Data Integration Patterns
• Data Quality and Data Governance
• Data Cataloging and Metadata Management
• Enterprise Trust
• Artificial Intelligence and Automation
• Ecosystem and Multi-Cloud
18
Data Prep Requirements
• Designed as a low setup, no configuration, and no maintenance
data pipeline to lift data from operational sources and deliver it
to a modern cloud data warehouse
• Well-suited for popular cloud applications like Salesforce, Stripe,
and Marketo
• Transformations are SQL-based rules written by business users
and set to run on a schedule or as new data becomes available
• Look for
– Migrate function to move or clone business logic and resources, change data capture,
queue messaging, and pub-sub
– Dynamic ETL (such as job variables and iterators) plus data orchestration and staging
mechanisms (such as functions to make real-time adjustments if jobs run long or fail)
– Automation in pipeline building
19
Application Programming Interfaces
• A ubiquitous method and de facto standard of communication among modern information
technologies.
• APIs have begun to replace older, more cumbersome methods of information sharing with
lightweight, endpoints.
• Due to the popularity and proliferation of APIs and microservices, the need has arisen to manage the
multitude of services a company relies on—both internal and external.
– APIs themselves vary greatly in protocols, methods, authorization/authentication schemes, and usage patterns.
– Additionally, IT needs greater control over their hosted APIs, such as rate limiting, quotas, policy enforcement, and user
identification, to ensure high availability and prevent abuse and security breaches.
– Exposing APIs opens the door to many partners who can co- create and expand the core platform without even knowing
anything about the underlying technology.
• Organizations depend on these services to be properly managed, with high performance and
availability.
– Many companies experience workloads of more than 1,000 transactions per second on their API endpoints.
– For these organizations, their need for performance is tantamount to their need for management because they rely on these
API transaction rates to keep up with the speed of their business.
– On the contrary, many companies are looking for a solution to load balance across redundant API endpoints and enable high
transaction volumes.
– A financial institution with 1,000 transactions happening per second translates to 86 million API calls in a single 24-hour day.
20
API & Microservices Ecosystem
Public Private - External Private - Internal
Over 20,000 public APIs*
*according to https://www.programmableweb.com/apis/directory
External Partners Connected Apps & Data
Platform Architecture
Load
Balancer
(Nginx,
HAProxy,
ELB, etc.)
API Nodes
Postgres or Cassandra
API Endpoint 1
API Endpoint 2
API Endpoint…n
Client 1
Client 2
Client …n
API Requirements
• Performance: Good for high performance
workloads (>1,000TPS)
• Reliability: All workloads completed with
100% message completion
• Complexity: Multiple plugins enabled
Streaming Solutions
ETL is Insufficient for this combination
• Data platforms operating at an enterprise-wide scale
• A high variety of data sources
• Real-time/streaming data
• ETL forces either real-time loading without being scalable or
scalability with batch loading
– Data, produced from numerous sources, is a torrent of flowing
information, needing to be timestamped, dispatched, and even
duplicated (to protect against data loss)
– A postman is needed to distribute data from message senders to
receivers at the right place at the right time.
25
Real-Time Data
• A.k.a. messaging, live feeds, real-time, event-driven
• Comes in continuously and often quickly, so we call it
streaming data.
• Needs special attention and can be of immense value, but
only if we are alerted in time.
• Foundation for Artificial Intelligence
– Stream data forms the core of data for artificial intelligence
26
Enter Message-Oriented Middleware aka Streaming and
message queuing technology
• Messages can be any kind of data wrapped in a neat package
with a very simple header as a bow on top.
• Messages are sent by “producers”—systems, sensors, or devices
that generate the messages—toward a “broker.”
• A broker does not process the messages, but instead routes
them into queues according to the information enclosed in the
message header or its own routing process.
• Then “consumers” retrieve the messages from the queues to
which they subscribe (although sometimes messages are pushed
to consumers rather than pulled).
• The consumers open the messages and perform some kind of
action on them.
27
Performance and scalability in streaming
28
Storage
Ability to retain varying volumes of
messages for varying lengths of time
Throughput
High, sustainable rate of message
processing
Latency
Fast, consistent responsiveness for
publishing and consumption
Operations
Minimizing operational burden for scaling,
tuning, and monitoring
Streaming Architecture
Apps
29
Streaming
Platform
Change logs
Streaming data pipelines
Messaging or
Stream processing
Request - Response
DW
Technical
Support
Web Services
API
Big Data
Analysis
IDE / Developer
GUI
Data Lake
Parallel
Tools
Multi-Threaded
Math Libraries
Cluster
support
Apache Kafka
• Open source streaming platform developed at LinkedIn
• A distributed publish-subscribe messaging system that maintains feeds of
messages called topics
– Publishers write data to topics and subscribers read from topics
– Kafka topics are partitioned and replicated across multiple nodes in your Hadoop
cluster
• Enables “source to sink” data pipelines
• Kafka messages are simple, byte-long arrays that can store objects in
virtually any format with a key attached to each message; often in JSON
• E&L in ETL through Kafka Connect API
• T in ETL through Kafka Streams API
• Fault-tolerant
• DIY
30
Apache Pulsar
• Originally developed at Yahoo
• Began its incubation at Apache in late 2016
• Has been in production as Yahoo for 3 years prior—utilized in popular services and
applications like Yahoo! Mail, Finance, Sports, Flickr, Gemini Ads, and Sherpa
• Follows the publisher-subscriber model (pub-sub), and has the same producers,
topics, and consumers as some of the aforementioned technologies
• Uses built-in multi-datacenter replication
• Architected for multi-tenancy and uses concepts of properties and namespaces
– A property could represent all the messaging for a particular team, application, product vertical, etc.
– Namespaces is the administrative unit where security, replication, configurations, and subscriptions are managed
– At the topic level, messages are partitioned and managed by brokers using a user-defined routing policy—such as
single, round robin, hash, or custom—thus granting further transparency and control in the process
31
Workloads are Distinguished by
• The number of topics
• The size of the messages being produced and consumed
• The number of subscriptions per topic
• The number of producers per topic
• The rate at which producers produce messages (per
second)
• The size of the consumer’s backlog (in gigabytes)
• The total duration of the test (in minutes)
32
Key Takeaways & Application to the Enterprise
• An enterprise has many different types of data stores as well as many data stores of the same type
– This extends to clouds
• The reasons for this are many
– Performance
– Cost Predictability and Transparency
– Administration
– Optimizer
– Concurrency
– Resource Elasticity
– Machine Learning Capabilities
– Data Storage Format Alternatives
• It is fully expected that enterprise environments would have a heterogenous vendor as well as other
vendors
• APIs have begun to replace older, more cumbersome methods of information sharing with
lightweight, endpoints.
• Streaming and message queuing have lasting value to organizations.
• Streaming and messaging will be able to meet the real-time data volume, variety, and timing
requirements of the coming years.
The Shifting
Landscape of Data
Integration
Presented by: William McKnight
“#1 Global Influencer in Data Warehousing” Onalytica
President, McKnight Consulting Group Inc 5000
@williammcknight
www.mcknightcg.com
(214) 514-1444

More Related Content

What's hot

The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star SchemaDATAVERSITY
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsDATAVERSITY
 
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongThe Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongDATAVERSITY
 
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationDATAVERSITY
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
 
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...DATAVERSITY
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY
 
Measuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentMeasuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentDATAVERSITY
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...DATAVERSITY
 
MLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMichael Pearce
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsDATAVERSITY
 
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data DictionaryRWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data DictionaryDATAVERSITY
 
Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Why Data Modeling Is Fundamental
Why Data Modeling Is FundamentalWhy Data Modeling Is Fundamental
Why Data Modeling Is FundamentalDATAVERSITY
 

What's hot (20)

The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
 
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongThe Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
 
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
Measuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentMeasuring Data Quality Return on Investment
Measuring Data Quality Return on Investment
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
 
MLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into Production
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
 
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data DictionaryRWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
 
Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words Matter
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Why Data Modeling Is Fundamental
Why Data Modeling Is FundamentalWhy Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
 

Similar to The Shifting Landscape of Data Integration

ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User InformationDenodo
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Saurabh K. Gupta
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bankChungsik Yun
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudPrecisely
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
 

Similar to The Shifting Landscape of Data Integration (20)

ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bank
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 

Recently uploaded (17)

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 

The Shifting Landscape of Data Integration

  • 1. The Shifting Landscape of Data Integration Presented by: William McKnight “#1 Global Influencer in Data Warehousing” Onalytica President, McKnight Consulting Group Inc 5000 @williammcknight www.mcknightcg.com (214) 514-1444
  • 2. © All rights reserved. Matillion 2021 A vendor perspective from Matillion The Shifting Landscape of Data Integration
  • 3. © All rights reserved. Matillion 2021 2 Paul Lacey Sr. Director Product Marketing Matillion Speakers
  • 4. © All rights reserved. Matillion 2021 New Challenges in the Cloud From 3 Vs to 3 Ds
  • 5. © All rights reserved. Matillion 2021 © All rights reserved. Matillion 2021 Cloud Data Challenges Source: Calming the Storm with Cloud Data Management - Webinar - Matillion & IDC, April 2021, n=401
  • 6. © All rights reserved. Matillion 2021 © All rights reserved. Matillion 2021 IDC Sees Shift From the 3 Vs to the 3 Ds Velocity Volume Variety Cloud Migration Dynamic Distributed Diverse Source: Calming the Storm with Cloud Data Management - Webinar - Matillion & IDC, April 2021
  • 7. © All rights reserved. Matillion 2021 About Matillion A Modern Solution to Modern Challenges
  • 8. © All rights reserved. Matillion 2021 Bringing the power of the cloud to modern data challenges with the flexibility of cloud-native data integration Low-Code, No-Compromise 7 © All rights reserved. Matillion 2021
  • 9. © All rights reserved. Matillion 2021 Delivering transformative value Pay only for what you use: Achieve rapid returns on cloud investments that can be paid forward across the business Easy to use Get things done faster in the cloud with low-code, visual, repeatable design patterns Built for the enterprise Easily scale for massive data volumes and stay within enterprise security requirements Built for the cloud Leverage the full power of the cloud with push-down ELT and native cloud platform integrations About Matillion 8 Cloud data platforms we support:
  • 10. © All rights reserved. Matillion 2021 9 Our Products © All rights reserved. Matillion 2021
  • 11. © All rights reserved. Matillion 2021 More Info: matillion.com
  • 12. © All rights reserved. Matillion 2021 Matillion #TeamGreen Matillion #teamgreen matillion.com Thank You!
  • 13. William McKnight • Dog owner • Fitness – Current National Age Group Champion Deka.Fit and Hyrox Pro • Piano
  • 14. Reports Spreadsheets Dashboards Transform Tools/Scripts Cubes or Marts Enterprise Analytic Data Operational Systems Data Science Artificial Intelligence Executives Compliance Audit IT Staff Governance Finance Stewardship Cloud Stores Cloud Apps Process Steward Data Steward Data Owner Business Terms Ops Documents Managers Users COBOL Reports Data, Processes, People, Privacy, Problems, and Projects Over Time
  • 15. Why So Many Data Stores? 4
  • 16. Performance • Performance is a critical point of interest when it comes to selecting an analytics platform • To measure data warehouse performance, we use similarly priced specifications across data warehouse competitors. • Usually when people say they care about performance, it is the ultimate metric of price/performance • The realities of creating fair tests can be overwhelming to many shops, and is a task usually underestimated. 5
  • 17. Cost Predictability and Transparency • The cost profile options for cloud databases are straightforward if you accept the defaults for simple workload or proof-of- concept (POC) environments • Initial entry costs and inadequately scoped environments can artificially lower expectations of the true costs of jumping into a cloud data warehouse environment. • For some, you pay for compute resources as a function of time, but you also choose the hourly rate based on certain enterprise features you need. • With some platforms, you pay for bytes processed and the underlying architecture is unknown. The environment is scaled automatically without affecting price. There is also a cost-per- hour flat rate where you would need to calculate how long it would take to run your queries to completion to predict costs. • Customers need to analyze current workloads, performance, and concurrency and project those into realistic pricing in alternative platforms. 6
  • 18. Administration • Overall costs, time, as well as storage and compute resources are affected by the simplicity of configurability and overall use. • The platform should have embraced a self-sufficiency model for its customers and be well into the process of automating repetitive tasks. • Easy administration starts with setup that is a simple process of asking basic information and providing helpful information for selecting the storage and node configurations. • The data store should support mission-critical business applications with minimal downtime. 7
  • 19. Optimizer • You may need: • Conditional parallelism and what the causes are of variations in the parallelism deployed. • Dynamic and controllable prioritization of resources for queries. • Time and requirements for optimal queries, such as compiling indexes or updating statistics. • Workload isolation capabilities. 8
  • 20. Concurrency Scaling • If the database has concurrency limitations, designing around them is difficult at best, and limiting to effective data usage. • If the data store automatically scales up to overcome concurrency limitations, this may be costly if the data warehouse charges by compute node. • If the data store charges per user, costs will also increase as the data warehouse is put to more use in the company. • The workload may need linear scaling in overall query workload performance as concurrent users are added. 9
  • 21. Resource Elasticity • You may need the ability to scale up and down and take advantage of the elastic compute and storage capabilities in the cloud, public or private, without disruption or delay. • The more the customer needs to be involved in resource determination and provisioning, the less elastic, and less modern, the solution is. • One thing to watch for in elasticity scaling is keeping the amount of money spent by the customer under the customer’s control. 10
  • 22. Machine Learning • Today, some data query languages need to be extended to include machine learning, or firms may find the programming required will be too challenging to keep pace. • Data stores today may need to weave machine learning into their data processing workflows • Vendors must accommodate and extend SQL to include machine learning functions and algorithms to expand the capabilities of those tools and users. • If your database does not include machine learning, there are many extra things to be concerned with. • Other components will be needed to complete the toolbox and get the job done. • Ideally, security for machine learning will be the same as database security. • The data warehouse also needs to be able to operate at scale, beyond sampling 11
  • 23. Data Storage Format Alternatives • Cloud object storage is relatively inexpensive making data storage at high scale affordable. • On-premises, specialized private cloud storage options such as Pure Flashblades tend to offer similar data type storage flexibility • To take full advantage of the elasticity of the cloud without driving up costs, data warehouse compute and storage need to be scaled separately. • To take full advantage of the many types of data available, such as Apache ORC, Apache Parquet, JSON, Apache Avro, etc., modern data warehouses need to be able to analyze that data without moving or altering it. • A unified analytics warehouse that supports these various data formats means you have the benefit of querying them directly, without greatly expanding the hierarchical complex data types to a standard tabular data structure for analysis. • You should also be able to import data directly from these formats • The ability to join data for analysis between the various internal and external data formats provides the highest level of analytic flexibility. 12
  • 24. Challenges of Today’s Environment vComplexity vContext vCost
  • 25. Data Integration Product #Goals • Cloud Native • Intelligently driven automation suggests and generates new data pipelines between source and target without manually mapping or design • Data Orchestration – Managing the ebb and flow of data throughout the ecosystem, – Determining what data is analyzed/stored at the origin, – Determining what data is moved upstream and at what granularity and state – Moving, integrating and updating data, metadata and master data and machine learning models as evolution happens at the core • Trust created through transparency and understanding – Modern data management and integration platforms give organizations holistic views of their enterprise data and deep, thorough lineage paths to show how critical data traces back to a trusted, primary source • Able to dynamically, and even automatically, scale to meet the increasing complexity and concurrency demands of query executions. 14
  • 26. Capabilities for Cloud Data Integration
  • 27. Data Lineage Stakeholders and Use Cases Data Lineage Stakeholders and Use Case • Legislative Bodies and Compliance • Data Discovery Initiatives • Impact Analysis • Audit Functions • Data Quality Programs • Project Sponsors and Managers • Data Architects • Training and Onboarding Staff • Root Cause Analysis
  • 28. Data Lineage Requirements Data Lineage Requirements • Graphical Representation • Impact Analysis • Root Cause Analysis • Extend to Non-Standard or Custom Sources • General Accessibility to Lineage and Metadata
  • 29. Data Integration Requirements • Comprehensive Native Connectivity • Multi-Latency Data Ingestion • Data Integration Patterns • Data Quality and Data Governance • Data Cataloging and Metadata Management • Enterprise Trust • Artificial Intelligence and Automation • Ecosystem and Multi-Cloud 18
  • 30. Data Prep Requirements • Designed as a low setup, no configuration, and no maintenance data pipeline to lift data from operational sources and deliver it to a modern cloud data warehouse • Well-suited for popular cloud applications like Salesforce, Stripe, and Marketo • Transformations are SQL-based rules written by business users and set to run on a schedule or as new data becomes available • Look for – Migrate function to move or clone business logic and resources, change data capture, queue messaging, and pub-sub – Dynamic ETL (such as job variables and iterators) plus data orchestration and staging mechanisms (such as functions to make real-time adjustments if jobs run long or fail) – Automation in pipeline building 19
  • 31. Application Programming Interfaces • A ubiquitous method and de facto standard of communication among modern information technologies. • APIs have begun to replace older, more cumbersome methods of information sharing with lightweight, endpoints. • Due to the popularity and proliferation of APIs and microservices, the need has arisen to manage the multitude of services a company relies on—both internal and external. – APIs themselves vary greatly in protocols, methods, authorization/authentication schemes, and usage patterns. – Additionally, IT needs greater control over their hosted APIs, such as rate limiting, quotas, policy enforcement, and user identification, to ensure high availability and prevent abuse and security breaches. – Exposing APIs opens the door to many partners who can co- create and expand the core platform without even knowing anything about the underlying technology. • Organizations depend on these services to be properly managed, with high performance and availability. – Many companies experience workloads of more than 1,000 transactions per second on their API endpoints. – For these organizations, their need for performance is tantamount to their need for management because they rely on these API transaction rates to keep up with the speed of their business. – On the contrary, many companies are looking for a solution to load balance across redundant API endpoints and enable high transaction volumes. – A financial institution with 1,000 transactions happening per second translates to 86 million API calls in a single 24-hour day. 20
  • 32. API & Microservices Ecosystem Public Private - External Private - Internal Over 20,000 public APIs* *according to https://www.programmableweb.com/apis/directory External Partners Connected Apps & Data
  • 33. Platform Architecture Load Balancer (Nginx, HAProxy, ELB, etc.) API Nodes Postgres or Cassandra API Endpoint 1 API Endpoint 2 API Endpoint…n Client 1 Client 2 Client …n
  • 34. API Requirements • Performance: Good for high performance workloads (>1,000TPS) • Reliability: All workloads completed with 100% message completion • Complexity: Multiple plugins enabled
  • 36. ETL is Insufficient for this combination • Data platforms operating at an enterprise-wide scale • A high variety of data sources • Real-time/streaming data • ETL forces either real-time loading without being scalable or scalability with batch loading – Data, produced from numerous sources, is a torrent of flowing information, needing to be timestamped, dispatched, and even duplicated (to protect against data loss) – A postman is needed to distribute data from message senders to receivers at the right place at the right time. 25
  • 37. Real-Time Data • A.k.a. messaging, live feeds, real-time, event-driven • Comes in continuously and often quickly, so we call it streaming data. • Needs special attention and can be of immense value, but only if we are alerted in time. • Foundation for Artificial Intelligence – Stream data forms the core of data for artificial intelligence 26
  • 38. Enter Message-Oriented Middleware aka Streaming and message queuing technology • Messages can be any kind of data wrapped in a neat package with a very simple header as a bow on top. • Messages are sent by “producers”—systems, sensors, or devices that generate the messages—toward a “broker.” • A broker does not process the messages, but instead routes them into queues according to the information enclosed in the message header or its own routing process. • Then “consumers” retrieve the messages from the queues to which they subscribe (although sometimes messages are pushed to consumers rather than pulled). • The consumers open the messages and perform some kind of action on them. 27
  • 39. Performance and scalability in streaming 28 Storage Ability to retain varying volumes of messages for varying lengths of time Throughput High, sustainable rate of message processing Latency Fast, consistent responsiveness for publishing and consumption Operations Minimizing operational burden for scaling, tuning, and monitoring
  • 40. Streaming Architecture Apps 29 Streaming Platform Change logs Streaming data pipelines Messaging or Stream processing Request - Response DW Technical Support Web Services API Big Data Analysis IDE / Developer GUI Data Lake Parallel Tools Multi-Threaded Math Libraries Cluster support
  • 41. Apache Kafka • Open source streaming platform developed at LinkedIn • A distributed publish-subscribe messaging system that maintains feeds of messages called topics – Publishers write data to topics and subscribers read from topics – Kafka topics are partitioned and replicated across multiple nodes in your Hadoop cluster • Enables “source to sink” data pipelines • Kafka messages are simple, byte-long arrays that can store objects in virtually any format with a key attached to each message; often in JSON • E&L in ETL through Kafka Connect API • T in ETL through Kafka Streams API • Fault-tolerant • DIY 30
  • 42. Apache Pulsar • Originally developed at Yahoo • Began its incubation at Apache in late 2016 • Has been in production as Yahoo for 3 years prior—utilized in popular services and applications like Yahoo! Mail, Finance, Sports, Flickr, Gemini Ads, and Sherpa • Follows the publisher-subscriber model (pub-sub), and has the same producers, topics, and consumers as some of the aforementioned technologies • Uses built-in multi-datacenter replication • Architected for multi-tenancy and uses concepts of properties and namespaces – A property could represent all the messaging for a particular team, application, product vertical, etc. – Namespaces is the administrative unit where security, replication, configurations, and subscriptions are managed – At the topic level, messages are partitioned and managed by brokers using a user-defined routing policy—such as single, round robin, hash, or custom—thus granting further transparency and control in the process 31
  • 43. Workloads are Distinguished by • The number of topics • The size of the messages being produced and consumed • The number of subscriptions per topic • The number of producers per topic • The rate at which producers produce messages (per second) • The size of the consumer’s backlog (in gigabytes) • The total duration of the test (in minutes) 32
  • 44. Key Takeaways & Application to the Enterprise • An enterprise has many different types of data stores as well as many data stores of the same type – This extends to clouds • The reasons for this are many – Performance – Cost Predictability and Transparency – Administration – Optimizer – Concurrency – Resource Elasticity – Machine Learning Capabilities – Data Storage Format Alternatives • It is fully expected that enterprise environments would have a heterogenous vendor as well as other vendors • APIs have begun to replace older, more cumbersome methods of information sharing with lightweight, endpoints. • Streaming and message queuing have lasting value to organizations. • Streaming and messaging will be able to meet the real-time data volume, variety, and timing requirements of the coming years.
  • 45. The Shifting Landscape of Data Integration Presented by: William McKnight “#1 Global Influencer in Data Warehousing” Onalytica President, McKnight Consulting Group Inc 5000 @williammcknight www.mcknightcg.com (214) 514-1444