SlideShare a Scribd company logo
1 of 28
Download to read offline
How to Use the Right Tools for
Operational Data Integration
Mark R. Madsen – March, 2009
http://ThirdNature.net




                     Attribution-NonCommercial-No Derivative
                     http://creativecommons.org/licenses/by-nc-nd/3.0/us/
What We’re Asked For




                  (simulation)
                                      Slide 2
March 2009           Mark R. Madsen
How It Makes Us Feel




                                      Slide 3
March 2009           Mark R. Madsen
How We Want to Feel




                                     Slide 4
March 2009          Mark R. Madsen
Spending Priorities in IT




    In 2007 and 2008 this is where the money went…
       but you can’t do most of these without data integration.
Sources: CIO Insight
                                                              Slide 5
March 2009                       Mark R. Madsen
Technology Priorities in IT




  Data integration moved up to #3 spot for CIOs in 2008

Sources: CIO Insight
                                                    Slide 6
March 2009                 Mark R. Madsen
The Cost Problem Management Reacts To




                                      Source: IDC

                                                    Slide 7
March 2009           Mark R. Madsen
Where We Often Are Today: Point to Point

                                                                   Typical scenario:
                                                                   • Disparate data
                                                                   • Heterogeneous sources
                                                                   • Point integration
                                                                   • Minimal reuse
                                                                   • No tools




             Databases   Documents   Flat Files          XML       Services   ERP   Applications



                                      Source Environments
                                                                                                   Slide 8
March 2009                                        Mark R. Madsen
The Desired Future State
                                                  “Data as a platform” provides:
                                                  • Standards-based interfaces
                                                  • Single views of disparate source data
                                                  • Single point of access / integration
                                                  • Reuse of data


                                                                    …but you can’t achieve this by
                  Data Platform                                     writing more application code




             Databases   Documents   Flat Files           XML          Services   ERP   Applications



                                      Source Environments
                                                                                                       Slide 9
March 2009                                         Mark R. Madsen
Application versus Data Integration
    Application               Data Integration
    Integration
    Managing the flow of      Managing the flow of
    events                    data and access
    Standardizes the          Standardizes the data
    transaction or service
    Tools abstract the        Tools abstract the
    transport and system      transport, system,
    endpoints
                              representation and
                              manipulation
    Must write code at        Data structure, format
    endpoints to manipulate   and manipulation is
    data                      abstracted
    Focus on code - data as   Focus on data - data as
    a byproduct               the product
    Reusable functions, not   Reusable data, not
    data                      functions


                                                           Slide 10
March 2009                                Mark R. Madsen
Analytic versus Operation Data Integration

   Analytic                                 Operational
   Most of a BI project’s effort is         Most of an application project
   spent on data integration                is focused on features, not DI
   Many disparate sources                   One or a few sources
   Generally unidirectional                 One-way or bidirectional
   Large data volumes                       Large data volume for some,
                                            small volume for others
   Usually loaded daily                     Often loaded more often,
                                            varies based on project type
   Low concurrency                          Low to high concurrency
   High latency                             Low to high latency

                                                                        Slide 11
March 2009                            Mark R. Madsen
Architectural Models for Data Integration



      Physical




 Data
 Access
 Model



         Virtual




                   Distributed                    Centralized
                                    Control
                                                                Slide 12
March 2009                       Mark R. Madsen
Consolidation
    Common operational DI scenarios
    where this model is appropriate:
        • Migrations
        • Upgrades
        • Consolidations
        • Managing master / reference data

    Characteristics:
        • Large data volumes to move or access
        • One time data movement
        • Usually unidirectional
        • Transformation or cleansing required

                                                   Slide 13
March 2009                        Mark R. Madsen
Propagation
  Common scenarios:
        • Copying data that can’t be accessed
          directly / remotely
        • Synchronizing data
        • Data cross-referencing
        • Infrequent / one-time extracts


  Characteristics:
        • Can be one-way or bi-directional
        • Often repetitive data movement
        • Medium to large data volume (but not
          always)
                                                     Slide 14
March 2009                          Mark R. Madsen
Federation
    Common scenarios:
      • Real-time / low latency data access
      • Security / regulatory requirements that
        prevent copying data
      • Impractical to create a central
        database (e.g. # sources, latency)
      • Centralized data services


    Characteristics:
      • One-way
      • Lower data volumes
      • Higher concurrency

                                                    Slide 15
March 2009                         Mark R. Madsen
Choosing Models
                      There are some basic criteria
                      and tradeoffs to consider:
                      •   Data currency vs. latency
                      •   Diversity of data sources
                      •   Data cleansing & transformation
                      •   Predictability of performance
                      •   Access to the same data is
                          needed via different interfaces
                      •   Non-relational sources
                      •   Frequency of access
                      •   Data volumes
                      •   And more…


                                                      Slide 16
March 2009             Mark R. Madsen
A Handy Comparison Chart

                                                         Consolidation Model
                            Criteria                     Physical    Virtual
             Data currency
             Query performance / latency
             Frequency of access
             Diversity of data sources
             Diversity of data types
             Non-relational data sources
             Transformation and cleansing
             Predictability of performance
             Multiple interfaces to same data
             Large query / data volume
             Need for history / aggregation



                                                                               Slide 17
March 2009                              Mark R. Madsen
Three Implementation Choices

    • Write code! It’s fun! It’s easy! At first.
    • Buy proprietary data integration tools
    • Use available open source tools




                                                   Slide 18
March 2009                   Mark R. Madsen
Hand-coded Integration
    Why is this so common?
      •   DI is an afterthought on application projects
      •   It’s just data
      •   It’s hard to justify expensive tools for ODI
      •   Developers and DBAs don’t talk

    The market is changing:
      • Lower tolerance for the high cost of
        custom DI development and maintenance
      • External data challenges
      • Bad fit for consolidation projects


    Products get better over time. Hand-written
    code gets worse.


                                                          Slide 19
March 2009                              Mark R. Madsen
Buying Data Integration Tools
                       Buying is the usual alternative,
                       mostly ETL tools.
                           • ETL vendors are branching out
                           • Many companies have ETL for BI
                       But…
                           • Poor fit for propagation and
                             synchronization tasks
                           • Centralized servers
                           • Licensing costs / problems for
                             consolidation tasks or broad use

                       Integration code is single-purpose, tools are
                       multi-purpose. You should always go with
                       tools – when you can afford them.


                                                              Slide 20
March 2009              Mark R. Madsen
Use of Tools vs. Hand Coding
               High Use        Medium Use                    Low Use             None
  60%



  50%



  40%



  30%



  20%



  10%



    0%
             ETL EDR EII EAI   ETL EDR EII EAI             ETL EDR EII EAI   ETL EDR EII EAI


                                                                                Source: TDWI, 2006

                                                                                              Slide 21
March 2009                                Mark R. Madsen
Open Source: End of Buy vs. Build
                     Open source avoids the pitfalls
                     of coding and gains the
                     advantages of using tools.
                      • Tools can be distributed with little
                        to no license restrictions
                      • Application projects budget for
                        features, not glue
                      • Even basic tools have obvious
                        operational advantages over
                        hand-coding

                     Why build custom code when there are
                     comparable tools available?

                                                         Slide 22
March 2009            Mark R. Madsen
Benefits Reported
    After your organization adopted open source
    software, what was the primary benefit of its use?
                         Flexibility                                        31%


                        Lower cost                                          31%


     Reduced dependence on vendors                            15%


                      Performance                       10%


                         Reliability               7%


                          Security          4%


                             Other        3%
                                                                    Source: The 451 Group

                                                                                 Slide 23
March 2009                             Mark R. Madsen
A Side Benefit of Flexibility
    Comparison of time taken to evaluate tools




                                                 Source: Yankee Group

                                                             Slide 24
March 2009                 Mark R. Madsen
Recommendations
1. Differentiate between analytic
   data integration and operational
   data integration
2. Stop hand-coding unless the
   problem really is trivial, and this
   includes table replication and
   DBA SQL scripts
3. Use the right data integration
   model for the problem
4. Augment existing data
   integration infrastructure with
   open source
5. Make open source the default
   option for data integration tools

                                                   Slide 25
March 2009                        Mark R. Madsen
Creative Commons
    Thanks to the people who made their images available via creative commons:
    red pill blue pill - http://www.flickr.com/photos/rcrowley/2540057217/
    red pill blue pill2 - http://www.flickr.com/photos/thomasthomas/258931782/
    happy dog jumping in meadow - http://flickr.com/photos/cenz/16128560/
    Writing code – http://flickr.com/photos/cdm/72250667/
    Woodworking – http://flickr.com/photos/rigoletto/126367565/
    Febo – http://flickr.com/photos/jshyun/1573065713/
    open_air_market_bologn - http://flickr.com/photos/pattchi/181259150/




                                                                                 Slide 26
March 2009                                             Mark R. Madsen
Thanks




                              Slide 27
March 2009   Mark R. Madsen
Creative Commons
    This work is licensed under the Creative Commons
    Attribution-Noncommercial-No Derivative Works 3.0 United
    States License. To view a copy of this license, visit
    http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or send
    a letter to Creative Commons, 543 Howard Street, 5th Floor,
    San Francisco, California, 94105, USA.




                                                               Slide 28
March 2009                      Mark R. Madsen

More Related Content

Similar to How to Use the Right Tools for Operational Data Integration

Interoperability for Intelligence Applications using Data-Centric Middleware
Interoperability for Intelligence Applications using Data-Centric MiddlewareInteroperability for Intelligence Applications using Data-Centric Middleware
Interoperability for Intelligence Applications using Data-Centric MiddlewareGerardo Pardo-Castellote
 
The Impact of SOA on Traditional Middleware Technologies
The Impact of SOA on Traditional Middleware TechnologiesThe Impact of SOA on Traditional Middleware Technologies
The Impact of SOA on Traditional Middleware Technologiesdigitallibrary
 
Next Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan HartwellNext Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan HartwellHPDutchWorld
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellHPDutchWorld
 
Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009LeeFeigenbaum
 
Database 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvanDatabase 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvanIftikhar Ahmad
 
Positioning XAM for the Cloud
Positioning XAM for the CloudPositioning XAM for the Cloud
Positioning XAM for the CloudMark Carlson
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniJAXLondon2014
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams치민 최
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
How To Deliver High Performing Highly Available Cloud Applications
How To Deliver High Performing Highly Available Cloud ApplicationsHow To Deliver High Performing Highly Available Cloud Applications
How To Deliver High Performing Highly Available Cloud ApplicationsBen Rushlo
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBMongoDB
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Mobile Development Meets Semantic Technology
Mobile Development Meets Semantic TechnologyMobile Development Meets Semantic Technology
Mobile Development Meets Semantic TechnologyBlue Slate Solutions
 
High Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and ScalaHigh Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and ScalaAngelo Corsaro
 

Similar to How to Use the Right Tools for Operational Data Integration (20)

Interoperability for Intelligence Applications using Data-Centric Middleware
Interoperability for Intelligence Applications using Data-Centric MiddlewareInteroperability for Intelligence Applications using Data-Centric Middleware
Interoperability for Intelligence Applications using Data-Centric Middleware
 
The Impact of SOA on Traditional Middleware Technologies
The Impact of SOA on Traditional Middleware TechnologiesThe Impact of SOA on Traditional Middleware Technologies
The Impact of SOA on Traditional Middleware Technologies
 
Next Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan HartwellNext Generation Datacenter Oracle - Alan Hartwell
Next Generation Datacenter Oracle - Alan Hartwell
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
 
Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009
 
Database 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvanDatabase 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvan
 
Positioning XAM for the Cloud
Positioning XAM for the CloudPositioning XAM for the Cloud
Positioning XAM for the Cloud
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams
 
Tim marston
Tim marstonTim marston
Tim marston
 
Integration
IntegrationIntegration
Integration
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
How To Deliver High Performing Highly Available Cloud Applications
How To Deliver High Performing Highly Available Cloud ApplicationsHow To Deliver High Performing Highly Available Cloud Applications
How To Deliver High Performing Highly Available Cloud Applications
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Tim Marston.
Tim Marston.Tim Marston.
Tim Marston.
 
Mobile Development Meets Semantic Technology
Mobile Development Meets Semantic TechnologyMobile Development Meets Semantic Technology
Mobile Development Meets Semantic Technology
 
Dms01
Dms01Dms01
Dms01
 
High Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and ScalaHigh Performance Distributed Computing with DDS and Scala
High Performance Distributed Computing with DDS and Scala
 

More from mark madsen

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humansmark madsen
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Rangemark madsen
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software marketmark madsen
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customersmark madsen
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsmark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)mark madsen
 

More from mark madsen (20)

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customers
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analytics
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)
 

Recently uploaded

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 

Recently uploaded (20)

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 

How to Use the Right Tools for Operational Data Integration

  • 1. How to Use the Right Tools for Operational Data Integration Mark R. Madsen – March, 2009 http://ThirdNature.net Attribution-NonCommercial-No Derivative http://creativecommons.org/licenses/by-nc-nd/3.0/us/
  • 2. What We’re Asked For (simulation) Slide 2 March 2009 Mark R. Madsen
  • 3. How It Makes Us Feel Slide 3 March 2009 Mark R. Madsen
  • 4. How We Want to Feel Slide 4 March 2009 Mark R. Madsen
  • 5. Spending Priorities in IT In 2007 and 2008 this is where the money went… but you can’t do most of these without data integration. Sources: CIO Insight Slide 5 March 2009 Mark R. Madsen
  • 6. Technology Priorities in IT Data integration moved up to #3 spot for CIOs in 2008 Sources: CIO Insight Slide 6 March 2009 Mark R. Madsen
  • 7. The Cost Problem Management Reacts To Source: IDC Slide 7 March 2009 Mark R. Madsen
  • 8. Where We Often Are Today: Point to Point Typical scenario: • Disparate data • Heterogeneous sources • Point integration • Minimal reuse • No tools Databases Documents Flat Files XML Services ERP Applications Source Environments Slide 8 March 2009 Mark R. Madsen
  • 9. The Desired Future State “Data as a platform” provides: • Standards-based interfaces • Single views of disparate source data • Single point of access / integration • Reuse of data …but you can’t achieve this by Data Platform writing more application code Databases Documents Flat Files XML Services ERP Applications Source Environments Slide 9 March 2009 Mark R. Madsen
  • 10. Application versus Data Integration Application Data Integration Integration Managing the flow of Managing the flow of events data and access Standardizes the Standardizes the data transaction or service Tools abstract the Tools abstract the transport and system transport, system, endpoints representation and manipulation Must write code at Data structure, format endpoints to manipulate and manipulation is data abstracted Focus on code - data as Focus on data - data as a byproduct the product Reusable functions, not Reusable data, not data functions Slide 10 March 2009 Mark R. Madsen
  • 11. Analytic versus Operation Data Integration Analytic Operational Most of a BI project’s effort is Most of an application project spent on data integration is focused on features, not DI Many disparate sources One or a few sources Generally unidirectional One-way or bidirectional Large data volumes Large data volume for some, small volume for others Usually loaded daily Often loaded more often, varies based on project type Low concurrency Low to high concurrency High latency Low to high latency Slide 11 March 2009 Mark R. Madsen
  • 12. Architectural Models for Data Integration Physical Data Access Model Virtual Distributed Centralized Control Slide 12 March 2009 Mark R. Madsen
  • 13. Consolidation Common operational DI scenarios where this model is appropriate: • Migrations • Upgrades • Consolidations • Managing master / reference data Characteristics: • Large data volumes to move or access • One time data movement • Usually unidirectional • Transformation or cleansing required Slide 13 March 2009 Mark R. Madsen
  • 14. Propagation Common scenarios: • Copying data that can’t be accessed directly / remotely • Synchronizing data • Data cross-referencing • Infrequent / one-time extracts Characteristics: • Can be one-way or bi-directional • Often repetitive data movement • Medium to large data volume (but not always) Slide 14 March 2009 Mark R. Madsen
  • 15. Federation Common scenarios: • Real-time / low latency data access • Security / regulatory requirements that prevent copying data • Impractical to create a central database (e.g. # sources, latency) • Centralized data services Characteristics: • One-way • Lower data volumes • Higher concurrency Slide 15 March 2009 Mark R. Madsen
  • 16. Choosing Models There are some basic criteria and tradeoffs to consider: • Data currency vs. latency • Diversity of data sources • Data cleansing & transformation • Predictability of performance • Access to the same data is needed via different interfaces • Non-relational sources • Frequency of access • Data volumes • And more… Slide 16 March 2009 Mark R. Madsen
  • 17. A Handy Comparison Chart Consolidation Model Criteria Physical Virtual Data currency Query performance / latency Frequency of access Diversity of data sources Diversity of data types Non-relational data sources Transformation and cleansing Predictability of performance Multiple interfaces to same data Large query / data volume Need for history / aggregation Slide 17 March 2009 Mark R. Madsen
  • 18. Three Implementation Choices • Write code! It’s fun! It’s easy! At first. • Buy proprietary data integration tools • Use available open source tools Slide 18 March 2009 Mark R. Madsen
  • 19. Hand-coded Integration Why is this so common? • DI is an afterthought on application projects • It’s just data • It’s hard to justify expensive tools for ODI • Developers and DBAs don’t talk The market is changing: • Lower tolerance for the high cost of custom DI development and maintenance • External data challenges • Bad fit for consolidation projects Products get better over time. Hand-written code gets worse. Slide 19 March 2009 Mark R. Madsen
  • 20. Buying Data Integration Tools Buying is the usual alternative, mostly ETL tools. • ETL vendors are branching out • Many companies have ETL for BI But… • Poor fit for propagation and synchronization tasks • Centralized servers • Licensing costs / problems for consolidation tasks or broad use Integration code is single-purpose, tools are multi-purpose. You should always go with tools – when you can afford them. Slide 20 March 2009 Mark R. Madsen
  • 21. Use of Tools vs. Hand Coding High Use Medium Use Low Use None 60% 50% 40% 30% 20% 10% 0% ETL EDR EII EAI ETL EDR EII EAI ETL EDR EII EAI ETL EDR EII EAI Source: TDWI, 2006 Slide 21 March 2009 Mark R. Madsen
  • 22. Open Source: End of Buy vs. Build Open source avoids the pitfalls of coding and gains the advantages of using tools. • Tools can be distributed with little to no license restrictions • Application projects budget for features, not glue • Even basic tools have obvious operational advantages over hand-coding Why build custom code when there are comparable tools available? Slide 22 March 2009 Mark R. Madsen
  • 23. Benefits Reported After your organization adopted open source software, what was the primary benefit of its use? Flexibility 31% Lower cost 31% Reduced dependence on vendors 15% Performance 10% Reliability 7% Security 4% Other 3% Source: The 451 Group Slide 23 March 2009 Mark R. Madsen
  • 24. A Side Benefit of Flexibility Comparison of time taken to evaluate tools Source: Yankee Group Slide 24 March 2009 Mark R. Madsen
  • 25. Recommendations 1. Differentiate between analytic data integration and operational data integration 2. Stop hand-coding unless the problem really is trivial, and this includes table replication and DBA SQL scripts 3. Use the right data integration model for the problem 4. Augment existing data integration infrastructure with open source 5. Make open source the default option for data integration tools Slide 25 March 2009 Mark R. Madsen
  • 26. Creative Commons Thanks to the people who made their images available via creative commons: red pill blue pill - http://www.flickr.com/photos/rcrowley/2540057217/ red pill blue pill2 - http://www.flickr.com/photos/thomasthomas/258931782/ happy dog jumping in meadow - http://flickr.com/photos/cenz/16128560/ Writing code – http://flickr.com/photos/cdm/72250667/ Woodworking – http://flickr.com/photos/rigoletto/126367565/ Febo – http://flickr.com/photos/jshyun/1573065713/ open_air_market_bologn - http://flickr.com/photos/pattchi/181259150/ Slide 26 March 2009 Mark R. Madsen
  • 27. Thanks Slide 27 March 2009 Mark R. Madsen
  • 28. Creative Commons This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Slide 28 March 2009 Mark R. Madsen