SlideShare a Scribd company logo
1 of 13
Download to read offline
White Paper



Data Warehousing 2.0
Modeling and Metadata Strategies for Next Generation
Architectures

By Bill H. Inmon
Forest Rim Technology, LLC

April 2010




Corporate Headquarters              EMEA Headquarters         Asia-Pacific Headquarters
100 California Street, 12th Floor   York House                L7. 313 La Trobe Street
San Francisco, California 94111     18 York Road              Melbourne VIC 3000
                                    Maidenhead, Berkshire     Australia
                                    SL6 1SF, United Kingdom
Data Warehousing 2.0 – Bill Inmon




INTRODUCTION
Data warehousing has undergone a constant state of evolution since the beginning. Just when
you think that everything has been discovered and developed, data warehousing evolves once
again, mutating into a new form and structure.


EVOLUTION OF THE DATA WAREHOUSE
From the beginning the evolution of data warehousing has been shaped by powerful forces.
First there was the need for access to data. Then there was the need for integration. Then came
the need for a single version of the truth. Then different departmental needs for looking at the
same foundation of data arose.

The general evolution of the first stages of the data warehouse environment is shown by Fig 1.




                                                                                edw




                 First there were applications, then there was a
                 data warehouse, then there was an infrastructure
                 surrounding the data warehouse

                                             Figure 1

In the beginning there were applications. Applications grew so quickly that there developed
what was termed the “spider web environment” where the same data was scattered all over the
landscape. The frustration with the spider web environment led to the creation of the first data
warehouse. The simple and singular data warehouse addressed many of the problems of the
spider web environment.

But soon it was discovered that other components of the data warehouse infrastructure were
needed. There was a need for ETL, the process that allows data to be read in an unintegrated
application format and written out in a corporate, integrated format. Then there ware data
marts, where different departments had their own version of the base data found in the data
warehouse environment. Soon the ODS appeared where there was a need for high
performance, transaction processing on integrated data.

An entire infrastructure grew up around the world of the simple data warehouse. An
architecture called the “cif”, or the corporate information factory, grew up around the data
warehouse.

But the evolution of the data warehouse did not stop there. Soon the evolution of the data
warehouse grew to include a broader set of requirements. Soon the notion of a data warehouse
expanded to include a full set of data fulfilling a newly discovered set of requirements that
reached well beyond the original notion of what a data warehouse should be.


Embarcadero Technologies, Inc.                                                          Page - 1 -
Data Warehousing 2.0 – Bill Inmon



DW 2.0: ARCHITECTURE FOR THE NEXT
GENERATION OF DATA WAREHOUSING
This new architecture was called “DW 2.0”. Fig 2 shows the basic description of DW 2.0.

                                    DW 2.0
                                 Architecture for the next
                                 generation of data warehousing
                                                                                        Transaction
                                                                                        data

                   Interactive                                                           A      A       A
                     Very
                                                                                         p      p       p
                     current                                                             p      p       p
                                                                                         l      l       l




                                  Textual                             Detailed        Continuous
                                  subjects                                            snapshot
                                  Internal, external                  S   S   S   S   data
                                                       Simple         u   u   u   u
                                                       pointer        b   b   b   b
                                  Captured                                            Profile
                  Integrated      text                                j   j   j   j
                                                                                      data
                                                                                                S
                                                                                                u
                                                                                                    j j
                                                                                                    b b
                                                                                                    u u
                                                                                                    S S

                                                                                                b
                   Current++      Text id ......                                                j


                                  Linkage

                                   Text to subj                       Summary




                                  Textual                             Detailed        Continuous
                                  subjects                                            snapshot
                                  Internal, external                  S   S   S   S   data
                                                       Simple         u   u   u   u
                                                       pointer        b   b   b   b
                                  Captured                                            Profile
                  Near line       text
                                                                      j   j   j   j
                                                                                      data
                                                                                                S
                                                                                                u
                                                                                                    j j
                                                                                                    b b
                                                                                                    u u
                                                                                                    S S

                                                                                                b
                   Less than
                   current
                                   Text id ......                                               j


                                  Linkage
                                   Text to subj                       Summary




                                  Textual                             Detailed        Continuous
                                  subjects                                            snapshot
                                                                      S   S   S   S   data
                                  Internal, external                  u   u   u   u
                                                       Simple
                                                                      b   b   b   b
                                                       pointer                        Profile
                   Archival       Captured                            j   j   j   j
                                                                                      data
                                                                                                S
                                                                                                u
                                                                                                    j
                                                                                                    b
                                                                                                    u
                                                                                                    S   j
                                                                                                        b
                                                                                                        u
                                                                                                        S
                                  text                                                          b
                                                                                                j
                     Older         Text id ......
                                  Linkage
                                                                      Summary
                                   Text to subj




                                      Then there was DW 2.0,
                                      architecture for the next
                                      generation of data
                                      warehousing
                                                                 Figure 2

Like the earlier renditions of data warehouse architecture that preceded DW 2.0, DW 2.0 was
shaped by powerful evolutionary forces.


THE LIFE CYCLE OF DATA
The first powerful evolutionary force that shaped DW 2.0 was the recognition that the data
within the data warehouse contained its own life cycle. It was not enough to merely place data


Embarcadero Technologies, Inc.                                                                              Page - 2 -
Data Warehousing 2.0 – Bill Inmon


on disk storage and call it a data warehouse. As time passed that data began to exhibit its own
characteristics. The first manifestation of the life cycle of the data within the data warehouse was
that over time, the probability of access to data in the data warehouse dropped. The older data
became, the less that data was accessed. A second manifestation of the lifecycle of data with
the data warehouse was that over time the volumes of data in the data warehouse grew rapidly.
This led to a paradox: the larger a data warehouse became and the older the data in the data
warehouse, the smaller percentage of data was being used.


UNSTRUCTURED DATA
The second manifestation of evolutionary forces in the data warehouse was the realization that
unstructured, textual data belonged in the data warehouse. The original data that was placed in
the data warehouse was transaction based, repetitive data. Many corporate decisions were
based on this kind of data. But there is much important data that is not transaction based that
belongs in the data warehouse.


METADATA
A third realization was that metadata belonged in the data warehouse as a formal and integral
component. In first generation data warehouses, metadata was an afterthought. But there are
powerful reasons why metadata needs to become an integral and formal part of the data
warehouse environment.

Fig 3 shows the evolutionary forces and their effect on the DW 2.0 environment.
                                     DW 2.0
                                  Architecture for the next
                                  generation of data warehousing
                                                                               Transaction
                                                                               data

                    Interactive                                                 A A A
                      Very
                                                                                p p p
                      current                                                   p p p
                                                                                l l l




                                   Textual                        Detailed   Continuous
                                   subjects
                                   Internal, external
                                                        Simple
                                                        pointer
                                                                  S S S S
                                                                  u u u u
                                                                  b b b b
                                                                             snapshot
                                                                             data
                                                                                                          Recognition
                                   Captured                                  Profile
                    Integrated
                    Current++
                                   text
                                   Text id ......
                                                                  j j j j
                                                                             data
                                                                                       S S S
                                                                                       u
                                                                                       b
                                                                                       j
                                                                                         j j
                                                                                         b b
                                                                                         u u

                                                                                                          of the need
                                   Linkage
                                                                                                          for a formal
A recognition                       Text to subj                  Summary

                                                                                                          metadata
of the life cycle                  Textual
                                   subjects
                                                                  Detailed   Continuous
                                                                             snapshot                     infrastructure
of data within      Near line
                                   Internal, external

                                   Captured
                                                        Simple
                                                        pointer
                                                                  S S S S
                                                                  u u u u
                                                                  b b b b
                                                                  j j j j
                                                                             data

                                                                             Profile   S S S
                                                                                         j j
                                                                                         b b
                                                                                         u u


the data
                                   text                                      data      u
                                                                                       b
                    Less than
                    current
                                    Text id ......                                     j


                                   Linkage
 warehouse                          Text to subj                  Summary




                                   Textual                        Detailed   Continuous
                                   subjects                                  snapshot
                                                                  S S S S    data
                                   Internal, external             u u u u
                                                        Simple
                                                                  b b b b
                                                        pointer              Profile
                    Archival       Captured                       j j j j
                                                                             data
                                                                                       S S S
                                                                                       u
                                                                                         j j
                                                                                         b b
                                                                                         u u
                                   text                                                b
                                                                                       j
                      Older         Text id ......
                                   Linkage
                                                                  Summary
                                    Text to subj




                                  Recognition of the need to integrate
                                  both structured and unstructured data
                                  in the data warehouse

                                                                                               Figure 3

DW 2.0 has become the architectural paradigm for modern data warehouses. For a detailed
description of DW 2.0 refer to the book DW 2.0 ARCHITECTURE FOR THE NEXT GENERATION
OF DATA WAREHOUSING, Morgan Kaufman, 2008.

Embarcadero Technologies, Inc.                                                                                             Page - 3 -
Data Warehousing 2.0 – Bill Inmon


Like most large architectures, DW 2.0 is not build all at once. Instead an organization builds first
one component of DW 2.0 then another component. Indeed some components of the data
warehouse are optional, such as the near line sector.


COMPARTMENTS OF PROCESSING
In any case, the different components of DW 2.0 consist of sectors that perform one basic
function. These sectors are in a sense their own small operating environments. Each sector has
its own purpose and its own functionality.

Fig 4 shows that each sector is neatly compartmentalized.
                  Interactive
                    Very
                    current
                                                                                 Transaction
                                                                                 data

                                                                                   A    A      A
                                                                                   p    p      p
                                                                                   p    p      p
                                                                                   l    l      l




             Integrated
             Current++
                              Textual                        Detailed        Continuous
                              subjects                                       snapshot
                                                             S   S   S   S   data
                              Internal, external
                                                   Simple    u   u   u   u
                                                   pointer   b   b   b   b
                              Captured                                       Profile   S S S
                                                             j   j   j   j
                              text                                           data      u u u
                                                                                       b b b
                                                                                       j j j
                              Text id ......
                              Linkage
                                                             Summary
                               Text to subj




             Near line                                                                             DW 2.0 is made up of
             Less than
             current          Textual
                              subjects
                                                              Detailed

                                                              S S S S
                                                                              Continuous
                                                                              snapshot
                                                                              data
                                                                                                   different components
                              Internal, external              u u u u
                                                   Simple
                                                   pointer    b b b b
                              Captured                                       Profile   S S S
                                                              j j j j
                              text                                           data      u u u
                                                                                       b b b
                                                                                       j j j
                               Text id ......
                              Linkage
                                                              Summary
                               Text to subj




             Archival
                                                             Detailed        Continuous
                              Textual
                Older         subjects                                       snapshot
                                                             S S S S         data
                              Internal, external             u u u u
                                                   Simple    b b b b
                              Captured             pointer                   Profile   S S S
                                                             j j j j
                              text                                           data      u u u
                                                                                       b b b
                                                                                       j j j
                               Text id ......
                              Linkage
                                                             Summary
                               Text to subj




                                                                                       Figure 4

In light of the individual compartmentalization of the sectors of DW 2.0, the question then
becomes – how is work distributed and coordinated across the different components? The
answer is that work is distributed and coordinated by means of the passage of metadata from
one component to the next.


METADATA AS THE GLUE
Fig 5 shows that metadata is the “glue” that binds the different operating components of DW
2.0 together.




Embarcadero Technologies, Inc.                                                                                            Page - 4 -
Data Warehousing 2.0 – Bill Inmon


                                                             Interactive
                                                                                Transaction
                                                              Very
                                                                                data
                                                              current
                                                                                  A    A      A
                                                                                  p    p      p
                                                                                  p    p      p
                                                                                  l    l      l




                 Integrated
                 Current++
                              Textual                            Detailed   Continuous
                              subjects                                      snapshot
                                                                 S S S S    data
                              Internal, external
                                                   Simple        u u u u
                                                   pointer       b b b b
                              Captured                                      Profile   S S S
                                                                 j j j j
                              text                                          data      u u u
                                                                                      b b b
                                                                                      j j j
                              Text id ......
                              Linkage

                               Text to subj                      Summary
                                                                                                  There is a
                                                                                                  need for tight
                Near line                                                                         communications
                 Less than
                 current      Textual
                              subjects
                              Internal, external
                                                                 Detailed

                                                                 S S S S
                                                                             Continuous
                                                                             snapshot
                                                                             data
                                                                                                  and tight
                                                                 u u u u
                                                                                                  coordination
                                                   Simple
                                                   pointer       b b b b
                              Captured                                      Profile   S S S
                                                                 j j j j
                              text                                          data      u u u
                                                                                      b b b
                               Text id ......
                              Linkage
                                                                                      j j j

                                                                                                  among the
                               Text to subj
                                                                 Summary
                                                                                                  different
                 Archival
                                                                                                  components of
                    Older
                              Textual
                              subjects
                                                                Detailed

                                                                S S S S
                                                                            Continuous
                                                                            snapshot
                                                                            data
                                                                                                  DW 2.0
                              Internal, external                u u u u
                                                   Simple       b b b b
                              Captured             pointer                  Profile   S S S
                                                                j j j j
                              text                                          data      u u u
                                                                                      b b b
                                                                                      j j j
                               Text id ......
                              Linkage
                                                                Summary
                               Text to subj




                                                                                Figure 5

In a sense, metadata forms a lattice work that holds the components of DW 2.0 together. It is
the passage of metadata that allows the different components of DW 2.0 to work in cooperation
and in coordination with each other.
Stated differently, without metadata, the different components of DW 2.0 would not be able to
achieve a coordinated and cohesive work flow.

Consider a symphony orchestra. What kind of music would be created if the violins were playing
Beethoven’s Fifth, the cellos were playing “Hotel California”, the drums were playing Aretha
Franklin’s “Respect”, the flutes were playing “Happy Birthday”, and the trumpets were playing
“Jingle Bells”. The result would be a bunch of noise – nothing that anyone would want to listen
to.

But now suppose a conductor steps in and gets everyone to play DeBussy’s Le Mer. Now,
suddenly the very same components that just a few seconds ago sounded awful are making very
beautiful music. What is needed is a conductor, and it is metadata that plays the role of the
conductor. Metadata of course does not direct a symphony orchestra. Metadata instead directs
the DW 2.0 environment, with all of its different components and different technologies.

Fig 6 shows the role of metadata and the role of a conductor.




Embarcadero Technologies, Inc.                                                                                     Page - 5 -
Data Warehousing 2.0 – Bill Inmon




               The orchestra with no conductor                The orchestra with a conductor

                                                 Figure 6

So what kind of metadata is used to be passed from one DW 2.0 component to another? In
truth there are all sorts of metadata that are passed from one DW 2.0 component to another.


DIFFERENT TYPES OF METADATA
Typical types of metadata that are passed include:

 •   data descriptions
 •   data definitions
 •   formula and algorithms
 •   the timing of processing
 •   description of assorted volumes of data
 •   operating parameters
 •   encoded values
 •   processed/calculated/derived data
 •   ratios and statistics
 •   status codes, and the like


Fig 7 shows the different kinds of metadata that can be passed from one component to the
next. (Note: this list is merely a representative sample of the different types of metadata that
can be passed from one component to the next.)




Embarcadero Technologies, Inc.                                                                 Page - 6 -
Data Warehousing 2.0 – Bill Inmon


                 Interactive
                                          Transaction
                  Very
                                          data
                  current
                                            A    A      A
                                            p    p      p
                                            p    p      p
                                            l    l      l




                     Detailed         Continuous
                                      snapshot
                     S   S    S   S   data
                     u
                     b
                     j
                         u
                         b
                         j
                              u
                              b
                              j
                                  u
                                  b
                                  j
                                      Profile   S S S
                                                            The data that is passed from one
                                      data      u u u
                                                b b b
                                                j j j
                                                            component to the next includes -
                     Summary
                                                              - data descriptions
                                                              - parameters
                        Detailed       Continuous
                                                              - processed data
                     S
                     u
                     b
                          S
                          u
                          b
                              S
                              u
                              b
                                  S
                                  u
                                  b
                                       snapshot
                                       data                   - ratios and statistics
                     j    j   j   j
                                      Profile
                                      data
                                                S S S
                                                u u u
                                                b b b
                                                j j j
                                                              - status codes
                     Summary
                                                              - and so forth

                    Detailed          Continuous
                                      snapshot
                    S    S    S   S   data
                    u    u    u   u
                    b    b    b   b
                                      Profile   S S S
                    j    j    j   j
                                      data      u u u
                                                b b b
                                                j j j




                    Summary




                                                                 Figure 7

The data descriptions that can be passed from one component of DW 2.0 to another often
reflect on the data model that exists for each component of DW 2.0. Indeed, there is a data
model for each of the different components of the DW 2.0 environment.




Embarcadero Technologies, Inc.                                                                 Page - 7 -
Data Warehousing 2.0 – Bill Inmon



DIFFERENT DATA MODELS
Fig 8 shows the different data models that exist for the different components of DW 2.0.


                                    Interactive
                                                       Transaction
                                     Very
                                                       data
                                     current
                                                         A
                                                         p
                                                              A
                                                              p
                                                                     A
                                                                     p              Operational,
                                                         p    p      p
                                                         l    l      l
                                                                                    application
                                                                                    data model

                                        Detailed

                                        S S S S
                                        u u u u
                                                    Continuous
                                                    snapshot
                                                    data
                                                                                    Integrated,
                                        b b b b
                                        j j j j
                                                   Profile
                                                   data
                                                             S S S
                                                             u u u
                                                             b b b
                                                                                    corporate
                                                                                    data model
                                                             j j j




                                        Summary




                                        Detailed

                                        S S S S
                                                    Continuous
                                                    snapshot
                                                    data
                                                                                    See integrated,
                                        u u u u
                                        b b b b
                                        j j j j
                                                   Profile
                                                   data
                                                             S S S
                                                             u u u
                                                             b b b
                                                                                    corporate
                                                             j j j
                                                                                    data model
                                        Summary




                                       Detailed

                                       S S S S
                                                   Continuous
                                                   snapshot
                                                   data
                                                                                    Archival data
                                       u u u u
                                       b b b b
                                       j j j j
                                                   Profile
                                                   data
                                                             S S S
                                                             u u u
                                                                                    model
                                                             b b b
                                                             j j j




                                       Summary




                                      There is a data model for each
                                      sector of the DW 2.0 environment

                                                                         Figure 8

At the operational level (i.e., the interactive sector of DW 2.0) is an application oriented data
model. This data model reflects the transactional nature of the work that is done here. While
integration may occur here, it often doesn’t. In addition, this level of data often includes the
reception of data from applications that lie outside of it.

The second data model that is found is that of the data model for the integrated sector. The
integrated sector is the place where corporate integration occurs. The integrated data model is
a classical subject oriented, non volatile model.

The third data model that (optionally) appears is the data model for the near line sector. If the
near line sector is part of the data warehouse environment, then the near line data model is
IDENTICAL to the integrated data model. Stated differently, if the near line sector appears at
all, then the data model for the near line sector is IDENTICAL, in every way, to the integrated
data model.

The fourth data model found in the DW 2.0 environment is the data model for the archival
environment. The archival data model is the data model that reflects several important aspects
of design:

 •   The need to look at and measure data over lengthy periods of time
 •   the need to need to reflect a changing metadata structure over time
 •   The need to store metadata directly in the same physical volume of data as the actual data
     itself,
 •   The need to store calculation algorithms along with summarized or aggregated data
 •   The need to be able to free data from its software structuring


Embarcadero Technologies, Inc.                                                                        Page - 8 -
Data Warehousing 2.0 – Bill Inmon


 •   The need to be able to further normalize data as it is placed in the archival environment,
     and so forth.
There are then some distinctive needs for a data model for each of the different layers
processing that occur in the DW 2.0 environment.


A COMMON THEME
It is of interest that the different data models for the different sectors of DW 2.0 are in fact very
different. Having stated that, there never the less is a common theme running through each of
the different data models. For example, the integrated data model is undeniably akin to its
interactive data model. And the data model found at the archival environment is undeniably
related to the integrated data model.

The similarities of the different data models to each other are somewhat akin to looking at a
grandfather, his daughter, and the child of the daughter. There will be certain facial and body
similarities throughout the family. But no one will mistake a grandfather for his daughter, and no
one will mistake a daughter for her child.

Fig 9 shows that there are definite familial similarities running one data model and the next.
                              Interactive
                                                      Transaction
                               Very
                                                      data
                               current
                                                        A    A      A
                                                        p    p      p
                                                        p    p      p
                                                        l    l      l




                                  Detailed         Continuous
                                                   snapshot
                                  S   S   S   S    data
                                  u   u   u   u
                                  b   b   b   b
                                                  Profile   S S S
                                  j   j   j   j
                                                  data      u u u
                                                            b b b
                                                            j j j




                                  Summary




                                  Detailed         Continuous
                                                   snapshot
                                  S   S   S   S    data
                                  u   u   u   u
                                  b   b   b   b
                                                  Profile   S S S
                                  j   j   j   j
                                                  data      u u u
                                                            b b b
                                                            j j j




                                  Summary




                                 Detailed         Continuous
                                                  snapshot
                                 S S S S          data
                                 u u u u
                                 b b b b
                                                  Profile   S S S
                                 j j j j
                                                  data      u u u
                                                            b b b
                                                            j j j




                                 Summary




                             There is a great deal of commonality
                             from one type of data model to the next
                                                                        Figure 9




Embarcadero Technologies, Inc.                                                               Page - 9 -
Data Warehousing 2.0 – Bill Inmon



THE TECHNICAL COMMUNITY
So who do the metadata and the data model aspects of the DW 2.0 environment help? The first
set of people that are helped by the data models and the metadata is the development and
maintenance organizations. The technicians of the organization find that data models and
metadata become the Bible of development and maintenance. The data models and the
metadata become the true “intellectual roadmap” for the ongoing development and
maintenance of the data warehouse environment. Stated differently, without metadata and a
data model, the technician is like a mechanic working on a Yugo where the Yugo is no longer
produced and where the manuals describing the Yugo are either all lost or all in Serbian, where
the mechanic only reads and speaks English. Trying to do a repair job on a Yugo under those
circumstances is questionable under the best of circumstances.

Fig 10 shows the use of the metadata and data models by the technical community.

                     Interactive
                                             Transaction
                      Very
                                             data
                      current
                                               A    A      A
                                               p    p      p
                                               p    p      p
                                               l    l      l




                         Detailed        Continuous
                                         snapshot
                         S   S   S   S   data
                         u   u   u   u
                         b   b   b   b
                                         Profile   S S S
                         j   j   j   j
                                         data      u u u
                                                   b b b
                                                   j j j




                         Summary




                         Detailed         Continuous
                                          snapshot
                         S   S   S S      data
                         u   u   u u
                         b   b   b b
                                         Profile   S S S
                         j   j   j j
                                         data      u u u
                                                   b b b
                                                   j j j




                         Summary




                        Detailed         Continuous
                                         snapshot
                        S S S S          data
                        u u u u
                        b b b b
                                         Profile   S S S
                        j j j j
                                         data      u u u
                                                   b b b
                                                   j j j




                        Summary




              The data model for each component of DW 2.0
              becomes the intellectual road map for the
              building and the maintenance the data
              structures found in the component
                                                               Figure 10


THE END USER ANALYTICAL COMMUNITY
But there is another audience that is well served by metadata and data models, and that
audience is the end user analyst.

Consider a newly hired end user analyst that has just received an office right across from you.
The end user analyst has just received an MBA and wants to show his/her stuff. The end user


Embarcadero Technologies, Inc.                                                          Page - 10 -
Data Warehousing 2.0 – Bill Inmon


analyst starts to build an analysis and becomes perplexed. There are so many types of data, so
much similar data, so much data that is of different ages that the analyst does not know where
to start. Merely biting off a chunk of data may be very misleading if the analyst doesn’t chose
the proper data. And there is a lot to choose from.

So the analyst comes to your office and asks –“how do I find my way around this environment?
There is a lot to choose from and I don’t want to start with the wrong data.”

It is at this point that the metadata and the data models become invaluable. Without the
metadata and the data models the end user analyst may wander around the maze of data for a
long time. But with the metadata and the data models, the end user analyst can determine what
data is where and what data provides the best source for the analysis at hand.

Fig 11 shows that metadata and the data models are extremely useful to the analytical
community as well as the technical community.



      Interactive
                              Transaction
       Very
                              data
       current
                                A    A      A
                                p    p      p
                                p    p      p
                                l    l      l




          Detailed        Continuous
                          snapshot
          S   S   S   S   data
          u   u   u   u
          b   b   b   b   Profile   S S S
          j   j   j   j
                          data      u u u
                                    b b b
                                    j j j




          Summary




          Detailed         Continuous
                           snapshot
          S   S   S   S    data
          u   u   u   u
          b   b   b   b   Profile
          j   j                     S S S
                  j   j
                          data      u u u
                                    b b b
                                    j j j




          Summary




         Detailed         Continuous
                          snapshot
                                                Another group of people who are greatly helped
         S S S S
         u u u u
         b b b b
                          data

                          Profile   S S S
                                                by metadata and data models are the end user
         j j j j
                          data      u u u
                                    b b b
                                    j j j       analysts who have to find what data there is to
         Summary                                use as they do their analysis


                                                Figure 11


SUMMARY
In this paper we have discussed DW 2.0 and the evolution of the architecture surrounding data
warehouse. DW 2.0 has different sectors. There is a need for metadata to control the activities in
each of these sectors. In addition there is a data model for each of the sectors. Each data model
is different and yet there is a common theme running through each data model.

The data models and the metadata serve two communities – the technical community and the
end user analysis community. For the technical community the metadata and the data models
act as an intellectual roadmap. For the end user analytical community, the data models and the


Embarcadero Technologies, Inc.                                                             Page - 11 -
Data Warehousing 2.0 – Bill Inmon


metadata serve as a guidepost for where the end user analyst should go to in order to find the
proper data for analysis.


REFERENCES
Inmon, W H - DW 2.0 ARCHITECTURE FOR THE NEXT GENERATION OF DATA
WAREHOUSING, Morgan – Kaufman, 2008.


ABOUT THE AUTHOR
Bill Inmon, “the father of data warehousing”, has written 50 books translated into 9 languages.
Bill founded and took public the world’s first ETL software company. Bill has written over 1000
articles and published in most major trade journals.

Bill has conducted seminars and spoken at conferences on every continent except Antarctica.
Bill holds three software patents. Bill’s latest company is Forest Rim Technology, a company
dedicated to the access and integration of unstructured data into the structured world. Bill’s
website – inmoncif.com - has attracted over 1,000,000 visitors a month. Bill’s weekly newsletter
in b-eye-network.com is one of the most widely read in the industry and goes out to 75,000
subscribers each week.




Embarcadero Technologies, Inc. is a leading provider of award-winning tools for application
developers and database professionals so they can design systems right, build them faster and
run them better, regardless of their platform or programming language. Ninety of the Fortune
100 and an active community of more than three million users worldwide rely on Embarcadero
products to increase productivity, reduce costs, simplify change management and compliance
and accelerate innovation. The company’s flagship tools include: Embarcadero® RAD Studio,
DBArtisan®, Delphi®, ER/Studio®, JBuilder® and Rapid SQL®. Founded in 1993, Embarcadero is
headquartered in San Francisco, with offices located around the world. Embarcadero is online
at www.embarcadero.com.




Embarcadero Technologies, Inc.                                                          Page - 12 -

More Related Content

What's hot

Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AITigerGraph
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisGramener
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
Benefits of data visualization
Benefits of data visualizationBenefits of data visualization
Benefits of data visualizationinfographic_art
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in RDuyen Do
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationPier Luca Lanzi
 
Scaling Big Data Cleansing
Scaling Big Data CleansingScaling Big Data Cleansing
Scaling Big Data CleansingZuhair khayyat
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesDavid Walker
 
Snowflake Data Governance
Snowflake Data GovernanceSnowflake Data Governance
Snowflake Data Governancessuser538b022
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default predictionALTEN Calsoft Labs
 

What's hot (20)

Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AI
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
Benefits of data visualization
Benefits of data visualizationBenefits of data visualization
Benefits of data visualization
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
 
Scaling Big Data Cleansing
Scaling Big Data CleansingScaling Big Data Cleansing
Scaling Big Data Cleansing
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
Knn
KnnKnn
Knn
 
Snowflake Data Governance
Snowflake Data GovernanceSnowflake Data Governance
Snowflake Data Governance
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data anonymization
Data anonymizationData anonymization
Data anonymization
 

Viewers also liked

Warehouse components
Warehouse componentsWarehouse components
Warehouse componentsganblues
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Mike Frampton
 
Business Intelligence - A Management Perspective
Business Intelligence - A Management PerspectiveBusiness Intelligence - A Management Perspective
Business Intelligence - A Management Perspectivevinaya.hs
 
Information And Decision Support System
Information And Decision Support SystemInformation And Decision Support System
Information And Decision Support Systemmegat zainurul anuar
 
Business Intelligence 3.0 Revolution
Business Intelligence 3.0 RevolutionBusiness Intelligence 3.0 Revolution
Business Intelligence 3.0 Revolutionwww.panorama.com
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - IntroDavid Hubbard
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-WarehouseAbdul Aslam
 

Viewers also liked (20)

HR Business Inteligence
HR Business InteligenceHR Business Inteligence
HR Business Inteligence
 
Business Inteligence
Business InteligenceBusiness Inteligence
Business Inteligence
 
Enterprise business Inteligence
Enterprise business Inteligence Enterprise business Inteligence
Enterprise business Inteligence
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
080827 abramson inmon vs kimball
080827 abramson   inmon vs kimball080827 abramson   inmon vs kimball
080827 abramson inmon vs kimball
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
Business Intelligence - A Management Perspective
Business Intelligence - A Management PerspectiveBusiness Intelligence - A Management Perspective
Business Intelligence - A Management Perspective
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Information And Decision Support System
Information And Decision Support SystemInformation And Decision Support System
Information And Decision Support System
 
Decision support system
Decision support systemDecision support system
Decision support system
 
Business Intelligence 3.0 Revolution
Business Intelligence 3.0 RevolutionBusiness Intelligence 3.0 Revolution
Business Intelligence 3.0 Revolution
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - Intro
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
Data mining
Data miningData mining
Data mining
 

More from Embarcadero Technologies

PyTorch for Delphi - Python Data Sciences Libraries.pdf
PyTorch for Delphi - Python Data Sciences Libraries.pdfPyTorch for Delphi - Python Data Sciences Libraries.pdf
PyTorch for Delphi - Python Data Sciences Libraries.pdfEmbarcadero Technologies
 
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...Embarcadero Technologies
 
Linux GUI Applications on Windows Subsystem for Linux
Linux GUI Applications on Windows Subsystem for LinuxLinux GUI Applications on Windows Subsystem for Linux
Linux GUI Applications on Windows Subsystem for LinuxEmbarcadero Technologies
 
Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework Embarcadero Technologies
 
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
Introduction to Python GUI development with Delphi for Python - Part 1:   Del...Introduction to Python GUI development with Delphi for Python - Part 1:   Del...
Introduction to Python GUI development with Delphi for Python - Part 1: Del...Embarcadero Technologies
 
FMXLinux Introduction - Delphi's FireMonkey for Linux
FMXLinux Introduction - Delphi's FireMonkey for LinuxFMXLinux Introduction - Delphi's FireMonkey for Linux
FMXLinux Introduction - Delphi's FireMonkey for LinuxEmbarcadero Technologies
 
Python for Delphi Developers - Part 1 Introduction
Python for Delphi Developers - Part 1 IntroductionPython for Delphi Developers - Part 1 Introduction
Python for Delphi Developers - Part 1 IntroductionEmbarcadero Technologies
 
RAD Industrial Automation, Labs, and Instrumentation
RAD Industrial Automation, Labs, and InstrumentationRAD Industrial Automation, Labs, and Instrumentation
RAD Industrial Automation, Labs, and InstrumentationEmbarcadero Technologies
 
Embeddable Databases for Mobile Apps: Stress-Free Solutions with InterBase
Embeddable Databases for Mobile Apps: Stress-Free Solutions with InterBaseEmbeddable Databases for Mobile Apps: Stress-Free Solutions with InterBase
Embeddable Databases for Mobile Apps: Stress-Free Solutions with InterBaseEmbarcadero Technologies
 
Rad Server Industry Template - Connected Nurses Station - Setup Document
Rad Server Industry Template - Connected Nurses Station - Setup DocumentRad Server Industry Template - Connected Nurses Station - Setup Document
Rad Server Industry Template - Connected Nurses Station - Setup DocumentEmbarcadero Technologies
 
Move Desktop Apps to the Cloud - RollApp & Embarcadero webinar
Move Desktop Apps to the Cloud - RollApp & Embarcadero webinarMove Desktop Apps to the Cloud - RollApp & Embarcadero webinar
Move Desktop Apps to the Cloud - RollApp & Embarcadero webinarEmbarcadero Technologies
 
Getting Started Building Mobile Applications for iOS and Android
Getting Started Building Mobile Applications for iOS and AndroidGetting Started Building Mobile Applications for iOS and Android
Getting Started Building Mobile Applications for iOS and AndroidEmbarcadero Technologies
 
ER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureEmbarcadero Technologies
 
The Secrets of SQL Server: Database Worst Practices
The Secrets of SQL Server: Database Worst PracticesThe Secrets of SQL Server: Database Worst Practices
The Secrets of SQL Server: Database Worst PracticesEmbarcadero Technologies
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsEmbarcadero Technologies
 
Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016Embarcadero Technologies
 

More from Embarcadero Technologies (20)

PyTorch for Delphi - Python Data Sciences Libraries.pdf
PyTorch for Delphi - Python Data Sciences Libraries.pdfPyTorch for Delphi - Python Data Sciences Libraries.pdf
PyTorch for Delphi - Python Data Sciences Libraries.pdf
 
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
 
Linux GUI Applications on Windows Subsystem for Linux
Linux GUI Applications on Windows Subsystem for LinuxLinux GUI Applications on Windows Subsystem for Linux
Linux GUI Applications on Windows Subsystem for Linux
 
Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework
 
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
Introduction to Python GUI development with Delphi for Python - Part 1:   Del...Introduction to Python GUI development with Delphi for Python - Part 1:   Del...
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
 
FMXLinux Introduction - Delphi's FireMonkey for Linux
FMXLinux Introduction - Delphi's FireMonkey for LinuxFMXLinux Introduction - Delphi's FireMonkey for Linux
FMXLinux Introduction - Delphi's FireMonkey for Linux
 
Python for Delphi Developers - Part 2
Python for Delphi Developers - Part 2Python for Delphi Developers - Part 2
Python for Delphi Developers - Part 2
 
Python for Delphi Developers - Part 1 Introduction
Python for Delphi Developers - Part 1 IntroductionPython for Delphi Developers - Part 1 Introduction
Python for Delphi Developers - Part 1 Introduction
 
RAD Industrial Automation, Labs, and Instrumentation
RAD Industrial Automation, Labs, and InstrumentationRAD Industrial Automation, Labs, and Instrumentation
RAD Industrial Automation, Labs, and Instrumentation
 
Embeddable Databases for Mobile Apps: Stress-Free Solutions with InterBase
Embeddable Databases for Mobile Apps: Stress-Free Solutions with InterBaseEmbeddable Databases for Mobile Apps: Stress-Free Solutions with InterBase
Embeddable Databases for Mobile Apps: Stress-Free Solutions with InterBase
 
Rad Server Industry Template - Connected Nurses Station - Setup Document
Rad Server Industry Template - Connected Nurses Station - Setup DocumentRad Server Industry Template - Connected Nurses Station - Setup Document
Rad Server Industry Template - Connected Nurses Station - Setup Document
 
TMS Google Mapping Components
TMS Google Mapping ComponentsTMS Google Mapping Components
TMS Google Mapping Components
 
Move Desktop Apps to the Cloud - RollApp & Embarcadero webinar
Move Desktop Apps to the Cloud - RollApp & Embarcadero webinarMove Desktop Apps to the Cloud - RollApp & Embarcadero webinar
Move Desktop Apps to the Cloud - RollApp & Embarcadero webinar
 
Useful C++ Features You Should be Using
Useful C++ Features You Should be UsingUseful C++ Features You Should be Using
Useful C++ Features You Should be Using
 
Getting Started Building Mobile Applications for iOS and Android
Getting Started Building Mobile Applications for iOS and AndroidGetting Started Building Mobile Applications for iOS and Android
Getting Started Building Mobile Applications for iOS and Android
 
Embarcadero RAD server Launch Webinar
Embarcadero RAD server Launch WebinarEmbarcadero RAD server Launch Webinar
Embarcadero RAD server Launch Webinar
 
ER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data ArchitectureER/Studio 2016: Build a Business-Driven Data Architecture
ER/Studio 2016: Build a Business-Driven Data Architecture
 
The Secrets of SQL Server: Database Worst Practices
The Secrets of SQL Server: Database Worst PracticesThe Secrets of SQL Server: Database Worst Practices
The Secrets of SQL Server: Database Worst Practices
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data Assets
 
Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016Troubleshooting Plan Changes with Query Store in SQL Server 2016
Troubleshooting Plan Changes with Query Store in SQL Server 2016
 

Bill inmon-data-warehousing-2-0-whitepaper

  • 1. White Paper Data Warehousing 2.0 Modeling and Metadata Strategies for Next Generation Architectures By Bill H. Inmon Forest Rim Technology, LLC April 2010 Corporate Headquarters EMEA Headquarters Asia-Pacific Headquarters 100 California Street, 12th Floor York House L7. 313 La Trobe Street San Francisco, California 94111 18 York Road Melbourne VIC 3000 Maidenhead, Berkshire Australia SL6 1SF, United Kingdom
  • 2. Data Warehousing 2.0 – Bill Inmon INTRODUCTION Data warehousing has undergone a constant state of evolution since the beginning. Just when you think that everything has been discovered and developed, data warehousing evolves once again, mutating into a new form and structure. EVOLUTION OF THE DATA WAREHOUSE From the beginning the evolution of data warehousing has been shaped by powerful forces. First there was the need for access to data. Then there was the need for integration. Then came the need for a single version of the truth. Then different departmental needs for looking at the same foundation of data arose. The general evolution of the first stages of the data warehouse environment is shown by Fig 1. edw First there were applications, then there was a data warehouse, then there was an infrastructure surrounding the data warehouse Figure 1 In the beginning there were applications. Applications grew so quickly that there developed what was termed the “spider web environment” where the same data was scattered all over the landscape. The frustration with the spider web environment led to the creation of the first data warehouse. The simple and singular data warehouse addressed many of the problems of the spider web environment. But soon it was discovered that other components of the data warehouse infrastructure were needed. There was a need for ETL, the process that allows data to be read in an unintegrated application format and written out in a corporate, integrated format. Then there ware data marts, where different departments had their own version of the base data found in the data warehouse environment. Soon the ODS appeared where there was a need for high performance, transaction processing on integrated data. An entire infrastructure grew up around the world of the simple data warehouse. An architecture called the “cif”, or the corporate information factory, grew up around the data warehouse. But the evolution of the data warehouse did not stop there. Soon the evolution of the data warehouse grew to include a broader set of requirements. Soon the notion of a data warehouse expanded to include a full set of data fulfilling a newly discovered set of requirements that reached well beyond the original notion of what a data warehouse should be. Embarcadero Technologies, Inc. Page - 1 -
  • 3. Data Warehousing 2.0 – Bill Inmon DW 2.0: ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING This new architecture was called “DW 2.0”. Fig 2 shows the basic description of DW 2.0. DW 2.0 Architecture for the next generation of data warehousing Transaction data Interactive A A A Very p p p current p p p l l l Textual Detailed Continuous subjects snapshot Internal, external S S S S data Simple u u u u pointer b b b b Captured Profile Integrated text j j j j data S u j j b b u u S S b Current++ Text id ...... j Linkage Text to subj Summary Textual Detailed Continuous subjects snapshot Internal, external S S S S data Simple u u u u pointer b b b b Captured Profile Near line text j j j j data S u j j b b u u S S b Less than current Text id ...... j Linkage Text to subj Summary Textual Detailed Continuous subjects snapshot S S S S data Internal, external u u u u Simple b b b b pointer Profile Archival Captured j j j j data S u j b u S j b u S text b j Older Text id ...... Linkage Summary Text to subj Then there was DW 2.0, architecture for the next generation of data warehousing Figure 2 Like the earlier renditions of data warehouse architecture that preceded DW 2.0, DW 2.0 was shaped by powerful evolutionary forces. THE LIFE CYCLE OF DATA The first powerful evolutionary force that shaped DW 2.0 was the recognition that the data within the data warehouse contained its own life cycle. It was not enough to merely place data Embarcadero Technologies, Inc. Page - 2 -
  • 4. Data Warehousing 2.0 – Bill Inmon on disk storage and call it a data warehouse. As time passed that data began to exhibit its own characteristics. The first manifestation of the life cycle of the data within the data warehouse was that over time, the probability of access to data in the data warehouse dropped. The older data became, the less that data was accessed. A second manifestation of the lifecycle of data with the data warehouse was that over time the volumes of data in the data warehouse grew rapidly. This led to a paradox: the larger a data warehouse became and the older the data in the data warehouse, the smaller percentage of data was being used. UNSTRUCTURED DATA The second manifestation of evolutionary forces in the data warehouse was the realization that unstructured, textual data belonged in the data warehouse. The original data that was placed in the data warehouse was transaction based, repetitive data. Many corporate decisions were based on this kind of data. But there is much important data that is not transaction based that belongs in the data warehouse. METADATA A third realization was that metadata belonged in the data warehouse as a formal and integral component. In first generation data warehouses, metadata was an afterthought. But there are powerful reasons why metadata needs to become an integral and formal part of the data warehouse environment. Fig 3 shows the evolutionary forces and their effect on the DW 2.0 environment. DW 2.0 Architecture for the next generation of data warehousing Transaction data Interactive A A A Very p p p current p p p l l l Textual Detailed Continuous subjects Internal, external Simple pointer S S S S u u u u b b b b snapshot data Recognition Captured Profile Integrated Current++ text Text id ...... j j j j data S S S u b j j j b b u u of the need Linkage for a formal A recognition Text to subj Summary metadata of the life cycle Textual subjects Detailed Continuous snapshot infrastructure of data within Near line Internal, external Captured Simple pointer S S S S u u u u b b b b j j j j data Profile S S S j j b b u u the data text data u b Less than current Text id ...... j Linkage warehouse Text to subj Summary Textual Detailed Continuous subjects snapshot S S S S data Internal, external u u u u Simple b b b b pointer Profile Archival Captured j j j j data S S S u j j b b u u text b j Older Text id ...... Linkage Summary Text to subj Recognition of the need to integrate both structured and unstructured data in the data warehouse Figure 3 DW 2.0 has become the architectural paradigm for modern data warehouses. For a detailed description of DW 2.0 refer to the book DW 2.0 ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING, Morgan Kaufman, 2008. Embarcadero Technologies, Inc. Page - 3 -
  • 5. Data Warehousing 2.0 – Bill Inmon Like most large architectures, DW 2.0 is not build all at once. Instead an organization builds first one component of DW 2.0 then another component. Indeed some components of the data warehouse are optional, such as the near line sector. COMPARTMENTS OF PROCESSING In any case, the different components of DW 2.0 consist of sectors that perform one basic function. These sectors are in a sense their own small operating environments. Each sector has its own purpose and its own functionality. Fig 4 shows that each sector is neatly compartmentalized. Interactive Very current Transaction data A A A p p p p p p l l l Integrated Current++ Textual Detailed Continuous subjects snapshot S S S S data Internal, external Simple u u u u pointer b b b b Captured Profile S S S j j j j text data u u u b b b j j j Text id ...... Linkage Summary Text to subj Near line DW 2.0 is made up of Less than current Textual subjects Detailed S S S S Continuous snapshot data different components Internal, external u u u u Simple pointer b b b b Captured Profile S S S j j j j text data u u u b b b j j j Text id ...... Linkage Summary Text to subj Archival Detailed Continuous Textual Older subjects snapshot S S S S data Internal, external u u u u Simple b b b b Captured pointer Profile S S S j j j j text data u u u b b b j j j Text id ...... Linkage Summary Text to subj Figure 4 In light of the individual compartmentalization of the sectors of DW 2.0, the question then becomes – how is work distributed and coordinated across the different components? The answer is that work is distributed and coordinated by means of the passage of metadata from one component to the next. METADATA AS THE GLUE Fig 5 shows that metadata is the “glue” that binds the different operating components of DW 2.0 together. Embarcadero Technologies, Inc. Page - 4 -
  • 6. Data Warehousing 2.0 – Bill Inmon Interactive Transaction Very data current A A A p p p p p p l l l Integrated Current++ Textual Detailed Continuous subjects snapshot S S S S data Internal, external Simple u u u u pointer b b b b Captured Profile S S S j j j j text data u u u b b b j j j Text id ...... Linkage Text to subj Summary There is a need for tight Near line communications Less than current Textual subjects Internal, external Detailed S S S S Continuous snapshot data and tight u u u u coordination Simple pointer b b b b Captured Profile S S S j j j j text data u u u b b b Text id ...... Linkage j j j among the Text to subj Summary different Archival components of Older Textual subjects Detailed S S S S Continuous snapshot data DW 2.0 Internal, external u u u u Simple b b b b Captured pointer Profile S S S j j j j text data u u u b b b j j j Text id ...... Linkage Summary Text to subj Figure 5 In a sense, metadata forms a lattice work that holds the components of DW 2.0 together. It is the passage of metadata that allows the different components of DW 2.0 to work in cooperation and in coordination with each other. Stated differently, without metadata, the different components of DW 2.0 would not be able to achieve a coordinated and cohesive work flow. Consider a symphony orchestra. What kind of music would be created if the violins were playing Beethoven’s Fifth, the cellos were playing “Hotel California”, the drums were playing Aretha Franklin’s “Respect”, the flutes were playing “Happy Birthday”, and the trumpets were playing “Jingle Bells”. The result would be a bunch of noise – nothing that anyone would want to listen to. But now suppose a conductor steps in and gets everyone to play DeBussy’s Le Mer. Now, suddenly the very same components that just a few seconds ago sounded awful are making very beautiful music. What is needed is a conductor, and it is metadata that plays the role of the conductor. Metadata of course does not direct a symphony orchestra. Metadata instead directs the DW 2.0 environment, with all of its different components and different technologies. Fig 6 shows the role of metadata and the role of a conductor. Embarcadero Technologies, Inc. Page - 5 -
  • 7. Data Warehousing 2.0 – Bill Inmon The orchestra with no conductor The orchestra with a conductor Figure 6 So what kind of metadata is used to be passed from one DW 2.0 component to another? In truth there are all sorts of metadata that are passed from one DW 2.0 component to another. DIFFERENT TYPES OF METADATA Typical types of metadata that are passed include: • data descriptions • data definitions • formula and algorithms • the timing of processing • description of assorted volumes of data • operating parameters • encoded values • processed/calculated/derived data • ratios and statistics • status codes, and the like Fig 7 shows the different kinds of metadata that can be passed from one component to the next. (Note: this list is merely a representative sample of the different types of metadata that can be passed from one component to the next.) Embarcadero Technologies, Inc. Page - 6 -
  • 8. Data Warehousing 2.0 – Bill Inmon Interactive Transaction Very data current A A A p p p p p p l l l Detailed Continuous snapshot S S S S data u b j u b j u b j u b j Profile S S S The data that is passed from one data u u u b b b j j j component to the next includes - Summary - data descriptions - parameters Detailed Continuous - processed data S u b S u b S u b S u b snapshot data - ratios and statistics j j j j Profile data S S S u u u b b b j j j - status codes Summary - and so forth Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary Figure 7 The data descriptions that can be passed from one component of DW 2.0 to another often reflect on the data model that exists for each component of DW 2.0. Indeed, there is a data model for each of the different components of the DW 2.0 environment. Embarcadero Technologies, Inc. Page - 7 -
  • 9. Data Warehousing 2.0 – Bill Inmon DIFFERENT DATA MODELS Fig 8 shows the different data models that exist for the different components of DW 2.0. Interactive Transaction Very data current A p A p A p Operational, p p p l l l application data model Detailed S S S S u u u u Continuous snapshot data Integrated, b b b b j j j j Profile data S S S u u u b b b corporate data model j j j Summary Detailed S S S S Continuous snapshot data See integrated, u u u u b b b b j j j j Profile data S S S u u u b b b corporate j j j data model Summary Detailed S S S S Continuous snapshot data Archival data u u u u b b b b j j j j Profile data S S S u u u model b b b j j j Summary There is a data model for each sector of the DW 2.0 environment Figure 8 At the operational level (i.e., the interactive sector of DW 2.0) is an application oriented data model. This data model reflects the transactional nature of the work that is done here. While integration may occur here, it often doesn’t. In addition, this level of data often includes the reception of data from applications that lie outside of it. The second data model that is found is that of the data model for the integrated sector. The integrated sector is the place where corporate integration occurs. The integrated data model is a classical subject oriented, non volatile model. The third data model that (optionally) appears is the data model for the near line sector. If the near line sector is part of the data warehouse environment, then the near line data model is IDENTICAL to the integrated data model. Stated differently, if the near line sector appears at all, then the data model for the near line sector is IDENTICAL, in every way, to the integrated data model. The fourth data model found in the DW 2.0 environment is the data model for the archival environment. The archival data model is the data model that reflects several important aspects of design: • The need to look at and measure data over lengthy periods of time • the need to need to reflect a changing metadata structure over time • The need to store metadata directly in the same physical volume of data as the actual data itself, • The need to store calculation algorithms along with summarized or aggregated data • The need to be able to free data from its software structuring Embarcadero Technologies, Inc. Page - 8 -
  • 10. Data Warehousing 2.0 – Bill Inmon • The need to be able to further normalize data as it is placed in the archival environment, and so forth. There are then some distinctive needs for a data model for each of the different layers processing that occur in the DW 2.0 environment. A COMMON THEME It is of interest that the different data models for the different sectors of DW 2.0 are in fact very different. Having stated that, there never the less is a common theme running through each of the different data models. For example, the integrated data model is undeniably akin to its interactive data model. And the data model found at the archival environment is undeniably related to the integrated data model. The similarities of the different data models to each other are somewhat akin to looking at a grandfather, his daughter, and the child of the daughter. There will be certain facial and body similarities throughout the family. But no one will mistake a grandfather for his daughter, and no one will mistake a daughter for her child. Fig 9 shows that there are definite familial similarities running one data model and the next. Interactive Transaction Very data current A A A p p p p p p l l l Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary There is a great deal of commonality from one type of data model to the next Figure 9 Embarcadero Technologies, Inc. Page - 9 -
  • 11. Data Warehousing 2.0 – Bill Inmon THE TECHNICAL COMMUNITY So who do the metadata and the data model aspects of the DW 2.0 environment help? The first set of people that are helped by the data models and the metadata is the development and maintenance organizations. The technicians of the organization find that data models and metadata become the Bible of development and maintenance. The data models and the metadata become the true “intellectual roadmap” for the ongoing development and maintenance of the data warehouse environment. Stated differently, without metadata and a data model, the technician is like a mechanic working on a Yugo where the Yugo is no longer produced and where the manuals describing the Yugo are either all lost or all in Serbian, where the mechanic only reads and speaks English. Trying to do a repair job on a Yugo under those circumstances is questionable under the best of circumstances. Fig 10 shows the use of the metadata and data models by the technical community. Interactive Transaction Very data current A A A p p p p p p l l l Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary The data model for each component of DW 2.0 becomes the intellectual road map for the building and the maintenance the data structures found in the component Figure 10 THE END USER ANALYTICAL COMMUNITY But there is another audience that is well served by metadata and data models, and that audience is the end user analyst. Consider a newly hired end user analyst that has just received an office right across from you. The end user analyst has just received an MBA and wants to show his/her stuff. The end user Embarcadero Technologies, Inc. Page - 10 -
  • 12. Data Warehousing 2.0 – Bill Inmon analyst starts to build an analysis and becomes perplexed. There are so many types of data, so much similar data, so much data that is of different ages that the analyst does not know where to start. Merely biting off a chunk of data may be very misleading if the analyst doesn’t chose the proper data. And there is a lot to choose from. So the analyst comes to your office and asks –“how do I find my way around this environment? There is a lot to choose from and I don’t want to start with the wrong data.” It is at this point that the metadata and the data models become invaluable. Without the metadata and the data models the end user analyst may wander around the maze of data for a long time. But with the metadata and the data models, the end user analyst can determine what data is where and what data provides the best source for the analysis at hand. Fig 11 shows that metadata and the data models are extremely useful to the analytical community as well as the technical community. Interactive Transaction Very data current A A A p p p p p p l l l Detailed Continuous snapshot S S S S data u u u u b b b b Profile S S S j j j j data u u u b b b j j j Summary Detailed Continuous snapshot S S S S data u u u u b b b b Profile j j S S S j j data u u u b b b j j j Summary Detailed Continuous snapshot Another group of people who are greatly helped S S S S u u u u b b b b data Profile S S S by metadata and data models are the end user j j j j data u u u b b b j j j analysts who have to find what data there is to Summary use as they do their analysis Figure 11 SUMMARY In this paper we have discussed DW 2.0 and the evolution of the architecture surrounding data warehouse. DW 2.0 has different sectors. There is a need for metadata to control the activities in each of these sectors. In addition there is a data model for each of the sectors. Each data model is different and yet there is a common theme running through each data model. The data models and the metadata serve two communities – the technical community and the end user analysis community. For the technical community the metadata and the data models act as an intellectual roadmap. For the end user analytical community, the data models and the Embarcadero Technologies, Inc. Page - 11 -
  • 13. Data Warehousing 2.0 – Bill Inmon metadata serve as a guidepost for where the end user analyst should go to in order to find the proper data for analysis. REFERENCES Inmon, W H - DW 2.0 ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING, Morgan – Kaufman, 2008. ABOUT THE AUTHOR Bill Inmon, “the father of data warehousing”, has written 50 books translated into 9 languages. Bill founded and took public the world’s first ETL software company. Bill has written over 1000 articles and published in most major trade journals. Bill has conducted seminars and spoken at conferences on every continent except Antarctica. Bill holds three software patents. Bill’s latest company is Forest Rim Technology, a company dedicated to the access and integration of unstructured data into the structured world. Bill’s website – inmoncif.com - has attracted over 1,000,000 visitors a month. Bill’s weekly newsletter in b-eye-network.com is one of the most widely read in the industry and goes out to 75,000 subscribers each week. Embarcadero Technologies, Inc. is a leading provider of award-winning tools for application developers and database professionals so they can design systems right, build them faster and run them better, regardless of their platform or programming language. Ninety of the Fortune 100 and an active community of more than three million users worldwide rely on Embarcadero products to increase productivity, reduce costs, simplify change management and compliance and accelerate innovation. The company’s flagship tools include: Embarcadero® RAD Studio, DBArtisan®, Delphi®, ER/Studio®, JBuilder® and Rapid SQL®. Founded in 1993, Embarcadero is headquartered in San Francisco, with offices located around the world. Embarcadero is online at www.embarcadero.com. Embarcadero Technologies, Inc. Page - 12 -