04 The Visualization Pipeline

DS-620 Data Visualization
Chapter 4 Summary
Valerii Klymchuk
May 23, 2015
0. EXERCISE 0
4 The Visualization Pipeline.
4.1 Conceptual Perspective
The visualization process can be seen as a pipeline consisting of several stages, each modeled by a specific
data transformation operation. This sequence of data transformations is called a visualization pipeline,
it usually has four stages: data importing, data filtering and enrichment, data mapping, and data rendering.
In full detail, the insight maps from the produced images to the actual questions the user has about the
raw data, which are not necessarily one-to-one with the data itself. The process of getting insight goes in an
inverse direction to the visualization pipeline itself.
Some applications provide computational steering - complete round trip of steering data generated
by a process onto a given path by changing its parameters by means of visual feedback. It is implemented
by software applications such as SciRUN, CUMULVS and CSE.
The various steps of the visualization pipeline correspond to a specific sub-functions, each taking a
specific concern. The concatenation, or composition, of these sub-functions yields the desired visualization
V is. Domain specific knowledge is another factor establishing qualitative and quantitative dependencies
between these elements along with user interaction (tuning the parameters of the model).
4.1.1 Importing Data
Importing data into the visualization process implies mapping the raw information DI to a dataset D ∈ D.
Here D represents the set of all supported datasets of a given visualization process.
Data importing step should try to preserve as much of the available input information as possible, and
make as few assumptions as possible about what is important and what is not. The choices made during data
importing determine the quality of the resulting images, and thus the effectiveness of a the visualization.
4.1.2 Data Filtering and Enrichment
Usually, raw data do not model directly the aspects targeted by our questions. Visualization is useful when
the subject of our questions involves more complex features that directly modeled by the input data.
A process called data filtering or data enriching is used to distill our raw data into more appropriate
representations, also called enriched datasets. It performs two tasks:
• data is filtered to extract relevant information
• data is enriched with higher level information that supports a given task.
See what is relevant. Often we are not interested in the properties of the complete input dataset, but
only in those of a specific subset of interest that is relevant for a given task.
Handle large data. A fundamental problem related to size is the limited output resolution of the typical
computer screens used by visualization applications. One solution used in practice is zooming, i.e., subsam-
pling the input image and displaying only a subset that captures the overall characteristics of the complete
1

dataset. A complimentary solution is panning, i.e., selecting a subset of the input image at its original
resolution, can also be seen as a form of data filtering.
Ease of use - a convenience. Datasets are usually transformed from one form to the other during the
visualization process, such that they fit the data representation required by the processing operations we
want to apply.
4.1.3 Mapping Data
The filtering operation produces an enriched dataset that directly represents the features of interest for a
specific exploration task. Mapping is associating elements of the visual domain with the data elements
present in the enriched dataset. Map : D → DV . The visual domain DV is a multidimensional space whose
axes, or dimensions, are those elements that we perceive as quasi-independent visual attributes like: shape,
position, size, color, texture, shading, and motion. Hence, a visual feature is a colored, shaded, textured,
and animated 2D or 3D shape. For height plot example we mapped the actual dataset extent to the xy-
coordinates of a polygonal surface, and the height attribute of the dataset points to the z-coordinate. The 3D
coordinates of the polygonal surface are the visual features that encode our dataset extent and height-scalar
attribute.
Reasons for splitting mapping and rendering:
• Purpose: mapping encodes explicit design decisions about what, and how* we want to visualize. Map-
ping specifies those visual attributes that encode actual data, whereas rendering specifies the remaining
visual attributes that users can tune to their taste to examine 3D scene (e.g., lighting).
• Modularity: Separating two operations modulizes the visualization pipeline and favors a clean design
based on separation of concerns and software reuse. It allows a given visualization pipeline to use
different back-end renderers, such as OpenGL, DirectX, or render-to-file formats.
Desirable mapping properties.
Data mapping targets the specific visualization task of making the invisible and multi-dimensional data
visible and low-dimensional, respectively. To do this, the data mapping function Map should try to satisfy
several desirable properties. Map should preferably be injective. That is, different values x1 = x2 should
be mapped to different visual attribute values Map(x1) = Map(x2) in the visual feature dataset.
Inverting the mapping
Having Map invertible from a purely mathematical point of view is sometimes not enough. We must
know how, and be able to do the inversion mentally when we look at the pictures. For this we must know
how visual attributes: color, shape, icon size orientation; position, texture used in mapping relate to data
attributes of interest. Most visualizations that map numerical attributes to color display a color legend,
which explains how colors correspond to values and assists users in the mental color-to-value inverse mapping.
Other widely used conventions include orientations of cartographic maps, where north is on top, and the
specific color maps used to indicate relief forms on these maps: blue - for water, green for fields, light brown
- for medium heights, dark brown - for mountains, white - for peaks, and similarly - for traffic signs.
Distance preservation. A strong and useful property used in visualization applications, is that the function
Map tries to preserve distances when mapping from the data to the visual domain. The simplest way to do
this is to use a direct proportionality relationship between the two. This is useful when we are interested in
visually comparing relative values rather than assessing absolute attribute values. In practice, visualization
applications often use linear mapping functions to map numerical attribute to height, position, luminance
or hue.
Mapping functions used in visualizations where data is to be measured are sometimes called measure-
ment mappings. Those measurement mapping functions fulfill the representation condition: mapping
maps entities into numbers and empirical relations into numerical relations in such a way that the empirical
relations preserve and are preserved by the numerical relations. In practice, many Map functions are not
invertible over their entire domain, and also do not preserve distances. Evaluation of the effectiveness of a
mapping function can only be made with respect to a concrete application domain, task, and user group.
Organization levels characterize the types of operations that one can visually perform with ease on such
variables. A visual variable v ∈ DV is said to be:
2

• Associative if v allows a categorical attribute mapped by v to be perceived independently on the
presence of other visual variables in the same image. For instance, shape is associative, since we can
easily distinguish different shapes even when colored or positioned differently. A variable that is not
associative is called dissociative;
• Selective if v allows a categorical value mapped by v to be (nearly) instantaneously perceived as
different from other values mapped by v;
• Ordinal if v allows one to pre-attentively compare different values mapped by v. For example, size
is ordinal, since we can easily see if two objects have the same size or if one object is larger than the
other one;
• Quantitative if v allows one to visually compute ratios between different values mapped by v. For ex-
ample, size. The following table lists the most common visual variables and their so-called organization
levels.
Visual variable Quantitative Ordinal Selective Associative/Dissociative
Position A
Size D
Brightness D
Texture A
Color (hue) A
Orientation A
Shape A
The main value of this classification is to make designers aware of the inherent limitations that visual
variables have in well-designed visualizations.
The main task of visualization is to derive information, i.e., useful facts that lead to conclusions about a
certain problem, from data (recorded signal samples on a grid). The mapping function should allow retrieval
of information, and not just raw data, from the produced images.
Effective mapping is at the core of success of designing effective visualizations. There are many more
aspects that contribute to a successful design: choice of visual encodings as a function of the display medium;
accepted conventions of the target user group; knowledge of perceptual, cognitive, and human vision factors;
and aesthetic principles driving the overall visualization design.
4.1.4 Rendering Data
The rendering operation is the final step of the visualization process. Rendering takes 3D scene created by
mapping operation, together with several viewing parameters such as viewpoint and lighting, and renders it
to produce the desired images: Render : DV → I. Considering viewing parameters to be part of the rendering
operation allows users to “cheaply” render and examine 3D scene anew for any viewpoint without having to
recompute the mapping operation.
4.2 Implementation Perspective
We can describe the visualization pipeline as a composition of functions:
V is = Fi ◦ F2 ◦ ... ◦ Fn, whereFi : D → D.
The various functions Fi perform the data rendering, mapping, filtering, and importing. The input of Fn is
the application’s raw data, and the output of F1 is the final image. We firs choose the right operations Fi,
second, we we choose the right dataset implementations Dj ∈ D to connect the pipeline functions Fi which
can be implemented as classes having three properties:
• They read one or more input datasets Dinp
i .
• They write one or more output datasets Dout
j .
3

• They have an execute() operation that computes Dout
j given Dinp
i .
The setInput() and getOutput() methods are simple accesors to the input and output datasets Dinp
i and
Dout
j , respectively. The accessors simply store references to the input and output datasets in the local inputs
and outputs vectors of the F class.
The sequence of operation executions in this application model follow the “flow” of data from the im-
porting operation to the final rendering operation. For this reason, this design is often called a dataflow
application model.
Several professional visualization frameworks implement these features. The Visualization Toolkit
(VTK) stands out as a framework, based on dataflow model. Same architectural and design principles
are used in the Insight Toolkit (ITK). While VTK addresses general-purpose data visualization, ITK
focuses on the more specific field of image segmentation, processing, and registration. ITK can handle mul-
tidimensional images and offers algorithms for thresholding, edge detection, smoothing, de-noising, distance
computations, segmentation, and registration.
A visualization application can be implemented as a network of operation objects that have dataset
objects as inputs and outputs. To execute the application, the operations are invoked in the dataflow order,
starting with the data importing and ending with the rendering.
Visual application building is implemented in by several visual programming environments, where
user constructs the dataflow application network by assembling iconic representations of the visualization
operations. Graphical user interfaces (GUIs) are provided by the environment to let users control parameters
of various operations to achieve interactive data exploration.
Such applications as VISSION, MeVisLab and ParaView use VTK library and it’s machinery. ParaView
features a more beginner-friendly end-user interface. The main attraction of visual programming environ-
ments is that they allow rapid prototyping of visualization applications by users who have no programming
skills.
4.3 Algorithm Classification
Visualization algorithms are specific visualization techniques. Most existing classification is base on the
way the visualization techniques interact with each other as parts of the same visualization pipeline, on
types of attributes these techniques work with. We talk about scalar, vector, and tensor visualization
methods, color visualization (rendering) methods, image processing methods, while non-numeric attribute
types are covered by information visualization (infovis) methods.
Domain modeling methods - visualization methods that deal with the underlying sampling domain
representation rather than with attributes. Examples: grid warping (change the location of the sample
points), cutting and selection (extract a subset of a sampling domain as a separate dataset), resampling
(change the cells and/or the basis functions to reconstruct the data).
Alternative structural classification groups visualization techniques by the type of dataset ingredient
they change: geometric techniques (alter geometry, or locations, of sample points), topological techniques
(alter the grid cells), attribute techniques (alter the attributes only), and combined techniques (alter several
of a dataset’s ingredients). Yet, another type of classification is based on following dimensions:
• Task: What is the task to be completed?
• Audience: Which are the users?
• Target: What is the data to visualize?
• Medium: What is rendering (drawing) support?
• Representation: What are the graphical attributes (shapes, colors, textures) used?
4.4 Conclusion
There is no clear-cut separation of the visualization stages of data importing, filtering, mapping, and ren-
dering. Actual applications can separate and structure the pipeline in different ways, depending on design
and implementation considerations that go beyond the topic of this general discussion.
4

From an implementation point of view, the elements that are assembled to form the pipeline should meet
the usual requirements of software components: modularity, reusability, simplicity, extensibility, minimality,
and generality, which is a daunting task.
Effectiveness of a given visualization is critically determined by the mapping function. It should be
invertible so we can grasp the data properties by looking at its visual mapping, and unambiguous, so
we do not doubt about what we see. Aesthetics is essential to a good visualization, as its users must be
attracted to spend effort to study and work with it.
What have you learned in this chapter?
This chapter presented the structure of a complete visualization application, it introduces the four main
ingredients of such an application: data importing, data filtering and enrichment, data mapping, and data
rendering, where visualization process is seen as a composition of functions. Chapter touches upon sev-
eral implementation considerations of this conceptual structure, talks about classification of the various
algorithms used in the visualization process. Visualization pipeline is described from both conceptual and
implementation point of view, VTK and ITK toolkits are introduces.
What surprised you the most?
I was surprising to find out about so many visualization programming environments, as well as to learn
about multiple classifications of visualization methods. It is also surprising to realize that there is no clear-
cut separation of the visualization stages of data importing, filtering, mapping, and rendering in real world
applications.
What applications not mentioned in the book you could imagine for the techniques ex-
plained in this chapter?
I can imagine a Visual Application Builder, which is fully compatible with most textual programming
languages to provide best combination of both. This would allow fast prototyping and advanced data
manipulation needed for most real world applications. I can also imagine applying dataflow application
model for analysis of big data.
1. EXERCISE 1
The visualization pipeline offers an intuitive architectural model to design complex data processing and/or
data-visualization applications by combining lower-level functionality in a so-called dataflow graph. Think
of two visualization applications related to your own professional or daily experience. Describe these appli-
cations in terms of a dataflow graph. For each graph node, explain what the functionality of the respective
node is, and also the kinds of datasets it reads and writes. Try to be as specific as possible.
Let’s consider visualizing heart rate/bit data. This visualization application can be described as a
dataflow graph with following nodes:
• Import
Function of this node is to import raw data into the pipeline from an external source (csv files). Purpose
of this node is to sample continuous signal coming from the device and to translate data storage format (csv
files) into a desired discrete domain dataset (time series), and to resample data from one resolution/grid type
to another (long term or short term analysis). It reads input data from external storage - files or a database
and writes imported dataset into the memory of a computer making it available for later processing.
• Filter
Function of this node is to distill importet raw data (input dataset) into appropriate representation
- an enriched dataset to encode the features of interest. Imported data gets filtered to extract relevant
information. Data is enriched with higher level information (it gets labeled with confidence score based on
levels of noise, activity levels, and other heart rate laws); based on that score and the Data itself other
characteristics of interest are being estimated to support a given task (creating a Vital Sign Classification
and Monitoring Dashboard). It reads time series from computer’s operation memory, manipulates them and
writes Enriched Dataset back in for the next visualization step.
• Map
5

Function of this step is to take features of interest from the Enriched Dataset and to associate them with
the elements of the visual domain (visual attributes, such as shape, color, size, position, texture, shading,
etc.). This step reads Enriched Dataset from computer operational memory, creates appropriate grid and
links visual attributes to it that depict features of interest; then it creates and writes 2D/3D Scene Dataset
back into computer’s memory to be used in rendering step.
• Render
Function of this step is to simulate the physical process of lighting a visible 2D/3D scene. The color of
the plot, the viewpoint, and the lighting parameters do not encode actual data, so the user can tune them
while examining and navigating the 3D scene. Rendering operation renders the scene to produce the desired
images. This function reads Scene Dataset from the memory and produces the final image. It has no output,
but the image itself.
2. EXERCISE 2
The visualization pipeline is, often, implemented in software as a set of data-processing modules that
are connected in a directed graph (the dataflow graph). Here, each graph node is such a module; and each
(directed) edge is the connection of a module’s output to another module’s input. Can you imagine such a
graph, which would contain loops (cycles)? If so, sketch a conceptual visualization application represented
by such a graph, and explain why a loop would be useful. If not, explain which problems would occur if
loops were present in the dataflow graph.
If the application graph is acyclic, the execution is equivalent to calling the execute() method of all
operations Fi in the order of the topological sorting of the graph. This ensures that an operation is executed
only when all its inputs are available and up-to-date. Cyclic application graphs can also be accommodated but
require more complex update mechanisms, for which reason they are less used in practice. Cyclic application
graphs might need advanced reference counting to ensure that inputs and outputs are compatible with
dataset-operations at each step.
Figure 1: Example of data flow graph with loops.
Let’s assume, that node xin imports data, then node x1 filters and cleans data with clustering or clas-
sification based on basic data integrity rules, then it passes improved dataset to node x4, which performs
another type of clustering, classification and perhaps mapps data to the features of interest. Output of x4
might affect dataset in a way, that data integrity can be altered, so it needs to be passed back to x1 for one
more stage of cleaning. Data flow in loop between x1 and x4 continues until integrity of x4’s output meets
certain criteria to be considered as final. After that the scene, produced by x4 is rendered by node xout.
3. EXERCISE 3
The dataflow model used for constructing visualization applications is often supported by so-called visual
builders, where users can interactively construct a visualization application by placing modules on a canvas
6

and connecting their inputs and outputs to form a dataflow graph. Describe another application domain
(apart from data visualization) where you know that, or imagine that, this kind of visual programming would
be an effective paradigm. For that domain, give a few examples of modules by describing their functionality,
inputs, and outputs.
I can think of a real time sound control system. Sound signal from a musical instrument (microphone)
gets imported and sampled by a computer, then it is analyzed and filtered by an algorithm to get rid of noise
and perhaps some “false” notes, then global control unit enriches the signal by replacing “faulty notes” with
proper harmonics, and adds another soundtrack on top of it, synchronizing music with the singer’s voice.
Combined signal can now be amplified and is ready to be sent to the output speakers for the audience to
enjoy.
4. EXERCISE 4
Visual application builders (VABs) offer an alternative to classical textual programming (CTP) construct-
ing dataflow applications such as those present in visualization contexts. However, there are also contexts
in which VABs are less effective and/or efficient to use than CTP. Consider the VAB examples illustrated in
Chapter 4. Based on this information and/or your concrete experience with a VAB
• Enumerate four advantages of VAB vs CTP
• Enumerate four advantages of CTP vs VAB
• Present a possible system design which would combine the advantages of VAB and CTP while limiting
their separate disadvantages.
Hints: First, consider the tasks that a programmer or end-user would like to accomplish by using both
application-building paradigms.
Advantages of VAB vs CTP:
• Visual programming environments provide a way to rapidly prototype and develop a program by users
who have no programming experience, which speeding up the development cycle.
• Provide simpler and more intuitive application construction mechanisms to visualize program flow
(blocks and wires, instead of lines of code), visual tools can be more useful for non-programmers to set
up some logic.
• Makes it easy to manage different things going in parallel.
• Grants users interactive control over parameters of the operations, allows interactive data exploration
easy to use and learn and more beginner friendly end-user interface.
• Great for data acquisition and signal processing with some minor manipulations in-between but not
so great for experimental programing. It allows computer program to use an interface, as opposed to
writing programming code manually
Advantages of CTP vs VAB:
• In visual programming you can’t have more than 50 visual primitives on the screen at the same time.
Text is more concise and takes less space. Graphical elements will take up valuable screen space and
it might be difficult to understand the method.
• Programming complicated algorithms is easier with CTP and is more difficult with VAB, since you
have absolute control over textual code.
• Many real world applications have hundreds of operations and contain intricate control flow that cannot
be easily modeled using the dataflow paradigm, and need complex custom code.
• Structure of an application changes rarely once it is in development stage, so visual programming
environments are less suited for the creation of final applications.
Let’s consider a hypothetical system, which uses both: visual and textual programing. Custom libraries
can be created in textual languages like Python or C++ to implement sophisticated algorithms to better
the quality of datasets and to provide data manipulations required. Those libraries and methods can be
invoked at each stage of application’s data pipeline, and can also run in parallel if needed. The pipeline as
an application can be prototyped and modeled in a visual application builder.
7

5. EXERCISE 5
Visualization techniques and tools can be classified using the five-element model of Marcus et al. (task,
audience, target, medium, and representation) described in Section 4.3, Chapter 4. Give two examples of
visualization applications of your choice, and explain, for each example, which are the five elements of the
above model.
Visualization of human’s brain tissue from CT-scan data.
• Task: visualizing digital images of brain scans to support diagnosis and treatment of brain injuries.
• Audience: health care practitioners, patients who had brain injuries.
• Target: data coming from X-Rays, MRI or CT scans
• Medium: rendering supports picturing the structure of a human’s brain by concentrating on various
features of interest: tissue type, tissue density, liquid flow, etc.
• Representation: various graphical attributes are used, such as color to represent tissue type, textures
- to depict tissue density, shapes - to depict fiber cells.
Visualization of vital sign data from a portable monitoring device.
• Task: visualizing digital signal containing vital signs of a person in support of classification and moni-
toring dashboard.
• Audience: health care practitioners, insurance companies, patients who monitor their health.
• Target: csv-data coming from a portable heart bit monitor placed on person’s wrist.
• Medium: rendering supports picture containing various features of interest: current mode/regime
of heart bit, class of heart performance, estimated health risks against possible complications, and
historical trends.
• Representation: various graphical attributes are used, such as color to represent calls categories, tex-
tures - to depict measure of deviations from norm, shapes - to depict indicators or risks.
6. EXERCISE 6
Visual programming is a useful tool for quick prototyping of relatively simple visualization applications.
However, building a large and complex application whose dataflow graph consists of hundreds of modules, and
where each module has many inputs and outputs, can be challenging in terms of the manual effort required
to find the right modules, place them at good positions on the canvas, and connect the right inputs and
outputs. If you were the designer of the next-generation visual programming tool, propose three functions
you would add to a visual-programming tool in order to speed up the building process. Hints: Think where
the bottlenecks are for a beginner user in terms of the operations needed to construct the right dataflow
graph. Think also about the repetitive actions an advanced user needs to do.
3 functions:
• Access control: Datasets should be only accessible when they contain appropriate (updated) data.
• Direct linkage with other languages such as Python, R, C/C++ to allow advanced algorithms to work
with datasets.
• Time dimension for debugging
7. EXERCISE 7
Consider the conceptual data visualization pipeline. Here, data read from an input source is transformed
by various filters, next it is mapped to geometric primitives, which are finally rendered on the screen.
Consider now that the user is interested to select any visible element in the final image, e.g. a polygon or
vertex, and ask the visualization system “From which raw data elements has this element come? And via
which operations?” Your visualization system is implemented based on the operator-dataset model outlined
in Figure 4.5 (also shown below). That is, the pipeline consists of a sequence of computational functions or
operators that read, respectively write, dataset object. How would you implement the above ‘back tracing’
functionality in such a system?
8

Figure 2: Visualization pipeline as a directed graph of datasets and operators.
Hints: Start from the end towards the beginning. Any visible element that the user can select is, in
essence, a geometric primitive coming from the last dataset-object that the visualization pipeline produces
and feeds to the rendering operator. Think of how you can augment the dataset representation with back-
tracing information that encodes, at a low level (cells and vertices) both origins of these data elements and
the operations they were generated by.
We can implement back-tracing functionality by forcing each of the operators/functions to carry a tag for
the data output they produce. The tag must contain references to the original (raw) data elements altered
by this operator, as well as the list of operations performed by every operator. At the end of the mapping
stage we will have a dataset that allows us to trace any visible element back through all the operations to
the raw data, from which this element comes.
9

04 The Visualization Pipeline

Recommended

Recommended

More Related Content

More from Valerii Klymchuk

More from Valerii Klymchuk (9)

Recently uploaded

Recently uploaded (20)

04 The Visualization Pipeline