03 Data Representation

Reading Report 3. Data Representation
Valerii Klymchuk
April 3, 2015
0. EXERCISE 0
Chapter 3. Summary
0.1 Continuous Data
A lot of phenomena are modeled in terms of various physical quantities. In data representation these quanti-
ties can be classified in two fundamentally different categories: intrinsically continuous and intrinsically
discrete ones. Continuous data are usually manipulated by computers in some finite approximate form.
Continuous sampled data are also discrete, since they consist of finite set of data elements, however in con-
trast to intrinsically discrete data, sampled data always originates from, and is intended to approximate, a
continuous quantity. In contrast, intrinsically discrete data has no counterpart in the continuous world,
as is the case of page of text, for example. This is a fundamental difference between continuous (sampled)
and discrete data.
Mathematically, continuous data can be modeled as a function
f : D ⊂ Rd
→ C ⊂ Rc
between domain and codomain respectively. f is called a d-dimensional, or d-variate, c-value function. In
visualization f sometimes is called a field.
Function f is continuous if the graph of the function is a connected surface without “holes” or “jumps.”
Cauchy − δ criterion states, that f is continuous, if for every point p ∈ C the following holds:
∀ > 0, ∃δ > 0 : if x − p < δ, x ∈ C ⇒ f(x) − f(p) < .
Also, f is continuous of order k if the function itself and all its derivatives up to and including order k are
also continuous in this sense. This is denoted as f ∈ Ck
.
Functions f whose derivatives are continuous on compact intervals are called piecewise continuous.
The triplet D = (D, C, f) defines a continuous dataset. The dimension d of the space Rd
in which
function’s domain is embedded is called the geometrical dimension. Topological dimension of the
dataset is the dimension s ≤ d of the function domain D itself - the number of independent variables that
we need to represent our domain D. For a line of curve in Euclidean space R3
we have s = 1 and d = 3; if D
is a plane or curved surface, then s = 2. The geometric dimension is always fixed to d = 3, hence, the only
dimension that varies in datasets is the topological dimension s, there fore in practice it is often called the
dataset dimension. We assume that geometrical dimension is always three.
The co-dimension of an object of topological dimension s and geometrical dimension d is the difference
d − s.
The function values are usually called dataset attributes. The dimensionality c of the function codomain
C is also called the attribute dimension (usually ranges from 1 to 4).
0.2 Sampled Data
The two operations relate sampled data and continuous data:
1

• sampling: given a continuous dataset, we have to be able to produce sampled data from it;
• reconstruction: given a sampled dataset, we have to be able to recover an (approximated) version of
the original continuous data.
The reconstruction involves specifying the value of the function between its sample points, using the
sample values, using a technique called interpolation. The reconstruction quality is a function of the amount
and distribution of sample points used.
To be used in practice, a sampled dataset should comply with several requirements: it should be accurate,
minimal, generic, efficient, and simple. By accurate, we mean that one should be able to control the
production of a sampled dataset D∫ from a continuous one D such that D can be constructed from D∫ with
a small user specified error. By minimal, we mean that D∫ contains the least number of sample points needed
to ensure a reconstruction with the desired error. By generic, we mean that we can easily replace the various
data processing operations we had for the continuous D with equivalent counterparts for the sampled D∫ .
By efficient we mean that both the reconstruction operation and the data processing operations we wish to
perform on D∫ can be done efficiently from an algorithmic point of view. By simple we mean that we can
design a reasonably simple software implementation of both D∫ and the operations we want to perform on
it.
We define reconstruction as follows: given a sampled dataset {pi, fi} consisting of a set of N sample
points pi ∈ D and sample values fi ∈ C, we want to produce a continuous function ˆf : D → C that
approximates the original f. The reconstructed function should equal the original one at all sample points,
i.e., ˆf(pi) = f(pi) = fi. One way to define the reconstructed function that satisfies this property is to set
ˆf =
N
i=1 fiφi, where φi : D → C are called basis functions or interpolation functions. In other
words, we defined the reconstruction operation using a weighted sum of a given set of basis functions φi,
where weights are exactly our sample values fi. Since we want that ˆf = fj for all sample points pi, we get
N
i=1 fiφi(pj) = fj, ∀j. This equation must hold to any function f. Let us consider a function
φi(pi) =
1, i = j
0, i = j
Equation above is sometimes referred to as the orthogonality of basis functions. Let us now consider
the constant function g(x) = 1 for any x ∈ D, we obtain
N
i=1 φi(pi) = 1, ∀pi ∈ D, or
N
i=1
φi(x) = 1, ∀x ∈ D.
The property described in the equation above is called the normality of basis functions. Basis functions
that are both orthogonal and normal are called orthonormal. To reconstruct a sampled function, we can
use different orthonormal basis functions.
A grid, sometimes also called a mesh, is a subdivision of a given domain D ∈ Rd
into a collection of cells,
sometimes also called elements, denoted ci. The union of the cells completely covers the sample domain, i.e.,
i ci = D, and the cells are non-overlapping, i.e., ci cj = 0, ∀i = j.
We can now define the simplest set of basis functions, the constant basis functions. These functions
approximate a given function by the piecewise, per-cell, constant sample value fi (for every point x ∈ D
it assigns the sample value of the nearest cell center). For this reason, the piecewise constant interpolation
is also called nearest-neighbor interpolation. Constant basis functions are simple to implement and
have no computational cost, they work for any cell shape and in any dimension, however, these functions
provide a poor, staircase like approximation ˆf of the original f. Over every cell visualization has a visible
discontinuity.
By using higher-order basis functions we can provide a better and more continuous reconstruction. The
next-simples basis functions beyond the constant ones are the linear basis functions. To use these,
however, we need to make some assumptions about the cell types used in the grid. Let us consider a single
quadrilateral cell c having the vertices (v1, v2, v3, v4), where v1 = (0, 0), v2 = (1, 0), v3 = (1, 1) and
v4 = (0, 1) - axis-aligned square of edge size 1 with the origin as first vertex. We call this the reference
2

cell in R2
. Coordinates in the reference cell [0, 1]d
are called reference coordinates: r1, ..., rd (or r, s, t for
d = 3). We define now four local basis functions Φ1
1, Φ1
2, Φ1
3, and Φ1
4; Φ1
i : [0, 1]2
→ R as follows:
Φ1
1(r, s) = (1 − r)(1 − s),
Φ1
2(r, s) = r(1 − s),
Φ1
3(r, s) = rs),
Φ1
4(r, s) = (1 − r)s.
These basis functions are indeed orthonormal. For any point (r, s) in the reference cell, we can now use
these basis functions to define a linear function ˆf(r, s) =
4
i=1 fiΦ1
i (r, s) as a sum of linear basis functions,
which makes it a first-order continuous reconstruction of the four sample values f1, f2, f3, f4 defined at the
cell vertices. For every arbitrary quadrilateral cell c in R3
, we can define a coordinate transformation
T : [0, 1]2
→ R3
that maps our reference cell to c. We want to map the reference cell vertices vi to
the corresponding world cell pi, so T(vi) = pi. We define our transformation T using our reference basis
functions to map from a point r, s, t in the reference cell coordinate system to a point x, y, z in the actual
cell to be
(x, y, z) = T(r, s, t) =
n
i=1
piΦ1
i (r, s, t).
If T maps the reference cell to the world cell then its inverse T−1
maps points x, y, z in the world cell
to points r, s, t in the reference cell, where our basis functions Φ1
i are defined, Using T−1
, we can rewrite
equation (3.2) for our quad cell c:
ˆf(x, y) =
4
i=1
fiΦ1
i (T−1
(x, y)).
In order to compute the inverse transformation T−1
, we must invert the expression given by Equation (3.8).
This inversion depends on the actual cell type.
We now have a way to reconstruct a piecewise linear function ˆf from samples on any quad grid: for
every cell c in the grid, we simply apply Equation(3.9). We can now finally define our piecewise linear
reconstruction in terms of a set of global basis functions φ, just like we did for piecewise constant
reconstruction (Equation (3.6)). Given a grid with sample points pi and quad cells ci, we can define our
grid-wise linear basis functions φ1
i as follows:
φ1
i (x, y) =
0, if (x,y)/∈ cells(pi),
Φ1
i (T−1
(x, y)), if(x,y) ∈ c=v1,v2,v3,v4, where vj = pi,
where cells(pi) denotes the cells that have pi as a vertex. Sampling the continuous signal f produces a set
of samples fi. Multiplying the samples by the global basis functions φi obtained from the reference basis
functions Φj via the transform T, we obtain the reconstructed signal ˆf.
We can use basis-function machinery and sampling and reconstruction mechanisms applied to more data
attributes than surface geometry alone (e.g., - to shading). Gourand shading produces a smooth illumination
over the polygon by reconstructing original continuous surface using piecewise linear interpolation for both
the geometry and illumination.
0.3 Discrete Datasets
We can say, that, given:
• a grid in terms of a set of cells defined by a set of sample points,
• some sampled values at the cell centers or cell vertices,
• a set of basis functions, we can define a piecewise continuous reconstruction of the sampled signal on
this grid and work with it.
We defined a continuous dataset dataset for a function f : D → C as the triplet D = (D, C, f). In the
discrete case, we replace the function domain D by the sampling grid (pi, ci), and the continuous function
3

f by its piecewise k-order continuous reconstruction ˆf computed using the grid, the sample values fi, and
a set of basis functions {Φk
i }. Hence, the discrete (sampled) dataset counterpart of (D, C, f) is the tuple
Ds = ({pi}, {ci}, {fi}, {Φk
i }): grid points, grid cells, sample values, and reference basis functions.
Replacing a continuous dataset D with its discrete counterpart D∫ means working with a piecewise
k-order continuous function ˆf instead of a potentially higher-order continuous function f. Dataset require-
ments: accurate, minimal, generic, efficient and simple for a discrete dataset translate to constraints on the
number and position of sample points pi, shape of cells ci, type of reference basis functions Φi, and number
and type of sampling values fi. These constraints determine specific implementation solutions as follows.
The cell shapes, together with the basis functions, determine different cell types. The number and type of
sample values fi determine the attribute types.
0.4 Cell Types
A grid is a collection of cells ci, whose vertices are the grid sample points pi. Given some data sampled at
the points pi, the cells are used to define supports for the basis functions φi used to interpolate the data
between the sample points.
The dimensionality d of the cells ci has to be the same as the topological dimension of the sampled
domain D, if we want to approximate D by the union of all cells i ci. For example, if D is a plane (d = 2),
we must use planar cells, such as polygons. If D is a volume (d = 3), we must use volumetric cells, such
as tetrahedra. For each cell type we shall present the linear basis functions it supports, as well as the
coordinate transformation T−1
that maps from locations (x, y, z) in the actual world cell to locations (r, s, t)
in the reference cell.
0.4.1 Vertex
The simplest cell type of dimension d = 0 is identical to its single vertex, c = v1. The vertex has a single,
constant basis function Φ0
1(r) = 1. In practice there us no distinction between sample points and vertex
cells.
0.4.2 Line
Line cells have dimension d = 1 and two vertices c = v1, v2. Line cells used to interpolate along any kind of
curves embedded in any dimension. Given the reference line cell defined by the points v1 = 0, v2 = 1, the
two linear basis functions are
Φ1
1(r) = (1 − r),
Φ1
2(r) = r
The transformation T−1
for line cells is simply the dot product between the position vector of the desired
point in the cell p = (x, y, z) with respect to the first cell’s vertex p1 and the cell vector p1p2:
T−1
line(x, y, z) = (p = p1)(p2 − p2).
0.4.3 Triangle
The simplest cell type in dimension d = 2 is the triangle, i.e., c = v1, v2, v3. Triangle can be used to
interpolate along any kind of surfaces embedded into any dimension (planar or curved). Given the reference
triangle cell defined by the points v1 = (0, 0), v2 = (1, 0), v3 = (0, 1), the three linear basis functions are
Φ1
1(r, s) = 1 − r − s,
Φ1
2(r, s) = r,
Φ1
3(r, s) = s.
for triangular cells is T−1
tri = (r, s) = (p−p1)×(p3−p1)
(p2−p1)×(p3−p1) , (p−p1)×(p2−p1)
(p3−p1)×(p2−p1) It is
computed as dot products between the position vector p−p1 of the point p in the world cell with the respect
to the world cell’s first vertex p1 and the world cell edges p2p1.
4

0.4.4 Quad
Another possibility to interpolate over two-dimensional surfaces is to use quadrilateral cells, or quads.
The reference quad is defined by the points v1 = (0, 0), v2 = (1, 0), v3 = (1, 1) and v4 = (0, 1) and is an
axis-aligned square of edge size 1. On this reference quad the basis functions are
Φ1
1(r, s) = (1 − r)(1 − s),
Φ1
2(r, s) = r(1 − s),
Φ1
3(r, s) = rs,
Φ1
4(r, s) = (1 − r)s.
A good trade-off between flexibility and simplicity is to support quad cells as input data, but transform
them internally into triangle cells, by dividing every quad into two triangles using one of its two diagonals.
quad for a general quad cell deals with bilinear basis functions and can not
be easily inverted. We can only solve it numerically for r, s as functions of x, y, z. If our actual cells are
rectangular instead of arbitrary quads, like in uniform or rectilinear grid, we can do better. In this case
the transformation T−1
rect: T−1
rect = (r, s) = (p−p1)·(p2−p1)
p2−p1
2 , (p−p1)·(p4−p1)
p4−p1
2 .
0.4.5 Tetrahedron
The simplest cell type in demotion d = 3 is the tetrahedron, defined by its four vertices c = (v1, v2, v3, v4).
On the reference tetrahedron defined by the points v1 = (0, 0, 0), v2 = (1, 0, 0), v3 = (0, 1, 0) and v4 = (0, 0, 1),
the four linear basis functions are
Φ1
1(r, s, t) = 1 − r − s − t,
Φ1
2(r, s, t) = r,
Φ1
3(r, s, t) = s,
Φ1
4(r, s, t) = t.
Given a tetrahedral cell with vertices p1, p2, p3, p4, the transformation T−1
tet = (r, s, t) follows the same pattern:
r = |((p−p4)·((p1−p4)×(p3−p4))|
|((p1−p4)·((p2−p4)×(p3−p4))| ,
s = |((p−p4)·((p1−p4)×(p2−p4))|
|((p1−p4)·((p2−p4)×(p3−p4))| ,
t = |((p−p3)·((p1−p3)×(p2−p3))|
|((p1−p4)·((p2−p4)×(p3−p4))| ,
Some applications use also pyramid cells and prism cells to discretize volumetric domain. Pyramid and
prism cells can be split into tetrahedral cells.
0.4.6 Hexahedron
The next d = 3 dimensional cell type is the hexahedron, or hex, defined by its eight vertices c = (v1, ..., v8).
The reference hexahedron is the axis-aligned cube of unit edge length, with v1 at the origin. On this cell the
eight linear basis functions are
Φ1
1(r, s, t) = (1 − r)(1 − s)(1 − t),
Φ1
2(r, s, t) = r(1 − s)(1 − t),
Φ1
3(r, s, t) = rs(1 − t),
Φ1
4(r, s, t) = (1 − r)s(1 − t),
Φ1
5(r, s, t) = (1 − r)(1 − s)t,
Φ1
6(r, s, t) = r(1 − s)t,
Φ1
7(r, s, t) = rst,
Φ1
8(r, s, t) = (1 − r)st.
WE can split hexahedral cells into six tetrahedra each and then use only tetrahedra as 3D cell types, simpli-
fying software implementations and maintenance. T−1
hex for hexahedral cells cannot be computed analyticaly,
and must be determined using numerical methods. However, in case our actual hex cells are parallelepipeds
5

(orthogonal edges), these cells can be called box cells. In this case, T−1
hex can be computed by taking a dot
product of the position vector p − p1 with the cell edges. For a box cell with vertices p1...p8, we obtain:
T−1
box(x, y, z) = (r, s, t) =
(p − p1)(p2 − p1)
p2 − p1
2 ,
(p − p1)(p4 − p1)
p4 − p1
2 ,
(p − p1)(p5 − p1)
p5 − p1
2 .
Software packages sometimes offer more cell types, such as squares and pixels (identical to rectangle grid),
triangle strips (memory-efficient way to store sequences of triangle cells that share edges), polygons in 2D,
and cubes and voxels in 3D (same role as squares and pixels have in 2D). Some applications use quadratic
cells and support quadratic basis functions and provide piecewise quadratic (smoother) reconstruction of
data, which is C2
continuous, and are often used in numerical simulations applications such as finite element
methods.
In general, you should add new cell types to your application data representation only if these allow you
to implement some particular visualization or data processing algorithms much more easily and/or efficiently
that cell types your software already supports. Quadratic cells also contain a midpoint for edges and, for
3D cells, centers of cell faces.
0.5 Grid Types
0.5.1 Uniform Grids
In a uniform grid, the domain D is an axis-aligned box, e.g., a line segment for d = 1, rectangle for d = 2,
or parallelepiped for d = 3. On a uniform grid , sample points pi ∈ D ⊂ R are equally spaced along the d
axes of the domain D. Hence, in the uniform grid, a sample point is described by its d integer coordinates
n1 . . . , nd. These integer coordinates are sometimes called structured coordinates. A simple example of
uniform grid is a 2D pixel image, where every pixel pi is located by two integer coordinates. This regular
point ordering allows us to define the grid cells implicitly by using the point indexes.
The magor advantages of uniform grids are their simple implementation and practically zero storage
requirements. Regardless of its size, storing d-dimentional grid itself takes 3d floatong-point values, i.e., only
12d bytes of memory. Storing the actual sample values at the grid points takes storage proportional to the
number of sample points.
0.5.2 Rectilinear Grids
Uniform grids are simple and efficient, but have limited modeling power. To accurately represent a function
with a non-uniform variation rate, we need either to use a high sampling density on a uniform grid, or use a
grid with non-uniform sample density. Rectilinear grids relax the constraint of equal sampling distances for
a given axis, but keeps the axis-aligned, matrix-like point ordering and implicit cell definition. These grids
are similar to the uniform ones, except that the distances δi,j between the sample points are now not equal
along the grid axes. Implementing a rectilinear grid implies storing the grid origins (mi, Ni) and sample
counts for every dimension d, as for the uniform grid. Additionally, we must store sample steps. In total,
the storage requirements are 2d +
d
i=1 Ni values.
0.5.3 Structured Grids
In rectilinear grids the samples domain is still a rectangular box. and the sample point density can be
changed only one axis at a time. Rectangular grids, for example, do not allow us to place more sample
points only in the central peak region of an exponential function.
Structured grids allow explicit placement of every sample point pi = (xi1, . . . , xid). The user can freely
specify the coordinates xij of all points. At the same time structured grids preserve the matrix-like ordering
of the sample points. Implementing a structured grid implies storing the coordinates of all grid sample points
pi and the number of points N1, . . . , Nd per dimension. Structured grids can represent a large number of
shapes.
6

0.5.4 Unstructured Grids
Structured grids can be seen as a deformation of uniform grids, where topological ordering of the points
(cells) stays the same, but their geometrical position is allowed to vary freely.
There are, however, shapes that cannot be effectively modeled by structured grids. They allow defining
both their sample points and cells explicitly. An unstructured grid can be modeled as a collection of sample
points pi, i ∈ [0, N] and cells ci = (vi1, . . . , viCi ). The values vij ∈ [0, N] are called cell vertices and refer
to the sample points pvij
used by the cell. A cell is thus an ordered list of sample point indices. This model
allows us to define every cell separately and independently of the other cells. Also, cells of different type
and even dimensionality can be freely mixed in the same grid, if desired. If cells share the same sample
points as their vertices, this can be directly expressed, which is useful in several contexts. * Storing index
represented by integer is usually cheaper than storing a d-dimensional coordinate (d floating numbers) * We
can process the grid geometry (positions of the sample points pi) independently of the grid topology, i.e., the
cell definitions. In practice, it is preferable to use unstructured grids containing a single cell type, as these
are simpler to implement and also can lead to faster application code. The costs of storing an unstructured
grid depend on the types of cells used and the actual grid. For example, a grid of C d-dimensional cells with
V vertices per cell and N sample points would require dN + CV values.
0.6 Attributes
In visualization, the set of sample values of a sampled dataset is usually called attribute data. Attribute
data can be characterized by their dimension c, as well as the semantics of the data they represent. This
gives rize to several attribute types.
0.6.1 Scalar Attributes
Scalar attributes are c = 1 dimentional. They are represented by plain real numbers. They encode various
physical quantities such as temperature, concentration, pressure, or density, or geometrical measures, such
as length or height (elevation plot function f : R2
→ R).
0.6.2 Vector Attributes
They are usually c = 2 or c = 3 dimensional. Vector attributes can encode position, direction, force, or
gradients of scalar functions. Usually vectors have an orientation and a magnitude, also called length or
norm.
0.6.3 Color Attributes
Color attributes are usually c = 3 dimensional and represent the displayable colors on a computer screen.
Three components of a color attribute can have different meanings, depending on the color system in use
(RGB system). RGB is an additive system, since every color is represented as a mix of pure red, green
and blue colors in different amounts. Equal amounts of these colors determine gray shades, whereas other
combinations determine various hues.
Another popular color representation system is the HSV system, where the three color components specify
the hue, saturation, and value of a given color. The advantage of the HSV system is that it is more intuitive
for the human user. Hue distinguishes between different colors of different wavelengths, such as red, yellow,
and blue. Saturation represents the color purity. A saturation of 1 corresponds to pure, undiluted color,
whereas a saturation of 0 corresponds to white. Value represents the brightness, or luminance, or a given
color. A value of 0 is always black, whereas a value of 1 is three brightest color of a given hue and saturation
that can be represented on a given system. The value of luminance component of an HSV color is equal to
the maximum of the R, G, and B components.
0.6.4 Tensor Attributes
Tensor attributes are high-dimensional generalizations of vectors and matrices. We can compute the curva-
ture of a planar curve using its second derivative d2
f
dx2 , and the curvature of a 3D surface in a given direction
7

using its Hessian matrix H of partial derivatives. The hessian matrix is also called the curvative tensor
of the given surface.
Besides curvature, tensors can describe other physical quantities that depend on direction, such as water
diffusivity or stress and strain in materials. Tensors are characterized by their rank. Scalars are tensors of
rank 0. Vectors are tensors of rank 1. The Hessian curvature tensor is a rank 2 symmetric tensor since it is
expressed by a symmetric, rank 2 matrix.
0.6.5 Non-Numerical Attributes
Examples of possible non-numerical attribute types are text, images, file names, or even sound samples.
The main property for D∫ is to permit us to reconstruct some piecewise, k-order continuous function
ˆf : D → C, given the sample values fi ∈ C. What should the meaning of the multiplication between sample
values fi and real-valued basis functions Φi and of addition of the sample values fi in Equation (3.9) be?
0.6.6 Properties of Attribute Data
The main purpose of attribute data is to allow a reconstructions ˆf of the sampled information fi. Attribute
data has several general properties:
• attribute data, the sample values fi, must be defined for all sample points pi of a dataset Ds. If samples
in some points pi are missing, there several solutions: 1. remove these points completely from the grig,
2. define missing values fi in some way or replace them with some special value (like 0), 3) we can
define missing values using existing values, using some complex interpolation scheme.
• cell type can contain any number of attributes, of any type, as long as these are defined for all data
points. We can choose whether we want to model our data as a single c-value dataset or as c one-value
datasets. The answer is to consider all attributes that have a related meaning as a single higher-
dimensional attribute - separate attributes with different meanings.
Operations of color attributes must consider all color components simultaneously, as color components
R, G, B have a related meaning.
Some data visualization applications classify attribute data into:
• node or vertex attributes - defined at the vertices of the grid cell and correspond to a sampled
dataset and
• cell attributes - defined at the center points of the grid cells - correspond to sampled dataset that
uses constant basis functions. Vertex attributes can be converted to cell attributes and conversely by
resampling.
The attribute components are sometimes related by some constraints. This happens for normal attributes
n ∈ R3
, where the three components are constrained to yield unit length normals, i.e., |n| = n2
x + n2
y + n2
z.
Depending on the choice of the basis functions, interpolating these components separately as scalar values
may not preserve the unit length properly on the interpolated normal n. First solution is to interpolate the
components separately, and then enforce the desired constraint on the result by normalizing it. i.e., replacing
n with n/|n| (works when sample values do not vary too strongly across a grid cell). Second solution is to
represent the constraint directly in the data attributes, rather than enforcing it after interpolation. For
normal attribute types, this means representing 3D normals as two independent orientations, e.g., using
polar coordinates α, β, instead of using the tree x, y, z components, which are dependent via the unit length
constraint. We can now interpolate the normal orientations α, β using the desired basis functions, and will
always obtain the correct result.
0.7 Computing Derivatives of Sampled Data
One of the requirements for a sampled dataset D∫ = (pi, ci, fi, Φi) is that it should be generic: we can easily
replace various data processing operations available for continuous counterpart with equivalent operations
in D∫ .
8

ˆf =
N
i=1 fiφi, then ∂ ˆf
∂xi
=
N
j=1 fj
∂φi
∂xi
. Using the expressions of the reference basis functions:
∂ ˆf
∂xi
=
N
j=1 fj
∂Φi
∂xi
(r). We now use the chain rule and obtain: ∂Φ
∂xi
=
d
j=1
∂Φi
∂rj
∂rj
∂xi
to obtain ∂ ˆf
∂xi
=
N
j=1 fj
d
k=1
∂Φi
∂rk
∂rk
∂xi
. Finally, we canx rewrite last equation in a convenient matrix form, as follows:





∂ ˆf
∂x1
∂ ˆf
∂x2
. . .
∂ ˆf
∂xd





=
N
j=1
fj




∂r1
∂x1
∂r2
∂x1
. . . ∂rd
∂x1
∂r1
∂x2
∂r2
∂x2
. . . ∂rd
∂x2
. . .
∂r1
∂xd
∂r2
∂xd
. . . ∂rd
∂xd




inverse Jacobian matrix J−1





∂Φj
∂r1
∂Φj
∂r2
. . .
∂Φj
∂rd





The matrix above is called the inverse Jacobian matrix J−1
= (∂ri/∂xj)ij. this matrix is in-
verse of the Jacobian matrix J = (∂xi/∂rj)ij. Using, T−1
, we can rewrite the inverse Jacobian as
J−1
= (
∂T −1
i (x1,...,xd)
∂xj
)ij, where T−1
denotes the it-h component of the function T−1
. Putting it all to-
gether, we get the formula for computing the partial derivatives of a sampled dataset ˆf with respect to all
coordinates xi: ( ∂ ˆf
∂xi
) =
N
k=1 fk(
∂T −1
i
∂xk
)ij(∂Φk
∂ri
)i.
To use this equation in practice, we need to evaluate the derivatives of both the reference basis functions
Φk and T−1
for every cell type. Alternatively, we can evaluate the Jacobian matrix instead of its reverse,
using the reference-cell to world-cell coordinate transform T instead of T−1
, then numerically invert J, and
finally apply Equation (3.33). For all cells described in Section 3.4, the coordinate transformation T−1
are
linear functions of the arguments xi, so their derivatives are constant. Hence, the derivatives of ˆf are of the
same order as those of the basis functions Φk we choose to use.
Partial derivatives of ˆf inside a given cell are computed by linearly interpolating the 1D derivatives of ˆf
along opposite cell edges. A similar result can be obtained for rectilinear grids as well as for hexahedral cells.
If a dataset is noisy, the computed derivatives tend to exhibit even stronger noise that the original data. A
simple method to limit these problems is to pre-filter the input dataset in order to eliminate high frequency
noise, using methods such as the Laplacian smoothing described in Section 8.4. However, smoothing can
also eliminate important information from the dataset together with the noise.
0.8 Implementation
0.8.1 Grid Implementation
0.9 Advanced Data Representation
Sometimes more advanced forms of data manipulation and representation are needed. We will describe the
task of data resampling, which is used in the process of converting information between different types of
datasets that have different sample points, cells or basis functions.
0.9.1 Data Resampling
Lets consider piecewise constant normal - polygon normals themselves, which are discontinuous at the poly-
gon vertices and actually, over the complete polygon edges - so we can not use them for approximations
for the vertex normals. How can we compute vertex normal values from the known polygon normals? The
answer is provided by operation called resampling.
Resampling computes the values fi of the target dataset as function of the values fi of the source dataset.
For simplicity, we assume that both datasets use the same set of basis functions Φi.
Let us now consider a common resampling operation in data visualization: converting cell attributes (fi)
to vertex attributes (fi ). Cell attributes imply the use of constant basis functions Φi, vertex attributes, in
contrast, imply the use of higher-order basis functions, such as linear ones. On the other hand we want the
sample points of the target grid cells (target grid vertices) to be identical to the source vertices for the two
grids to match.
Vertex data is the area weighted-average of the cell data in the cells that use a given vertex. Cell attributes
are the average of the cell’s vertex attributes.
9

Resampling data from cells to vertices increases the assumed continuity. If our original sampled data
were indeed continuous of that order, no problem appears. However if the original data contained, e.g.,
zero-order discontinuities, such as jumps or holes, resampling it to a higher-continuity grid also throws
away discontinuities which might have been a feature of the data and not a sampling artifact. In contrast,
resampling from a higher continuity (vertex data) to a lower continuity (cell data) has fewer side effects-
overall, the smoothness of the data decreases globally.
Two other frequently used resampling operations are subsampling and supersampling. Subsampling re-
duces the number of sample points that are the subset of original dataset points (optimizing the process speed
and memory demands, working with smaller datasets). After eliminating some number of points subsampling
operations can choose or redistribute the remaining points in order to obtain a better approximation of the
original data. Subsampling implementations can take advantage of dataset topology. A desirable property
of subsampling is to keep most samples in the regions of rapid data variations and cull most samples from
the regions of slow data variation. A technique, called uniform subsampling, is simple and effective when
the original dataset is densely sampled it is used in uniform , rectilinear and structured grids to keep every
k-th point along every dimention and discard the remaining ones.
Supersampling or refinement is the inverse of subsampling: more data points are created from an
existing dataset. It is useful in situations when we try to create or manipulate information on a dataset at
a level of detail, or scale, that is below the one captured by the sampling frequency. Uniform supersampling
introduces k points into every cell of the original dataset. An efficient supersampling implementation usually
inserts extra points only in those regions where we need to further add extra information.
0.9.2 Scattered Point Interpolation
There are situations when we would like to avoid constructing and storing a grid of cells to represent
data domain. 3D scanner delivers a scattered 3D point set, also called a point cloud: point and their
corresponding data values pi, fi. For scanner the data values fi are the surface normals and/or color measured
by the device.
How do we reconstruct continuous surface if we were given a set above with points and normals?
Constructing a grid from scattered points (triangulation): unstructured grid with 2D cells, e.g, triangles,
which have pi as vertices and approximate the surface as much as possible.
A second way is griddles interpolation. Storing the cell information can double the amount of memory
required in the worst case. To reconstruct a continuous function from a scattered point set we need a set
of griddles basis functions. There are several ways to construct such functions, frequently used choice for
griddles basis functions is radial basis functions or RBFs. These functions depend only on the distance
between the current point and the origin r = |x| =
d
i=1 x2
i .
RBFs smoothly drop from 1 at their origin (r = 0) to a vanishing value for large values of the distance
r. To limit the effect of a basis function to its immediate neighborhood, we specify a radius of influence R,
or support radius, beyond which Φ is equal to zero. In this setup a common RBF is the Gaussian function.
Φ(x) =
e−kr2
, r < R,
0, r ≥ R,
where r = |x|.
The parameter k ≥ 0 controls the decay speed, or the shape of the radial basis functions. Setting k = 0
yields constant cylinder shaped radial functions, which are equivalent to the constant basis functions for grid
based datasets. Another popular choice are inverse distance functions defined as
Φ(x) =
1
1+r2 , r < R,
0, r ≥ R,
where r = |x|.
The radius values Ri control the influence of the sample data value of a point pi. Higher values of Ri yield
smoother reconstructions at higher computational cost, lower values of Ri yield less-smooth reconstruction
but higher performance. In practice, setting Ri to the average inter-point distance in the neighborhood of
point pi gives a good balance between smoothness and efficiency.
Given a point p, we shall sum only those basis functions φk that are nonzero at p. In case of radial
basis functions, we must find the k nearest sample points p1, . . . , pk to p so that |p − pk| < Rk. One way
10

to accomplish this is to store all sample points pi in a spatial search structure such as a kd-tree. Spatial
search structures provide efficient retrieval of the k nearest neighbors at any given location. A good, scalable
implementation of such a search structure is provided by the Approximate Nearest Neighbor (ANN) library.
Scattered point data sets sometimes are called unstructured point datasets, however, if the function
of a dataset is to provide a piecewise continuous reconstruction of its data samples, we need to specify
also a choice for the basis functions Φi to have a complete dataset (pi, fi, Φi). To effectively perform the
reconstruction, searching methods are needed that return the sample points pi located in the neighborhood
of a given point p.
What have you learned in this chapter?
This chapter lays out a discussion on discrete data representation, continuous data sampling and re-
construction. Fundamental differences between continuous (sampled) and discrete data are outlined. It
introduces basic functions, discrete meshes and cells as means of constructing piecewise continuous approx-
imations from sampled data. I learned about various types of datasets commonly used in the visualization
practice: their advantages, limitations and constraints This chapter gives an understanding of various trade-
offs involved in the choice of a dataset for a given visualization application while focuses on efficiency of
implementing the most commonly used datasets presented with cell types in d ∈ [0, 3] dimensions.
What surprised you the most? I was surprised to find out that there are few representations and
mapping of colors between RGB and HSV space.
I was surprised to find out how griddles interpolation works and that it exists. Also, that reconstruction
of scattered/unstructured point datasets requires using searching methods to locate nearest sample points
in the neighborhood of a given point.
I was surprised that datasets with attributes such as text, images, or relations form the target of infor-
mation visualization applications, since they are purely discrete, and often not defined on a spatial domain.
What applications not mentioned in the book you could imagine for the techniques ex-
plained in this chapter? I can only imagine a datasets that stores high dimentional attributes in order
to allow just enough continuity to perform various types of resampling between target and source grids of
certain type. Selecting a set of useful grids and proper resampling might improve original visualization model
in a way that it will focus more on nature of a signal, by depending less on the structure/representation of
its sampled data .
1. EXERCISE 1
Consider the following datasets:
• The evolution in time of the prices of N different stock-exchange shares, recorded at one-second intervals
over the period of one hour. • The paths covered by all cars driving through a given city, recorded at one-
minute intervals over the period of one hour. For each record, we store the car ID, the car’s position, and
the car’s speed. • The amount of rainfall and the air temperature, recorded at a given time instant at N
given weather stations over some geographical area.
Describe the kind of grid, grid cells, and data attributes that you would use to store such a dataset.
Argue your proposal by considering the kind of data to store, and the locations at which data is recorded
(sampled).
• grid: uniform linear grid with 1 second intervals; grid cells - lines length of 1 second; data attributes:
price for each our of N shares (360 samples per hour times N shares = 360N values to store)
• grid - a data structure with spatial search that utilizes average interpoint distance between points;
cells - grigless radial basis function with compact support; data attributes: car ID, car’s position, car’s
speed, basis functions.
• grid - rectilinear structured grid with specified sampling locations; cells: quads; data attributes: am-
mount of rainfall, temperature, location
2. EXERCISE 2
Sampling and reconstruction are closely related operations which reduce a function y = f(x) to a finite
set of sample points (xi, yi) and, respectively, reconstruct an approximation ˆy = ˆf(x) of f(x) from the
sample points. Consider an application where you have to perform the above reconstruction ˆf(x), but you
11

are only allowed to use a fixed finite number N of sample points xi. How would you place these sample
points over the domain of definition of x so that the reconstruction error | ˆf − f| k is equally well minimized
over the entire range of x?
Hints: first, consider the kinds of basis functions you want to use (e.g., constant or linear). Next, consider
how you can minimize the reconstruction error by shifting the points xi around the x axis.
• In case of constant basis functions we can use unoform sampling density with N points placed at equal
distances from each other.
• In case of linear basis functions we can use non-uniform sampling density, in order to assign more
sample points to those areas of domain, where function’s higher order derivatives change fast.
3. EXERCISE 3
In Figure 3.10 in Chapter 3 (also displayed below), it is shown that we can use structured grids to cover
a 2D disk shape. Now, consider an arbitrary convex 2D shape of genus 0 (that is, without holes). The
2D shape is specified by means of its contour, which is given as a closed 2D polyline of N points. • Can
we always construct a structured grid so that all points of this polyline will be also points on the grid’s
boundary? If not, sketch a simple counter-example. • Can we always construct a structured grid with the
conditions listed in the point above and the additional condition that no grid-boundary point exists which
is not a polyline point? If not, sketch a simple counter-example.
Hints: Think about the number of points on the boundary of a structured grid.
• Yes, we can always construct a structured grid of N points and N −2 triangular cells. Since all internal
angles of the shape are less that 180 degrees, then it is always possible to take one vertice and connect
it to remaning (N − 2) vertices in order to form a triangular structured grid.
• Yes, it is always possible for a convex 2D polyline to use triangulars as describes above so, that all
polyline points are also grid-boundary points.
4. EXERCISE 4
As shown in Figure 3.11 in Chapter 3 (also shown below), not all 2D shapes can be covered by structured
grids. Consider now a 3D (curved) surface of a half sphere. Can we cover this surface with a structured
grid? Argue your answer.
Yes, we can cover such half a sphere with a structured grid, consisting of tetraahedral cells. Such shape
consist only of one component and genus of domain here I assume equals to 0.
5. EXERCISE 5
Consider the 2D cells in the figure below. For each cell, scalar data values vi are indicated at its sample
points (vertices). Additionally, a separate point p inside the cell is indicated. If bilinear interpolation is
used, compute the interpolated value v(p) of the vertex data values vi at the point p. Detail your answer by
explaining how you computed the interpolated value.
• For rectangular quad:
T−1
rect = (r, s) =
(p − p1) · (p2 − p1)
p2 − p1
2 ,
(p − p1) · (p4 − p1)
p4 − p1
2 where :



(p − p1) = (4 − x1, 3 − y1) = (3, 1),
(p2 − p1) = (x2 − x1, y2 − y1) = (4, 0),
(p4 − p1) = (x4 − x1, y4 − y1) = (0, 3),
p2 − p1
2
= (x2 − x1)2
= 42
= 16,
p4 − p1
2
= (y4 − y1)2
= 32
= 9.
12

T−1
rect = (r, s) =
(3, 1) · (4, 0)
16
,
(3, 1) · (0, 3)
9
=
12
16
,
3
9
=
3
4
,
1
3
.
Calculating 4 basis functions as follows:
Φi(T−1
rect) =



Φ1
1(r, s) = (1 − r)(1 − s) = (1 − 3/4)(1 − 1/3) = 1/6,
Φ1
2(r, s) = r(1 − s) = (3/4) · (1 − 1/3) = 3/4 × 2/3 = 1/2,
Φ1
3(r, s) = rs = 3/4 × 1/3 = 3/12 = 1/4,
Φ1
4(r, s) = (1 − r)s = (1 − 3/4) · (1/3) = 1/4 × 1/3 = 1/12.
Finally, we calculate value for: v(p) = ˆp(x, y) =
4
i=1 viΦ1
i = 3· 1
6 +1· 1
2 +4· 1
4 +0· 1
12 = 1/2+1/2+1+0 = 2.
Answer: v(p) = 2.
• For triangular cell:
T−1
tri = (r, s) =
(p − p1) × (p3 − p1)
(p2 − p1) × (p3 − p1)
,
(p − p1) × (p2 − p1)
(p3 − p1) × (p2 − p1)
, where :



(p − p1) = (3 − x1, 3 − y1) = (2, 1),
(p2 − p1) = (x2 − x1, y2 − y1) = (4, 1),
(p3 − p1) = (x3 − x1, y3 − y1) = (0, 3),
T−1
tri = (r, s) =
(2, 1) × (0, 3)
(4, 1) × (0, 3)
,
(2, 1) × (4, 1)
(0, 3) × (4, 1)
=
3
3
,
9
3
= (1, 3)
Φi(T−1
tri ) =



Φ1
1(r, s) = 1 − r − s = 1 − 1 − 3 = −3,
Φ1
2(r, s) = r = 1 = 1,
Φ1
3(r, s) = s = 3 = 3,
3
i=1 viΦ1
i = 3 · (−3) + 1 · 1 + 4 · 3 = 4.
Answer: v(p) = 4.
6. EXERCISE 6
Consider the 2D cells in the ﬁgures below. For each cell, vector data values vi are indicated at its sample
points (vertices). Additionally, a separate point p inside the cell is indicated. If bilinear interpolation is
used, compute the interpolated value v(p) of the vertex data values vi at the point p. Detail your answer by
explaining how you computed the interpolated value.
• For rectangular quad:
T−1
rect = (r, s) =
(p − p1) · (p2 − p1)
p2 − p1
2 ,
(p − p1) · (p4 − p1)
p4 − p1
2 where :



(p − p1) = (4 − x1, 3 − y1) = (3, 1),
(p2 − p1) = (x2 − x1, y2 − y1) = (4, 0),
(p4 − p1) = (x4 − x1, y4 − y1) = (0, 3),
p2 − p1
2
= (x2 − x1)2
= 42
= 16,
p4 − p1
2
= (y4 − y1)2
= 32
= 9.
T−1
rect = (r, s) =
(3, 1) · (4, 0)
16
,
(3, 1) · (0, 3)
9
=
12
16
,
3
9
=
3
4
,
1
3
.
13

Φi(T−1
rect) =



Φ1
1(r, s) = (1 − r)(1 − s) = (1 − 3/4)(1 − 1/3) = 1/6,
Φ1
2(r, s) = r(1 − s) = (3/4) · (1 − 1/3) = 3/4 × 2/3 = 1/2,
Φ1
3(r, s) = rs = 3/4 × 1/3 = 3/12 = 1/4,
Φ1
4(r, s) = (1 − r)s = (1 − 3/4) · (1/3) = 1/4 × 1/3 = 1/12.
4
i=1 viΦ1
i = (1, 0) · 1
6 + (0, 1) · 1
2 + (1, 1) · 1
4 + (2, 1) · 1
12 =
(1
6 , 0) + (0, 1
2 ) + (1
4 , 1
4 ) + (1
6 , 1
12 ) = (7/12, 5/6).
Answer: v(p) ≈ (0.58, 0.83).
• For triangular cell:
T−1
tri = (r, s) =
(p − p1) × (p3 − p1)
(p2 − p1) × (p3 − p1)
,
(p − p1) × (p2 − p1)
(p3 − p1) × (p2 − p1)
, where :



(p − p1) = (3 − x1, 3 − y1) = (2, 1),
(p2 − p1) = (x2 − x1, y2 − y1) = (4, 1),
(p3 − p1) = (x3 − x1, y3 − y1) = (0, 3),
T−1
tri = (r, s) =
(2, 1) × (0, 3)
(4, 1) × (0, 3)
,
(2, 1) × (4, 1)
(0, 3) × (4, 1)
=
3
3
,
9
3
= (1, 3)
Φi(T−1
tri ) =



Φ1
1(r, s) = 1 − r − s = 1 − 1 − 3 = −3,
Φ1
2(r, s) = r = 1 = 1,
Φ1
3(r, s) = s = 3 = 3,
3
i=1 viΦ1
i = (0, −1) · (−3) + (1, 0) · 1 + (1, 1) · 3 =
(0, 3) + (1, 0) + (3, 3) = (4, 6).
Answer: v(p) = (4, 6).
7. EXERCISE 7
Color selection, by end users, is typically done by various widgets which represent the space of available
colors, such as the color wheel, color hexagon, or three separate color sliders for the R, G, and B (or
alternatively H, S, and V ) color components. Assume, now, that we want to select only colors present in a
given subset of the entire color space. Concretely, we have a large set of color photographs, and we next want
to select only colors predominantly present in these photographs, rather than any possible color. Sketch and
argue for a color-selection widget that would optimally help users to select only these specific colors. Hints:
Think how to modify any of the existing color-selection widgets to ‘focus’ on a specific color range where
many samples exist.
We can specify a subset of colors we are interested in by specifying R,G,B values for each color in
our sample. Then, we have a scatterred field of dots inside a color cube. After doing so we can conduct
supersampling by ading even more dots in the neighbourhood of each specified color. Finally, we can use
interpolate a 3D surface along each axis and project result on color cube facets, or on RGB hexagon.
Also, we can modify HSV color wheel, by cutting out segments, corresponding to lowes density of our
color sample, so that remaining colors have represent magority of our color samples.
8. EXERCISE 8
Consider a grid where we have color data values recorded at its cell vertices. We would like to use linear
interpolation to compute colors at all points inside the grid cells. We can do this by interpolating colors
represented as RGB triplets or, alternatively, colors represented as HSV triplets. Discuss the advantages
14

and disadvantages of both schemes. Can you imagine a situation where the RGB interpolation would be
arguably preferable to HSV interpolation? Can you imagine a situation when the converse (HSV interpolation
is preferable to RGB interpolation) is true? Describe such situations or alternatively argue for the fact that
they do not exist.
HSV interpolation gives better results since it is separeted from luminance and saturation. In RGB color
scheme we need to interpolate alnong all 3 components.
9. EXERCISE 9
Consider a grid cell, such as a 1D line, 2D triangle or quad, or 3D parallelepiped or cube, and some
scalar values vi recorded at the cell vertices. Consider that we are using linear interpolation to reconstruct
the sampled scalar signal v(x) at any point x inside the cell. Does a cell shape exist, and a point x in that
cell, so that v(x) is larger than the maximum of vi over all cell vertices? Does a cell shape exist, and a point
x in that cell, so that v(x) is smaller than the minimum of vi over all cell vertices? Argue your answers.
No
10. EXERCISE 10
Consider a grid-cell like in the Exercise 9, and some color values vi recorded at the cell vertices. Consider
that we are using linear interpolation to compute a color v(x) at any point x inside the cell. Does a point
x exist so that v(x) is brighter than any of the colors vi? Does a point x exist so that v(x) is darker than
any of the colors vi? Do the answers to the above two sub-questions depend on the choice of the system, or
space, used to represent colors (RGB or HSV )? Explain your answer.
No.
15

03 Data Representation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 03 Data Representation

Similar to 03 Data Representation (20)

Recently uploaded

Recently uploaded (20)

03 Data Representation