1. Data Structure
and
Algorithms
Lecturer: CHHAY Nuppakun
E-mail: nuppakunc@yahoo.com
Department of Computer Studies
Norton University - 2013
2. Chapter 1
Fundamental ideas
of
data structure
and
algorithm
3. Read Ahead
You are expected to read the lecture notes
before the lecture.
This will facilitate more productive discussion
during class.
Also please
Like in an proof read
English class assignments
& tests.
3
4. Programs and programming
What is a program?
A set of instructions working with data
designed to accomplish a specific task
The “recipe” analogy
Ingredients are the Data
Directions are the Program Statements
What is programming
The art and craft of writing programs
The art to control these “idiot servants” and
“naïve children”
4
5. Introduction to Programming
Programming is to solve problems using computers
How to do it at all ?
How to do it robustly ?
How to do it effectively ?
Programming consists of two steps:
Algorithmic design (the architects)
Coding (the construction workers)
Programming requires:
A programming language (C/C++/C#) to express your ideas
A set of tools to design, edit, and debug your code
A compiler to translate your programs into machine code
A machine to run the executable code
5
6. Crafting Programs Effectively
Program design
design process
stepwise refinement & top-down design
bottom-up design
modularization, interfaces
use of abstractions
Programming style
structured programming
readable code
effective use of language constructs
“formatting”
software organization
Documentation and comments 6
7. Good Programs
There are a number of facets to good
programs: they must
run correctly
run efficiently
be easy to read and understand
be easy to debug and
be easy to modify
better running times will generally be
obtained from use of the most appropriate
data structures and algorithms
7
8. Why Data Structure and Algorithms
Computer is becoming ubiquitous …
programming gets you more out of computer
learn how to solve problems
dealing with abstractions
be more precise
Unfortunately, most people
know little about Computer Science
know little about Programming
write bad or buggy programs
become lost when writing large programs
8
9. Algorithms and Data Structures
Algorithm: a strategy for computing something, e.g.,
sorting: putting data in order by key
searching: finding data in some kind of index
finding primes and generating random numbers
string processing
graphics: drawing lines, arcs, and other geometric
objects
Data structure: a way to store data, e.g.,
arrays and vectors
linked lists
Two are related:
data structures organize data
algorithms use that organization
9
10. What are computers?
“idiot servants” that can do simple operations
incredibly fast if you tell them every step to do
like little children in their need for specific and
detailed instruction
computers are not “brains” & are not “smart” -
they only as good as the program they are
running
10
11. Computer Environment: Hardware
Hardware
the physical, tangible parts of a computer
E.g., CPU, storage, keyboard, monitor
chip that executes
Monitor Central program commands
Processing e.g.,
Keyboard Unit Intel Pentium IV
Sun Sparc
Transmeta
primary storage area for
programs and data Hard Disk
Main
also called RAM Memory
CD ROM
11
12. Computer Environment: Software
Operating System
E.g., Linux, Mac OS X, Windows 2000, Windows XP
manages resources such as CPU, memory, and disk
controls all machine activities
Application programs
generic term for any other kind of software
compiler, word processors, missile control systems,
games
12
13. Operating System
What does an OS do?
hides low level details of bare machine
arbitrates competing resource demands
Useful attributes
multi-user
multi-tasking
CPU
User Operating
Program System Disk
Network
13
15. Main Program and Library Files
<preprocessor directives>
<global data and function declarations>
int main( )
{
<local data declarations>
<statements>
return 0;
}
<main program function implementation>
15
20. Expressions and Assignment
Operators: +, -, *, /, %, =, <, <=, >=, = =, !=,
&&, ||, !, ( )
Examples:
a = b = c = 5;
((a = b) = c) = 5; //?
a = = 0;
a=0
20
21. Type Conversion
<type name> (<expression>) or
(<type name>) <expression>
Example:
int (3.14) returns 3
(float) 3 returns 3.0
21
22. Interactive I/O
cout << “Enter an int, a float, and a string,
“
<< “separated by spaces”
cin >> int_value >> float_value >> string
22
23. Functions
double pow(double base, double exponent);
cout << pow(5,2) << endl; // function call
*********************************
void hello_world( )
{
cout << “Hello World” << endl;
}
hello_world( ); //call to a void function
23
24. Selection
if
if … else
switch
Iteration
For (<initiation>, <termination>, <update>)
while (<condition>) { <statements>}
do {<statements>} while (<condition>)
24
25. User Defined Type
Using typedef
typedef int boolean;
Using enum
enum weekday {MON, TUE, WED, THUR, FRI};
enum primary_color {RED, YELLOW, BLUE};
weekday day = MON;
primary_color color = RED;
25
27. Why do we need an array?
cin >> value0;
#include <iostream.h> cin >> value1;
int value0; …
int value1; cin >> value999;
int value2; cout << value0;
… cout << value1;
int value999; cout << value2;
…
cout << value999
27
28. Array Declaration
<type> <ArrayName>[Size];
Example int value[1000];
Multidimensional Array
Declaration
<type> <ArrayName> [index0][...][indexN]
Example
int hiTemp[52][7]
int ThreeD[10][10][5]
28
29. Accessing an Array
Array initialization
for (I = 0; I < = 999, I++)
value[I] = 2 *I -1;
Each of an array’s elements can be accessed
in sequence by varying an array index
variable within a loop
Multidimensional arrays can be accessed with
nested loops.
29
31. Algorithm
Definition
A step-by-step procedure for solving a
problem in a finite amount of time
Pseudo-code
is a compact and informal high-level
description of a computer programming
algorithm that uses the structural conventions
of a programming language
31
32. Algorithms (Continue)
Algorithm is used in computer science to describe a
problem-solving method suitable for implementation as a
computer program:
1. Most algorithms of interest involve methods of organizing the data
involved in the computation. Objects created in this way are called
data structures => algorithms and data structures go hand in hand
2. use a computer to help us solve a problem for small or for huge
problems - quickly become motivated to devise methods that use
time or space as efficiently as possible.
3. Careful algorithm design is an extremely effective part of the
process of solving a huge problem, whatever the applications
area
32
33. Algorithms (Continue)
4. Huge or complex computer program is to be developed, a great deal of
effort must go into understanding and defining the problem to be solved,
In most cases, however, there are a few algorithms whose choice is
critical because most of the system resources will be spent running
those algorithms
5. The sharing of programs in computer systems is becoming more
widespread => to reimplement basic algorithms arises frequently, that
we are faced with completely new computing environments (hardware
and software) with new features that old implementations may not use
to best advantage. To make our solutions more portable and longer
lasting.
6. The choice of the best algorithm for a particular task can be a
complicated process, perhaps involving sophisticated mathematical
analysis. The branch of computer science that comprises the study of
such questions is called analysis of algorithms .
33
34. Analysis of Algorithms
Analysis is the key to being able to understand algorithms
sufficiently well
Analysis plays a role at every point in the process of
designing and implementing algorithms
which mathematical analysis can play a role in the process of
comparing the performance of algorithms
The following are among the reasons that we perform
mathematical analysis of algorithms:
To compare different algorithms for the same task
To predict performance in a new environment
To set values of algorithm parameters
34
35. Growth of Functions
Most algorithms have a primary parameter N that affects the
running time most significantly:
The parameter N might be the degree of a polynomial
the size of a file to be sorted or searched
the number of characters in a text string
or some other abstract measure of the size of the
problem being considered
By using mathematical formulas that are as simple as
possible and that are accurate for large values of the
parameters
35
36. Growth of Functions (Continue)
The algorithms in typically have running times
proportional to one of the following functions:
1 Most instructions of most programs are executed once or at most
only a few times, that the program's running time is constant
log N When the running time of a program is logarithmic, the
program gets slightly slower as N grows. This running time commonly
occurs in programs that solve a big problem by transformation into a
series of smaller problems
N When the running time of a program is linear, it is generally the
case that a small amount of processing is done on each input
element
N log N The N log N running time arises when algorithms solve a
problem by breaking it up into smaller subproblems, solving them
independently, and then combining the solutions
36
37. Growth of Functions (Continue)
N2 When the running time of an algorithm is quadratic, that
algorithm is practical for use on only relatively small problems
N3 Similarly, an algorithm that processes triples of data items
(perhaps in a triple nested loop) has a cubic running time and is
practical for use on only small problems
2N Few algorithms with exponential running time are likely to be
appropriate for practical use, even though such algorithms arise
naturally as brute-force solutions to problems.
The running time of a particular program is likely to be
some constant multiplied by one of these terms (the
leading term) plus some smaller terms.
37
39. Running Time
Most algorithms transform input objects into
output objects.
The running time of an algorithm typically
grows with the input size.
Average case time is often difficult to
determine.
We focus on the worst case running time.
Easier to analyze
Crucial to applications such as games, finance and
robotics
39
40. Experimental Studies
Write a program implementing the algorithm
Run the program with inputs of varying size and
composition
Use a function, like the built-in clock() function, to
get an accurate measure of the actual running time
Plot the results
40
41. Limitations of Experiments
It is necessary to implement the algorithm, which
may be difficult
Results may not be indicative of the running time on
other inputs not included in the experiment.
In order to compare two algorithms, the same
hardware and software environments must be used
41
42. Algorithm Analysis
C= a + b;
Operands: c, a, b
Operators: +, =
Simple model computation steps :
- load operands (fetch time for c, a, b)
- perform operations (operates time for + and =)
- so above instruction needs 3Tfetch + 1T+ + 1Tstore
42
43. Algorithm Analysis
int num= 25;
Operands: num, constant: 25, operator: =
Time needed: 1Tfetch + 1Tstore
n>= I;
Operands: n, i, operator: >=
Time needed: 2Tfetch + 1T>=
++i; i=i+1;
Time needed: 2Tfetch + 1T+ + 1Tstore
43
45. Computing running time
Arithmetic series summation (eg.)
1- unsignet int Sum (unsigned int n)
2- {
Statement Time Code
3- unsigned int result=0;
3 Tfetch + Tstore result=0;
4- for (int i=0; i<=n; i++)
4a Tfetch + Tstore i=0;
5- result+=l;
6- return result; 4b (2Tfetch + T<) * (n+1) i<=n;
7- } 4c (2Tfetch + T+ + Tstore) * n i++;
5 (3Tfetch + T+ + Tstore) * n result+=I;
6 Tfetch + Treturn return result;
(7Tfetch + 2T+ + 2Tstore + T<) * n
Total +
(5Tfetch + 2Tstore + T< + Treturn)
Computing running time of the program
45
46. Big-Oh Notation
The mathematical artifact that allows us to suppress detail when we
are analyzing algorithms is called the O-notation, or "big-Oh
notation,"
Definition 1 A function g(N) is said to be O(f (N)) if there
exist constants co and No such that g(N) < co f (N) for all N > No.
We use the O-notation for three distinct purposes:
To bound the error that we make when we ignore small
terms in mathematical formulas
To bound the error that we make when we ignore parts of a
program that contribute a small amount to the total being
analyzed
To allow us to classify algorithms according to upper bounds
on their total running times
46
47. Big-Oh Notation (Continue)
Often, the results of a mathematical analysis are not exact, but rather
are approximate in a precise technical sense
The O-notation allows us to keep track of the leading terms while
ignoring smaller terms when manipulating approximate mathematical
expressions
For example, if we expand the expression:
(N + O (1)) (N + O (log N) + O(1)),
we get six terms: N2 + O (N) + O (N log N) + O (log N) + O (N) +
O (1),
but can drop all but the largest O-term, leaving the approximation
N2 + O (N log N).
That is, N2 is a good approximation to this expression when N is
large.
47
48. Another Example
What if the input size is 10,000
Algorithm 1: 1,000,000
Algorithm 2: 100,000,000
Conclusion
Algorithm 1 is better!
Question:
Who is REALLY better?
Confused!
Reason
Too precise!
Solution
Big-O notation – Order of the algorithm
Rougher measurement
Measure the increasing speed, ignoring the constants and smaller items
Better algorithms have lower increasing speed
Remember
The order of an algorithm generally is more important than the speed of
the processor (CPU)
Why? 48
51. Data Structure
Definition
A data structure is a collection of data, generally
organized so that items can be stored and retrieved
by some fixed techniques
Example
An array
Stored and retrieved based on an index assigned to
each item
51
52. Data Structures vs. Software
How They Related
Software is designed to help people solve problems in
reality
To solve the problems, there are some THINGS, or
INFOs in reality to be processed
Those THINGS or INFOs are called DATA
DATA and their RELATIONS can be complicated
52
53. Data Structures vs. Software
How They Related
Reasonable organization of DATA helps improving
software efficiency, decreasing software design difficulty
Experiences accumulated in the past will be learned in this
course, and they are certain DATA STRUCUTRES, such
as the linked list and the binary tree
DATA STRUCTURE is a smart way to organize DATA,
depends on the features of DATA, and how the DATA are
processed
53
54. Phases of Software Development
Phases
Specification of the task
Design of a solution
Implementation of the solution
Analysis of the solution
Testing and debugging
Maintenance and evolution of the system
Obsolescence
54
55. Phases of Software Development
Features of the Phases
NOT a fixed sequence
For example, in a widely used OO DESIGN method,
Unified Process (UP), there are many iterations, and in
each iteration, there are specification, design,
implementation and test involved. Feedback from
previous iteration helps improving the next iteration
You can find other examples from textbook
Most phases are independent of programming
languages
We will use Java for IMPLEMNTATION
However, most of what we learned in this course
applies to other languages
55
56. Arrays
The most fundamental data structure is the array
An array is a fixed number of data items that are stored
contiguously and that are accessible by an index
A simple example of the use of an array, which prints out all the
prime numbers less than 1000.
const int N = 1000;
main( )
{ int i, j, a[N+1];
for (a[1] = 0, i = 2; i <= N; i++) a[i]=1;
for (i = 2; i <= N/2; i++)
for (j = 2; j <= N/i; j++) a[i*j] = 0;
for (i = 1; i <= N; i++)
if (a[i]) cout << i << ‘ ‘ ; cout << ‘n’;
}
56
57. Arrays (Continue)
The primary feature of arrays is that if the index is
known, any item can be accessed in constant time
The size of the array must be known beforehand, it is
possible to declare the size of an array at execution
time
Arrays are fundamental data structures in that they
have a direct correspondence with memory systems
on virtually all computers
The entire computer memory as an array, with the
memory addresses corresponding to array indices
57
58. Linked Lists
The second elementary data structure to
consider is the linked list
The primary advantage of linked lists
over arrays is that:
linked lists can grow and shrink in size during their
lifetime
their maximum size need not be known in advance
it possible to have several data structures share the
same space
58
59. Linked Lists (Continue)
A second advantage of linked lists is that:
they provide flexibility in allowing the items to be
rearranged efficiently
This flexibility is gained at the expense of quick
access to any arbitrary item in the list
A linked list is a set of items organized
sequentially, just like an array
A L I S T
A linked list
59
60. Linked Lists (Continue)
Flexible space use
Dynamically allocate space for each element as
needed
Include a pointer to the next item
Linked list Data Next
Each node of the list contains
the data item (an object pointer in our ADT)
a pointer to the next node object
60
61. Linked Lists (Continue)
Collection structure has a pointer to the list head
Initially NULL Collection
Add first item
Head
Allocate space for node node
Set its data pointer to object Data Next
Set Next to NULL
Set Head to point to new node
object
61
62. Linked Lists (Continue)
Add second item
Allocate space for node
Set its data pointer to object
Set Next to current Head
Set Head to point to new node
Collection
Head
node node
Data Next
Data Next
object2 object
62
63. Linked Lists (Continue)
head z
A L I S T
A linked list with its dummy nodes.
head z
A L I S T
head z
T A L I S
Rearranging a linked list
63
64. Linked Lists (Continue)
X
head z
A L I S T
head z
A L I X S T
head z
A L I X S T
Insertion into and deletion from a linked list.
64
65. Linked Lists - LIFO and FIFO
Single Linked List
One-way cursor
Only can move forward
Simplest implementation
Add to head
Last-In-First-Out (LIFO) semantics
Modifications
First-In-First-Out (FIFO)
Keep a tail pointer
head
tail
65
66. Linked Lists - Doubly linked
Doubly linked lists
Can be scanned in both directions
Two-way cursor
Can move forward and backward
head prev prev prev
tail
66
67. Linked List vs. Array
Arrays are better at random access
What is the 4th element in the list?
Arrays need O(C) time
Linked lists need O(n) time at worst case
Linked lists are better at additions and
removals at a cursor
Operations at the cursor need O(C) time
Arrays don’t have cursor, so addition and removal
operations need O(n) time at worst case
67
68. Linked List vs. Array
Resizing can be inefficient for an array
For arrays, capacity must be maintained in an inefficient
way
For linked lists, no problem
Summary
Array
Frequent random access operations
Linked lists
Operations occur at a cursor
Frequent capacity changes
Operations occur at a two-way cursor (DLL)
68
69. Storage Allocation
arrays are a rather direct representation of the
memory of the computer
direct-array representation of linked lists
is to use "parallel arrays“
The advantage of using parallel arrays is that the
structure can be built on top of' the data: the array
key contains data and only data all the structure
is in the parallel array next
more data can be added with more parallel arrays
69
70. Pushdown Stacks
The most important restricted-access data structure is the
pushdown stack. Items are added in a: L ast I n F irst O ut
(LIFO) approach
two basic operations are involved: one can push an item
onto the stack (insert it at the beginning) and pop an item
(remove it from the beginning)
pushdown stacks appear as the fundamental data structure
for many algorithms
The stack is represented with an array stack and pointer p
to the top of the stack the functions push, pop, and empty
are straightforward implementations of the basic stack
operations
70
71. Stack Example – Math Parser
Define Parser
9 * ( 3 + 5 ) * (4 + 2) = ?
Why not 10 ?
In INFIX notation
Convert to Postfix using a STACK
953+*42+*
Then compute using a STACK
Answer:
71
72. Infix -> Postfix Algorithm
9 * ( 3 + 5 ) * (4 + 2) = ?
Only worrying about +, *, and ()
Initialize Stack
If you get a #, output it
If you get a operand, entries are popped until
we get a lower priority
If you get a ‘)’, pop and output operands until
you clear a ‘(‘
72
90. Queues
Another fundamental restricted-access data structure
is called the queue
two basic operations are involved: one can insert
(add) an item into the queue at the beginning and
remove an item from the end
queues obey a "first in, first out” (FIFO) discipline
There is three class variables: the size of the queue
and two indices, one to the beginning of the queue
(head) and one to the end (tail)
If head and tail are equal, then the queue is defined to
be empty; but if put would make them equal, then it is
defined to be full
90
91. Applications of Queues
Direct applications
Waiting lines
Access to shared resources (e.g., printer)
Multiprogramming
Indirect applications
Auxiliary data structure for algorithms
Component of other data structures
91
92. Queue Example
You: Bank of America employee
Boss: How many tellers do I need?
How do you go about solving this
problem?
Simulations!
What are the parameters?
92
93. Bank Teller Example
Classes
Data structures
Input
Time step = 5 sec
Transaction = 2 minutes
Customer Frequency = 50% chance every 15
seconds
What questions do we want to know?
Average wait time
Average line length
How a simulation would work
93
94. More Queue examples
Networking: Router
Computer Architecture: Execution Units
Printer queues
File systems
Wal-Mart checkout lines
Disney entrance
94
95. Recursion
Two Necessary Parts
Recursive calls
Stopping or base cases
Infinite recursion
Every recursive call produces another recursive call
Stopping case not well defined, or not reached
Very useful technique
Definition of mathematical functions
Definition of data structures
Recursive structures are naturally processed by
recursive functions!
Recursively defined functions
factorial
Fibonacci
GCD by Euclid’s algorithm
Games
Towers of Hanoi
95
96. Recurrences
Factorial function, defined by the formula
N! = N . (N - 1)!, for N > 1 with 0! = 1.
This corresponds directly to the following simple recursive program:
int factorial(int N)
{ if (N == 0) return 1;
return N * factorial(N-1);
}
This program illustrates the basic features of a recursive
program: it calls itself and it has a termination condition in which it
directly computes its result
96
97. Recurrences (Continue)
Well-known recurrence relation is the one that defines the Fibonacci
numbers:
FN = FN- 1 + FN-2 , for N >= 2 with F0 = F1 = 1
The recurrence corresponds directly to the simple recursive program:
int fibonacci(int N)
{ if (N <= 2) return 1;
return fibonacci(N-1) + fibonacci(N-2);
}
This is an even less convincing example of the “power" of recursion, that the
recursive
calls indicate that FN-1 and FN-2 should be computed independently.
97
98. Recurrences (Continue)
The relationship between recursive programs and
recursively defined functions is often more
philosophical than practical
factorial function really could be implemented with a
loop and that the Fibonacci function is better handled
by storing all precomputed values in an array
98
99. Divide-and-Conquer
Most of the recursive programs use two recursive
calls, each operating on about half the input -
called "divide and conquer " paradigm for
algorithm design
Divide-and conquer is a general algorithm design
paradigm:
Divide: divide the input data S in two or more is joint
subsets S1, S2, …
Recur: solve the subproblems recursively
Conquer: combine the solutions for S1, S2, …, into a
solution for S
99
100. Divide-and-Conquer (Continue)
divide-and-conquer recursive program is a straightforward
Way to accomplish our objective:
void rule (int l, int r, int h)
{ int m = (l+r) /2;
if (h > 0)
{ rule (l,m,h-1);
mark (m, h) ;
rule (m,r,h-1);
}
}
The idea behind the method is the following: to make the marks
in an interval, first make the long mark in the middle
100
101. rule (0,8,3)
mark (4,3)
rule (0,4,2)
mark (2,2)
rule (0,2,1) Drawing a ruler (Preorder)
mark (1,1)
rule (0,1,0) in detail, giving the list of procedure calls
rule (1,2,0)
rule (2,4,1) and marks resulting from the call
mark (3,1) rule (0, 8, 3). We mark the middle and call
rule (2,3,0)
rule (3,4,0) rule for the left half, then do the same for
rule (4,8,2) the left half, and so forth, until a mark of
mark (6,2)
rule (4,6,1) length 0 is called for. Eventually we return
mark (5,1) from rule and mark right halves in the
rule (4,5,0)
rule (5,6,0) same way.
rule (6,8,1)
mark (7,1)
rule (6,7,0)
rule (7,8,0)
101
102. rule (0,8,3)
rule (0,4,2)
rule (0,2,1)
rule (0,1,0)
mark (1,1)
rule (1,2,0) Drawing a ruler (Inorder version)
mark (2,2)
rule (2,4,1) In general, divide-and-conquer
rule (2,3,0) algorithms involve doing some work
mark (3,1) to split the input into two pieces, or
rule (3,4,0)
mark (4,3) to merge the results of processing
rule (4,8,2) two independent "solved" portions
rule (4,6,1) of the input, or to help things along
rule (4,5,0)
mark (5,1) after half of the input has been
rule (5,6,0) processed.
mark (6,2)
rule (6,8,1)
rule (6,7,0)
mark( 7,1)
102
103. Divide-and-Conquer (Continue)
nonrecursive algorithm, which does not correspond to
any recursive implementation, is to draw the shortest
marks first, then the next shortest, etc.
rule(int l, int r, int h);
{ int i , j , t;
for (i=1,j=1; i<=h; i++, j+=j)
for (t = 0 ; t<=(l+r)/j; t++)
mark (l+j+t*(j+j), i);
}
combine and conquer - method of algorithm design where we
solve a problem by first solving trivial subproblems, then combining
those solutions to solve slightly bigger subproblems, etc., until the
whole problem is solved.
103
105. TREES GLOSSARY
one item follows the other, which will consider two-dimensional
linked structures called trees
Trees are encountered frequently in everyday life
A tree is a nonempty collection of vertices and edges:
A vertex is a simple object (also referred to as a node)
An edge is a connection between two vertices
A path in a tree is a list of distinct vertices in which
successive vertices are connected by edges in the tree
One node in the tree is designated, as the root the defining
property of a tree
If there is more than one path between the root and some node,
or if there is no path between the root and some node, then
what we have is a graph, not a tree
105
106. TREES
In computer science, a tree is an abstract model of a
hierarchical structure
Nodes with no children are sometimes called leaves, or
terminal nodes
Nodes with at least one child are sometimes called
nonterminal nodes
nonterminal nodes refer as internal nodes and terminal
nodes as external nodes
E
Applications:
A R E
Organization charts
File systems
A S T
Programming environments M P L E
A sample tree
106
107. TREES (Continue)
The nodes in a tree divide themselves into levels - the
level of a node is the number of nodes on the path from
the node to the root
The height of a tree is the maximum level among all
nodes in the tree (or the maximum distance to the root
from any node)
The path length of a tree is the sum of the levels of all
the nodes in the tree (or the sum of the lengths of the
paths from each node to the root)
The tree in figure of slide No 3 is height 3 and path length
21
107
108. Binary Trees
A binary tree has nodes , similar to nodes in a
linked list structure.
Data of one sort or another may be stored at each
node.
But it is the connections between the nodes
which characterize a binary tree.
108
111. A Binary Tree of States
In this example, the
data contained at
each node is one of
the 50 states.
Each tree has a
special node
called its root ,
usually drawn at
the top.
111
112. A Binary Tree of States
Each node is
permitted to have two
Arkansas has a
Arkansas has a links to other nodes,
left child, but no
left child, but no
right child. called the left child
right child.
and the right child .
Some nodes
have only one
child.
112
113. A Binary Tree of States
Washington is the
Washington is the
parent of Arkansas
parent of Arkansas
and Colorado.
and Colorado.
Each node is
called the
parent of its
A node with no children.
children is called a
leaf .
113
114. A Binary Tree of States
Two rules about parents:
The root has no
parent.
Every other node
has exactly one
parent.
114
115. A Binary Tree of States
Two nodes with
the same
Arkansas
parent are Arkansas
and Colorado
and Colorado
called siblings. are siblings.
are siblings.
115
116. Complete Binary Trees
A complete binary tree is a When a complete
When a complete
binary tree is built,
special kind of binary tree binary tree is built,
its first node must be
its first node must be
which will be useful to us. the root.
the root.
The second node of a complete
binary tree is always the left child
of the root...
116
117. Complete Binary Trees
The second node of a complete binary
tree is always the left child of the root...
... and the third node is always the right
child of the root.
....
The next nodes must always
fill the next level from left
to right .
117
118. Binary Tree
Consists of
Node
Left and Right sub-trees
Both sub-trees are binary trees
Each sub-tree
is itself
a binary tree
118
119. Trees - Performance
Find
Complete Tree
Height, h
Nodes traversed in a path from the root to a leaf
Number of nodes, h
n = 1 + 21 + 22 + … + 2h = 2h+1 - 1
h = floor( log n )
2
119
120. Trees - Performance
Find
Complete Tree
Since we need at most h+1 comparisons,
find in O(h+1) or O(log n)
Same as binary search
120
121. Summary
Binary trees contain nodes.
Each node may have a left child and a right child.
If you start from any node and move upward, you will
eventually reach the root.
Every node except the root has one parent. The root has no
parent.
Complete binary trees require the nodes to fill in each level
from left-to-right before starting the next level.
121
122. PROPERTIES
Property 1 - There is exactly one path connecting
any two nodes in a tree :
Any two nodes have a least common ancestor
that any node can be the root: each node in a tree has the
property that there is exactly one path connecting that node with
every other node in the tree
Property 2 - A tree with N codes has N - 1 edges
each node, except the root, has a unique parent, and every edge
connects a node to its parent
Property 3 - A binary tree with N internal nodes has N
+ 1 external nodes
A binary tree with no internal nodes has one external node
the left subtree has k + 1 external nodes and the right subtree has
N - k external nodes, for a total of N + 1
122
123. PROPERTIES ( Continue )
Property 4 - The external path length of any binary
tree with N internal nodes is 2N greater than the
internal path length
start with the binary tree consisting of one external node
The process starts with a tree with internal and external
path length both 0 and, for each of N steps, increases the
external path length by 2 more than the internal path length
Property 5 - The height of a full binary tree with N
internal nodes is about 10g2 N
if the height is n, then we must have 2n-1 <N+1 ≤ 2n ,
since there are N + 1 external nodes
123
124. Representing Binary Trees
The most prevalent representation of binary trees is a
straightforward use of records with two links per node
For the representation corresponds to have two different
types of records, one for internal nodes, one for external
nodes; for others, it may be appropriate to use just one
type of node and to use the links in external nodes for
some other purpose
The parse tree for an expression is defined by the simple
recursive rule: "put the operator at the root and then put
the tree for the expression corresponding to the first
operand on the left and the tree corresponding to the
expression for the second operand on the right
124
125. Representing Binary Trees ( Continue )
The parse tree for A B C + D E * * F + * (the same expression
in postfix)-- infix and postfix are two ways to represent
arithmetic expressions, parse trees are a third
*
A +
*
F
+ *
B C D E
Parse tree for A * ( ( ( B + C ) * ( D * E ) ) + F )
125
126. Representing Binary Trees ( Continue )
There are two other commonly used solutions. One option is to use a
different type of node for external nodes, one with no links. Another
option is to mark the links in some way (to distinguish them from
other links in the tree), then have them point elsewhere in the tree.
+ *
B C D E
*
+ A +
* * *
F F
+ * + * + *
B C D E B C D E B C D E
Building the parse tree for A B C + D E * * F + *
126
127. TRAVERSING TREES
How to traverse tree and how to systematically visit every node
- there are a number of different ways to proceed
The first method to consider is preorder traversal - The method
is defined by the simple recursive rule. "Visit the root, then
visit the left subtree, then visit the right subtree ."
traverse(struct node *t)
{ stack.push(t);
while ( !stack.empty ( ) )
{ t = stack.popo; visit(t);
if (t->r != z) stack.push(t->r ) ;
if (t->l != z) stack.push(t->l );
}
}
127
129. TRAVERSING TREES (Continue)
The Second method to consider is inorder traversal - is
defined with the recursive rule "visit the left subtree,
then visit the root, then visit the right subtree ." ,
sometimes called symmetric order
The implementation of a stack-based program for inorder
is almost identical to the above program.
This method of traversal is probably the most widely
used
129
131. TRAVERSING TREES (Continue)
The Third method to consider is postorder traversal - is
defined by the recursive rule "visit the left subtree,
then visit the right subtree , then visit the root ."
Implementation of a stack-based program for postorder
is more complicated than for the other two because one
must arrange for the root and the right subtree to be
saved while the left subtree is visited and for the root to
be saved while the right subtree is visited.
131
133. TRAVERSING TREES (Continue)
The Fourth method to consider is level-order traversal - is
defined not recursive at all - simply visit the nodes as they
appear on the page, reading down from top to bottom
and from left to right , because all the nodes on each level
appear together .
level-order traversal can be achieved by using the program above for
preorder, with a queue instead of a stack:
traverse(struct node *t)
{ queue.put(t);
while ( !queue.empty( ) )
{ t = queue.get( ); visit(t);
if (t->l != z) queue.put(t->l);
if (t->r != z) queue.put(t->r);
}
} 133
135. Heaps Root
A heap is a certain
kind of complete
binary tree.
When a complete
When a complete
binary tree is built,
binary tree is built,
its first node must be
its first node must be
the root.
the root.
135
136. Heaps
Complete
Left child
binary tree. of the Right child
root of the
root
The second node is
The second node is
always the left child
always the left child
of the root. The third node is
The third node is
of the root. always the right child
always the right child The next nodes
of the root.
of the root. The next nodes
always fill the next
always fill the next
level from left-to-right. .
level from left-to-right
136
137. Heaps
45
A heap is a
certain kind 35 23
of complete
binary tree. 27 21 22 4
19
Each node in a heap
Each node in a heap The "heap property"
The "heap property"
contains a key that
contains a key that requires that each
requires that each
can be compared to
can be compared to node's key is >= the
node's key is >= the
other nodes' keys.
other nodes' keys. keys of its children
keys of its children
137
138. Adding a Node to a Heap
Put the new node in the 45
next available spot.
Push the new node 42
35 23
upward, swapping with its
parent until the new node 42 27 21 22 4
reaches an acceptable
location.
19 42
138
139. Adding a Node to a Heap
The parent has a key that is
>= new node, or 45
The node reaches the root.
The process of pushing the 42 23
new node upward is
called reheapification 35 21 22 4
upward .
19 27
139
140. Removing the Top of a Heap
Move the last node onto the
27
root.
Push the out-of-place node
downward, swapping with its 42 23
larger child until the new node
reaches an acceptable 35 21 22 4
location.
19
140
141. Removing the Top of a Heap
The children all have keys <= 42
the out-of-place node, or
The node reaches the leaf. 35 23
The process of pushing the new
27 21 22 4
node downward is called
reheapification 19
downward .
141
142. Implementing a Heap
42
Data from the root
goes in the first 35 23
location of the
array. 27 21
Data from the 42 35 23
next row goes in
the next two An array of data
array locations.
142
143. Implementing a Heap
42
Data from the next row
goes in the next two
array locations. 35 23
27 21
42 35 23 27 21
An array of data
We don't care what's in
this part of the array.
143
144. Summary
A heap is a complete binary tree, where the entry at
each node is greater than or equal to the entries in
its children.
To add an entry to a heap, place the new entry at the
next available spot, and perform a reheapification
upward.
To remove the biggest entry, move the last node
onto the root, and perform a reheapification
downward.
144
146. Sorting
In numerous sorting applications, a simple algorithm may
be the method of choice
often use a sorting program only once, or just a few
times
elementary methods are always suitable for small
files
As a rule, the elementary methods - take time proportional
to N2 to sort N randomly arranged items. If N is small, this
running time may be perfectly adequate
146
147. SELECTION SORT
find the smallest element in the array, and exchange it with
the element in the first position
find the second smallest element and exchange it with the
element in the second position
Continue in this way until the entire array is sorted
- It works by repeatedly selecting the smallest remaining
element
- A disadvantage of selection sort is that its running time
depends only slightly on the amount of order already in the
file.
147
148. Selection sort
For each i from l to r-1, exchange a[i]
with the minimum element in a [i], . . . , a
[r]. As the index i travels from left to right,
the elements to its left are in their final
position in the array (and will not be
touched again), so the array is fully sorted
when i reaches the right end.
template <class Item>
void selection(Item a[], int l, int r)
{ for (int i = l; i < r; i++)
{ int min = i;
for (int j = i+1; j <= r; j++)
if (a[j] < a[min]) min = j;
exch(a[i], a[min]);
}
}
148
149. INSERTION SORT
often use to sort bridge hands is to
consider the elements one at a time
inserting each into its proper place
need to make space for the element being
inserted by moving larger elements one
position to the right
then inserting the element into the vacated
position
149
150. Insertion sort example
During the first pass of insertion
sort, the S in the second position is
larger than the A, so it does not
have to be moved. On the second
pass, when the O in the third
position is encountered, it is ex-
changed with the S to put A 0 S in
sorted order, and so forth. Un-
shaded elements that are not
circled are those that were moved
one position to the right.
The running time of insertion sort primarily depends on the initial order
of the keys in the input. For example, if the file is large and the keys are
already in order (or even are nearly in order), then insertion sort is quick
and selection sort is slow.
150
151. Insertion sort
First puts the smallest element in the array into the
first position, so that that element can serve as a
sentinel;
For each i, it sorts the elements a [1], . . ., a [i] by
moving one position to the right elements in the sorted
list a [1], . . . , a [i-1] that are larger than a [i],
then putting a [i] into its proper position.
template <class Item>
void insertion(Item a[], int l, int r)
{ int i;
for (i = r; i > l; i--) compexch(a[i-1], a[i]);
for (i = l+2; i <= r; i++)
{int j = i; Item v = a[il;
while (v < a[j-1])
{ a[j] = a[j-1]; j--; }
a[jl = v;
}
}
151
152. BUBBLE SORT
Keep passing through the file
exchanging adjacent elements that are out of order
continuing until the file is sorted
it is actually easier to implement than insertion or
selection sort is arguable
Bubble sort generally will he slower than the other two
methods
152
153. Bubble Sort (Continue)
/* Bubble sort for integers */
#define SWAP(a,b) { int t; t=a; a=b; b=t; }
void bubble( int a[], int n )
{ int i, j;
for(i=0;i<n;i++)
{ /* n passes thru the array */
/* From start to the end of unsorted part */
for(j=1;j<(n-i);j++)
{/* If adjacent items out of order, swap */
if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
}
}
}
153
154. Bubble sort example
Small keys percolate over to the left in
bubble sort. As the sort moves from right
to left, each key is exchanged with the one
on its left until a smaller one is
encountered. On the first pass, the E is
exchanged with the L, the P, and the M
before stopping at the A on the right; then
the A moves to the beginning of the file,
stopping at the other A, which is already
Bubble sort : in position. The ith smallest key reaches its
O(n2) - Very simple code final position after the ith pass, just as in
Insertion sort: selection sort, but other keys are moved
Slightly better than bubble
closer to their final position, as well.
sort
Fewer comparisons -
Also O(n2) 154
159. Searching
The goal of the search is to find all records with keys
matching a given search key
Applications of searching are widespread, and involve a
variety of different operations
Two common terms often used to describe data structures
for searching are dictionaries and symbol tables
In searching have programs that are in widespread and
frequent use to study a variety of methods that store
records in arrays that are either searched with key
comparisons or indexed by key value.
159
160. Searching (Continue)
search algorithms as belonging to packages
implementing a variety of generic operations that can
be separated from particular implementations, so
that alternate implementations can be substituted
easily. The operations of interest include:
Initialize the data structure.
Search for a record (or records) having a given key.
Insert a new record.
Delete a specified record.
Join two dictionaries to make a large one.
Sort the dictionary; output all the records in sorted order.
160
161. Searching (Continue)
search and insert operation is often included for efficiency in
situations where records with duplicate keys are not to be kept within
the data structure
Records with duplicate keys can be handled in several ways:
the primary searching data structure contain only records with
distinct keys
to leave records with equal keys in the primary searching data
structure and return any record with the given key for a search
to assume that each record has a unique identifier (apart from
the key) and require that a search find the record with a given
identifier, given the key
to arrange for the search program to call a specified function
for each record with the given key
161
162. Sequential Searching
method for searching is simply to store
the records in an array:
When a new record is to be inserted, we put it
at the end of the array
when a search is to perform, we look through
the array sequentially
162
163. Sequential Searching (Continue)
Property 1 - Sequential search (array implementation) uses
N + 1 comparisons for an unsuccessful search (always) and
about N/2 comparisons for a successful search (on the
average)
For unsuccessful search, this property follows directly
from the code: each record must be examined to decide
that a record with any particular key is absent. For
successful search, if we assume that each record is
equally likely to be sought, then the average number of
comparisons is (1 + 2 +…+ N)/N = (N + 1)/2, exactly
half the cost of unsuccessful search
163
164. Sequential Searching (Continue)
Property 2 - Sequential search (sorted list implementation)
uses about N/2 comparisons for both successful and
unsuccessful search (on the average)
For successful search, the situation is the same as
before. For unsuccessful search, if we assume that the
search is equally likely to be terminated by the tail node
z or by each of the elements in the list (which is the
case for a number of "random" search models), then the
average number of comparisons is the same as for
successful search in a table of size N + 1, or (N + 2)/2
164
165. Binary Search
Binary Search is an incredibly powerful technique for
searching an ordered list
The basic algorithm is to find the middle element of
the list
compare it against the key
decide which half of the list must contain the key
and repeat with that half
Two requirements to support binary search:
Random access of the list elements, so we need arrays
instead of linked lists.
The array must contain elements in sorted order by the
search key
165
166. Binary Search (Continue)
Property 3 - Binary search never uses more than lg N + 1
comparisons for either successful or unsuccessful search
This follows from the fact that the subfile size is at least halved at
each step: an upper bound on the number of comparisons satisfies
the recurrence CN = CN/2 +1 with C, = 1, which implies the stated
result.
It is important to note that the time required to insert new records is
high for binary search
Property 4 - Interpolation search uses fewer than lg lgN + 1
comparisons for both successful and unsuccessful search, in files of
random keys
This function is a very slowly growing one, which can be thought of
as a constant for practical purposes: if N is one billion, lg lgN < 5.
Thus, any record can be found using only a few accesses (on the
average), a substantial improvement over binary search
166
167. Binary Tree Search
Binary tree search is a simple, efficient
dynamic searching method that qualifies as of
the most fundamental algorithms in computer
science
The defining property of a binary tree is that
each node has left and right links
A binary search tree
167
168. Binary Tree Search (Continue)
Property 5 - A search or insertion in a binary search tree requires
about 2 lnN comparisons, on the average, in a tree built from N
random keys.
For each node in the tree, the number of comparisons used for
a successful search to that node is the distance to the root. The
sum of these distances for all nodes is called the internal path
length of the tree. Dividing the internal path length by N, we get
the average number of comparisons for successful search. But
if CN denotes the average internal path length of a binary
search tree of N nodes, we have the recurrence
Property 6 - In the worse case, a search in a binary search tree with
N keys can require N comparisons.
For example, when the keys are inserted in order (or in reverse
order), the binary- tree search method is no better than the
sequential search method that we saw at the beginning of this
chapter
168
Editor's Notes
Give an example, such as database of a company. Or more examples, sound, video games, …..
Give an example, such as database of a company. Or more examples, sound, video games, …..
But, unlike a linked list, the connections between the nodes are more than a simple one-to-another progression. An example can illustrate the connections in a binary tree.
This is an example of a binary tree with nine nodes. Presumably each node contains information about one of the 50 states. In this example, the states are not arranged in any particular order, except insofar as I need to illustrate the different special kinds of nodes and connections in a binary tree.
Each node in a binary tree is permitted to have two links downward to other nodes, called the left child and the right child .
Some nodes have no children, and those nodes are called leaves . In this example, there are four leaves: Massachusetts, Oklahoma, New Hampshire (or is that Vermont?) and Nebraska. (Yes, that really is Nebraska. Either the author ran out of room on the slide and had to shrink it, or the author is from rival state Colorado.)
There are two rules about parents in any tree: 1. The root never has a parent. 2. Every other node has exactly one parent. There is also a related rule which is not written here, but is part of the definition of a tree: If you start at the root, there is always one way to get from the root to any particular node by following a sequence of downward links (from a parent to its child).
Two nodes that have the same parent are called siblings , as shown here. In a binary tree, a node has at most one sibling.
When a complete binary tree is built, its nodes are generally added one at a time. As with any tree, the first node must be the root.
The next node must be the right child of the root.
A quick summary . . .
The first node of a complete binary tree is always the root...
...then the right child of the root...
So, a heap is a complete binary tree. Each node in a heap contains a key, and these keys must be organized in a particular manner. Notice that this is not a binary search tree, but the keys do follow some semblance of order. Can you see what rule is being enforced here?
We can add new elements to a heap whenever we like. Because the heap is a complete binary search tree, we must add the new element at the next available location, filling in the levels from left-to-right. In this example, I have just added the new element with a key of 42. Of course, we now have a problem: The heap property is no longer valid. The 42 is bigger than its parent 27. To fix the problem, we will push the new node upwards until it reaches an acceptable location.
In general, there are two conditions that can stop the pushing upward: 1. We reach a spot where the parent is >= the new node, or 2. We reach the root. This process is called reheapification upward (I didn't just make up that name, really).
We'll fix the problem by pushing the out-of-place node downward. Perhaps you can guess what the downward pushing is called.... reheapification downward .
Reheapification downward can stop under two circumstances: 1. The children all have keys that are <= the out-of-place node. 2. The out-of-place node reaches a leaf.
Following the usual technique for implementing a complete binary tree, the data from the root is stored in the first entry of the array.
As with any partially-filled array, we are only concerned with the front part of the array. If the tree has five nodes, then we are only concerned with the entries in the first five components of the array.