V
CS 241: Data Structures—Final Exam Review Notes
V
Important notes!
*
click on the triangles to the left to expand or contract sections!
*
use the text size feature in your browser to make this text larger and more readable!
V
About the exam
*
the final exam will take place on Monday, May 13, 2019, from 8-11 am, in the usual lecture room (Ford 202)
*
the exam will be designed so that you should be able to complete it in one-and-a-half to two hours
*
you will not have access to books, notes, computers, networks or neighbors during the exam
(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)
V
In preparing for the exam, you should consult:
*
your textbook (see especially chapters and sections mentioned below)
*
your notes from lecture and lab—these are likely to be especially helpful
*
the home page for the course and various links, examples, etc., provided there
*
… and also especially the lab assignments linked from the home page, as well as your own programs
*
the sample quizzes on the homepage
V
Format: the exam will consist of several kinds of questions; typically:
*
10-15 True-or-False questions at 1-2 points each
*
8-10 short answer questions at 4-6 points each
*
2-4 longer answer questions at 8-15 points each
*
These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
V
The final exam is comprehensive!
*
But there will naturally be some emphasis on material from after the midterm
*
For more information on the midterm, see the Midterm Review Notes
V
… but here are the main subject areas again (with book chapters) for the midterm material:
V
General background to Data Structures and modularity
*
Chapter 1, section 1.1; also Chapter 3, section 3.1
V
Math review and recursion
*
Chapter 1, sections 1.2-1.3
V
Java review and new features
*
Chapter 1, sections 1.4-1.5
V
Review—see external notes!
V
Algorithm analysis and asymptotics (O notation)
*
Chapter X, sections y-z
V
Arrays, ArrayLists and linked lists
*
Chapter 3, sections 3.2-3.5
V
Stacks and Queues
*
Chapter 3, sections 3.6-3.7
V
Infix and Postfix expressions
*
Chapter 3, sections 3.6.3
V
Trees
*
See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6 (peruse also splay trees in 4.5)
V
Tree definitions and terminology
*
A tree is a collection structure which can hold elements in positions called nodes;
a tree is a linked structure, but unlike a linked list, each node may have several linked nodes
*
nodes: the “places” where data items are kept in the tree
*
links: the connections between nodes (represented as references/“pointers” in Java)
*
path: a sequence of links between nodes (and the nodes along the way, depending on context)
*
root: the top node of a tree
*
sub-tree: a node in a tree considered as a root of its own tree
*
children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)
*
parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)
*
height: the number of links between the root of a tree and its furthest descendant
*
internal nodes: nodes which have a child
*
leaf or external node: nodes which have no children (or all null children, in a Java implementation)
*
fringe: the set of all leaves of a tree, considered as a sequence
*
ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say
*
sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles
*
in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)
V
General trees
*
in a general tree, any node may have an arbitrary number of children
*
we could represent children in a general tree using any (ordered) collection, but a linked list might be best
*
we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)
V
Binary trees
*
when every node in a tree has at most two children and they are distinguished as left- and right-children
*
in other words, when every node has a left and right child, either of which may be null
V
Expression trees
*
since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as
binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)
V
expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix
*
postfix puts the arguments first, then the operator afterwards (post-position); prefix is the opposite
*
postfix and prefix forms never need parentheses
*
you should be able to convert between different forms of expressions for the exam!!
V
various tree traversals (see below) yield similar representations of expressions
*
except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees
V
Tree traversals
*
a tree traversal is a process visiting all the nodes (or elements thereof) in a tree
*
typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order
V
by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:
*
left-right pre-order: visit node, traverse left, traverse right
*
left-right inorder: traverse left, visit node, traverse right
*
left-right postorder: traverse left, traverse right, visit node
*
… and so on, for three others but right-to-left (6 possible permutations all together)
*
if we want to implement the Iterator interface, we can’t use recursion: use an explicit stack instead
*
if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)
V
Binary search trees
V
a binary search tree is a binary tree whose elements support some ordering, and where:
*
all the nodes to the left of the root hold values that are less than the the one in the root
*
all the nodes to the right of the root hold values that are greater than the the one in the root
*
similarly for all sub-trees (i.e., same rule throughout the tree)
*
equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)
*
binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key
V
insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:
*
compare value being inserted (or sought) to the element in the root, go left or right as appropriate
*
height of a bushy tree is O(log n) where n is the total number of elements
*
int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups
V
Deletion from BSTs
V
deletion from a BST depends on some cases:
*
deletion of a leaf is easy: just set the reference in the parent to null
*
deletion of a node with a single child is almost as easy: just re-set the reference int he parent to circumvent the deleted node
*
if a node has two children, look for the “adjacent node” (next smaller or next larger) in the tree;
this will be in the “inside corner” of one of the sub-trees (extreme left of right sub-tree or vice versa);
replace the element in the deleted node with that from the insider corner node, then delete that node (which is easier)
V
Balanced trees (AVL)
*
since “bushy” or balanced trees are important to make BSTs efficient, it is worth considering how to keep trees balanced
*
many possible definitions of balance, but it is not enough that the tree “balances” around the root like a coat hanger
*
for AVL trees, the difference in height between sub-trees must be at most one (throughout the tree)
*
for the exam, you should be able to recognize this property of a tree
*
when inserting a node, an AVL tree may get out of balance: we correct this by performing rotations
*
there are several cases to consider, with symmetries, some requiring single and some requiring double rotations
*
see the textbook pages 123-136 for details, examples and pictures
V
Splay trees
*
See text section 4.5 on splay trees
*
a splay tree is a binary search tree without a special notion of balance
*
instead, we always bring the accessed item to the root (top) when we search for something
*
the splaying operation, which brings the item to the root, is similar to a series of tree rotations along the path
*
the splaying operation takes O(log n) time, amortized over many operations
*
splay trees provide an amortized good behavior: over many operations, the average is guaranteed
*
you are not responsible for the details of the “zig-zig” or “zig-zag” cases of the splay operation
V
Other kinds of trees
*
there are many other kinds of trees that maintain balance using rotations: red-black trees, B-trees, 2-3 trees, etc.
*
you don’t need to know the details of any other kinds of trees, but should be aware of the general notion of tree, and that there are many variations
V
Graphs
*
we did not study graphs in detail, but remember that they are a more generalized version of trees
*
a graph consists of a collection of nodes and edges between them
*
a graph need not be completely connected, have a root, or be “downward”-directed
*
a tree is a: rooted, connected, directed graph, often with ordered children, but with no sharing or cycle
V
Priority queues and binary heaps
*
See Chapter 6 of the text (esp. 6.1through 6.4)
V
Abstraction levels
*
a priority queue is an ADT (abstract data type) supporting insertion of prioritized items, and removal of the highest priority item
*
(priority queues based on integer orderings can either be based on minimum or maximum items as “high priority”)
*
a binary heap is an implementation idea for priority queues using left-complete binary trees
*
a binary heap may be implemented in an array, giving especially fast access to the children and parents of a “node”
V
Linear implementations
*
there are obvious, but ultimately naïve, implementations of a priority queue using a linked list (say)
*
like a normal queue, we add to one end (the tail of the list) and remove from the other
V
we can either choose to honor the priority ordering on:
*
the way in (search for proper position on insertion)
*
or the way our (search for highest priority item on removal)
V
either way, the naïve implementations lead to an O(1) versus O(n) split for insertion and removal
*
(in other words, one of the two operations will be especially expensive in time)
*
the binary heap, by contrast, provides both operations in O(log n) time: faster and more consistent
V
Binary heaps (as trees)
*
the binary heap implementation has both a structural aspect and an ordering aspect
*
we can store the items in a left-complete binary tree (filled up all the way to the bottom row and then from the left)
V
the heap order: the values in the descendants of every node are lower priority than the value in the node
*
NOTE: there is no left/right aspect; as long as all descendants are lower priority, the heap order is satisfied
*
this is a “looser” order than in a binary search tree, whence the name “heap”
*
for both operations: restore structure first, then heap order
*
to insert: add at next position in bottom row (last in level-order traversal), then “bubble up” to right place
*
to remove(Min): take from root, then move last place item to root & bubble down
*
both operations take O(log n) time (with good constants) since the tree is guaranteed to be a minimum height
V
in both cases, we can save even more time by swapping “holes” rather than actual value
*
(this saves time because we make only one re-assignment per move until the end)
*
to build the original heap (buildHeap): start bubble up each node starting from first with child (in reverse level-order traversal)
V
Binary heap in an array
*
we can avoid allocating actual tree nodes by making a left-complete binary tree in an array
*
store the items in positions indexed starting from 1 in level-order
*
every node’s parent is in index n/2 (rounding down)
*
every node’s children are in indices 2n and 2n+1
*
these operations (involving multiplication and division by 2) can be implemented very quickly as bit shifts
V
Applications of priority queues
*
printer queues: in the old days, printers were shared by so many that priority was used to determine printing order
*
priority queues are used as “internal” structures in the implementation of many advanced algorithms (esp. for graphs)
*
sorting! (priority queues are used in the so-called “heapsort” algorithm)
V
Other variations on priority queues and heaps
*
you are not responsible for the exam for variations, but you should know there are many (d-heaps, leftist heaps, binomial queues, fibonacci heaps, etc.)
*
most variations support extra operations well, such as merging two priority queues
V
Hashing and hash tables
*
See Chapter 5 of the text, especially sections 5.1-5.4
V
Introduction
V
Dictionary interface: look values up by key
*
often the key is a field in a larger value object
*
can be implemented in O(log n) time using a BST (see above)
*
… but it would be better to have O(1)!
*
solution: keep items in an array
*
but under what index?
*
basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)
*
we want a “messy”, random-seeming (but repeatable!) process which distributes keys well
*
unfortunately, it may happen that two keys collide under the hash function
*
use the modulus operator (%) to limit the hash results to the table size (-1)
V
what do we want from hash function?
*
speed!
*
coverage: all slots (not limited in magnitude or ability to hit), even distribution
*
lack of collisions (as much as possible)
V
collision resolution strategies
*
separate chaining: use a list of items hanging off each array entry
*
internal addressing/probing: try to insert at nearby slots if the preferred one is filled
*
… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)
*
linear probing: try successive entries (but primary clustering)
*
quadratic probing: square the “attempt number” to get an offset (but secondary clustering)
V
best bets for hashing
*
well-chosen table size (about twice as much as there is data, and use a prime number)
*
use a well-distributed hash function
V
Sorting
*
See Chapter 7 of the text, especially sections 7.1-7.8
V
Introduction to sorting
*
one of the most common operations
*
ordered data supports binary search (fast!)
V
comparison interfaces in Java
*
Comparable—internal/“natural”: this.compareTo(that)
*
Comparator—external: compare(this, that)
*
both return an integer: negative (this is less), zero, positive
V
Naïve sorting
*
selection sort: next ordered item selected and added physically next in the result
*
insertion sort: next physical item inserted into proper place in the ordered result
*
bubble sort: repeatedly swap items, “bubbling up” to the boundary
*
average- vs. best- vs. worst-case
*
time bounds: O(n2)
V
Better sorting
*
tree sort: put items into a binary search tree, then traverse
*
heap sort: put items into a heap, then pull them all out (with repeated deleteMin)
*
quick sort: split items using pivot, then recursively sort the two parts
*
merge sort: split items into tow halves, recursively sort, then merge back together
*
time bounds: O(n log n) in average case (but can be bad for sorted data)
*
hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)
V
A rational reconstruction of (many) sorts
*
hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)
*
one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)