V
CS 241: Data Structures—Final Exam Review Notes
V
Important notes!
*
click on the triangles to the left to expand or contract sections!
*
use the text size feature in your browser to make this text larger and more readable!
V
About the exam
*
the final exam will take place on Wednesday, December 11, 2019, from 2-5 pm, in the usual lecture room (Ford 204)
*
the exam will be designed so that you should be able to complete it in one-and-a-half to two hours
*
you will not have access to books, notes, computers, networks or neighbors during the exam
(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)
V
In preparing for the exam, you should consult:
*
your textbook (see especially chapters and sections mentioned below)
*
your notes from lecture and lab—these are likely to be especially helpful
*
the home page for the course and various links, examples, etc., provided there
*
… and also especially the lab assignments linked from the home page, as well as your own programs
*
the sample quizzes on the homepage
V
Format: the exam will consist of several kinds of questions; typically:
*
10-15 True-or-False questions at 1-2 points each
*
8-10 short answer questions at 4-6 points each
*
2-4 longer answer questions at 8-15 points each
*
These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
V
The final exam is comprehensive!
*
But there will naturally be some emphasis on material from after the midterm
*
For more information on the midterm, see the Midterm Review Notes
V
… but here are the main subject areas again (with book chapters) for the midterm material:
V
General background to Data Structures and modularity
*
Chapter 1, section 1.1; also Chapter 3, section 3.1
V
Math review and recursion
*
Chapter 1, sections 1.2-1.3
V
Java review and new features
*
Chapter 1, sections 1.4-1.5
V
Review—see external notes!
V
Algorithm analysis and asymptotics (O notation)
*
Chapter X, sections y-z
V
Arrays, ArrayLists and linked lists
*
Chapter 3, sections 3.2-3.5
V
Stacks and Queues
*
Chapter 3, sections 3.6-3.7
V
Infix and Postfix expressions
*
Chapter 3, sections 3.6.3
V
Priority queues and binary heaps
*
See Chapter 6 of the text (esp. 6.1through 6.4)
V
Abstraction levels
*
a priority queue is an ADT (abstract data type) supporting insertion of prioritized items, and removal of the highest priority item
*
(priority queues based on integer orderings can either be based on minimum or maximum items as “high priority”)
*
a binary heap is an implementation idea for priority queues using left-complete binary trees
*
a binary heap may be implemented in an array, giving especially fast access to the children and parents of a “node”
V
Linear implementations
*
there are obvious, but ultimately naïve, implementations of a priority queue using a linked list (say)
*
like a normal queue, we add to one end (the tail of the list) and remove from the other
V
we can either choose to honor the priority ordering on:
*
the way in (search for proper position on insertion)
*
or the way our (search for highest priority item on removal)
V
either way, the naïve implementations lead to an O(1) versus O(n) split for insertion and removal
*
(in other words, one of the two operations will be especially expensive in time)
*
the binary heap, by contrast, provides both operations in O(log n) time: faster and more consistent
V
Binary heaps (as trees)
*
the binary heap implementation has both a structural aspect and an ordering aspect
*
we can store the items in a left-complete binary tree (filled up all the way to the bottom row and then from the left)
V
the heap order: the values in the descendants of every node are lower priority than the value in the node
*
NOTE: there is no left/right aspect; as long as all descendants are lower priority, the heap order is satisfied
*
this is a “looser” order than in a binary search tree, whence the name “heap”
*
for both operations: restore structure first, then heap order
*
to insert: add at next position in bottom row (last in level-order traversal), then “bubble up” to right place
*
to remove(Min): take from root, then move last place item to root & bubble down
*
both operations take O(log n) time (with good constants) since the tree is guaranteed to be a minimum height
V
in both cases, we can save even more time by swapping “holes” rather than actual value
*
(this saves time because we make only one re-assignment per move until the end)
*
to build the original heap (buildHeap): start bubble up each node starting from first with child (in reverse level-order traversal)
V
Binary heap in an array
*
we can avoid allocating actual tree nodes by making a left-complete binary tree in an array
*
store the items in positions indexed starting from 1 in level-order
*
every node’s parent is in index n/2 (rounding down)
*
every node’s children are in indices 2n and 2n+1
*
these operations (involving multiplication and division by 2) can be implemented very quickly as bit shifts
V
Applications of priority queues
*
printer queues: in the old days, printers were shared by so many that priority was used to determine printing order
*
priority queues are used as “internal” structures in the implementation of many advanced algorithms (esp. for graphs)
*
sorting! (priority queues are used in the so-called “heapsort” algorithm)
V
Other variations on priority queues and heaps
*
you are not responsible for the exam for variations, but you should know there are many (d-heaps, leftist heaps, binomial queues, fibonacci heaps, etc.)
*
most variations support extra operations well, such as merging two priority queues
V
Hashing and hash tables
*
See Chapter 5 of the text, especially sections 5.1-5.4
V
Introduction
V
Dictionary interface: look values up by key
*
often the key is a field in a larger value object
*
can be implemented in O(log n) time using a BST (see above)
*
… but it would be better to have O(1)!
*
solution: keep items in an array
*
but under what index?
*
basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)
*
we want a “messy”, random-seeming (but repeatable!) process which distributes keys well
*
unfortunately, it may happen that two keys collide under the hash function
*
use the modulus operator (%) to limit the hash results to the table size (-1)
V
what do we want from hash function?
*
speed!
*
coverage: all slots (not limited in magnitude or ability to hit), even distribution
*
lack of collisions (as much as possible)
V
collision resolution strategies
*
separate chaining: use a list of items hanging off each array entry
*
internal addressing/probing: try to insert at nearby slots if the preferred one is filled
*
… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)
*
linear probing: try successive entries (but primary clustering)
*
quadratic probing: square the “attempt number” to get an offset (but secondary clustering)
V
best bets for hashing
*
well-chosen table size (about twice as much as there is data, and use a prime number)
*
use a well-distributed hash function
V
Sorting
*
See Chapter 7 of the text, especially sections 7.1-7.8
V
Introduction to sorting
*
one of the most common operations
*
ordered data supports binary search (fast!)
V
comparison interfaces in Java
*
Comparable—internal/“natural”: this.compareTo(that)
*
Comparator—external: compare(this, that)
*
both return an integer: negative (this is less), zero, positive
V
Naïve sorting
*
selection sort: next ordered item selected and added physically next in the result
*
insertion sort: next physical item inserted into proper place in the ordered result
*
bubble sort: repeatedly swap items, “bubbling up” to the boundary
*
average- vs. best- vs. worst-case
*
time bounds: O(n2)
V
Better sorting
*
tree sort: put items into a binary search tree, then traverse
*
heap sort: put items into a heap, then pull them all out (with repeated deleteMin)
*
quick sort: split items using pivot, then recursively sort the two parts
*
merge sort: split items into tow halves, recursively sort, then merge back together
*
time bounds: O(n log n) in average case (but can be bad for sorted data)
*
hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)
V
A rational reconstruction of (many) sorts
*
hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)
*
one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)
V
Disjoint Sets
*
See Chapter 8 of the text (esp. 8.1through 8.5)
V
Relations and equivalence relations
*
a relation R is just a set of pairs of elements of some common set S
*
when a pair is in the set, we say that the elements x, y of S are related by it: x R y
*
for example, in a set of numbers, n < m would be a relation, so e.g. 2 < 3
V
an equivalence relation is one which has these properties:
*
reflexive: when a R a for every element a of S
*
reflexive: when a R a for every element a of S
*
reflexive: when a R a for every element a of S
V
Partitions
V
a partition of a set S is just a way of “breaking up” S into a collection of subsets
*
(so they should join together to completely “cover” S)
*
every equivalence relation determines a partition: related elements are placed in the same subset
*
every partition determines an equivalence relation: elements in the same subset are deemed related
*
the subset in which a given element sits (in a partition) is called its equivalence class
V
Dynamic equivalence problem
*
we imagine an unknown equivalence relation (or partition)
*
over time, we both receive queries (called “find”s) and information (called “union”s)
V
find asks which subset (equivalence class) an element is in
*
find takes elements and returns a “name” for the subset
*
union tells us that (new information) two elements are in fact related (so in the same subset)
*
the algorithm is dynamic and on-line: we discover and report over time, but accurately for the moment
*
initially, no elements are related (so each is in its own, lonely little subset 😢 )
V
Simple algorithm
*
use a forest (collection) of “upward-pointing” trees
*
for a find, proceed up the tree to root, and report root’s name
*
for a union, join one tree’s root to the other’s
*
rather than actual trees (with pointers), we can just use an array of indices
V
Smart unions and path compression
V
rather than choosing which root to merge “into” randomly, we can
*
perform unions by size: make smaller tree a sub-tree of the larger (yields O(M), where M = # of operations)
*
perform unions by height: make the shallower tree a sub-tree of the deeper one
*
with path compression, we merge nodes into the target root as we climb the “tree” on a find operation
V
Graphs and graph algorithms
*
See Chapter 9 of the text (esp. 9.1through 9.3)
V
basic definitions
V
vertices and edges
*
may keep extra info in nodes or on edges (often “weights” or “costs”, also colors, etc.)
*
adjacency
*
directed and undirected graphs
*
paths and path lengths
*
loops (self), and cycles
V
connected graph
*
strongly and weakly connected for digraphs
*
complete graph
V
Graph representation
*
as an adjacency matrix O(|V|^2)
*
… but probably sparse, so try adjacency list O(V*E)
V
Graph applications
V
airport travel
*
may be non-symmetric
*
may want to optimize relative to weights, etc.
*
(also see, e.e., colorings of nodes)
V
Topoological sort
V
list of edges that respects adjacency (for directed graph)
*
i.e., if a path from vi to vj, then vj appears after vi in the ordering
*
example of pre-requisite structure
V
doesn’t work with cycles, of course, since no way to respect the ordering
*
(no good starting point for cycle)
V
recursive algorithm:
*
start with “root” (no incoming edges)
*
print & remove
*
repeat
V
better:
*
keep “root” (-ish) in a “box"
*
when deleting edge, decrement the indegree of adjacent nodes
*
see pseudocode in Weiss p. 365
*
note: graph and in-degree computations already in place (is this fair?)
*
note how comparison of O(V^2) to O(V+E) is dependent on sparseness of graph
V
Shortest path; tech: single-source, shortest-path, for weighted (usual) or unweighted graphs
V
weighted path length (sum of weights) vs. unweighted (only path length)
*
path from (vi,vj) has cost cij
*
note possibility of negative edge weights and thus negative-cost cycles
V
interesting fact (Weiss p. 367):
*
“Currently there are no algorithms in which finding the path from s to one vertex is any faster (by more than a constant factor) than finding the path from s to all vertices.”
V
unweighted version: O(|E|+|V|)
*
ignore actual path, focus on length only: can recover path just by keeping track as we go
*
start by marking (shortest path to) source as length 0
*
use breadth-first traversal to increase (minimally) the markers on (starting with adjacent to s) nodes
V
keep a table with three pieces of info per node
*
boolean known
*
distance from s (starts at infinity, hopefully goes down)
*
bookkeeping for final path emission (printing)
*
weighted version: O(|E| log |V|)