V
CS 241: Data Structures—Final Exam Review—Fall 2020
V
Important notes!
*
click on the triangles to the left to expand or contract sections!
*
use the text size feature in your browser to make this text larger and more readable!
V
About the exam
*
the final exam will take place on Thursday, December 3rd, from 2-5pm
*
the exam will be held by a combination of email (to send the exam) and Zoom link (for proctoring/questions)
*
the exam will be designed so that you should be able to complete it in less than the three hours allotted
*
you will not have access to books, notes, computers (except to Zoom), networks, or friends during the exam
(you are encouraged to ask your instructor for clarifications during the exam if necessary)
V
In preparing for the exam, you should consult:
*
your textbook (see especially chapters and sections mentioned below)
*
your notes from lecture and lab—these are likely to be especially helpful
*
the home page for the course and various links, examples, etc., provided there
*
… and also especially the lab assignments linked from the home page, as well as your own programs
V
Format: the exam will consist of several kinds of questions; typically:
*
10-15 True-or-False questions at 1-2 points each
*
8-10 short answer questions at 4-6 points each
*
2-4 longer answer questions at 8-15 points each
*
These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
V
The final exam is comprehensive!
*
But there will naturally be some emphasis on material from after the midterm
*
For more information on the midterm, see the Midterm Review Notes
V
… but here are the main subject areas again (with book chapters) for the midterm material:
*
Modularity, abstraction, and reuse [see textbook Section 1.3.2 and also those for objects below]
*
Stacks and (linear) abstract data types [see textbook Sections 3.1-3.3]
*
Numbers, numeral bases, and conversion [see textbook Section 3.3.6]
*
Course introduction, basic concepts [see textbook Sections 1.1-1.3]
*
Review of Python [see textbook Section 1.4]
*
Algorithm analysis and Big-O notation [see textbook Section 2.2]
*
Efficiency of built-in Python data structures [see textbook Section 2.3]
*
Expressions, bracketing, and conversion [see textbook Sections 3.3.4-3.3.7]
*
Classes and methods in Python [see textbook Section 1.4.6]
V
Linked lists, queues, and deques [see textbook Sections 3.4-3.6]
V
Linked lists
V
linked lists are based on nodes with references
*
a node is an object which holds some data and also some references to other objects
*
remember that when we change an object, all the references to it “see” the changes (aliasing)
*
array-based lists allow for fast indexing (O(1) time
*
rather than Python’s built-in lists (based on resizable arrays), we can “roll our own” lists
*
… but arrays must be reallocated and copied whenever they run out of room
*
linked lists can be extended or trimmed in O(1) time
*
… but traversing to an index, or searching for an item, takes O(n) time
*
the run-time behavior of the two kinds of lists can be different for some operations
V
we also usually have a container class of objects representing the list (or other structure) itself
*
for lists, we might have an object with a head reference or perhaps a head and tail references
*
we can also keep useful information such as the size of the list, a boolean for emptiness, etc.
(these should be updated as necessary whenever the list is)
V
for linked lists, each node also holds a reference to the next node in the list
*
note that “local” changes to a few pointers are still O(1) (since they don’t effect/inspect the whole list, or (pro)portions thereof)
*
for circularly-linked lists, the final item points to the first (rather than to None)
*
for doubly-linked lists, there is also a previous reference
*
keeping all these links straight (coherent) can be difficult and error-prone
V
Queues
*
a queue supports insertion (add or enqueue) and deletion (remove or dequeue), but with a specific “discipline”
V
add and remove behave in a FIFO way: first in, first out
*
(think of a line of people waiting at the bank or grocery store)
V
applications of queues:
*
round robin allocation of resources (fair scheduling)
*
holding a buffer of items that cannot be treated (and released) as quickly as they arrive
*
traversing the nodes of a tree in breadth-first order (i.e., by levels)
V
implementing a queue with a (linked) list
*
keep pointer to first and last nodes
*
add at the last end; remove from the first (the other way won't work as well)
*
the empty queue is a special case
V
Deques
*
deques (pronounced like “decks”) are doubly-ended queues
*
we can add or remove (enqueue or dequeue) from either end of the structure
*
no specific implementations will be referenced on the exam
V
Recursion [see textbook Chapter 4]
*
an algorithm is recursive if it refers to itself (calls itself) in its own definition
V
recursion can be concise and clear, but watch out for:
*
multiple recursive calls (as in the Fibonacci example)
*
a missing base case or lack of progress, each of which can make the recursion go forever
V
recursion is often an alternative to iteration (loops)
*
for non-linear structures (e.g., trees), we may need an auxiliary stack to hold pending nodes while we iterate
V
recursion is implemented using a stack, one item (called a stack frame) for each method call
*
(actually, all method calls are implemented this way, partly out of anticipation of recursion)
*
it can help to visualize recursion by way of a tree of recursive calls
V
examples of recursion
*
Towers of Hanoi
*
maze search algorithm
*
fractals in graphics (e.g., fractal trees)
V
Dynamic programming [see textbook Section 4.7]
*
recursion often makes excessive calls by calling repeatedly with the same argument
*
rather than re-calling the function/method (which can be expensive), we can store the results
*
a basic approoach, called memoization, just saves the results in a structure (like a Python list or dict)
*
dynamic programming is a more advanced approach in which we calculate all results from the bottom up
V
Searching and hashing [see textbook Sections 5.1-5.2]
V
Searching in lists
*
unordered lists: searching can take O(n) in the expected/average and worst cases
*
ordered lists: searching can be reduced down to O(log(n)) using a divide and conquer strategy
*
linked lists (as opposed to Python’s built-in, array-based lists) do not support divide-and-conquer (no fast indexing)
*
(note that making them doubly-linked doesn’t help)
V
Hashing and hash tables
V
Introduction
*
basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)
*
we want a “messy”, random-seeming (but repeatable!) process which distributes keys well
*
unfortunately, it may happen that two keys collide under the hash function
*
solution: keep items in an array … but under what index?
*
use the modulus operator (%) to limit the hash results to the table size (-1)
V
Dictionary interface: look values up by key
*
often the key is a field in a larger value object
V
what do we want from hash function?
*
speed!
*
coverage: all slots (not limited in magnitude or ability to hit), even distribution
*
lack of collisions (as much as possible)
V
collision resolution strategies
*
internal addressing/probing: try to insert at nearby slots if the preferred one is filled
*
… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)
*
linear probing: try successive entries (but primary clustering)
*
quadratic probing: square the “attempt number” to get an offset (but secondary clustering)
*
separate chaining: use a list of items hanging off each array entry
V
best bets for hashing
*
well-chosen table size (about twice as much as there is data, and use a prime number)
*
use a well-distributed hash function
V
Sorting [see textbook Section 5.3]
V
Introduction to sorting
*
one of the most common operations
*
ordered data often supports binary search (fast!)
*
requires some notion of ordering: compare two items for being {less, equal, greater}
V
Naïve sorting
*
time bounds: O(n2)
*
insertion sort: next physical item inserted into proper place in the ordered result
*
selection sort: next ordered item selected and added physically next in the result
*
bubble sort: repeatedly swap items, “bubbling up” to the boundary
*
average- vs. best- vs. worst-case
V
Better sorting
*
quick sort: split items using pivot, then recursively sort the two parts
*
time bounds: O(n log n) in average case (but can be bad for sorted data)
*
hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)
*
merge sort: split items into tow halves, recursively sort, then merge back together
V
A rational reconstruction of (many) sorts
*
hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)
*
one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)
V
Trees [see textbook Sections 6.1-6.4, and 6.5.2]
V
Tree definitions and terminology
*
ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say
*
nodes: the “places” where data items are kept in the tree
*
path: a sequence of links between nodes (and the nodes along the way, depending on context)
*
sub-tree: a node in a tree considered as a root of its own tree
*
internal nodes: nodes which have a child
*
fringe: the set of all leaves of a tree, considered as a sequence
*
A tree is a collection structure which can hold elements in positions called nodes;
a tree is a linked structure, but unlike a linked list, each node may have several linked nodes
*
links: the connections between nodes (represented as references/“pointers” in Java)
*
root: the top node of a tree
*
parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)
*
height: the number of links between the root of a tree and its furthest descendant
*
leaf or external node: nodes which have no children (or all null children, in a Java implementation)
*
sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles
*
children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)
*
in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)
V
General trees
*
in a general tree, any node may have an arbitrary number of children
*
we could represent children in a general tree using any (ordered) collection, but a linked list might be best
*
we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)
V
Binary trees
*
when every node in a tree has at most two children and they are distinguished as left- and right-children
*
in other words, when every node has a left and right child, either of which may be null
V
Tree traversals
*
if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)
*
a tree traversal is a process visiting all the nodes (or elements thereof) in a tree
*
typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order
V
by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:
*
left-right inorder: traverse left, visit node, traverse right
*
… and so on, for three others but right-to-left (6 possible permutations all together)
*
left-right pre-order: visit node, traverse left, traverse right
*
left-right postorder: traverse left, traverse right, visit node
*
if we don’t want to use recursion, we can use an explicit stack instead
V
Binary search trees
*
binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key
V
insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:
*
compare value being inserted (or sought) to the element in the root, go left or right as appropriate
*
height of a bushy tree is O(log n) where n is the total number of elements
V
a binary search tree is a binary tree whose elements support some ordering, and where:
*
equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)
*
all the nodes to the left of the root hold values that are less than the the one in the root
*
all the nodes to the right of the root hold values that are greater than the the one in the root
*
similarly for all sub-trees (i.e., same rule throughout the tree)
*
int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups