V
CS 241: Data Structures—Fall 2019 Midterm Exam Review Notes
V
Important note!
*
click on the triangles to the left to expand or contract sections!
*
use the text size feature in your browser to make this text larger and more readable!
V
About the exam
*
the midterm exam will take place on Friday, October 25th, at the usual lecture time and place (Ford 204 at 1:50pm)
*
the exam will be designed so that you should be able to complete it in less than an hour
*
you will not have access to books, notes, computers, networks or neighbors during the exam
(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)
V
In preparing for the exam, you should consult:
*
your textbook (see especially chapters and sections mentioned below)
*
your notes from lecture and lab—these are likely to be especially helpful
*
the home page for the course and various links, examples, etc., provided there
*
… and also especially the lab assignments linked from the home page, as well as your programs
V
Format: the exam will consist of several kinds of questions; typically:
*
10-12 True-or-False questions at 1-2 points each
*
8-10 short answer questions at 4-6 points each
*
1-2 longer answer questions at 8-15 points each
*
These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
V
General background to Data Structures and modularity
*
Chapter 1, section 1.1; also Chapter 3, section 3.1
*
large software projects are often written and maintained by groups of people
*
programming language features can allow us to manage complexity by circumscribing various tasks
*
different classes which implement the same interface can be "plugged" and "unplugged" (replaced) at will
*
ideally, we unplug a slower implementation and plug in a faster one, each implementing the same interface
*
a common language of specification, implementation and analysis (O notation) supports communication
*
the use of standard techniques is valued so that people can understand and modify each other's code
*
standard solutions to standard problems often involve "tricks" (innovations) that would be hard to come up with on the spot
V
Math review and recursion
*
Chapter 1, sections 1.2-1.3
V
Functions and algorithms
V
a (mathematical) function is a set of input-output pairs, without regard to how they are computed
*
the inputs and outputs may be numbers, but also data structures, etc.
*
an algorithm is a technique for computing a function, step by step method, but still somewhat abstract
*
a program is a very specific realization of an algorithm in a particular language, with choices of variable names, etc.
*
these three ideas form a kind of hierarchy, with functions the most abstract and programs the most concrete
*
the relationships are many-to-one between algorithms and functions, and between programs and algorithms
*
algorithms have (abstract, O-notation style) running times; functions do not (and programs have real-world running times)
V
Exponents and logarithms
*
the logarithm is a very slowly growing function of its argument
*
technically, the log is the power to which you must bring the base to equal the argument number
V
roughly, a log in "base b" is the number of digits required to write the argument in base b
*
so, in base 10, the log of one thousand is 3, and of one million is 6
*
at the whole power of the base, there is actually one more digit, as the "odometer" turns over
V
logarithms obey certain mathematical laws similar to those for exponents
*
example: log (a·b) = log(a) + log(b)
*
example: log (ak) = k·log(a)
*
log functions sit below linear functions, and (n log n) sits between linear and squaring in the hierarchy of O-notation (see below)
V
O notation eliminates the need for specifying the bas of the logarithm, since it ignores constant factors
*
(but many logs in computing are really base 2)
V
Basics of recursion
*
an algorithm is recursive if it refers to itself (calls itself) in its own definition
V
recursion is often an alternative to iteration (loops)
*
for non-linear structures (e.g., trees), we may need an auxiliary stack to hold pending nodes while we iterate
V
recursion is implemented using a stack, one item (called a stack frame) for each method call
*
(actually, all method calls are implemented this way, partly out of anticipation of recursion)
V
Java review and new features
*
Chapter 1, sections 1.4-1.5
V
Review—see external notes!
V
Interfaces (and abstract classes)
V
signatures include the name of a method, the type of the result and the number and type of the arguments
*
(argument names don't matter for matching a signature)
*
(signatures with different result types, but the same name and argument types, conflict)
*
interfaces are collections of method names and signatures: no method bodies!
*
an interface allows us to specify a set of capabilities without (yet) specifying how they will be implemented
V
interfaces support a notion of contractual agreement about how a class can be accesses and used, how it will behave
*
but Java does not allow behavioral specification beyond signatures
*
one interface may be declared as extending another, so as to provide some (hierarchical) structure
V
a class can be declared as implementing an interface: class A implements I { ... }
*
but note that only the signature, not the behavior, is specified in Java or checked by the compiler!
V
abstract classes are "halfway between" a class and an interface
*
they may have some signatures and some full definitions
*
this allows a form of "default implementation" when a full method is defined
*
classes may extend abstract classes, forming a hierarchy
*
you may not instantiate (with new) an abstract class!
V
Exceptions
*
rationale for exceptions: allow for unexpected circumstances without littering your code
*
the try { ... } catch (...) { ... } statement wraps around code, minimizing the cognitive interference of error-checking
V
methods must be "branded" with the "throws Exception" decoration if they might throw an uncaught exception
*
some exceptions are exempt: standard runtime exceptions such as array bounds and null references
*
exceptions themselves are declared like other classes, but they extend Exception
*
exceptions are created and thrown as in: throw new Exception(...)
V
Wrapper classes
*
for each primitive type (int, boolean, char) there is a wrapper class (Integer, Boolean, etc.)
V
the wrapper class provides a kind of a "box" which holds values of the associated type
*
the value can be changed, but the box object remains the same
*
the wrapper class also supplies a "home" for static methods related to primitive types (e.g., Integer.parseInt(…))
V
Generics
*
in order to avoid the problem of Object casting, we can specify an element type for a structure
*
the <T> type variable is affixed to the end of the class name in declarations of the class and of variable types
*
once in scope, the T type variable can be used as the type of variables, etc.
*
see Generics page at Oracle for a tutorial or section 1.5 in the textbook
V
Algorithm analysis and asymptotics (O notation)
*
Chapter 2, sections 2.1-2.4.5
V
Context and goals
V
we want a way to describe and compare the running times of algorithms, but:
*
it should be insensitive to particular hardware/machines
*
it should emphasize the behavior as problems (or data sets) grow large
V
we abstract away the detail of a function (e.g., a polynomial) representing the running time
*
we eliminate low order terms: just keep the highest one
*
we eliminate constant factors (this has a downside, though)
V
technically, we say that f is O(g) when there is a constant factor c and an initial point n0
*
f(n) is less than or equal to c·g(n) for all n > n0
*
idea: we can pick a constant that, when multiplied by the high-order term, will dominate all low order terms
V
the hierarchy of O-notation functions typically seen is:
*
log(n) < n < n log(n) < n2 < n3 < ... < cn
V
how do we measure times for O-notation?
*
count individual operations as steps
*
ignore constant amounts of steps: look for loops, traversals that depend on n somehow
V
Watch out for this!
*
when f is O(g), it may be that f is less than g
V
Arrays, ArrayLists and linked lists
*
Chapter 3, sections 3.2-3.5
V
Arrays
*
see your introductory Java text for review and examples
V
ArrayLists
*
ArrayLists provide a number of useful methods for getting, adding and removing items from the collection
*
however, indiscriminate use of these features without understanding how they are implemented can lead to bad performance
*
in general, recall that some operations may require up to O(n) time in order to move or copy other elements
*
specifically, adding and removing items from the end of the ArrayList is cheap (O(1)), but adding and removing from the front is expensive (O(n))
V
Linked Lists
*
basic tasks: insertion, deletion, search, indexing, containment, size, check for emptiness
*
singly-linked lists: links run only in one direction (can't easily go backwards)
*
doubly-linked lists: links run in both directions (takes more space, more complex operations)
*
circularly-linked lists: single or double links, no first node
*
be careful of the order when re-linking list nodes on insertion and deletion!
V
Stacks and Queues
*
Chapter 3, sections 3.6-3.7
V
Stacks
*
a stack supports insertion (push) and deletion (pop), but with a specific "discipline"
V
push and pop behave in a LIFO way: last in, first out
*
(think of a stack of dinner plates on a spring at Goudy)
V
applications of stacks:
*
keeping track of pending method calls
*
matching parentheses (and brackets and braces) in program syntax
*
traversing the nodes of a tree in depth-first order (e.g., pre-order, in-order or post-order)
V
implementing a stack with an array
*
keep an index of the top item (or the next top: this is different!)
*
increment on add, decrement on remove
*
remember to check for empty (or full!)
V
implementing a stack with a list
*
keep a reference to (the node for) the top item
*
add a new node at the front to push a new item (set the next reference before changing the top reference)
*
remove a node from the front top pop an item (hold onto the item while re-setting the top reference)
V
Queues
*
a queue supports insertion (add) and deletion (remove), but with a specific "discipline"
V
add and remove behave in a FIFO way: first in, first out
*
(think of a line of people waiting at the bank or grocery store)
V
applications of queues:
*
round robin allocation of resources (fair scheduling)
*
holding a buffer of items that cannot be treated (and released) as quickly as they arrive
*
traversing the nodes of a tree in breadth-first order (i.e., by levels)
V
implementing a queue with an array
*
keep indices for the front and the back: increment the appropriate index upon add or remove
*
(but do the indices "point" to current or next items?)
*
wrap around to the front when you reach the end (use modulus operator % after addition)
*
note that the full part of the queue can "wrap around" from the end to the beginning of the array
*
keep a separate size variable to make it easier to distinguish empty from full
V
implementing a queue with a list
*
keep pointer to first and last nodes
*
add at the last end; remove from the first (the other way won't work as well)
*
the empty queue is a special case
V
Infix and Postfix expressions
*
Chapter 3, sections 3.6.3
V
Expression trees
*
since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as
binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)
V
expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix
*
postfix puts the arguments first, then the operator afterwards (post-position); prefix is the opposite
*
postfix and prefix forms never need parentheses
*
you should be able to convert between different forms of expressions for the exam!!
V
various tree traversals (see below) yield similar representations of expressions
*
except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees
V
Infix form
*
algebraic expressions are usually written with binary operators (like +, - and *), constants (numerals) and variables
*
when we write expressions, we can use parentheses to show structure
V
we leave excess parentheses out according to the usual "order of operations"
*
multiplication and division have higher precedence than addition and subtraction
*
when it matters, we associate to the left: a+b+c really means (a+b)+c, even though the value is unaffected by the order
*
… but sometimes we need to use parentheses, when the order does not match our intent
*
we can always put extra parentheses in for emphasis
*
this usual way of writing expressions is called infix form
V
Postfix form
*
an alternative style is to put the operators after their arguments: 2 3 +
*
in this case, called postfix form, we never need parentheses
*
postfix form is based on the ideas of Polish logicians and is sometimes called Polish notation
V
Evaluating postfix
V
we can use a stack to evaluate (variable-free) postfix forms:
*
proceed through the expression from left to right
*
for a constant, push it on the stack
*
for an operator, pop off two arguments and apply the operator, then push the result back on
*
be careful of the argument order: first one popped is the right argument!
*
when done, we should always have exactly one number on the stack (and have never failed to pop 2)
V
Trees
*
See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6 (peruse also splay trees in 4.5)
V
Tree definitions and terminology
*
A tree is a collection structure which can hold elements in positions called nodes;
a tree is a linked structure, but unlike a linked list, each node may have several linked nodes
*
nodes: the “places” where data items are kept in the tree
*
links: the connections between nodes (represented as references/“pointers” in Java)
*
path: a sequence of links between nodes (and the nodes along the way, depending on context)
*
root: the top node of a tree
*
sub-tree: a node in a tree considered as a root of its own tree
*
children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)
*
parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)
*
height: the number of links between the root of a tree and its furthest descendant
*
internal nodes: nodes which have a child
*
leaf or external node: nodes which have no children (or all null children, in a Java implementation)
*
fringe: the set of all leaves of a tree, considered as a sequence
*
ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say
*
sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles
*
in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)
V
General trees
*
in a general tree, any node may have an arbitrary number of children
*
we could represent children in a general tree using any (ordered) collection, but a linked list might be best
*
we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)
V
Binary trees
*
when every node in a tree has at most two children and they are distinguished as left- and right-children
*
in other words, when every node has a left and right child, either of which may be null
V
Binary search trees
V
a binary search tree is a binary tree whose elements support some ordering, and where:
*
all the nodes to the left of the root hold values that are less than the the one in the root
*
all the nodes to the right of the root hold values that are greater than the the one in the root
*
similarly for all sub-trees (i.e., same rule throughout the tree)
*
equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)
*
binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key
V
insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:
*
compare value being inserted (or sought) to the element in the root, go left or right as appropriate
*
height of a bushy tree is O(log n) where n is the total number of elements
*
int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups
V
Deletion from BSTs
V
deletion from a BST depends on some cases:
*
deletion of a leaf is easy: just set the reference in the parent to null
*
deletion of a node with a single child is almost as easy: just re-set the reference int he parent to circumvent the deleted node
*
if a node has two children, look for the “adjacent node” (next smaller or next larger) in the tree;
this will be in the “inside corner” of one of the sub-trees (extreme left of right sub-tree or vice versa);
replace the element in the deleted node with that from the insider corner node, then delete that node (which is easier)
V
Tree traversals
*
a tree traversal is a process visiting all the nodes (or elements thereof) in a tree
*
typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order
V
by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:
*
left-right pre-order: visit node, traverse left, traverse right
*
left-right inorder: traverse left, visit node, traverse right
*
left-right postorder: traverse left, traverse right, visit node
*
… and so on, for three others but right-to-left (6 possible permutations all together)
*
if we want to implement the Iterator interface, we can’t use recursion: use an explicit stack instead
*
if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)
V
Balanced trees (AVL)
*
since “bushy” or balanced trees are important to make BSTs efficient, it is worth considering how to keep trees balanced
*
many possible definitions of balance, but it is not enough that the tree “balances” around the root like a coat hanger
*
for AVL trees, the difference in height between sub-trees must be at most one (throughout the tree)
*
for the exam, you should be able to recognize this property of a tree
*
when inserting a node, an AVL tree may get out of balance: we correct this by performing rotations
*
there are several cases to consider, with symmetries, some requiring single and some requiring double rotations
*
see the textbook pages 123-136 for details, examples and pictures