Midterm review notes Fall 2019

CS 241: Data Structures—Fall 2019 Midterm Exam Review Notes

Important note!

click on the triangles to the left to expand or contract sections!

use the text size feature in your browser to make this text larger and more readable!

About the exam

the midterm exam will take place on Friday, October 25th, at the usual lecture time and place (Ford 204 at 1:50pm)

the exam will be designed so that you should be able to complete it in less than an hour

you will not have access to books, notes, computers, networks or neighbors during the exam

(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)

In preparing for the exam, you should consult:

your textbook (see especially chapters and sections mentioned below)

your notes from lecture and lab—these are likely to be especially helpful

the home page for the course and various links, examples, etc., provided there

… and also especially the lab assignments linked from the home page, as well as your programs

Format: the exam will consist of several kinds of questions; typically:

10-12 True-or-False questions at 1-2 points each

8-10 short answer questions at 4-6 points each

1-2 longer answer questions at 8-15 points each

These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!

General background to Data Structures and modularity

Chapter 1, section 1.1; also Chapter 3, section 3.1

large software projects are often written and maintained by groups of people

programming language features can allow us to manage complexity by circumscribing various tasks

different classes which implement the same interface can be "plugged" and "unplugged" (replaced) at will

ideally, we unplug a slower implementation and plug in a faster one, each implementing the same interface

a common language of specification, implementation and analysis (O notation) supports communication

the use of standard techniques is valued so that people can understand and modify each other's code

standard solutions to standard problems often involve "tricks" (innovations) that would be hard to come up with on the spot

Math review and recursion

Chapter 1, sections 1.2-1.3

Functions and algorithms

a (mathematical) function is a set of input-output pairs, without regard to how they are computed

the inputs and outputs may be numbers, but also data structures, etc.

an algorithm is a technique for computing a function, step by step method, but still somewhat abstract

a program is a very specific realization of an algorithm in a particular language, with choices of variable names, etc.

these three ideas form a kind of hierarchy, with functions the most abstract and programs the most concrete

the relationships are many-to-one between algorithms and functions, and between programs and algorithms

algorithms have (abstract, O-notation style) running times; functions do not (and programs have real-world running times)

Exponents and logarithms

the logarithm is a very slowly growing function of its argument

technically, the log is the power to which you must bring the base to equal the argument number

roughly, a log in "base b" is the number of digits required to write the argument in base b

so, in base 10, the log of one thousand is 3, and of one million is 6

at the whole power of the base, there is actually one more digit, as the "odometer" turns over

logarithms obey certain mathematical laws similar to those for exponents

example: log (a·b) = log(a) + log(b)

example: log (ak) = k·log(a)

log functions sit below linear functions, and (n log n) sits between linear and squaring in the hierarchy of O-notation (see below)

O notation eliminates the need for specifying the bas of the logarithm, since it ignores constant factors

(but many logs in computing are really base 2)

Basics of recursion

an algorithm is recursive if it refers to itself (calls itself) in its own definition

recursion is often an alternative to iteration (loops)

for non-linear structures (e.g., trees), we may need an auxiliary stack to hold pending nodes while we iterate

recursion is implemented using a stack, one item (called a stack frame) for each method call

(actually, all method calls are implemented this way, partly out of anticipation of recursion)

Java review and new features

Chapter 1, sections 1.4-1.5

Review—see external notes!

the Java ontology page linked from the home page

Interfaces (and abstract classes)

signatures include the name of a method, the type of the result and the number and type of the arguments

(argument names don't matter for matching a signature)

(signatures with different result types, but the same name and argument types, conflict)

interfaces are collections of method names and signatures: no method bodies!

an interface allows us to specify a set of capabilities without (yet) specifying how they will be implemented

interfaces support a notion of contractual agreement about how a class can be accesses and used, how it will behave

but Java does not allow behavioral specification beyond signatures

one interface may be declared as extending another, so as to provide some (hierarchical) structure

a class can be declared as implementing an interface: class A implements I { ... }

but note that only the signature, not the behavior, is specified in Java or checked by the compiler!

abstract classes are "halfway between" a class and an interface

they may have some signatures and some full definitions

this allows a form of "default implementation" when a full method is defined

classes may extend abstract classes, forming a hierarchy

you may not instantiate (with new) an abstract class!

Exceptions

rationale for exceptions: allow for unexpected circumstances without littering your code

the try { ... } catch (...) { ... } statement wraps around code, minimizing the cognitive interference of error-checking

methods must be "branded" with the "throws Exception" decoration if they might throw an uncaught exception

some exceptions are exempt: standard runtime exceptions such as array bounds and null references

exceptions themselves are declared like other classes, but they extend Exception

exceptions are created and thrown as in: throw new Exception(...)

Wrapper classes

for each primitive type (int, boolean, char) there is a wrapper class (Integer, Boolean, etc.)

the wrapper class provides a kind of a "box" which holds values of the associated type

the value can be changed, but the box object remains the same

the wrapper class also supplies a "home" for static methods related to primitive types (e.g., Integer.parseInt(…))

Generics

in order to avoid the problem of Object casting, we can specify an element type for a structure

the <T> type variable is affixed to the end of the class name in declarations of the class and of variable types

once in scope, the T type variable can be used as the type of variables, etc.

see Generics page at Oracle for a tutorial or section 1.5 in the textbook

Algorithm analysis and asymptotics (O notation)

Chapter 2, sections 2.1-2.4.5

Context and goals

we want a way to describe and compare the running times of algorithms, but:

it should be insensitive to particular hardware/machines

it should emphasize the behavior as problems (or data sets) grow large

we abstract away the detail of a function (e.g., a polynomial) representing the running time

we eliminate low order terms: just keep the highest one

we eliminate constant factors (this has a downside, though)

technically, we say that f is O(g) when there is a constant factor c and an initial point n0

f(n) is less than or equal to c·g(n) for all n > n0

idea: we can pick a constant that, when multiplied by the high-order term, will dominate all low order terms

the hierarchy of O-notation functions typically seen is:

log(n) < n < n log(n) < n2 < n3 < ... < cn

how do we measure times for O-notation?

count individual operations as steps

ignore constant amounts of steps: look for loops, traversals that depend on n somehow

Watch out for this!

when f is O(g), it may be that f is less than g

Arrays, ArrayLists and linked lists

Chapter 3, sections 3.2-3.5

Arrays

see your introductory Java text for review and examples

or see the Oracle Java arrays tutorial page

ArrayLists

ArrayLists provide a number of useful methods for getting, adding and removing items from the collection

however, indiscriminate use of these features without understanding how they are implemented can lead to bad performance

in general, recall that some operations may require up to O(n) time in order to move or copy other elements

specifically, adding and removing items from the end of the ArrayList is cheap (O(1)), but adding and removing from the front is expensive (O(n))

Linked Lists

basic tasks: insertion, deletion, search, indexing, containment, size, check for emptiness

singly-linked lists: links run only in one direction (can't easily go backwards)

doubly-linked lists: links run in both directions (takes more space, more complex operations)

circularly-linked lists: single or double links, no first node

be careful of the order when re-linking list nodes on insertion and deletion!

Stacks and Queues

Chapter 3, sections 3.6-3.7

Stacks

a stack supports insertion (push) and deletion (pop), but with a specific "discipline"

push and pop behave in a LIFO way: last in, first out

(think of a stack of dinner plates on a spring at Goudy)

applications of stacks:

keeping track of pending method calls

matching parentheses (and brackets and braces) in program syntax

traversing the nodes of a tree in depth-first order (e.g., pre-order, in-order or post-order)

implementing a stack with an array

keep an index of the top item (or the next top: this is different!)

increment on add, decrement on remove

remember to check for empty (or full!)

implementing a stack with a list

keep a reference to (the node for) the top item

add a new node at the front to push a new item (set the next reference before changing the top reference)

remove a node from the front top pop an item (hold onto the item while re-setting the top reference)

Queues

a queue supports insertion (add) and deletion (remove), but with a specific "discipline"

add and remove behave in a FIFO way: first in, first out

(think of a line of people waiting at the bank or grocery store)

applications of queues:

round robin allocation of resources (fair scheduling)

holding a buffer of items that cannot be treated (and released) as quickly as they arrive

traversing the nodes of a tree in breadth-first order (i.e., by levels)

implementing a queue with an array

keep indices for the front and the back: increment the appropriate index upon add or remove

(but do the indices "point" to current or next items?)

wrap around to the front when you reach the end (use modulus operator % after addition)

note that the full part of the queue can "wrap around" from the end to the beginning of the array

keep a separate size variable to make it easier to distinguish empty from full

implementing a queue with a list

keep pointer to first and last nodes

add at the last end; remove from the first (the other way won't work as well)

the empty queue is a special case

Infix and Postfix expressions

Chapter 3, sections 3.6.3

Expression trees

since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as

binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)

expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix

postfix puts the arguments first, then the operator afterwards (post-position); prefix is the opposite

postfix and prefix forms never need parentheses

you should be able to convert between different forms of expressions for the exam!!

various tree traversals (see below) yield similar representations of expressions

except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees

Infix form

algebraic expressions are usually written with binary operators (like +, - and *), constants (numerals) and variables

when we write expressions, we can use parentheses to show structure

we leave excess parentheses out according to the usual "order of operations"

multiplication and division have higher precedence than addition and subtraction

when it matters, we associate to the left: a+b+c really means (a+b)+c, even though the value is unaffected by the order

… but sometimes we need to use parentheses, when the order does not match our intent

we can always put extra parentheses in for emphasis

this usual way of writing expressions is called infix form

Postfix form

an alternative style is to put the operators after their arguments: 2 3 +

in this case, called postfix form, we never need parentheses

postfix form is based on the ideas of Polish logicians and is sometimes called Polish notation

Evaluating postfix

we can use a stack to evaluate (variable-free) postfix forms:

proceed through the expression from left to right

for a constant, push it on the stack

for an operator, pop off two arguments and apply the operator, then push the result back on

be careful of the argument order: first one popped is the right argument!

when done, we should always have exactly one number on the stack (and have never failed to pop 2)

Trees

See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6 (peruse also splay trees in 4.5)

Tree definitions and terminology

A tree is a collection structure which can hold elements in positions called nodes;

a tree is a linked structure, but unlike a linked list, each node may have several linked nodes

nodes: the “places” where data items are kept in the tree

links: the connections between nodes (represented as references/“pointers” in Java)

path: a sequence of links between nodes (and the nodes along the way, depending on context)

root: the top node of a tree

sub-tree: a node in a tree considered as a root of its own tree

children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)

parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)

height: the number of links between the root of a tree and its furthest descendant

internal nodes: nodes which have a child

leaf or external node: nodes which have no children (or all null children, in a Java implementation)

fringe: the set of all leaves of a tree, considered as a sequence

ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say

sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles

in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)

General trees

in a general tree, any node may have an arbitrary number of children

we could represent children in a general tree using any (ordered) collection, but a linked list might be best

we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)

Binary trees

when every node in a tree has at most two children and they are distinguished as left- and right-children

in other words, when every node has a left and right child, either of which may be null

Binary search trees

a binary search tree is a binary tree whose elements support some ordering, and where:

all the nodes to the left of the root hold values that are less than the the one in the root

all the nodes to the right of the root hold values that are greater than the the one in the root

similarly for all sub-trees (i.e., same rule throughout the tree)

equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)

binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key

insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:

compare value being inserted (or sought) to the element in the root, go left or right as appropriate

height of a bushy tree is O(log n) where n is the total number of elements

int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups

Deletion from BSTs

deletion from a BST depends on some cases:

deletion of a leaf is easy: just set the reference in the parent to null

deletion of a node with a single child is almost as easy: just re-set the reference int he parent to circumvent the deleted node

if a node has two children, look for the “adjacent node” (next smaller or next larger) in the tree;

this will be in the “inside corner” of one of the sub-trees (extreme left of right sub-tree or vice versa);

replace the element in the deleted node with that from the insider corner node, then delete that node (which is easier)

Tree traversals

a tree traversal is a process visiting all the nodes (or elements thereof) in a tree

typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order

by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:

left-right pre-order: visit node, traverse left, traverse right

left-right inorder: traverse left, visit node, traverse right

left-right postorder: traverse left, traverse right, visit node

… and so on, for three others but right-to-left (6 possible permutations all together)

if we want to implement the Iterator interface, we can’t use recursion: use an explicit stack instead

if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)

Balanced trees (AVL)

since “bushy” or balanced trees are important to make BSTs efficient, it is worth considering how to keep trees balanced

many possible definitions of balance, but it is not enough that the tree “balances” around the root like a coat hanger

for AVL trees, the difference in height between sub-trees must be at most one (throughout the tree)

for the exam, you should be able to recognize this property of a tree

when inserting a node, an AVL tree may get out of balance: we correct this by performing rotations

there are several cases to consider, with symmetries, some requiring single and some requiring double rotations

see the textbook pages 123-136 for details, examples and pictures