midterm review notes 2019

CS 241: Data Structures—Spring 2019 Midterm Exam Review Notes

Important note!

click on the triangles to the left to expand or contract sections!

use the text size feature in your browser to make this text larger and more readable!

About the exam

the midterm exam will take place on Friday, March 22nd, at the usual lecture time and place

the exam will be designed so that you should be able to complete it in less than an hour

you will not have access to books, notes, computers, networks or neighbors during the exam

(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)

In preparing for the exam, you should consult:

your textbook (see especially chapters and sections mentioned below)

your notes from lecture and lab—these are likely to be especially helpful

the home page for the course and various links, examples, etc., provided there

… and also especially the lab assignments linked from the home page, as well as your programs

Format: the exam will consist of several kinds of questions; typically:

10-12 True-or-False questions at 1-2 points each

8-10 short answer questions at 4-6 points each

1-2 longer answer questions at 8-15 points each

These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!

General background to Data Structures and modularity

Chapter 1, section 1.1; also Chapter 3, section 3.1

large software projects are often written and maintained by groups of people

programming language features can allow us to manage complexity by circumscribing various tasks

different classes which implement the same interface can be "plugged" and "unplugged" (replaced) at will

ideally, we unplug a slower implementation and plug in a faster one, each implementing the same interface

a common language of specification, implementation and analysis (O notation) supports communication

the use of standard techniques is valued so that people can understand and modify each other's code

standard solutions to standard problems often involve "tricks" (innovations) that would be hard to come up with on the spot

Math review and recursion

Chapter 1, sections 1.2-1.3

Functions and algorithms

a (mathematical) function is a set of input-output pairs, without regard to how they are computed

the inputs and outputs may be numbers, but also data structures, etc.

an algorithm is a technique for computing a function, step by step method, but still somewhat abstract

a program is a very specific realization of an algorithm in a particular language, with choices of variable names, etc.

these three ideas form a kind of hierarchy, with functions the most abstract and programs the most concrete

the relationships are many-to-one between algorithms and functions, and between programs and algorithms

algorithms have (abstract, O-notation style) running times; functions do not (and programs have real-world running times)

Exponents and logarithms

the logarithm is a very slowly growing function of its argument

technically, the log is the power to which you must bring the base to equal the argument number

roughly, a log in "base b" is the number of digits required to write the argument in base b

so, in base 10, the log of one thousand is 3, and of one million is 6

at the whole power of the base, there is actually one more digit, as the "odometer" turns over

logarithms obey certain mathematical laws similar to those for exponents

example: log (a·b) = log(a) + log(b)

example: log (ak) = k·log(a)

log functions sit below linear functions, and (n log n) sits between linear and squaring in the hierarchy of O-notation (see below)

O notation eliminates the need for specifying the bas of the logarithm, since it ignores constant factors

(but many logs in computing are really base 2)

Basics of recursion

an algorithm is recursive if it refers to itself (calls itself) in its own definition

recursion is often an alternative to iteration (loops)

for non-linear structures (e.g., trees), we may need an auxiliary stack to hold pending nodes while we iterate

recursion is implemented using a stack, one item for each method call

(actually, all method calls are implemented this way, partly out of anticipation of recursion)

Java review and new features

Chapter 1, sections 1.4-1.5

Review—see external notes!

the Java ontology page linked from the home page

Interfaces (and abstract classes)

signatures include the name of a method, the type of the result and the number and type of the arguments

(argument names don't matter for matching a signature)

(signatures with different result types, but the same name and argument types, conflict)

interfaces are collections of method names and signatures: no method bodies!

an interface allows us to specify a set of capabilities without (yet) specifying how they will be implemented

interfaces support a notion of contractual agreement about how a class can be accesses and used, how it will behave

but Java does not allow behavioral specification beyond signatures

one interface may be declared as extending another, so as to provide some (hierarchical) structure

a class can be declared as implementing an interface: class A implements I { ... }

but note that only the signature, not the behavior, is specified in Java or checked by the compiler!

abstract classes are "halfway between" a class and an interface

they may have some signatures and some full definitions

this allows a form of "default implementation" when a full method is defined

classes may extend abstract classes, forming a hierarchy

you may not instantiate (with new) an abstract class!

Wrapper classes

for each primitive type (int, boolean, char) there is a wrapper class (Integer, Boolean, etc.)

the wrapper class provides a kind of a "box" which holds values of the associated type

the value can be changed, but the box object remains the same

the wrapper class also supplies a "home" for static methods related to primitive types (e.g., Integer.parseInt(…))

Generics

in order to avoid the problem of Object casting, we can specify an element type for a structure

the <T> type variable is affixed to the end of the class name in declarations of the class and of variable types

once in scope, the T type variable can be used as the type of variables, etc.

see Generics page at Oracle for a tutorial or section 1.5 in the textbook

Algorithm analysis and asymptotics (O notation)

Chapter 2, sections 2.1-2.4.5

Context and goals

we want a way to describe and compare the running times of algorithms, but:

it should be insensitive to particular hardware/machines

it should emphasize the behavior as problems (or data sets) grow large

we abstract away the detail of a function (e.g., a polynomial) representing the running time

we eliminate low order terms: just keep the highest one

we eliminate constant factors (this has a downside, though)

technically, we say that f is O(g) when there is a constant factor c and an initial point n0

f(n) is less than or equal to c·g(n) for all n > n0

idea: we can pick a constant that, when multiplied by the high-order term, will dominate all low order terms

the hierarchy of O-notation functions typically seen is:

log(n) < n < n log(n) < n2 < n3 < ... < cn

how do we measure times for O-notation?

count individual operations as steps

ignore constant amounts of steps: look for loops, traversals that depend on n somehow

Watch out for this!

when f is O(g), it may be that f is less than g

Arrays, ArrayLists and linked lists

Chapter 3, sections 3.2-3.5

Arrays

see your introductory Java text for review and examples

or see the Oracle Java arrays tutorial page

ArrayLists

ArrayLists provide a number of useful methods for getting, adding and removing items from the collection

however, indiscriminate use of these features without understanding how they are implemented can lead to bad performance

in general, recall that some operations may require up to O(n) time in order to move or copy other elements

specifically, adding and removing items from the end of the ArrayList is cheap (O(1)), but adding and removing from the front is expensive (O(n))

Linked Lists

basic tasks: insertion, deletion, search, indexing, containment, size, check for emptiness

singly-linked lists: links run only in one direction (can't easily go backwards)

doubly-linked lists: links run in both directions (takes more space, more complex operations)

circularly-linked lists: single or double links, no first node

be careful of the order when re-linking list nodes on insertion and deletion!

Stacks and Queues

Chapter 3, sections 3.6-3.7

Stacks

a stack supports insertion (push) and deletion (pop), but with a specific "discipline"

push and pop behave in a LIFO way: last in, first out

(think of a stack of dinner plates on a spring at Goudy)

applications of stacks:

keeping track of pending method calls

matching parentheses (and brackets and braces) in program syntax

traversing the nodes of a tree in depth-first order (e.g., pre-order, in-order or post-order)

implementing a stack with an array

keep an index of the top item (or the next top: this is different!)

increment on add, decrement on remove

remember to check for empty (or full!)

implementing a stack with a list

keep a reference to (the node for) the top item

add a new node at the front to push a new item (set the next reference before changing the top reference)

remove a node from the front top pop an item (hold onto the item while re-setting the top reference)

Queues

a queue supports insertion (add) and deletion (remove), but with a specific "discipline"

add and remove behave in a FIFO way: first in, first out

(think of a line of people waiting at the bank or grocery store)

applications of queues:

round robin allocation of resources (fair scheduling)

holding a buffer of items that cannot be treated (and released) as quickly as they arrive

traversing the nodes of a tree in breadth-first order (i.e., by levels)

implementing a queue with an array

keep indices for the front and the back: increment the appropriate index upon add or remove

(but do the indices "point" to current or next items?)

wrap around to the front when you reach the end (use modulus operator % after addition)

note that the full part of the queue can "wrap around" from the end to the beginning of the array

keep a separate size variable to make it easier to distinguish empty from full

implementing a queue with a list

keep pointer to first and last nodes

add at the last end; remove from the first (the other way won't work as well)

the empty queue is a special case

Infix and Postfix expressions

Chapter 3, sections 3.6.3

Expression trees

since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as

binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)

expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix

postfix puts the arguments first, then the operator afterwards (post-position); prefix is the opposite

postfix and prefix forms never need parentheses

you should be able to convert between different forms of expressions for the exam!!

various tree traversals (see below) yield similar representations of expressions

except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees

Infix form

algebraic expressions are usually written with binary operators (like +, - and *), constants (numerals) and variables

when we write expressions, we can use parentheses to show structure

we leave excess parentheses out according to the usual "order of operations"

multiplication and division have higher precedence than addition and subtraction

when it matters, we associate to the left: a+b+c really means (a+b)+c, even though the value is unaffected by the order

… but sometimes we need to use parentheses, when the order does not match our intent

we can always put extra parentheses in for emphasis

this usual way of writing expressions is called infix form

Postfix form

an alternative style is to put the operators after their arguments: 2 3 +

in this case, called postfix form, we never need parentheses

postfix form is based on the ideas of Polish logicians and is sometimes called Polish notation

Evaluating postfix

we can use a stack to evaluate (variable-free) postfix forms:

proceed through the expression from left to right

for a constant, push it on the stack

for an operator, pop off two arguments and apply the operator, then push the result back on

be careful of the argument order: first one popped is the right argument!

when done, we should always have exactly one number on the stack (and have never failed to pop 2)