CS 241 final review notes 2019

CS 241: Data Structures—Final Exam Review Notes

Important notes!

click on the triangles to the left to expand or contract sections!

use the text size feature in your browser to make this text larger and more readable!

About the exam

the final exam will take place on Monday, May 13, 2019, from 8-11 am, in the usual lecture room (Ford 202)

the exam will be designed so that you should be able to complete it in one-and-a-half to two hours

you will not have access to books, notes, computers, networks or neighbors during the exam

(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)

In preparing for the exam, you should consult:

your textbook (see especially chapters and sections mentioned below)

your notes from lecture and lab—these are likely to be especially helpful

the home page for the course and various links, examples, etc., provided there

… and also especially the lab assignments linked from the home page, as well as your own programs

the sample quizzes on the homepage

Format: the exam will consist of several kinds of questions; typically:

10-15 True-or-False questions at 1-2 points each

8-10 short answer questions at 4-6 points each

2-4 longer answer questions at 8-15 points each

These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!

The final exam is comprehensive!

But there will naturally be some emphasis on material from after the midterm

For more information on the midterm, see the Midterm Review Notes

… but here are the main subject areas again (with book chapters) for the midterm material:

General background to Data Structures and modularity

Chapter 1, section 1.1; also Chapter 3, section 3.1

Math review and recursion

Chapter 1, sections 1.2-1.3

Java review and new features

Chapter 1, sections 1.4-1.5

Review—see external notes!

the Java ontology page linked from the home page

Algorithm analysis and asymptotics (O notation)

Chapter X, sections y-z

Arrays, ArrayLists and linked lists

Chapter 3, sections 3.2-3.5

Stacks and Queues

Chapter 3, sections 3.6-3.7

Infix and Postfix expressions

Chapter 3, sections 3.6.3

Trees

See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6 (peruse also splay trees in 4.5)

Tree definitions and terminology

A tree is a collection structure which can hold elements in positions called nodes;

a tree is a linked structure, but unlike a linked list, each node may have several linked nodes

nodes: the “places” where data items are kept in the tree

links: the connections between nodes (represented as references/“pointers” in Java)

path: a sequence of links between nodes (and the nodes along the way, depending on context)

root: the top node of a tree

sub-tree: a node in a tree considered as a root of its own tree

children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)

parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)

height: the number of links between the root of a tree and its furthest descendant

internal nodes: nodes which have a child

leaf or external node: nodes which have no children (or all null children, in a Java implementation)

fringe: the set of all leaves of a tree, considered as a sequence

ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say

sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles

in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)

General trees

in a general tree, any node may have an arbitrary number of children

we could represent children in a general tree using any (ordered) collection, but a linked list might be best

we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)

Binary trees

when every node in a tree has at most two children and they are distinguished as left- and right-children

in other words, when every node has a left and right child, either of which may be null

Expression trees

since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as

binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)

expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix

postfix puts the arguments first, then the operator afterwards (post-position); prefix is the opposite

postfix and prefix forms never need parentheses

you should be able to convert between different forms of expressions for the exam!!

various tree traversals (see below) yield similar representations of expressions

except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees

Tree traversals

a tree traversal is a process visiting all the nodes (or elements thereof) in a tree

typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order

by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:

left-right pre-order: visit node, traverse left, traverse right

left-right inorder: traverse left, visit node, traverse right

left-right postorder: traverse left, traverse right, visit node

… and so on, for three others but right-to-left (6 possible permutations all together)

if we want to implement the Iterator interface, we can’t use recursion: use an explicit stack instead

if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)

Binary search trees

a binary search tree is a binary tree whose elements support some ordering, and where:

all the nodes to the left of the root hold values that are less than the the one in the root

all the nodes to the right of the root hold values that are greater than the the one in the root

similarly for all sub-trees (i.e., same rule throughout the tree)

equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)

binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key

insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:

compare value being inserted (or sought) to the element in the root, go left or right as appropriate

height of a bushy tree is O(log n) where n is the total number of elements

int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups

Deletion from BSTs

deletion from a BST depends on some cases:

deletion of a leaf is easy: just set the reference in the parent to null

deletion of a node with a single child is almost as easy: just re-set the reference int he parent to circumvent the deleted node

if a node has two children, look for the “adjacent node” (next smaller or next larger) in the tree;

this will be in the “inside corner” of one of the sub-trees (extreme left of right sub-tree or vice versa);

replace the element in the deleted node with that from the insider corner node, then delete that node (which is easier)

Balanced trees (AVL)

since “bushy” or balanced trees are important to make BSTs efficient, it is worth considering how to keep trees balanced

many possible definitions of balance, but it is not enough that the tree “balances” around the root like a coat hanger

for AVL trees, the difference in height between sub-trees must be at most one (throughout the tree)

for the exam, you should be able to recognize this property of a tree

when inserting a node, an AVL tree may get out of balance: we correct this by performing rotations

there are several cases to consider, with symmetries, some requiring single and some requiring double rotations

see the textbook pages 123-136 for details, examples and pictures

Splay trees

See text section 4.5 on splay trees

a splay tree is a binary search tree without a special notion of balance

instead, we always bring the accessed item to the root (top) when we search for something

the splaying operation, which brings the item to the root, is similar to a series of tree rotations along the path

the splaying operation takes O(log n) time, amortized over many operations

splay trees provide an amortized good behavior: over many operations, the average is guaranteed

you are not responsible for the details of the “zig-zig” or “zig-zag” cases of the splay operation

Other kinds of trees

there are many other kinds of trees that maintain balance using rotations: red-black trees, B-trees, 2-3 trees, etc.

you don’t need to know the details of any other kinds of trees, but should be aware of the general notion of tree, and that there are many variations

Graphs

we did not study graphs in detail, but remember that they are a more generalized version of trees

a graph consists of a collection of nodes and edges between them

a graph need not be completely connected, have a root, or be “downward”-directed

a tree is a: rooted, connected, directed graph, often with ordered children, but with no sharing or cycle

Priority queues and binary heaps

See Chapter 6 of the text (esp. 6.1through 6.4)

Abstraction levels

a priority queue is an ADT (abstract data type) supporting insertion of prioritized items, and removal of the highest priority item

(priority queues based on integer orderings can either be based on minimum or maximum items as “high priority”)

a binary heap is an implementation idea for priority queues using left-complete binary trees

a binary heap may be implemented in an array, giving especially fast access to the children and parents of a “node”

Linear implementations

there are obvious, but ultimately naïve, implementations of a priority queue using a linked list (say)

like a normal queue, we add to one end (the tail of the list) and remove from the other

we can either choose to honor the priority ordering on:

the way in (search for proper position on insertion)

or the way our (search for highest priority item on removal)

either way, the naïve implementations lead to an O(1) versus O(n) split for insertion and removal

(in other words, one of the two operations will be especially expensive in time)

the binary heap, by contrast, provides both operations in O(log n) time: faster and more consistent

Binary heaps (as trees)

the binary heap implementation has both a structural aspect and an ordering aspect

we can store the items in a left-complete binary tree (filled up all the way to the bottom row and then from the left)

the heap order: the values in the descendants of every node are lower priority than the value in the node

NOTE: there is no left/right aspect; as long as all descendants are lower priority, the heap order is satisfied

this is a “looser” order than in a binary search tree, whence the name “heap”

for both operations: restore structure first, then heap order

to insert: add at next position in bottom row (last in level-order traversal), then “bubble up” to right place

to remove(Min): take from root, then move last place item to root & bubble down

both operations take O(log n) time (with good constants) since the tree is guaranteed to be a minimum height

in both cases, we can save even more time by swapping “holes” rather than actual value

(this saves time because we make only one re-assignment per move until the end)

to build the original heap (buildHeap): start bubble up each node starting from first with child (in reverse level-order traversal)

Binary heap in an array

we can avoid allocating actual tree nodes by making a left-complete binary tree in an array

store the items in positions indexed starting from 1 in level-order

every node’s parent is in index n/2 (rounding down)

every node’s children are in indices 2n and 2n+1

these operations (involving multiplication and division by 2) can be implemented very quickly as bit shifts

Applications of priority queues

printer queues: in the old days, printers were shared by so many that priority was used to determine printing order

priority queues are used as “internal” structures in the implementation of many advanced algorithms (esp. for graphs)

sorting! (priority queues are used in the so-called “heapsort” algorithm)

Other variations on priority queues and heaps

you are not responsible for the exam for variations, but you should know there are many (d-heaps, leftist heaps, binomial queues, fibonacci heaps, etc.)

most variations support extra operations well, such as merging two priority queues

Hashing and hash tables

See Chapter 5 of the text, especially sections 5.1-5.4

Introduction

Dictionary interface: look values up by key

often the key is a field in a larger value object

can be implemented in O(log n) time using a BST (see above)

… but it would be better to have O(1)!

solution: keep items in an array

but under what index?

basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)

we want a “messy”, random-seeming (but repeatable!) process which distributes keys well

unfortunately, it may happen that two keys collide under the hash function

use the modulus operator (%) to limit the hash results to the table size (-1)

what do we want from hash function?

speed!

coverage: all slots (not limited in magnitude or ability to hit), even distribution

lack of collisions (as much as possible)

collision resolution strategies

separate chaining: use a list of items hanging off each array entry

internal addressing/probing: try to insert at nearby slots if the preferred one is filled

… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)

linear probing: try successive entries (but primary clustering)

quadratic probing: square the “attempt number” to get an offset (but secondary clustering)

best bets for hashing

well-chosen table size (about twice as much as there is data, and use a prime number)

use a well-distributed hash function

Sorting

See Chapter 7 of the text, especially sections 7.1-7.8

Introduction to sorting

one of the most common operations

ordered data supports binary search (fast!)

comparison interfaces in Java

Comparable—internal/“natural”: this.compareTo(that)

Comparator—external: compare(this, that)

both return an integer: negative (this is less), zero, positive

Naïve sorting

selection sort: next ordered item selected and added physically next in the result

insertion sort: next physical item inserted into proper place in the ordered result

bubble sort: repeatedly swap items, “bubbling up” to the boundary

average- vs. best- vs. worst-case

time bounds: O(n2)

Better sorting

tree sort: put items into a binary search tree, then traverse

heap sort: put items into a heap, then pull them all out (with repeated deleteMin)

quick sort: split items using pivot, then recursively sort the two parts

merge sort: split items into tow halves, recursively sort, then merge back together

time bounds: O(n log n) in average case (but can be bad for sorted data)

hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)

A rational reconstruction of (many) sorts

hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)

one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)

see the Visual Guide to Sorting Algorithms