CS 241 final review notes Fall 2020

CS 241: Data Structures—Final Exam Review—Fall 2020

Important notes!

click on the triangles to the left to expand or contract sections!

use the text size feature in your browser to make this text larger and more readable!

About the exam

the final exam will take place on Thursday, December 3rd, from 2-5pm

the exam will be held by a combination of email (to send the exam) and Zoom link (for proctoring/questions)

the exam will be designed so that you should be able to complete it in less than the three hours allotted

you will not have access to books, notes, computers (except to Zoom), networks, or friends during the exam

(you are encouraged to ask your instructor for clarifications during the exam if necessary)

In preparing for the exam, you should consult:

your textbook (see especially chapters and sections mentioned below)

your notes from lecture and lab—these are likely to be especially helpful

the home page for the course and various links, examples, etc., provided there

… and also especially the lab assignments linked from the home page, as well as your own programs

Format: the exam will consist of several kinds of questions; typically:

10-15 True-or-False questions at 1-2 points each

8-10 short answer questions at 4-6 points each

2-4 longer answer questions at 8-15 points each

These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!

The final exam is comprehensive!

But there will naturally be some emphasis on material from after the midterm

For more information on the midterm, see the Midterm Review Notes

… but here are the main subject areas again (with book chapters) for the midterm material:

Modularity, abstraction, and reuse [see textbook Section 1.3.2 and also those for objects below]

Stacks and (linear) abstract data types [see textbook Sections 3.1-3.3]

Numbers, numeral bases, and conversion [see textbook Section 3.3.6]

Course introduction, basic concepts [see textbook Sections 1.1-1.3]

Review of Python [see textbook Section 1.4]

Algorithm analysis and Big-O notation [see textbook Section 2.2]

Efficiency of built-in Python data structures [see textbook Section 2.3]

Expressions, bracketing, and conversion [see textbook Sections 3.3.4-3.3.7]

Classes and methods in Python [see textbook Section 1.4.6]

Linked lists, queues, and deques [see textbook Sections 3.4-3.6]

Linked lists

linked lists are based on nodes with references

a node is an object which holds some data and also some references to other objects

remember that when we change an object, all the references to it “see” the changes (aliasing)

array-based lists allow for fast indexing (O(1) time

rather than Python’s built-in lists (based on resizable arrays), we can “roll our own” lists

… but arrays must be reallocated and copied whenever they run out of room

linked lists can be extended or trimmed in O(1) time

… but traversing to an index, or searching for an item, takes O(n) time

the run-time behavior of the two kinds of lists can be different for some operations

we also usually have a container class of objects representing the list (or other structure) itself

for lists, we might have an object with a head reference or perhaps a head and tail references

we can also keep useful information such as the size of the list, a boolean for emptiness, etc.

(these should be updated as necessary whenever the list is)

for linked lists, each node also holds a reference to the next node in the list

note that “local” changes to a few pointers are still O(1) (since they don’t effect/inspect the whole list, or (pro)portions thereof)

for circularly-linked lists, the final item points to the first (rather than to None)

for doubly-linked lists, there is also a previous reference

keeping all these links straight (coherent) can be difficult and error-prone

Queues

a queue supports insertion (add or enqueue) and deletion (remove or dequeue), but with a specific “discipline”

add and remove behave in a FIFO way: first in, first out

(think of a line of people waiting at the bank or grocery store)

applications of queues:

round robin allocation of resources (fair scheduling)

holding a buffer of items that cannot be treated (and released) as quickly as they arrive

traversing the nodes of a tree in breadth-first order (i.e., by levels)

implementing a queue with a (linked) list

keep pointer to first and last nodes

add at the last end; remove from the first (the other way won't work as well)

the empty queue is a special case

Deques

deques (pronounced like “decks”) are doubly-ended queues

we can add or remove (enqueue or dequeue) from either end of the structure

no specific implementations will be referenced on the exam

Recursion [see textbook Chapter 4]

an algorithm is recursive if it refers to itself (calls itself) in its own definition

recursion can be concise and clear, but watch out for:

multiple recursive calls (as in the Fibonacci example)

a missing base case or lack of progress, each of which can make the recursion go forever

recursion is often an alternative to iteration (loops)

for non-linear structures (e.g., trees), we may need an auxiliary stack to hold pending nodes while we iterate

recursion is implemented using a stack, one item (called a stack frame) for each method call

(actually, all method calls are implemented this way, partly out of anticipation of recursion)

it can help to visualize recursion by way of a tree of recursive calls

examples of recursion

Towers of Hanoi

maze search algorithm

fractals in graphics (e.g., fractal trees)

Dynamic programming [see textbook Section 4.7]

recursion often makes excessive calls by calling repeatedly with the same argument

rather than re-calling the function/method (which can be expensive), we can store the results

a basic approoach, called memoization, just saves the results in a structure (like a Python list or dict)

dynamic programming is a more advanced approach in which we calculate all results from the bottom up

Searching and hashing [see textbook Sections 5.1-5.2]

Searching in lists

unordered lists: searching can take O(n) in the expected/average and worst cases

ordered lists: searching can be reduced down to O(log(n)) using a divide and conquer strategy

linked lists (as opposed to Python’s built-in, array-based lists) do not support divide-and-conquer (no fast indexing)

(note that making them doubly-linked doesn’t help)

Hashing and hash tables

Introduction

basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)

we want a “messy”, random-seeming (but repeatable!) process which distributes keys well

unfortunately, it may happen that two keys collide under the hash function

solution: keep items in an array … but under what index?

use the modulus operator (%) to limit the hash results to the table size (-1)

Dictionary interface: look values up by key

often the key is a field in a larger value object

what do we want from hash function?

speed!

coverage: all slots (not limited in magnitude or ability to hit), even distribution

lack of collisions (as much as possible)

collision resolution strategies

internal addressing/probing: try to insert at nearby slots if the preferred one is filled

… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)

linear probing: try successive entries (but primary clustering)

quadratic probing: square the “attempt number” to get an offset (but secondary clustering)

separate chaining: use a list of items hanging off each array entry

best bets for hashing

well-chosen table size (about twice as much as there is data, and use a prime number)

use a well-distributed hash function

Sorting [see textbook Section 5.3]

Introduction to sorting

one of the most common operations

ordered data often supports binary search (fast!)

requires some notion of ordering: compare two items for being {less, equal, greater}

Naïve sorting

time bounds: O(n2)

insertion sort: next physical item inserted into proper place in the ordered result

selection sort: next ordered item selected and added physically next in the result

bubble sort: repeatedly swap items, “bubbling up” to the boundary

average- vs. best- vs. worst-case

Better sorting

quick sort: split items using pivot, then recursively sort the two parts

time bounds: O(n log n) in average case (but can be bad for sorted data)

hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)

merge sort: split items into tow halves, recursively sort, then merge back together

A rational reconstruction of (many) sorts

hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)

one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)

see the Visual Guide to Sorting Algorithms

Trees [see textbook Sections 6.1-6.4, and 6.5.2]

Tree definitions and terminology

ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say

nodes: the “places” where data items are kept in the tree

path: a sequence of links between nodes (and the nodes along the way, depending on context)

sub-tree: a node in a tree considered as a root of its own tree

internal nodes: nodes which have a child

fringe: the set of all leaves of a tree, considered as a sequence

A tree is a collection structure which can hold elements in positions called nodes;

a tree is a linked structure, but unlike a linked list, each node may have several linked nodes

links: the connections between nodes (represented as references/“pointers” in Java)

root: the top node of a tree

parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)

height: the number of links between the root of a tree and its furthest descendant

leaf or external node: nodes which have no children (or all null children, in a Java implementation)

sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles

children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)

in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (sub-graphs)

General trees

in a general tree, any node may have an arbitrary number of children

we could represent children in a general tree using any (ordered) collection, but a linked list might be best

we can also use a first-child/next-sibling representation, which shows that a binary-tree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)

Binary trees

when every node in a tree has at most two children and they are distinguished as left- and right-children

in other words, when every node has a left and right child, either of which may be null

Tree traversals

if we substitute a queue for the stack, we get breadth-first traversals (also known as level-order)

a tree traversal is a process visiting all the nodes (or elements thereof) in a tree

typical traversals are implemented recursively: we “visit” the node and recursively traverse the sub-trees, in some order

by varying the relative order of node visitation and sub-tree traversal, we get different standard orders:

left-right inorder: traverse left, visit node, traverse right

… and so on, for three others but right-to-left (6 possible permutations all together)

left-right pre-order: visit node, traverse left, traverse right

left-right postorder: traverse left, traverse right, visit node

if we don’t want to use recursion, we can use an explicit stack instead

Binary search trees

binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key

insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:

compare value being inserted (or sought) to the element in the root, go left or right as appropriate

height of a bushy tree is O(log n) where n is the total number of elements

a binary search tree is a binary tree whose elements support some ordering, and where:

equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)

all the nodes to the left of the root hold values that are less than the the one in the root

all the nodes to the right of the root hold values that are greater than the the one in the root

similarly for all sub-trees (i.e., same rule throughout the tree)

int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups