Topic
V CS 241: Data Structures—Midterm Exam Review Notes
V Important notes!
* click on the triangles to the left to expand or contract sections!
* use the text size feature in your browser to make this text larger and more readable!
V About the exam
* the final exam will take place on Saturday, May 3, 2014, from 2-5 pm, in the usual lecture room (Ford 204)
* the exam will be designed so that you should be able to complete it in one-and-a-half to two hours
* you will not have access to books, notes, computers, networks or neighbors during the exam
(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)
V In preparing for the exam, you should consult:
* your textbook (see especially chapters and sections mentioned below)
* your notes from lecture and lab—these are likely to be especially helpful
* the home page for the course and various links, examples, etc., provided there
* … and also especially the lab assignments linked from the home page, as well as your own programs
* graded quizzes you completed during lecture
V Format: the exam will consist of several kinds of questions; typically:
* 10-15 True-or-False questions at 1-2 points each
* 8-10 short answer questions at 4-6 points each
* 2-4 longer answer questions at 8-15 points each
* These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
V The final exam is comprehensive!
* But there will be some emphasis on material from after the midterm
* For more information on the midterm, see the Midterm Review Notes
V … but here are the main subject areas again (with book chapters) for the midterm material:
V General background to Data Structures and modularity
* Chapter 1, section 1.1; also Chapter 3, section 3.1
V Math review and recursion
* Chapter 1, sections 1.2-1.3
V Java review and new features
* Chapter 1, sections 1.4-1.5
V Review—see external notes!
V Algorithm analysis and asymptotics (O notation)
* Chapter X, sections y-z
V Arrays, ArrayLists and linked lists
* Chapter 3, sections 3.2-3.5
V Stacks and Queues
* Chapter 3, sections 3.6-3.7
V Infix and Postfix expressions
* Chapter 3, sections 3.6.3
V Trees
* See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6
V More about trees
V Splay trees
* See text section 4.5 on splay trees
* a splay tree is a binary search tree without a special notion of balance
* instead, we always bring the accessed item to the root (top) when we search for something
* the splaying operation, which brings the item to the root, is similar to a series of tree rotations along the path
* the splaying operation takes O(log n) time, amortized over many operations
* splay trees provide an amortized good behavior: over many operations, the average is guaranteed
* you are not responsible for the details of the “zig-zig” or “zig-zag” cases of the splay operation
V Other kinds of trees
* there are many other kinds of trees that maintain balance using rotations: red-black trees, B-trees, 2-3 trees, etc.
* you don’t need to know the details of any other kinds of trees, but should be aware of the general notion of tree, and that there are many variations
V Graphs
* we did not study graphs in detail, but remember that they are a more generalized version of trees
* a graph consists of a collection of nodes and edges between them
* a graph need not be completely connected, have a root, or be “downward”-directed
* a tree is a: rooted, connected, directed graph, often with ordered children, but with no sharing or cycle
V Priority queues and binary heaps
* See Chapter 6 of the text (esp. 6.1through 6.4)
V Abstraction levels
* a priority queue is an ADT (abstract data type) supporting insertion of prioritized items, and removal of the highest priority item
* (priority queues based on integer orderings can either be based on minimum or maximum items as “high priority”)
* a binary heap is an implementation idea for priority queues using left-complete binary trees
* a binary heap may be implemented in an array, giving especially fast access to the children and parents of a “node”
V Linear implementations
* there are obvious, but ultimately naïve, implementations of a priority queue using a linked list (say)
* like a normal queue, we add to one end (the tail of the list) and remove from the other
V we can either choose to honor the priority ordering on:
* the way in (search for proper position on insertion)
* or the way our (search for highest priority item on removal)
V either way, the naïve implementations lead to an O(1) versus O(n) split for insertion and removal
* (in other words, one of the two operations will be especially expensive in time)
* the binary heap, by contrast, provides both operations in O(log n) time: faster and more consistent
V Binary heaps (as trees)
* the binary heap implementation has both a structural aspect and an ordering aspect
* we can store the items in a left-complete binary tree (filled up all the way to the bottom row and then from the left)
V the heap order: the values in the descendants of every node are lower priority than the value in the node
* NOTE: there is no left/right aspect; as long as all descendants are lower priority, the heap order is satisfied
* this is a “looser” order than in a binary search tree, whence the name “heap”
* for both operations: restore structure first, then heap order
* to insert: add at next position in bottom row (last in level-order traversal), then “bubble up” to right place
* to remove(Min): take from root, then move last place item to root & bubble down
* both operations take O(log n) time (with good constants) since the tree is guaranteed to be a minimum height
V in both cases, we can save even more time by swapping “holes” rather than actual value
* (this saves time because we make only one re-assignment per move until the end)
* to build the original heap (buildHeap): start bubble up each node starting from first with child (in reverse level-order traversal)
V Binary heap in an array
* we can avoid allocating actual tree nodes by making a left-complete binary tree in an array
* store the items in positions indexed starting from 1 in level-order
* every node’s parent is in index n/2 (rounding down)
* every node’s children are in indices 2n and 2n+1
* these operations (involving multiplication and division by 2) can be implemented very quickly as bit shifts
V Applications of priority queues
* printer queues: in the old days, printers were shared by so many that priority was used to determine printing order
* priority queues are used as “internal” structures in the implementation of many advanced algorithms (esp. for graphs)
* sorting! (priority queues are used in the so-called “heapsort” algorithm)
V Other variations on priority queues and heaps
* you are not responsible for the exam for variations, but you should know there are many (d-heaps, leftist heaps, binomial queues, fibonacci heaps, etc.)
* most variations support extra operations well, such as merging two priority queues
V Hashing and hash tables
* See Chapter 5 of the text, especially sections 5.1-5.4
V Introduction
V Dictionary interface: look values up by key
* often the key is a field in a larger value object
* can be implemented in O(log n) time using a BST (see above)
* … but it would be better to have O(1)!
* solution: keep items in an array
* but under what index?
* basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)
* we want a “messy”, random-seeming (but repeatable!) process which distributes keys well
* unfortunately, it may happen that two keys collide under the hash function
* use the modulus operator (%) to limit the hash results to the table size (-1)
V what do we want from hash function?
* speed!
* coverage: all slots (not limited in magnitude or ability to hit), even distribution
* lack of collisions (as much as possible)
V collision resolution strategies
* separate chaining: use a list of items hanging off each array entry
* internal addressing/probing: try to insert at nearby slots if the preferred one is filled
* … but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)
* linear probing: try successive entries (but primary clustering)
* quadratic probing: square the “attempt number” to get an offset (but secondary clustering)
V best bets for hashing
* well-chosen table size (about twice as much as there is data, and use a prime number)
* use a well-distributed hash function
V Sorting
* See Chapter 7 of the text, especially sections 7.1-7.8
V Introduction to sorting
* one of the most common operations
* ordered data supports binary search (fast!)
V comparison interfaces in Java
* Comparable—internal/“natural”: this.compareTo(that)
* Comparator—external: compare(this, that)
* both return an integer: negative (this is less), zero, positive
V Naïve sorting
* selection sort: next ordered item selected and added physically next in the result
* insertion sort: next physical item inserted into proper place in the ordered result
* bubble sort: repeatedly swap items, “bubbling up” to the boundary
* average- vs. best- vs. worst-case
* time bounds: O(n2)
V Better sorting
* tree sort: put items into a binary search tree, then traverse
* heap sort: put items into a heap, then pull them all out (with repeated deleteMin)
* quick sort: split items using pivot, then recursively sort the two parts
* merge sort: split items into tow halves, recursively sort, then merge back together
* time bounds: O(n log n) in average case (but can be bad for sorted data)
* hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)
V A rational reconstruction of (many) sorts
* hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)
* one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)
V Aside on priority queues and heaps
* a priority queue is like a queue, but we can remove items based on highest priority (deleteMin)
* it’s easy to make either add or deleteMin O(1) and the other O(n), using a linear list
* but we can get O(1) add and O(log n) deleteMin by using a tree-like structure
* … stored in an array!