Topic
 CS 241: Data Structures—Midterm Exam Review Notes
 Important notes!
 click on the triangles to the left to expand or contract sections!
 use the text size feature in your browser to make this text larger and more readable!
 the final exam will take place on Saturday, May 3, 2014, from 2-5 pm, in the usual lecture room (Ford 204)
 the exam will be designed so that you should be able to complete it in one-and-a-half to two hours
 you will not have access to books, notes, computers, networks or neighbors during the exam(you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)
 In preparing for the exam, you should consult:
 your textbook (see especially chapters and sections mentioned below)
 your notes from lecture and lab—these are likely to be especially helpful
 graded quizzes you completed during lecture
 Format: the exam will consist of several kinds of questions; typically:
 10-15 True-or-False questions at 1-2 points each
 8-10 short answer questions at 4-6 points each
 2-4 longer answer questions at 8-15 points each
 These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!
 The final exam is comprehensive!
 But there will be some emphasis on material from after the midterm
 … but here are the main subject areas again (with book chapters) for the midterm material:
 General background to Data Structures and modularity
 Chapter 1, section 1.1; also Chapter 3, section 3.1
 Math review and recursion
 Chapter 1, sections 1.2-1.3
 Java review and new features
 Chapter 1, sections 1.4-1.5
 Review—see external notes!
 Algorithm analysis and asymptotics (O notation)
 Chapter X, sections y-z
 Chapter 3, sections 3.2-3.5
 Stacks and Queues
 Chapter 3, sections 3.6-3.7
 Infix and Postfix expressions
 Chapter 3, sections 3.6.3
 Trees
 See Chapter 4 of the text, especially sections 4.1-4.4 and 4.6
 Splay trees
 See text section 4.5 on splay trees
 a splay tree is a binary search tree without a special notion of balance
 instead, we always bring the accessed item to the root (top) when we search for something
 the splaying operation, which brings the item to the root, is similar to a series of tree rotations along the path
 the splaying operation takes O(log n) time, amortized over many operations
 splay trees provide an amortized good behavior: over many operations, the average is guaranteed
 you are not responsible for the details of the “zig-zig” or “zig-zag” cases of the splay operation
 Other kinds of trees
 there are many other kinds of trees that maintain balance using rotations: red-black trees, B-trees, 2-3 trees, etc.
 you don’t need to know the details of any other kinds of trees, but should be aware of the general notion of tree, and that there are many variations
 Graphs
 we did not study graphs in detail, but remember that they are a more generalized version of trees
 a graph consists of a collection of nodes and edges between them
 a graph need not be completely connected, have a root, or be “downward”-directed
 a tree is a: rooted, connected, directed graph, often with ordered children, but with no sharing or cycle
 Priority queues and binary heaps
 See Chapter 6 of the text (esp. 6.1through 6.4)
 Abstraction levels
 a priority queue is an ADT (abstract data type) supporting insertion of prioritized items, and removal of the highest priority item
 (priority queues based on integer orderings can either be based on minimum or maximum items as “high priority”)
 a binary heap is an implementation idea for priority queues using left-complete binary trees
 a binary heap may be implemented in an array, giving especially fast access to the children and parents of a “node”
 Linear implementations
 there are obvious, but ultimately naïve, implementations of a priority queue using a linked list (say)
 like a normal queue, we add to one end (the tail of the list) and remove from the other
 we can either choose to honor the priority ordering on:
 the way in (search for proper position on insertion)
 or the way our (search for highest priority item on removal)
 either way, the naïve implementations lead to an O(1) versus O(n) split for insertion and removal
 (in other words, one of the two operations will be especially expensive in time)
 the binary heap, by contrast, provides both operations in O(log n) time: faster and more consistent
 Binary heaps (as trees)
 the binary heap implementation has both a structural aspect and an ordering aspect
 we can store the items in a left-complete binary tree (filled up all the way to the bottom row and then from the left)
 the heap order: the values in the descendants of every node are lower priority than the value in the node
 NOTE: there is no left/right aspect; as long as all descendants are lower priority, the heap order is satisfied
 this is a “looser” order than in a binary search tree, whence the name “heap”
 for both operations: restore structure first, then heap order
 to insert: add at next position in bottom row (last in level-order traversal), then “bubble up” to right place
 to remove(Min): take from root, then move last place item to root & bubble down
 both operations take O(log n) time (with good constants) since the tree is guaranteed to be a minimum height
 in both cases, we can save even more time by swapping “holes” rather than actual value
 (this saves time because we make only one re-assignment per move until the end)
 to build the original heap (buildHeap): start bubble up each node starting from first with child (in reverse level-order traversal)
 Binary heap in an array
 we can avoid allocating actual tree nodes by making a left-complete binary tree in an array
 store the items in positions indexed starting from 1 in level-order
 every node’s parent is in index n/2 (rounding down)
 every node’s children are in indices 2n and 2n+1
 these operations (involving multiplication and division by 2) can be implemented very quickly as bit shifts
 Applications of priority queues
 printer queues: in the old days, printers were shared by so many that priority was used to determine printing order
 priority queues are used as “internal” structures in the implementation of many advanced algorithms (esp. for graphs)
 sorting! (priority queues are used in the so-called “heapsort” algorithm)
 Other variations on priority queues and heaps
 you are not responsible for the exam for variations, but you should know there are many (d-heaps, leftist heaps, binomial queues, fibonacci heaps, etc.)
 most variations support extra operations well, such as merging two priority queues
 Hashing and hash tables
 See Chapter 5 of the text, especially sections 5.1-5.4
 Introduction
 Dictionary interface: look values up by key
 often the key is a field in a larger value object
 can be implemented in O(log n) time using a BST (see above)
 … but it would be better to have O(1)!
 solution: keep items in an array
 but under what index?
 basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)
 we want a “messy”, random-seeming (but repeatable!) process which distributes keys well
 unfortunately, it may happen that two keys collide under the hash function
 use the modulus operator (%) to limit the hash results to the table size (-1)
 what do we want from hash function?
 speed!
 coverage: all slots (not limited in magnitude or ability to hit), even distribution
 lack of collisions (as much as possible)
 collision resolution strategies
 separate chaining: use a list of items hanging off each array entry
 internal addressing/probing: try to insert at nearby slots if the preferred one is filled
 … but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)
 linear probing: try successive entries (but primary clustering)
 quadratic probing: square the “attempt number” to get an offset (but secondary clustering)
 best bets for hashing
 well-chosen table size (about twice as much as there is data, and use a prime number)
 use a well-distributed hash function
 Sorting
 See Chapter 7 of the text, especially sections 7.1-7.8
 Introduction to sorting
 one of the most common operations
 ordered data supports binary search (fast!)
 comparison interfaces in Java
 Comparable—internal/“natural”: this.compareTo(that)
 Comparator—external: compare(this, that)
 both return an integer: negative (this is less), zero, positive
 Naïve sorting
 selection sort: next ordered item selected and added physically next in the result
 insertion sort: next physical item inserted into proper place in the ordered result
 bubble sort: repeatedly swap items, “bubbling up” to the boundary
 average- vs. best- vs. worst-case
 time bounds: O(n2)
 Better sorting
 tree sort: put items into a binary search tree, then traverse
 heap sort: put items into a heap, then pull them all out (with repeated deleteMin)
 quick sort: split items using pivot, then recursively sort the two parts
 merge sort: split items into tow halves, recursively sort, then merge back together
 time bounds: O(n log n) in average case (but can be bad for sorted data)
 hard bounds on comparison-based sorting: no comparison-based sort can do better than O(n log n)
 A rational reconstruction of (many) sorts
 hard-easy division: either hard-split/easy-join or easy-split/hard-join (you have to do the work somewhere)
 one-many versus half-and-half split: the latter is better, since we get the power of divide-and conquer (logarithmic times)
 Aside on priority queues and heaps
 a priority queue is like a queue, but we can remove items based on highest priority (deleteMin)
 it’s easy to make either add or deleteMin O(1) and the other O(n), using a linear list
 but we can get O(1) add and O(log n) deleteMin by using a tree-like structure
 … stored in an array!