

CS 241: Data Structures—Midterm Exam Review Notes




Important notes!




click on the triangles to the left to expand or contract sections!




use the text size feature in your browser to make this text larger and more readable!




About the exam




the final exam will take place on Saturday, May 3, 2014, from 25 pm, in the usual lecture room (Ford 204)




the exam will be designed so that you should be able to complete it in oneandahalf to two hours




you will not have access to books, notes, computers, networks or neighbors during the exam (you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)




In preparing for the exam, you should consult:




your textbook (see especially chapters and sections mentioned below)




your notes from lecture and lab—these are likely to be especially helpful




the home page for the course and various links, examples, etc., provided there




… and also especially the lab assignments linked from the home page, as well as your own programs




graded quizzes you completed during lecture




Format: the exam will consist of several kinds of questions; typically:




1015 TrueorFalse questions at 12 points each




810 short answer questions at 46 points each




24 longer answer questions at 815 points each




These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!




The final exam is comprehensive!




But there will be some emphasis on material from after the midterm




… but here are the main subject areas again (with book chapters) for the midterm material:




General background to Data Structures and modularity




Chapter 1, section 1.1; also Chapter 3, section 3.1




Math review and recursion




Chapter 1, sections 1.21.3




Java review and new features




Chapter 1, sections 1.41.5




Review—see external notes!




Algorithm analysis and asymptotics (O notation)




Chapter X, sections yz




Arrays, ArrayLists and linked lists




Chapter 3, sections 3.23.5




Stacks and Queues




Chapter 3, sections 3.63.7




Infix and Postfix expressions




Chapter 3, sections 3.6.3




Trees




See Chapter 4 of the text, especially sections 4.14.4 and 4.6




More about trees




Splay trees




See text section 4.5 on splay trees




a splay tree is a binary search tree without a special notion of balance




instead, we always bring the accessed item to the root (top) when we search for something




the splaying operation, which brings the item to the root, is similar to a series of tree rotations along the path




the splaying operation takes O(log n) time, amortized over many operations




splay trees provide an amortized good behavior: over many operations, the average is guaranteed




you are not responsible for the details of the “zigzig” or “zigzag” cases of the splay operation




Other kinds of trees




there are many other kinds of trees that maintain balance using rotations: redblack trees, Btrees, 23 trees, etc.




you don’t need to know the details of any other kinds of trees, but should be aware of the general notion of tree, and that there are many variations




Graphs




we did not study graphs in detail, but remember that they are a more generalized version of trees




a graph consists of a collection of nodes and edges between them




a graph need not be completely connected, have a root, or be “downward”directed




a tree is a: rooted, connected, directed graph, often with ordered children, but with no sharing or cycle




Priority queues and binary heaps




See Chapter 6 of the text (esp. 6.1through 6.4)




Abstraction levels




a priority queue is an ADT (abstract data type) supporting insertion of prioritized items, and removal of the highest priority item




(priority queues based on integer orderings can either be based on minimum or maximum items as “high priority”)




a binary heap is an implementation idea for priority queues using leftcomplete binary trees




a binary heap may be implemented in an array, giving especially fast access to the children and parents of a “node”




Linear implementations




there are obvious, but ultimately naïve, implementations of a priority queue using a linked list (say)




like a normal queue, we add to one end (the tail of the list) and remove from the other




we can either choose to honor the priority ordering on:




the way in (search for proper position on insertion)




or the way our (search for highest priority item on removal)




either way, the naïve implementations lead to an O(1) versus O(n) split for insertion and removal




(in other words, one of the two operations will be especially expensive in time)




the binary heap, by contrast, provides both operations in O(log n) time: faster and more consistent




Binary heaps (as trees)




the binary heap implementation has both a structural aspect and an ordering aspect




we can store the items in a leftcomplete binary tree (filled up all the way to the bottom row and then from the left)




the heap order: the values in the descendants of every node are lower priority than the value in the node




NOTE: there is no left/right aspect; as long as all descendants are lower priority, the heap order is satisfied




this is a “looser” order than in a binary search tree, whence the name “heap”




for both operations: restore structure first, then heap order




to insert: add at next position in bottom row (last in levelorder traversal), then “bubble up” to right place




to remove(Min): take from root, then move last place item to root & bubble down




both operations take O(log n) time (with good constants) since the tree is guaranteed to be a minimum height




in both cases, we can save even more time by swapping “holes” rather than actual value




(this saves time because we make only one reassignment per move until the end)




to build the original heap (buildHeap): start bubble up each node starting from first with child (in reverse levelorder traversal)




Binary heap in an array




we can avoid allocating actual tree nodes by making a leftcomplete binary tree in an array




store the items in positions indexed starting from 1 in levelorder




every node’s parent is in index n/2 (rounding down)




every node’s children are in indices 2n and 2n+1




these operations (involving multiplication and division by 2) can be implemented very quickly as bit shifts




Applications of priority queues




printer queues: in the old days, printers were shared by so many that priority was used to determine printing order




priority queues are used as “internal” structures in the implementation of many advanced algorithms (esp. for graphs)




sorting! (priority queues are used in the socalled “heapsort” algorithm)




Other variations on priority queues and heaps




you are not responsible for the exam for variations, but you should know there are many (dheaps, leftist heaps, binomial queues, fibonacci heaps, etc.)




most variations support extra operations well, such as merging two priority queues




Hashing and hash tables




See Chapter 5 of the text, especially sections 5.15.4




Introduction




Dictionary interface: look values up by key




often the key is a field in a larger value object




can be implemented in O(log n) time using a BST (see above)




… but it would be better to have O(1)!




solution: keep items in an array




but under what index?




basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)




we want a “messy”, randomseeming (but repeatable!) process which distributes keys well




unfortunately, it may happen that two keys collide under the hash function




use the modulus operator (%) to limit the hash results to the table size (1)




what do we want from hash function?




speed!




coverage: all slots (not limited in magnitude or ability to hit), even distribution




lack of collisions (as much as possible)




collision resolution strategies




separate chaining: use a list of items hanging off each array entry




internal addressing/probing: try to insert at nearby slots if the preferred one is filled




… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)




linear probing: try successive entries (but primary clustering)




quadratic probing: square the “attempt number” to get an offset (but secondary clustering)




best bets for hashing




wellchosen table size (about twice as much as there is data, and use a prime number)




use a welldistributed hash function




Sorting




See Chapter 7 of the text, especially sections 7.17.8




Introduction to sorting




one of the most common operations




ordered data supports binary search (fast!)




comparison interfaces in Java




Comparable—internal/“natural”: this.compareTo(that)




Comparator—external: compare(this, that)




both return an integer: negative (this is less), zero, positive




Naïve sorting




selection sort: next ordered item selected and added physically next in the result




insertion sort: next physical item inserted into proper place in the ordered result




bubble sort: repeatedly swap items, “bubbling up” to the boundary




average vs. best vs. worstcase




time bounds: O(n2)




Better sorting




tree sort: put items into a binary search tree, then traverse




heap sort: put items into a heap, then pull them all out (with repeated deleteMin)




quick sort: split items using pivot, then recursively sort the two parts




merge sort: split items into tow halves, recursively sort, then merge back together




time bounds: O(n log n) in average case (but can be bad for sorted data)




hard bounds on comparisonbased sorting: no comparisonbased sort can do better than O(n log n)




A rational reconstruction of (many) sorts




hardeasy division: either hardsplit/easyjoin or easysplit/hardjoin (you have to do the work somewhere)




onemany versus halfandhalf split: the latter is better, since we get the power of divideand conquer (logarithmic times)




Aside on priority queues and heaps




a priority queue is like a queue, but we can remove items based on highest priority (deleteMin)




it’s easy to make either add or deleteMin O(1) and the other O(n), using a linear list




but we can get O(1) add and O(log n) deleteMin by using a treelike structure




… stored in an array!


