

CS 241: Data Structures—Midterm Exam Review Notes




Important notes!




click on the triangles to the left to expand or contract sections!




use the text size feature in your browser to make this text larger and more readable!




About the exam




the final exam will take place on Friday, May 4th, from 25pm, in the usual lecture room (Ford 204)




the exam will be designed so that you should be able to complete it in oneandahalf to two hours




you will not have access to books, notes, computers, networks or neighbors during the exam (you may use scratch paper and it will be supplied if needed; you are encouraged to ask your instructor for clarifications during the exam if necessary)




In preparing for the exam, you should consult:




your textbook (see especially chapters and sections mentioned below)




your notes from lecture and lab—these are likely to be especially helpful




the home page for the course and various links, examples, etc., provided there




… and also especially the lab assignments linked from the home page, as well as your own programs




graded quizzes you completed during lecture (3 of them, hopefully)




Format: the exam will consist of several kinds of questions; typically:




1015 TrueorFalse questions at 12 points each




810 short answer questions at 46 points each




24 longer answer questions at 815 points each




These review notes are not exhaustive, but are meant as a guide to coverage—see other sources for details!




The final exam is comprehensive!




But the emphasis will be on material from after the midterm




… but here are the main subject areas again (with book chapters):




General background to Data Structures and modularity




Chapter 1, section 1.1; also Chapter 3, section 3.1




Math review and recursion




Chapter 1, sections 1.21.3




Java review and new features




Chapter 1, sections 1.41.5




Review—see external notes!




Algorithm analysis and asymptotics (O notation)




Chapter X, sections yz




Arrays, ArrayLists and linked lists




Chapter 3, sections 3.23.5




Stacks and Queues




Chapter 3, sections 3.63.7




Infix and Postfix expressions




Chapter 3, sections 3.6.3




Introduction to Applets and GUIs




See online examples and resources




online examples at course homepage, near the top




Trees




See Chapter 4 of the text, especially sections 4.14.4 and 4.6 (peruse also splay trees in 4.5)




Tree definitions and terminology




A tree is a collection structure which can hold elements in positions called nodes; a tree is a linked structure, but unlike a linked list, each node may have several linked nodes




nodes: the “places” where data items are kept in the tree




links: the connections between nodes (represented as references/“pointers” in Java)




path: a sequence of links between nodes (and the nodes along the way, depending on context)




root: the top node of a tree




subtree: a node in a tree considered as a root of its own tree




children: the nodes immediately below a given node (similarly for grandchildren, descendants, etc.)




parent: the node immediately above a given node (similarly for grandparent, ancestors, etc.)




height: the number of links between the root of a tree and its furthest descendant




internal nodes: nodes which have a child




leaf or external node: nodes which have no children (or all null children, in a Java implementation)




fringe: the set of all leaves of a tree, considered as a sequence




ordered versus unordered: we usually, but not necessarily, think of the children of a node as ordered, from left to right, say




sharing and cycles: in a proper tree, no node had two parents (no sharing) and no links go back “up” from a node to one of its ancestors (or cousins, etc.): such paths would be called cycles




in a graph, by contrast, nodes are connected by links in arbitrary ways: possibly no root, possibly sharing and cycles, possibly even multiple sets of disconnected nodes (subgraphs)




General trees




in a general tree, any node may have an arbitrary number of children




we could represent children in a general tree using any (ordered) collection, but a linked list might be best




we can also use a firstchild/nextsibling representation, which shows that a binarytree structure can be used to represent a general tree (in this case the “next sibling” links form a kind of linked list)




Binary trees




when every node in a tree has at most two children and they are distinguished as left and rightchildren




in other words, when every node has a left and right child, either of which may be null




Expression trees




since arithmetic is built mostly out of binary operators, we can represent arithmetic expressions as binary trees, where the internal nodes hold operators and the leaves are numeric constants (literals)




expressions can be written in various systematic styles: infix with parentheses (the usual), postfix or prefix




postfix puts the arguments first, then the operator afterwards (postposition); prefix is the opposite




postfix and prefix forms never need parentheses




you should be able to convert between different forms of expressions for the exam!!




various tree traversals (see below) yield similar representations of expressions




except for the parentheses of inorder/infix, which must be generated separately, before and after the left and right subtrees




Tree traversals




a tree traversal is a process visiting all the nodes (or elements thereof) in a tree




typical traversals are implemented recursively: we “visit” the node and recursively traverse the subtrees, in some order




by varying the relative order of node visitation and subtree traversal, we get different standard orders:




leftright preorder: visit node, traverse left, traverse right




leftright inorder: traverse left, visit node, traverse right




leftright postorder: traverse left, traverse right, visit node




… and so on, for three others but righttoleft (6 possible permutations all together)




if we want to implement the Iterator interface, we can’t use recursion: use an explicit stack instead




if we substitute a queue for the stack, we get breadthfirst traversals (also known as levelorder)




Binary search trees




a binary search tree is a binary tree whose elements support some ordering, and where:




all the nodes to the left of the root hold values that are less than the the one in the root




all the nodes to the right of the root hold values that are greater than the the one in the root




similarly for all subtrees (i.e., same rule throughout the tree)




equal values can be handled in several ways (keep counts in nodes, or just ignore if duplication is not important)




binary search trees (BSTs) implement the Dictionary<K,V> interface, allowing values to be looked up by key




insertion and lookup in a BST can be very efficient (O(log n)) if the tree is sufficiently “bushy”:




compare value being inserted (or sought) to the element in the root, go left or right as appropriate




height of a bushy tree is O(log n) where n is the total number of elements




int the worst case, successive insertion of items already in order (or in reverse order) will result in a long, skinny, “tilted” tree with O(n) lookups




Deletion from BSTs




deletion from a BST depends on some cases:




deletion of a leaf is easy: just set the reference in the parent to null




deletion of a node with a single child is almost as easy: just reset the reference int he parent to circumvent the deleted node




if a node has two children, look for the “adjacent node” (next smaller or next larger) in the tree; this will be in the “inside corner” of one of the subtrees (extreme left of right subtree or vice versa); replace the element in the deleted node with that from the insider corner node, then delete that node (which is easier)




Balanced trees (AVL)




since “bushy” or balanced trees are important to make BSTs efficient, it is worth considering how to keep trees balanced




many possible definitions of balance, but it is not enough that the tree “balances” around the root like a coat hanger




for AVL trees, the difference in height between subtrees must be at most one (throughout the tree)




for the exam, you should be able to recognize this property of a tree




when inserting a node, an AVL tree may get out of balance: we correct this by performing rotations




there are several cases to consider, with symmetries, some requiring single and some requiring double rotations




see the textbook pages 123136 for details, examples and pictures




Other kinds of trees




there are many other kinds of trees that maintain balance using rotations: redblack trees, splay trees, etc.




splay trees in particular use a policy of bringing sought items to the root using rotations (with the understanding that they are likely to be sought more often later)




Sorting




See Chapter 7 of the text, especially sections 7.17.8




Introduction to sorting




one of the most common operations




ordered data supports binary search (fast!)




comparison interfaces in Java




Comparable—internal/“natural”: this.compareTo(that)




Comparator—external: compare(this, that)




both return an integer: negative (this is less), zero, positive




Naïve sorting




selection sort: next ordered item selected and added physically next in the result




insertion sort: next physical item inserted into proper place in the ordered result




bubble sort: repeatedly swap items, “bubbling up” to the boundary




average vs. best vs. worstcase




time bounds: O(n2)




Better sorting




tree sort: put items into a binary search tree, then traverse




heap sort: put items into a heap, then pull them all out (with repeated deleteMin)




quick sort: split items using pivot, then recursively sort the two parts




merge sort: split items into tow halves, recursively sort, then merge back together




time bounds: O(n log n) in average case (but can be bad for sorted data)




hard bounds on comparisonbased sorting: no comparisonbased sort can do better than O(n log n)




A rational reconstruction of (many) sorts




hardeasy division: either hardsplit/easyjoin or easysplit/hardjoin (you have to do the work somewhere)




onemany versus halfandhalf split: the latter is better, since we get the power of divideand conquer (logarithmic times)




Aside on priority queues and heaps




a priority queue is like a queue, but we can remove items based on highest priority (deleteMin)




it’s easy to make either add or deleteMin O(1) and the other O(n), using a linear list




but we can get O(1) add and O(log n) deleteMin by using a treelike structure




… stored in an array!




Hashing




See Chapter 5 of the text, especially sections 5.15.4




Introduction




Dictionary interface: look values up by key




often the key is a field in a larger value object




can be implemented in O(log n) time using a BST (see above)




… but it would be better to have O(1)!




solution: keep items in an array




but under what index?




basic idea: chop up and mash around the keys (i.e., as numbers, using their bits)




we want a “messy”, randomseeming (but repeatable!) process which distributes keys well




unfortunately, it may happen that two keys collide under the hash function




use the modulus operator (%) to limit the hash results to the table size (1)




what do we want from hash function?




speed!




coverage: all slots (not limited in magnitude or ability to hit), even distribution




lack of collisions (as much as possible)




collision resolution strategies




separate chaining: use a list of items hanging off each array entry




internal addressing/probing: try to insert at nearby slots if the preferred one is filled




… but there are problems due to clustering (“clumps” of filled entries, which tend to grow in size)




linear probing: try successive entries (but primary clustering)




quadratic probing: square the “attempt number” to get an offset (but secondary clustering)




best bets for hashing




wellchosen table size (about twice as much as there is data, and use a prime number)




use a welldistributed hash function


