CS 241 Lab 2: Timing Strings and Collections

In this lab, you will explore the running time behavior of certain algorithms involving common data structures; you will implement a StopWatch class for this purpose, as well as to facilitate similar time studies in the future. More specifically, we are interested in testing the behavior of the String and StringBuffer classes and the ArrayList data structure (this last we will compare against a simple mock implementation using an internal array, as was discussed in lecture).

 

Introduction

The Data Structures class is all about how to code algorithms more efficiently, in both time and space used. In this lab we will focus on time issues (in fact, time will receive more attention than space throughout the semester). We heard in lecture (and you may have also read on Joel Spolsky’s blog) that the String class does not behave well under iterated appends; that is to say, you shouldn’t write a loop in which you repeatedly replace a string by appending onto itself with new Strings, like this:

for(int i=0; i<n; i++) {
    myString = myString + newString;
}
(Here you should imagine that newString is something more realistic, such as a String read in from input, from a file, or perhaps computed somehow, rather than just a fixed String value ... but then see below!)

Why shouldn’t you write code like this? Well, what looks like an innocent addition onto the end of the String will actually be implemented by allocating a new String object (of greater size) and then copying both the original and new Strings into it, and finally replacing the reference to the old one with a reference to the new one ... and this will happen every time around the loop! All this allocation and especially copying will take an increasing amount of time with each iteration and, as we saw in lecture, when you add up amounts of time that increase with each step, the total time starts to go up like the square of the number of iterations.

(In Joel Spolsky’s blog, the old joke about Schlemiel the painter illustrates this idea: his longer and longer walks back and forth to the paint can make his progress painting road stripes seem more and more dismal.)

But what if you want to write that looks like the loop above? Well, the Java standard libraries will come to your rescue: the StringBuffer class looks roughly like the String class, and supports many of the same operations, but it implments repeated appending in a much more efficient way. Essentially, it keeps track of what needs to be appended over time, but doesn't actually run down the characters and put them all together until it’s needed “in the end”, i.e., when you actually extract a String from the StringBuffer. (This is in fact a common technique used by a lot of libraries and languages: it’s not specific to Java.)

Another possible source for slow-downs that we’ve discussed in lecture is the phenomenon of shifting blocks of data around in arrays in order to make room for new items (additions) or to close up gaps after items are taken out “from the middle” (removals). You have already been introduced to the ArrayList class, which in some sense serves the same role with respect to plain old arrays that StringBuffers serve with respect to Strings. That is, ArrayLists are to be preferred to arrays because they allow additions and removals to the front, middle or back end of an array without (we are told!) such high penalties as we would see if we shuffled the same items around in an array.

Well, caveat emptor! Do we really know that these claims are true? How could we tell? Maybe ArrayLists are really just arrays in disguise (a claim that was put forth in lecture, at least tentatively, in order to explain how they might work). In order to find out, we can implement a class similar to ArrayLists, but using an internal array, which supports the same add and remove methods, and then simply test them against real ArrayLists. Likewise, we can write code that will test StringBuffers against Strings to see how they compare.

In order to facilitate the timing comparisons, we will also implement a (very small, very simple!) class for StopWatches: this will make it especially easy to do cumulative timing tests (just start the watch before the relevant code, then stop it again right afterwards, and finally read off the accumulated time to see how long the task took). We will also implement a reset feature so that we can use the same watch to record multiple trials.

 

Methodology

Your main method (in the main project class) should perform tests of both Strings versus StringBuffers and your own “mock” array lists versus the real ones (let’s agree to call the one you write the MockList class). In each case, we’d like to test some operations by doing them in a loop for a specific number of iterations, recording the accumulated time it takes. Then, we can increase the number of iterations and see how the two different implementations react to these increases in size (i.e., loop iterations). In my own tests, I used iteration counts of 100, then 1,000, then 10,000 and even 100,000 ... although the String tests were so slow that I had to cut them off above the 10,000 mark: I just couldn’t wait that long!

What sorts of tests should we perform? Well, for Strings versus StringBuffers, a loop very much like the one above will make for a nice test. For testing purposes, it won’t matter if we actually do just use a fixed String to append on the right: in fact, it will help focus the test, since we won’t have to worry about how long it takes to get input, or to compute something. In the case of the StringBuffer, you cannot just “add” the second String using the plus (+) operator: you need to call the append(...) method instead. I used a fixed piece of text of about 40 characters in length for my tests and that worked just fine. Note that we will not actually do anything with the huge accumulated Strings that we build: in particular, we won’t print them out, but rather just time how long it takes to build them (I don’t want to print out a 4,000,000-character String!).

For the other set of tests, you will need to implement a class similar to ArrayLists, but which uses a plain old array internally. Let’s agree to implement just the add(...) and remove(...) methods, along with a necessary constructor (we can pass it a maximum capacity as an int) and also a size() method (returning the current size: not the capacity, but just the actual number of elements in the structure at the time). During debugging it will be important to check for errors that might occur if an addition of items causes the structure to go over capacity, but once your code is working, you can just request a suitable capacity in the constructor call, i.e., one that will fit all the test values you need. (Internally, you can just dynamically allocate an array of the given size in the constructor: the array variable itself should be declared outside the constructor, in the class as a whole, so that all methods have access to it ... but you can initialize it in the constructor method to the right size.)

We have agreed that we will test mainly the appending of Strings for the first part of the lab: what paces should we put our MockLists and the standard Java ArrayLists through? Let’s agree to take a given structure (starting empty) and first add a bunch of items at the front of the structure (i.e., at index 0), one after the another, for a whole slew of iterations (10, 100, 1,000, etc., as above). We can measure the total time for each count as we did above, by starting a StopWatch right before the addition loop and stopping it right after. Then, in a separate test (report the accumulated time separately), we can remove the items from the front as well (i.e., with a call to remove(0).

This suggests that we could write a method to do the testing, and pass it the number of iterations to test.

Abstracting out a method could be done for the String, StringBuffer and ArrayList cases as well. If you really want to get fancy, you could try to use generics or interfaces in order to unify the methods for the similar cases, but this is not a requirement of the assignment!

Overall, then, you should run the following tests:

In my own testing, I also printed out the average time per operation or iteration for each of the n-iterated loops, since that was an easier result to interpret. My output looked something like this, although I have blocked out the actual result numbers in order to not spoil any “surprises” that may come up:

Testing Strings ...
	* for      10 iterations, time =  ***; avg =  *** per string appended.
	* for   1,000 iterations, time =  ***; avg =  *** per string appended.
	* for  10,000 iterations, time =  ***; avg =  *** per string appended.
	*** 100,000 NOT SUPPORTED FOR STRINGS! ***

Testing StringBuffers ...
	* for      10 iterations, time =  ***; avg =  *** per string appended.
	* for     100 iterations, time =  ***; avg =  *** per string appended.
	* for   1,000 iterations, time =  ***; avg =  *** per string appended.
	* for  10,000 iterations, time =  ***; avg =  *** per string appended.
	* for 100,000 iterations, time =  ***; avg =  *** per string appended.

Testing ArrayLists ...
	* for      10 additions, time =  ***; avg =  *** per addition.
	* for      10 removals,  time =  ***; avg =  *** per removal.

	* for     100 additions, time =  ***; avg =  *** per addition.
	* for     100 removals,  time =  ***; avg =  *** per removal.

	* for   1,000 additions, time =  ***; avg =  *** per addition.
	* for   1,000 removals,  time =  ***; avg =  *** per removal.

	* for  10,000 additions, time =  ***; avg =  *** per addition.
	* for  10,000 removals,  time =  ***; avg =  *** per removal.

	* for 100,000 additions, time =  ***; avg =  *** per addition.
	* for 100,000 removals,  time =  ***; avg =  *** per removal.

Testing MockLists ...
	* for      10 additions, time =  ***; avg =  *** per addition.
	* for      10 removals,  time =  ***; avg =  *** per removal.

	* for     100 additions, time =  ***; avg =  *** per addition.
	* for     100 removals,  time =  ***; avg =  *** per removal.

	* for   1,000 additions, time =  ***; avg =  *** per addition.
	* for   1,000 removals,  time =  ***; avg =  *** per removal.

	* for  10,000 additions, time =  ***; avg =  *** per addition.
	* for  10,000 removals,  time =  ***; avg =  *** per removal.

	* for 100,000 additions, time =  ***; avg =  *** per addition.
	* for 100,000 removals,  time =  ***; avg =  *** per removal.

(Hey, Fritz: how did you get all those columns of numbers to line up so neatly? Why, I used the handy-dandy Java printf method, which you can read about at this page provided to the web by Jack Wilson from Cerritos College; thanks, Jack!)

Classes and methods

So, to summarize, you should implement at least the following classes and methods (you may also implement others, of course, if it makes your job easier, and you might even consolidate things if you are handy with sub-classes, interfaces and generics):

Demo and other conclusions

For your demo, you should be able to show your code, run the tests using the iteration values you chose, and also run the tests with new values at demo time. If you want to get fancy, you might want to gather your data up and present it in a spreadsheet or graph: students have done this sort of thing to great effect in the past (*cough* Adam Bolen, Lab Assistant *cough*).

You should in any case be able to discuss briefly the results you saw and speak to what might explain them.