javaContainers.html

Herbert Spencer wrote, "Science is organized knowledge." The corollary might be that applications are organized objects. Let's take a moment to turn to some aspects of Java that are critical for developing applications rather than applets.

Of those who have heard of Java, the majority has learned about the language through the popular press. The statement that crops up a lot is that Java is for "programming small applications, or applets, that can be embedded on a Web page." While correct, this definition conveys only one aspect of the new language; it doesn't describe the whole picture. Perhaps Java can be better described as a language designed to build systems -- large systems -- of well-understood portable pieces of executable code that can be combined, in whole or in part, to produce a desirable whole.

In this column I will begin to look at the various tools you can use to build in Java. I will demonstrate how these tools can be combined to make a larger application, and how, once you have an application, you can further aggregate the application into still larger systems -- all possible because in Java there is no distinction between a complete application and a simple subroutine.

To provide source code fodder for this and past columns, I chose to build a BASIC interpreter. "Why BASIC?" you might ask, thinking no one uses BASIC anymore. This is not entirely true. BASIC lives on in Visual BASIC and in other scripting languages. But more importantly, many people have been exposed to it and can make the following conceptual leap: If "applications" are programmed in BASIC, and BASIC can be written in Java, then applications can be written in Java. BASIC is just another interpreted language; the tools that we will be building can be modified to use any language syntax, thus the core concepts are the focus of these articles. Therefore, what starts as an application becomes a component of other applications -- even applets perhaps.

Generic classes and containers
Building generic classes is particularly relevant when creating applications because reusing classes provides tremendous leverage in reducing both complexity and time to market. In an applet, the value of a generic class is mitigated by the requirement of loading it over the network. The negative impact of loading generic classes over the network is demonstrated by Sun's Java Workshop (JWS). JWS augments the standard version of the abstract windowing toolkit (AWT) by using some very elegant "shadow" classes. The advantage is that applets are easy to develop and are rich in features; the downside is that loading these classes can take a lot of time on a slow network link. While this disadvantage will disappear eventually, what we find is that a system perspective on class development is often required to achieve the best solution.

Since we're starting to look a bit more seriously at application development, we will assume we have already determined that generic classes are a valid solution.

Java, like many general-purpose languages, provides several tools for creating generic classes. Different requirements will necessitate using different tools. In this column I will use the development of a container class as an example since it can accommodate nearly all of the tools a user might wish to use.

Containers: a definition
For those of you who are not yet familiar with things object-oriented, a container is a class that organizes other objects. Common containers are binary trees, queues, lists, and stacks. Java supplies three container classes with the JDK 1.0.2 release: java.util.Hashtable, java.util.Stack, and java.util.Vector.

Containers have both an organizing principle and an interface. Stacks, for example, may be organized as "first in, last out" (FILO), and their interface may be defined to have two methods -- push() and pop(). Simple containers can be thought of as having the standard methods add and remove. Further, they will have a means to enumerate the entire container, to check to see if a candidate object is already in the container and to test the number of elements being held by the container.

The Java container classes demonstrate some of the problems with containers, especially keyed containers (those containers that use a key to locate an object). The non-keyed containers like Stack and Vector simply stuff objects in and pull objects out. The keyed container Hashtable uses a key object to locate a data object. For the keying function to work, the key object must support a method HashCode that returns a unique hash code for every object. This keying ability works because the Object class defines a HashCode method and thus is inherited by all objects, but it isn't always what you want. For example, if you are putting objects into your hash table container and you are indexing them with String objects, the default HashCode method simply returns a unique integer based on the object reference value. For strings, you really want the hash code to be a function of the string value, so String overrides HashCode and supplies its own version. This means that for any object you develop, and want to store in a hash table using an instance of the object as the key, you must override the HashCode method. This insures that identically constructed objects hash to the same code.

But what about sorted containers? The only sorting interface provided in the Object class is equals(), and it is constrained to equating two objects as having the same reference, not having the same value. This is why, in Java, you can't write the following code:

The above code compares the object references, notes that there are two different objects here, and returns false. You have to write the code as follows:

This latter test uses knowledge encapsulated in the compareTo method of String to compare two string objects and return an indication of equality.

Using the tools in the box
As I mentioned earlier, generic program developers have two primary tools available to them: implementation inheritance (extending) and behavioral inheritance (implementing).

To use implementation inheritance, you extend (subclass) an existing class. By extension, all subclasses of the base class have the same capabilities as the root class. This is the basis for the HashCode method in the Object class. As all objects inherit from the java.lang.Object class, all objects have a method HashCode that returns a unique hash for that object. If you wish to use your objects as keys, however, keep in mind the caveat mentioned earlier about overriding HashCode.

In addition to implementation inheritance, there is behavioral inheritance (implementing), which is achieved by specifying that an object implements a particular Java interface. An object that implements an interface can be cast to an object reference of that interface type. Then that reference can be used to invoke the methods specified by that interface. Typically, interfaces are used when a class may need to process several objects of different types in a common way. For example, Java defines the Runnable interface that is used by the thread classes to work with classes in their own thread.

Building a container
To demonstrate the tradeoffs in writing generic code, I will walk you through the design and implementation of a sorted container class.

As I mentioned earlier, in the development of general-purpose applications, in many cases a good container would be useful. In my example application I needed a container that was both keyed, meaning that I wanted to retrieve contained objects by using a simple key, and sorted so that I could retrieve the contained objects in a specific order based on the key values.

When designing systems, it is important to keep in mind what parts of the system use a particular interface. In the case of containers, there are two critical interfaces -- the container itself and the keys that index the container. User programs use the container to store and organize objects; the containers themselves use the key interfaces to help them organize themselves. When designing containers, we strive to make them easy to use and to store a wide variety of objects (thus increasing their utility). We design the keys to be flexible so that a wide variety of container implementation can use the same key structures.

To solve my behavioral requirements, keying and sorting, I turn to a useful tree data structure called a binary search tree (BST). Binary trees have the useful property of being sorted, so they can be efficiently searched and can be dumped out in sorted order. The actual BST code is an implementation of the algorithms published in the book Introduction to Algorithms, by Thomas Cormen, Charles Leiserson, and Ron Rivest.

java.util.Dictionary
The Java standard classes have taken a first step toward generic keyed containers with the definition of an abstract class named java.util.Dictionary. If you look at the source code that comes with the JDK, you will see that Hashtable is a subclass of Dictionary.

The Dictionary class attempts to define the methods common to all keyed containers. Technically, what is being described could more properly be called a store as there is no required binding between the key and the object it indexes. However, the name is appropriate as nearly everyone understands the basic operation of a dictionary. An alternative name might be KeyedContainer, but that title gets tedious pretty quickly. The point is that the common superclass of a set of generic classes should express the core behavior being factored out by that class. The Dictionary methods are as follows:

size( )	This method returns the number of objects currently being held by the container.
isEmpty( )	This method returns true if the container has no elements.
keys( )	Return the list of keys in the table as an Enumeration.
elements( )	Return the list of contained objects as an Enumeration.
get(Object k)	Get an object, given a particular key k.
put(Object k, Object o)	Store an object o using key k.
remove(Object k)	Remove an object that is indexed by key k.

By subclassing Dictionary, we use the tool of implementation inheritance to create an object that can be used by a wide variety of clients. These clients need know only how to use a Dictionary, and we can then substitute our new BST or a Hashtable without the client noticing. It is this property of abstracting out the core interface into the superclass that is crucial to reusability, general-purpose function, expressed cleanly.

Basically, Dictionary gives us two groups of behavior, accounting and administration -- accounting in the form of how many objects we've stored and bulk reading of the store, and administration in the form of get, put, and remove.

If you look at the Hashtable class source (it is included with all versions of the JDK in a file named src.zip), you will see that this class extends Dictionary and has two private internal classes, one named HashtableEntry and one named HashtableEnumerator. The implementation is straightforward. When put is called, objects are placed into a HashtableEntry object and stored into a hash table. When get is called, the key passed is hashed and the hashcode is used to locate the desired object in the hash table. These methods keep track of how many objects have been added or removed, and this information is returned in response to a size request. The HashtableEnumerator class is used to return results of the elements method or keys method.

First cut at a generic keyed container
The BinarySearchTree class is an example of a generic container that subclasses Dictionary but uses a different organizing principle. As in the Hashtable class, I've added a couple of classes to support holding the stored objects and keys and for enumerating the table.

The first is BSTNode, which is equivalent to a HashtableEntry. It is defined as shown in the code outline below. You can also look at the source.

Let's take a look at this code in order to clarify two things. First, there's the null-protected constructor, which is present so that subclasses of this class do not have to declare a constructor that overrides one of this class' constructors. Second, the methods successor, predecessor, min, max, and print are very short and merely call the same private equivalent so as to conserve memory space.

The next special-purpose class is the class BSTEnumerator, which provides an implementation of a java.util.Enumeration that can be used to get each key or each object out of the tree in order. You can follow along with the code below.

As you can see, we extend BSTNode, which gives us access to the methods for moving through a tree. Further, we can look into the protected parts of the object -- the key and the payload instance variables. To implement the Enumeration interface we need only a hasMoreElements method and a nextElement method.

The hasMoreElements method returns true if the node we are holding is non-null. This design allows us to create an enumerator without having to check if there are any objects in our container; the enumerator will always provide the correct response. The second method, nextElement, simply returns the node we are holding and replaces that node with the next node in the tree (known as the successor node). What is actually returned is either the key value or the payload, depending on the state of the key's boolean. By implementing the system in this way, only these two classes -- BSTNode and BSTEnumerator -- need to know the particulars of how an internal node is constructed. Keep in mind that with the BSTEnumerator class, the reference in currentNode is in fact also currently stored inside the tree. If another thread is manipulating the tree, something unexpected may result. If this potential for multiple threads changing the tree during an enumeration is a problem, the enumerator could create a quick copy of the tree during its construction. For our purposes, however, we simply don't change the tree while we are enumerating its elements or keys.

Finally, after the node and enumerator are designed, we put together our subclass of Dictionary to be the container object. The first half of the code for BinarySearchTree is shown below in outline form.

This first section of the code above shows the two state variables that our tree class maintains. These variables hold the number of elements currently held and the root node for the binary tree. The variables are followed by three methods that manipulate the contents of the tree -- search, insert, and remove . Remember that trees consist of BSTNode objects. At this point, we must implement the abstract methods of Dictionary in our class. Java requires that concrete subclasses define implementations for all of the abstract methods in their superclass. Starting with the easy ones, we have elements, keys, size, and isEmpty, as shown below.

The key and elements methods simply create a new BSTEnumerator with the keys flag appropriately set, and pass it the logical first element in the tree. Later, when nextElement is called, this node will be used to find successive nodes until all nodes have been visited. The isEmpty and size methods simply return the state of elementCount to the user. This variable is maintained in the insert and delete methods of the tree.

The next method from Dictionary that we implement is get, which we do by calling the search method -- starting at the root node and using the key the user passed us. This code is shown below, and all it does is first check to make sure there are some nodes to search and then search them. If the search is successful, it returns the payload from the node it found.

Next we have the dictionary administration methods remove and put. The implementation of these methods is shown below. The only subtle implementation detail is that, implicitly, keyed containers cannot contain multiple objects for the same key. This is simply an implementation decision that Sun made in the specification of the language. It is straightforward to implement multiple objects per key but get and remove get a bit harder to write.

The remove method checks to see if the node exists. If it does, it gets back its payload, then removes it while returning the payload to the caller. The put method removes an object associated with key if it exists and then adds a new node with the new value into the tree.

And with that a complete container is created to go along with Hashtable. To test it I wrote a simple test program that fills the container with some strings and then enumerates them out. Next it removes one of the objects and prints out the complete list again. At about line 32 of the test program there is the statement Dictionary d = new XXX(); where the XXX is replaced with either BinarySearchTree or Hashtable. This test program inserts the following colors into the dictionary: red, green, orange, purple, chartreuse, gray, magenta, cyan, yellow, tangerine, turquoise, brown, black, white, silver, gold, pink, bronze, beige, blue, and aquamarine. And it sets the stored object value to be a string consisting of the number of the color in the list, as well as the color's value. This gives us an idea of how the colors get rearranged when they are in the Dictionary object. The complete contents of the dictionary are enumerated with the following code.

Hashtable contains :	BinarySearchTree contains :
`14: silver`	`20: aquamarine`
`17: bronze`	`18: beige`
`5: gray`	`12: black`
`8: yellow`	`19: blue`
`1: green`	`17: bronze`
`10: turquoise`	`11: brown`
`18: beige`	`4: chartreuse`
`11: brown`	`7: cyan`
`7: cyan`	`15: gold`
`6: magenta`	`5: gray`
`2: orange`	`1: green`
`12: black`	`6: magenta`
`4: chartreuse`	`2: orange`
`3: purple`	`16: pink`
`16: pink`	`3: purple`
`19: blue`	`0: red`
`0: red`	`14: silver`
`20: aquamarine`	`9: tangerine`
`13: white`	`10: turquoise`
`15: gold`	`13: white`
`9: tangerine`	`8: yellow`

As you can see from the above table, the objects from the BinarySearchTree-based dictionary come back in sorted order, whereas the Hashtable entries come back in an unpredictable order (unless you read the source to Hashtable, of course).

The use of String in the binary search tree container
Our first cut at designing a sorted container works fine; it implements all of the abstract methods of Dictionary. We've written a test program to show that at least in one case it is interchangeable with Hashtable. Unfortunately there is a lurking problem that we haven't yet addressed: The limitation that the key used in this container must be of type String. Let's take a moment to investigate why strings are used, as well as what the limitations of that design choice are. In the next column we will look at eliminating strings completely.

For any container to work, the key must identify an object. In our BinarySearchTree container we use a String object to identify the contained object. The object identified by the key is located using a binary search algorithm, which uses the relative value of the key you are holding versus the key you are currently looking at, to determine which key to look at next. This is implemented in the search method in BinarySearchTree shown below.

If you recall the way that a binary tree works, each node optionally has two child nodes (hence the term "binary search tree"). The node to the left points to a node that is "less than" the current node, and the node to the right points to a node that is "greater than" the current node. The search method uses the equals method to see if the key of this node is equal to the passed key. If the keys are not equal, then search uses the compareTo method to decide if the left branch or the right branch of the tree should be followed. Once the branch determination is made, the method is recursively called on that node. If the node is null, as it will be when the key is not present in the tree, search simply returns null, indicating that the key is not present.

As you can see, the algorithm is simple and efficient, however it depends on the key object having a method compareTo to determine if the left or right branch of the tree should be followed. Unfortunately, the base class of Object, which defined hashCode for the hash table algorithm, does not define compareTo for all objects. Because of this, we cannot truly meet Dictionary's contract that a key can be any object.

Wrapping up
In this column I've walked you through the building of a container based on the Java container classes. In the process we have uncovered what is clearly a somewhat "uncomfortable" place. Next month I plan to take the BinarySearchTree class through several revisions to illustrate ways to work around these issues in Java and ways to build generic containers in particular. Until next time.

About the author
Chuck McManis is currently the director of system software at FreeGate Corp. FreeGate is a venture-funded start-up that is exploring opportunities in the Internet marketplace. Before joining FreeGate, McManis was a member of the Java group. He joined the Java group just after the formation of FirstPerson Inc. and was a member of the portable OS group (the group responsible for the OS portion of Java). Later, when FirstPerson was dissolved, he stayed with the group through the development of the alpha and beta versions of the Java platform. He created the first "all Java" home page on the Internet when he did the programming for the Java version of the Sun home page in May 1995. He also developed a cryptographic library for Java and versions of the Java class loader that could screen classes based on digital signatures. Before joining FirstPerson, Chuck worked in the operating systems area of SunSoft developing networking applications, where he did the initial design of NIS+. Reach him at chuck.mcmanis@javaworld.com. Or check out his home page.

If you have problems with this magazine, contact webmaster@javaworld.com
URL: http://www.javaworld.com/javaworld/jw-11-1996/jw-11-indepth.html
Last updated: 15 October 1996

Related Web Works

Container support for objects in Java 1.0.2

Organizing objects is easy when you put them into containers