University of Sheffield   

    The Simons
    Component Library

Introduction   Class Hierarchy   Class Listing   Index of Classes   Index of Methods   Header Files  

The Streams Hierarchy

Classification   Protocols   File Streams   Format Streams  

This document describes the design rationale used in developing the Stream class hierarchy in the Simons Component Library. Streams are a fundamental kind of component in any software library, used for basic input and output, storage and retrieval from files and encoding and decoding of object data. A fundamental design decision was taken to make basic input and output as simple and symmetrical as possible. File streams are merged with standard input and output streams. An advisory file locking protocol may be used to restrict access to shared files. Formatting issues are separated from basic data transfer issues. All kinds of data encoding, such as XML, CSV, HTML or CGI may be handled by wrapping basic streams with encoding and decoding streams. Object serialisation and restoration is handled by the same mechanisms.

Classification of the Streams

From the outset, it was decided that the streams should be provided in the same way as other classes, related in a class hierarchy according to their similarities and differences. The design of the streams hierarchy is inspired by Java, in that basic data transfer is separated from all encoding and decoding in different formats; and the four most important interfaces are provided by the abstract classes Input, Output, Reader and Writer. These are provided in a single classification hierarchy:

Stream is the abstract superclass of all data streams and pipes. The Stream class is the root of the streams hierarchy, which includes the FileStream classes, which connect to the underlying filesystem and perform basic read and write operations, and the FormatStream classes, which wrap around the FileStream classes and provide data encoding and decoding in different formats. Further PipeStream subclasses may act as data connections between objects that produce or consume data. A Stream must always be explicitly opened before it is used; and it may be closed after use (otherwise, it will be closed automatically when it is no longer reachable). Stream provides a protocol to test the state of the Stream, which is implemented in descendant classes.

Input

Input is the superclass of all input file streams. The Input class provides the ANSII standard C++ implementation of an input filesystem. It implements the FileStream protocol for opening and closing a filesystem. It implements the Stream protocol for detecting the state of a Stream. Operations that connect to the filesystem may raise a NotFound exception if the filesystem cannot be found and a NoResponse exception if the filesystem fails, or is already in use. The parent of TextInput and ByteInput, Input defines the overloaded, abstract get() protocols for reading values of all the basic types Boolean, Character, Natural, Integer and Decimal. It also defines the get() protocols for reading Character[] array and String objects.

Output

Output is the superclass of all output file streams. The Output class provides the ANSII standard C++ implementation of an output filesystem. It implements the FileStream protocol for opening and closing a filesystem. It implements the Stream protocol for detecting the state of a Stream. Operations that connect to the filesystem may raise a NotFound exception if the filesystem cannot be found and a NoResponse exception if the filesystem fails, or is already in use. The parent of TextOutput and ByteOutput, the Output class defines the abstract, overloaded put() protocols for writing values of all the basic types Boolean, Character, Natural, Integer and Decimal. It also defines the put() protocols for writing Character[] array and String objects.

Reader

Reader is the abstract superclass of all format decoding streams. The Reader class is an intermediate class in the streams hierarchy that serves as the main interface for decoding Object data. The descendants of Reader include XMLReader, CSVReader and CGIReader. Note: some of these specialisations are still under development. A Reader wraps up an Input stream, from which it obtains its encoded data. It defines the interface for restoring root and branch Objects from their encoded forms, providing get() to reconstruct a root Object and all Objects reachable from it; and getField() to reconstruct a branch Object stored as the named field of some other Object.

Writer

Writer is the superclass of all format encoding streams. The Writer class is an intermediate class in the streams hierarchy that serves as the main interface for encoding Object data. The descendants of Writer include XMLWriter, CSVWriter and CGIWriter. Note: some of these specialisations are still under development. A Writer wraps up an Output stream, to which it sends its encoded data. It defines the interface for serialising root and branch Objects in an encoded form, providing put() to serialise a root object and all Objects reachable from it; and putField() to serialise a branch Object stored as the named field of some other Object.

Introducing the Stream Protocols

Stream objects are first-class objects in the Simons Component Library. This means that they can be passed by reference and shared by more than one variable; or cloned such that two independent streams refer to the same logical filesystem. For this reason, a particular policy was devised to minimise ownership conflicts in the management of the underlying filesystem resources.

Streams and Filesystem Resource Management

Streams can exist in either an open or closed state. When a stream is created, it is intially closed. Likewise, if a stream is cloned, the copy is initially closed, even if the original stream was open at the time. This policy is the opposite of standard C++, which opens streams upon construction and closes streams upon destruction. There were two reasons behind this design decision.

Firstly, there is a resource managment issue. Every time a connection is made to a filesystem, the operating system must allocate resources. A program must explicitly open() a stream to make the connection with the underlying filesystem. A program must then close() a stream to release the underlying filesystem resources. However, this action is also accomplished automatically when a stream object is released and deleted. This ensures that all streams are exception-safe and release their filesystem resources under both normal and abnormal termination.

Secondly, there is a file sharing issue. After cloning, a new stream cannot automatically be used as an alias for reading or writing to the same filesystem, since the stream is closed. Attempting to do this will raise an exception. However, if the program explicitly opens the cloned copy, this is understood as a deliberate request to connect with the same filesystem. Under Unix, it is possible to open multiple handles on the same file, whether for reading or writing. File sharing can be prevented explicitly by turning on the FileStream locking protocol.

Opening, Closing and Redirecting Streams

All streams may connect either to the terminal, or to a filesystem. If a default stream is created, without supplying a file path name, using open() will connect the stream to the terminal (standard input, or standard output):

If a path name is supplied at construction, using open() will attempt to connect the stream to the named filesystem: Alternatively, a path name can be supplied later, using openOn(), which first stores the new path name and then calls open() internally to connect to the filesystem: A stream can be closed explicitly using close(), after which no data can be passed; attempting to do so will raise an exception. If the stream is reopened with open(), it will connect to the filesystem named by the existing stored path name. A stream can be redirected to a different source or sink by reopening it using openOn() with the new path name. If a null path name is supplied, the stream will reconnect to the terminal.

Stream Failure and Exceptions

Streams connect to external physical devices, so there is the possibility of device failure during use; or failure to connect if the path to the device was incorrectly specified; or refusal to connect if the device was already in use. Also, reading and writing may fail if an attempt is made to read or write data values of an unexpected type or format. Because of this, stream operations may raise a number of exceptions. Programs can choose to catch these exceptions and recover. Certain failures require closing and reopening the stream.

Some of these exceptions may be chained together; for example a ReadFailure may arise because the stream has not been opened, therefore a NotFound exception is the prior cause. ReadFailure may also be caused by NoElements, if the input is exhausted, or NoResponse if the opened filesystem fails. Similarly a WriteFailure may arise because of a NotFound, or a NoResponse exception. Chained exceptions can be retrieved using the cause() method.

Programs should not rely on exceptions during normal use. For example, rather than wait for a NoElements exception, a program should test the input stream using empty() to detect whether the end of an input stream has been reached. See the Stream class protocols for methods that inspect the stream state.

Using the File Streams

The FileStream classes are those streams which connect directly to the underlying filesystem. The descendants of FileStream are Input and Output, which define the abstract protocols for reading and writing values of all the basic types Boolean, Character, Natural, Integer and Decimal; and for reading and writing Character[] array and String objects. The concrete descendants of Input are TextInput, which reads data in plain text format, and ByteInput, which reads data in binary format. Likewise, the concrete descendants of Output are TextOutput, which writes data in plain text format, and ByteOutput, which writes data in binary format.

File Streams are also Standard Streams

From the outset, it was decided that there was no compelling reason to distinguish the standard input and output streams, which connect with the keyboard and monitor, from the input and output file streams that connect to the underlying filesystem. No such distinction is maintained by the underlying operating system kernel, which also allows easy redirection of standard input and output from the terminal to the filesystem. In the Simons Component Library, every FileStream can choose to connect either to a standard stream, or to a filesystem. See the opening and closing protocols.

This contrasts with standard C++, in which the file streams are are kept conceptually distinct from the standard streams, and are only partly compatible with them. Sometimes this is annoying: while the uni-directional input and output file streams ifstream and ofstream are respectively compatible with the standard streams istream and ostream, the reverse is not the case. Neither is it possible to relate the bi-directional file stream fstream with the cognate standard stream iostream, because of the topology of the class hierarchy.

Basic Input and Output

From the outset, it was decided to make basic input and output as simple as possible (much like C++) and for the stream construction and IO programming idioms to be as symmetrical as possible (unlike Java). All Input streams support the get() protocol to read basic values; and all Output streams support the put() protocol to write basic values. In both cases, the read or written value is the argument and the result of the method is the stream, which allows operations to be chained together:

The arguments to put() may be literal values, or variables containing values. The arguments to get() must be variables, which will have values placed in them if the reading operation succeeds. The example shows how you can control how much text should go into a String variable, up to a specified terminator character '=', which is then explicitly skipped. This is more general than C++ which only reads Strings up to the next space. All reading methods that require terminators will default to the newline character. Apart from those methods that read strings, input methods skip leading white space.

Reopening, Rewriting and Appending

The open() and openOn() methods always reopen a file from the beginning. For an Input stream, this means that data will be read from the beginning of the file. For an Output stream, this means that every time a file is opened, it is overwritten and the old contents are lost. Output streams also support openLog() and openLogOn(), which open a file at the end, ready for appending data to an existing file, which is assumed to be a log file:

If openLog() is invoked on a default Output stream, this will connect to the standard error stream, rather than the standard output stream. All Exception classes do this when they report a failure using their report() method.

The File Stream Locking Protocol

The operating system may be quite liberal in allowing file sharing. In Unix, for example, multiple processes may freely read from the same file, but only one process may gain write-access to the file at any time. While this will protect a file from mutually-inconsistent updates by concurrent processes, this will not prevent the same process from opening multiple streams onto the same file. The SCL offers some protection against accidental file sharing with the explicit open() and close() protocol. However, if a stronger form of protection against file sharing is desired, the FileStream locking protocol may be engaged. This is an advisory protocol, meaning that all concurrent processes must agree to use it. Once the protocol is turned on, every stream within that process can only connect to a file if it has exclusive access to it.

The locking protocol is controlled by the static method setLocking() in the class FileStream. It is turned on and off by supplying Boolean arguments to the method. For example, to turn on the locking protocol:

Once a stream has exclusive access to a file, any attempt by another stream to connect to the same file will raise a DeviceBusy exception. This applies whether the stream is in the same process or a different process (using the file locking protocol), and whether it wishes to read or write (there is only a single permission). Engaging the file locking protocol will also prevent multiple streams from connecting to standard input, or standard output. The locking policy uses a portable technique, which creates a lockfile for each opened file. The lockfile is created using atomic operating system functions, so that any stream may detect whether a lock exists on a data file, before opening it. The lockfiles are deleted when the stream with exclusive access is closed.

Using the Format Streams

The FormatStream classes are those streams which perform some kind of encoding and decoding of data. The descendants of FormatStream are Reader and Writer, which define the abstract protocols for reading and writing values of various Object-types. The concrete descendants of Reader include XMLReader, CGIReader and CSVReader. The concrete descendants of Writer are XMLWriter, CGIWriter and CSVWriter. These perform different kinds of object encoding and decoding. Some of these specialisations are still under development.

Object Serialisation in XML

Reading and writing object data requires a more sophisticated strategy than simply printing out the fields of each object. This is because an object may in general be the root of a graph of objects; and this graph may contain cycles. The strategy when writing out a general object graph is called serialisation. This writes out a serial representation of the object graph (without cycles); and, when reading, reconstructs an isomorphic graph from the serialised data. An object and all of its connected dependents may be serialised by the put() method of XMLWriter; and an isomorphic (equal) structure may be reconstructed by the get() method of XMLReader. The latter must be supplied with a null variable of a suitable type, in which the root object of the graph will be returned.

The native format for serialising object data is in XML. This was chosen over other proprietary formats, since it offered the prospect of portable exchange of object data. A particular style of XML encoding was chosen, to store the persistent attribute data that would allow instances to be quickly reconstructed. For example, a Vector storing a String, an Integer and the identical String at indices 0, 1, and 2 is written out as:

From this, it can be seen that the XML tags take the class names of each object to describe the principle XML elements. Then, XML attributes are used to describe the name of the field in which the current object is stored (usually, an attribute of the enclosing object); and, for elements which are Object-types, a unique identifier for the object; and, for elements which are Collection-types, the size of the collection. If an object is encountered more than once (as determined from the unique identifier), a short self-closing XML tag is used, which only contains the field and identifier data.

Web-based Programming with CGI

Some information will eventually appear here about the use of the classes CGIReader and CGIWriter.

Database Programming with CSV

Some information will eventually appear here about the use of the classes CSVReader and CSVWriter.




This documentation was created by Dr Anthony J H Simons, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom.


Generated on Fri May 05 16:57:34 2006 for The Simons Component Library by doxygen1.3