![]() |
The Simons
|
This document describes the design rationale used in developing the Stream class hierarchy in the Simons Component Library. Streams are a fundamental kind of component in any software library, used for basic input and output, storage and retrieval from files and encoding and decoding of object data. A fundamental design decision was taken to make basic input and output as simple and symmetrical as possible. File streams are merged with standard input and output streams. An advisory file locking protocol may be used to restrict access to shared files. Formatting issues are separated from basic data transfer issues. All kinds of data encoding, such as XML, CSV, HTML or CGI may be handled by wrapping basic streams with encoding and decoding streams. Object serialisation and restoration is handled by the same mechanisms.
From the outset, it was decided that the streams should be provided
in the same way as other classes, related in a class hierarchy according
to their similarities and differences. The design of the streams hierarchy
is inspired by Java, in that basic data transfer is separated from all
encoding and decoding in different formats; and the four most important
interfaces are provided by the abstract classes Input
,
Output
, Reader
and Writer
.
These are provided in a single classification hierarchy:
Stream is the abstract superclass of all data streams and pipes. The Stream class is the root of the streams hierarchy, which includes the FileStream classes, which connect to the underlying filesystem and perform basic read and write operations, and the FormatStream classes, which wrap around the FileStream classes and provide data encoding and decoding in different formats. Further PipeStream subclasses may act as data connections between objects that produce or consume data. A Stream must always be explicitly opened before it is used; and it may be closed after use (otherwise, it will be closed automatically when it is no longer reachable). Stream provides a protocol to test the state of the Stream, which is implemented in descendant classes.
Input is the superclass of all input file streams. The Input class
provides the ANSII standard C++ implementation of an input filesystem.
It implements the FileStream protocol for opening and closing a filesystem.
It implements the Stream protocol for detecting the state of a Stream.
Operations that connect to the filesystem may raise a NotFound exception if
the filesystem cannot be found and a NoResponse exception if the filesystem
fails, or is already in use. The parent of TextInput and ByteInput, Input
defines the overloaded, abstract get()
protocols for reading
values of all the basic types Boolean
, Character
,
Natural
, Integer
and Decimal
.
It also defines the get()
protocols for reading
Character[]
array and String
objects.
Output is the superclass of all output file streams.
The Output class provides the ANSII standard C++ implementation of an
output filesystem. It implements the FileStream protocol for opening
and closing a filesystem. It implements the Stream protocol for detecting
the state of a Stream. Operations that connect to the filesystem may
raise a NotFound exception if the filesystem cannot be found and a
NoResponse exception if the filesystem fails, or is already in use.
The parent of TextOutput and ByteOutput, the Output class
defines the abstract, overloaded put()
protocols for writing
values of all the basic types Boolean
, Character
,
Natural
, Integer
and Decimal
.
It also defines the put()
protocols for writing
Character[]
array and String
objects.
Reader is the abstract superclass of all format decoding streams.
The Reader class is an intermediate class in the streams hierarchy that
serves as the main interface for decoding Object data. The descendants
of Reader include XMLReader, CSVReader and CGIReader.
Note: some of these specialisations are still under development.
A Reader wraps
up an Input stream, from which it obtains its encoded data. It defines
the interface for restoring root and branch Objects from their encoded
forms, providing get()
to reconstruct a root Object and all
Objects reachable from it; and getField()
to reconstruct a
branch Object stored as the named field of some other Object.
Writer is the superclass of all format encoding streams.
The Writer class is an intermediate class in the streams hierarchy that
serves as the main interface for encoding Object data. The descendants
of Writer include XMLWriter, CSVWriter and CGIWriter.
Note: some of these specialisations are still under development.
A Writer wraps up
an Output stream, to which it sends its encoded data. It defines the
interface for serialising root and branch Objects in an encoded form,
providing put()
to serialise a root object and all Objects
reachable from it; and putField()
to serialise a branch Object
stored as the named field of some other Object.
Stream objects are first-class objects in the Simons Component Library. This means that they can be passed by reference and shared by more than one variable; or cloned such that two independent streams refer to the same logical filesystem. For this reason, a particular policy was devised to minimise ownership conflicts in the management of the underlying filesystem resources.
Streams can exist in either an open or closed state. When a stream is created, it is intially closed. Likewise, if a stream is cloned, the copy is initially closed, even if the original stream was open at the time. This policy is the opposite of standard C++, which opens streams upon construction and closes streams upon destruction. There were two reasons behind this design decision.
Firstly, there is a resource managment issue. Every time a connection
is made to a filesystem, the operating system must allocate resources.
A program must explicitly open()
a stream to make the connection
with the underlying filesystem. A program must then close()
a stream to release the underlying filesystem resources. However, this
action is also accomplished automatically when a stream object is released
and deleted. This ensures that all streams are exception-safe and
release their filesystem resources under both normal and abnormal
termination.
Secondly, there is a file sharing issue. After cloning, a new stream cannot automatically be used as an alias for reading or writing to the same filesystem, since the stream is closed. Attempting to do this will raise an exception. However, if the program explicitly opens the cloned copy, this is understood as a deliberate request to connect with the same filesystem. Under Unix, it is possible to open multiple handles on the same file, whether for reading or writing. File sharing can be prevented explicitly by turning on the FileStream locking protocol.
All streams may connect either to the terminal, or to a filesystem. If a
default stream is created, without supplying a file path name, using
open()
will connect the stream to the terminal (standard input,
or standard output):
OutputID out = new TextOutput;
out->open(); // connect to standard output
open()
will
attempt to connect the stream to the named filesystem:
OutputID out = new TextOutput("textdocs/myfile.txt");
out->open(); // connect to an output file
openOn()
,
which first stores the new path name and then calls open()
internally to connect to the filesystem:
OutputID out = new TextOutput;
out->openOn("textdocs/myfile.txt");
// connect to an output file
close()
, after which
no data can be passed; attempting to do so will raise an exception.
If the stream is reopened with open()
, it will connect to the
filesystem named by the existing stored path name. A stream can be
redirected to a different source or sink by reopening it using
openOn()
with the new path name. If a null
path name is supplied, the stream will reconnect to the terminal.
Streams connect to external physical devices, so there is the possibility of device failure during use; or failure to connect if the path to the device was incorrectly specified; or refusal to connect if the device was already in use. Also, reading and writing may fail if an attempt is made to read or write data values of an unexpected type or format. Because of this, stream operations may raise a number of exceptions. Programs can choose to catch these exceptions and recover. Certain failures require closing and reopening the stream.
ReadFailure
is raised if an input operation fails;WriteFailure
is raised if an output operation fails;NotFound
is raised if the connection to the filesystem
cannot be made;NoResponse
is raised if an existing filesystem connection
breaks durng usage;DeviceBusy
is raised if an attempt is made to open a
connection to a filesystem that is already open;NoElements
is raised if an attempt is made to read past
the end of input.cause()
method.
Programs should not rely on exceptions during normal use. For example,
rather than wait for a NoElements
exception, a program should
test the input stream using empty()
to detect whether the end
of an input stream has been reached. See the
Stream class protocols for methods that
inspect the stream state.
The FileStream classes are those streams which connect directly to the
underlying filesystem. The descendants of FileStream are Input and
Output, which define the abstract protocols for reading and writing
values of all the basic types Boolean
, Character
,
Natural
, Integer
and Decimal
; and
for reading and writing Character[]
array and String
objects. The concrete descendants of Input are TextInput, which reads data
in plain text format, and ByteInput, which reads data in binary format.
Likewise, the concrete descendants of Output are TextOutput, which writes
data in plain text format, and ByteOutput, which writes data in binary
format.
From the outset, it was decided that there was no compelling reason to distinguish the standard input and output streams, which connect with the keyboard and monitor, from the input and output file streams that connect to the underlying filesystem. No such distinction is maintained by the underlying operating system kernel, which also allows easy redirection of standard input and output from the terminal to the filesystem. In the Simons Component Library, every FileStream can choose to connect either to a standard stream, or to a filesystem. See the opening and closing protocols.
This contrasts with standard C++, in which the file streams are are
kept conceptually distinct from the standard streams, and are only partly
compatible with them. Sometimes this is annoying: while the uni-directional
input and output file streams ifstream
and ofstream
are respectively compatible with the standard streams istream
and
ostream
, the reverse is not the case. Neither is it possible
to relate the bi-directional file stream fstream
with the
cognate standard stream iostream
, because of the topology of
the class hierarchy.
From the outset, it was decided to make basic input and output as simple
as possible (much like C++) and for the stream construction and IO
programming idioms to be as symmetrical as possible (unlike Java).
All Input streams support the get()
protocol to read basic values; and all Output streams support the
put()
protocol to write basic values. In both cases, the
read or written value is the argument and the result of the method is the
stream, which allows operations to be chained together:
OutputID out = new TextOutput("data.txt");
out->open();
out->put("total")->put('=')->put(42)->line();
... // wrote Character[], Character, Integer
out->close();
StringID label; // null variable
Integer value;
InputID in = new TextInput("data.txt");
in->open();
in->get(label, '=')->skip('=')->get(value)->skip();
... // read String, skipped Character,
read Integer
in->close();
put()
may be literal values, or variables
containing values. The arguments to get()
must be variables,
which will have values placed in them if the reading operation succeeds.
The example shows how you can control how much text should go into a
String variable, up to a specified terminator character '=', which is then
explicitly skipped. This is more general than C++ which only reads Strings
up to the next space. All reading methods that require terminators will
default to the newline character. Apart from those methods that read
strings, input methods skip leading white space.
The open()
and openOn()
methods always reopen a
file from the beginning. For an Input stream, this means that data will be
read from the beginning of the file. For an Output stream, this means that
every time a file is opened, it is overwritten and the old contents are lost.
Output streams also support openLog()
and openLogOn()
,
which open a file at the end, ready for appending data to an existing file,
which is assumed to be a log file:
OutputID log = new TextOutput;
log->openLogOn("textdocs/mylog.txt");
... // append data to the log
openLog()
is invoked on a default Output stream,
this will connect to the standard error stream, rather than the standard
output stream. All Exception classes do this when they report a failure
using their report()
method.
The operating system may be quite liberal in allowing file sharing. In
Unix, for example, multiple processes may freely read from the same file, but
only one process may gain write-access to the file at any time. While this
will protect a file from mutually-inconsistent updates by concurrent
processes, this will not prevent the same process from opening
multiple streams onto the same file. The SCL offers some protection against
accidental file sharing with the explicit open()
and
close()
protocol.
However, if a stronger form of protection against file sharing is desired,
the FileStream locking protocol may be engaged. This is an advisory
protocol, meaning that all concurrent processes must agree to use it. Once
the protocol is turned on, every stream within that process can only connect
to a file if it has exclusive access to it.
The locking protocol is controlled by the static method
setLocking()
in the class FileStream. It is turned on and off
by supplying Boolean arguments to the method. For example, to turn on
the locking protocol:
FileStream::setLocking(true);
...
OutputID out = new TextOutput("textdocs/myfile.txt");
...
// out has exclusive access to "textdocs/myfile.txt"
DeviceBusy
exception.
This applies whether the stream is in the same process or a different process
(using the file locking protocol),
and whether it wishes to read or write (there is only a single permission).
Engaging the file locking protocol will also prevent multiple streams
from connecting to standard input, or standard output.
The locking policy uses a portable technique, which creates a lockfile
for each opened file. The lockfile is created using atomic operating system
functions, so that any stream may detect whether a lock exists on a data
file, before opening it. The lockfiles are deleted when the stream with
exclusive access is closed.
The FormatStream classes are those streams which perform some kind of encoding and decoding of data. The descendants of FormatStream are Reader and Writer, which define the abstract protocols for reading and writing values of various Object-types. The concrete descendants of Reader include XMLReader, CGIReader and CSVReader. The concrete descendants of Writer are XMLWriter, CGIWriter and CSVWriter. These perform different kinds of object encoding and decoding. Some of these specialisations are still under development.
Reading and writing object data requires a more sophisticated strategy
than simply printing out the fields of each object. This is because an
object may in general be the root of a graph of objects; and this graph
may contain cycles. The strategy when writing out a general object graph
is called serialisation. This writes out a serial representation
of the object graph (without cycles); and, when reading, reconstructs an
isomorphic graph from the serialised data. An object and all of its
connected dependents may be serialised by the put()
method
of XMLWriter; and an isomorphic (equal) structure may be reconstructed by
the get()
method of XMLReader. The latter must be supplied
with a null
variable of a suitable type, in which the root
object of the graph will be returned.
VectorID vec1 = new Vector;
... // fill the Vector
WriterID out = new XMLWriter("Vector.xml");
out->open();
out->put(vec1); // serialise Vector
out->close();
VectorID vec2; // null variable
ReaderID in = new XMLReader("Vector.xml");
in->open();
in->get(vec2); // reconstruct Vector
in->close();
The native format for serialising object data is in XML. This was chosen over other proprietary formats, since it offered the prospect of portable exchange of object data. A particular style of XML encoding was chosen, to store the persistent attribute data that would allow instances to be quickly reconstructed. For example, a Vector storing a String, an Integer and the identical String at indices 0, 1, and 2 is written out as:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Vector field="root" id="523368" size="3">
<String field="0" id="523400"
size="5">total</String>
<Integer field="1">42</Integer>
<String field="2" id="523400"/>
</Vector>
Some information will eventually appear here about the use of the classes CGIReader and CGIWriter.
Some information will eventually appear here about the use of the classes CSVReader and CSVWriter.
This documentation was created by Dr Anthony J H Simons, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom.