This package contains the JAST 1.1 Custom AST Toolkit, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).
org.jast.ast
contains tools
for mapping XML files to user-defined Java syntax trees, and vice-versa.org.jast.xml
contains tools for mapping XML
files to JAST's standard XML memory model, and vice-versa.org.jast.xpath
contains an XPath search
engine for use with the standard XML model.org.jast.dtd
contains a document validation
engine for use with the standard XML model.org.jast.filter
contains filters for
searching and validating the standard XML model.This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:
The following assumes that you, the developer, wish to build a third-party
application, which incorporates the JAST 1.1 custom AST
processing tools. The components for mapping between XML files and arbitrary
user-defined Java abstract syntax trees are to be found in this package
org.jast.ast
. There is support both for unmarshalling XML files
into abstract syntax trees, and for marshalling abstract syntax trees back to
XML files. What you do with your Java memory-model is up to you, and depends
on the methods of the various AST classes that constitute the nodes of your
abstract syntax tree.
The first thing you will need to do is decide what kind of data you wish to model. Having done this, you will develop an XML markup scheme, using a mixture of XML elements and attributes to describe and encode the data. For example, a catalogue that stores information about films and TV shows might look like this:
<?xml version="1.0" encoding="UTF-8"?> <Catalogue> <Film year="1976" rating="PG"> <Title>Star Wars</Title> <Director>George Lucas</Director> </Film> <TVShow year="1965"> <Title>Thunderbirds</Title> <Director>Gerry Anderson</Director> </TVShow> <Film year="2007" rating="15"> <Title>Transformers</Title> <Director>Michael Bay</Director> </Film> </Catalogue>So, the main XML nodes are called
Catalogue
, Film
,
TVShow
, Title
and Director
; and the
attributes year
and rating
are used in some nodes.
Nodes like Title
and Director
are also known as
leaf-nodes, because they are terminal nodes containing no further
descendants, but only textual data (and possibly attributes). Other nodes
are known as branch-nodes and contain descendants; in particular,
one branch-node Catalogue
is the root-node for the
whole tree.
Once you have a stable XML model, you can consider developing the Java AST
model. The basic notion is that, for each differently-named XML element, you
will provide a Java class with the same name that stores the information held
by this element. So, an XML element named Film
, will be mapped
to a Java class of the same name. Since XML permits more liberal identifiers
than Java, some XML names must be normalised, for example, the XML name
cat:TV-show
will be mapped to a Java class TVShow
.
The normalising algorithm removes namespace prefixes and all internal
punctuation, capitalising the letter following each removed punctuation mark,
on the assumption that this occurred at a word boundary. The resulting
concatenated Java type identifier is in capital case.
The mapping between Java and XML is automatic, controlled by reflection
and certain coding conventions. The AST node classes are written in a
style similar to that used with Java Beans, where abstract properties are
given standard get
- and set
-methods, by which
the properties may be manipulated through reflection. Some aspects of
the mapping between Java and XML may be customised by the end-user.
Initially, the AST node designer must determine what information each node should store. Each AST class may choose to store in its fields information that corresponds to XML attributes, or dependent XML nodes, or even plain textual data. The fields of the AST class must be declared in the order in which they expect to be serialised in the XML, for example:
public class Film { // fields storing attribute properties private int year; private String rating; // fields storing dependent elements private Title title; private Director director; ... }indicates that the
Film
class will map two fields
year
and rating
to XML attributes, and two fields
title
and director
to dependent XML elements,
which will be serialised in the given order of declaration. At this stage,
almost any field name can be used (see below); and the stored value may be
of a simple type (typically, for property fields) or a class type
(typically, for dependent fields), or a collection type (a List, Set or
Map) for storing collections of dependent elements.
One field name, content
, is reserved to store any terminal
information content. This is often a text string, but could be any other
simple, or class type, as desired. In our example, leaf-node classes like
Director
that contain terminal textual information will
indicate this by supplying a content
field:
public class Director { // field storing textual information private String content; ... }It would also be possible for
Director
to inherit from a
class that declares the content
field. JAST is able to treat
each declared field as either a property, a dependent node, or terminal
content, based on the public interface supplied to access the field. It is
also possible for AST node classes to declare other fields, which are
accessed internally by the class, and will not be serialised as XML.
Once you have decided how to map elements, attributes and textual data,
you can create the standard construction API for the class. This is used
by the JAST tools to unmarshal an object from the XML input file.
Each class must provide a default constructor; and standard add
-methods for adding dependent nodes; and standard set
-methods for setting attribute properties or textual information.
For example, the Film
class might provide the following API:
public class Film { private int year; private String rating; Title title; private Director director; // default constructor public Film() {} // methods to add dependents public Film addTitle(Title title) { this.title = title; } public Film addDirector(Director director) { this.director = director; return this; } // methods to set properties public Film setYear(int year) { this.year = year; return this; } public Film setRating(String rating) { this.rating = rating; return this; } }Notice how all methods have conventional names that are derived from the types of node that you are adding (dependent fields) or the names of the properties you are setting (property fields). These conventional names are discovered automatically in your AST node classes by JAST, through reflection. All add-methods accept an argument of the node-type. All set-methods accept an argument of the field-type. Their result-type is arbitrary, and all these methods could simply return
void
. However, in this example, all methods in
the construction API have been designed to return this
, the
current object, since this will allow AST nodes to support sequences of
nested calls, when building a syntax tree.
Similarly, leaf-node classes must supply a default constructor, and a
set-method to set the the content field. This field may be of any
primitive or String
type, according to the information
stored; and the set-method must accept an argument of the same type. For
example, in the Director
class:
public class Director { private String content; // default constructor public Director() {} // method to store textual information public Director setContent(String name) { this.name = name; return this; } }the method
setContent(String)
is determined by reflection, based
on the discovery of a field String content
. The types of all
set-methods are determined in this way.
Next, you should provide access methods for your AST classes. These are
also used by the JAST tools to marshal every object back to XML.
When marshalling an AST object, the JAST tools expect every serialised field
to be provided with a suitable get-method, that returns a value of the same
type as the field. Note that methods should return strongly-typed results.
For example, the Film
class should provide the following API:
public class Film { private int year; private String rating; private Title title; private Director director; ... // everything else as above // methods to access dependent nodes public Director getDirector() { return director; } public Title getTitle() { return title; } // methods to access properties public int getYear() { return year; } public String getRating() { return rating; } }Notice how
getDirector()
returns an instance of Director
as you would expect, in strongly-typed Java. The other methods may
return primitive types or object types, according to how they were stored as
fields in the Film
object. The same applies when accessing the
content-field in an AST class. For example, the Director
class should provide the following:
public class Director { private String content; ... // everything else as above // method to access textual information public String getContent() { return content; } }to return the information stored in
content
. The idea is that,
whenever you access your data, you get back the obvious strongly-typed Java
objects that you can use directly in your Java programs. In some AST nodes,
the content
could be stored as an integer, or some other type.
Apart from this, there are very few restrictions. A property-, or
content-field should store a single value, which may be converted
to a String
. A dependent-field may either store a single
dependent object, or a list of dependents (see below), which will then be
marshalled recursively as XML elements.
Sometimes, different AST classes may end up looking quite similar, and
it would be a chore to have to repeat similar coding for several classes.
For example, the Film
and TVShow
classes overlap
considerably, in terms of their dependent- and property-fields, and their
associated get- and set-methods. Fortunately, you may arrange your AST
classes in a hierarchy, according to their similarities. The following
Show
class is intended as the abstract superclass of
both Film
and TVShow
:
public abstract class Show { private int year; private String rating; private Title title; private Director director; // default constructor public Show() { ... } // methods to add dependents public Show addTitle(Title title) { ... } public Show addDirector(Director director) { ... } // methods to access dependents public Title getTitle() { ... } public Director getDirector() { ... } // methods to set properties public Show setYear(int year) { ... } public Show setRating(String rating) { ... } // methods to access properties public int getYear() { ... } public String getRating() { ... } }All the common fields and methods needed are defined in one place. Now, it is very easy to define the AST classes for
Film
and
TVShow
as subclasses of Show
, using Java inheritance,
and inherit all fields and construction methods from the superclass:
public class Film extends Show { public Film() {} // only needs a default constructor } public class TVShow extends Show { public TVShow() {} // only needs a default constructor }
Furthermore, the JAST tools make it easy to manipulate polymorphic lists
of AST nodes having heterogeneous types. For example, let us assume that, in
the root node Catalogue
, we do not care to distinguish between
the action of adding a Film
and that of adding a TVShow
. Instead, we are only interested in adding polymorphic Show
objects. Accordingly, we can design the construction API for
Catalogue
in the following way:
public class Catalogue { // field to store heterogeneous dependents private List<Show> shows; // default constructor creates the list field public Catalogue() { shows = new ArrayList<Show>(); } // methods required to add/access dependents public Catalogue addShow(Show show) { shows.add(show); return this; } public List<Show> getShows() { return shows; } // other user-methods added for convenience public Show getShow(int i) throws IndexOutOfBoundsException { return shows.get(i); } public List<Film> getFilms() { ... // return the subset of Shows that are Films } ... }Two things have happened here. Firstly, rather than providing
Catalogue
with separate add-methods addFilm(Film)
and
addTVShow(TVShow)
, we have decided that a Catalogue
need not distinguish the two, and have simply provided addShow(Show)
that accepts a polymorphic Show
argument. The JAST
reflection tools will automatically discover this more general
method, if you don't supply the more specific methods.
Secondly, the get-method getShows()
will now be used to
access the heterogeneous list of films and TV shows. This method will be
detected automatically, by reflecting the name of the field.
Notice how, in contrast to earlier examples, this dependent-field's
get-method returns a list of objects. These will be marshalled in the same
order that they were added to the list, as XML elements of mixed kinds.
Although storing dependent nodes in a List is the most common case, it
is also possible to store them in a Set or a Map. In the case of a Map,
the dependent node should be stored as a value in the Map, indexed against
some key attribute accessed from the stored node. The JAST tools will seek
to discover a suitable add-
method for the type of node
stored in any collection-typed field, and from this will determine that
the collection can be serialised as XML. Note that if unordered Set or
Map implementations are chosen, the order of saved nodes may not be
stable.
You may design arbitrary Java classes to represent the nodes of the
abstract syntax tree, so long as they provide the construction and access
API described above. It is normally expected that these nodes will be used
to model only the XML elements, and that the XML attributes will be modelled
using other simple Java types, such as String
, int
or double
. However, you may also design an AST class that can
be used as the type of a basic property. Assume that we now wish for the
Title
class to be used as a property of Film
,
rather than as a dependent:
public class Title { private String text; // constructor building this from a String public Title(String text) { this.text = text; } // method converting this back to a String public String toString() { return text; } }The class must provide a constructor that accepts a
String
argument, and also must provide a toString
method that converts
an object of this type back to a String
. The JAST reflection
tools assume that if a property-type is not one of the basic types, it must
supply a String
constructor. Likewise, all property values must
be convertible to a String
. Internally, the Title
class may store the data in any way it wishes.
Naturally, if you wish to represent a Title
as a property,
then the owning class must provide a suitable set-method (rather than the
add-method for a dependent):
public class Film extends Show { ... public Film setTitle(Title title) { ... } }This
Film
class now expects to find the XML information about a
title in an attribute called title
, rather than in a dependent
element, but will use the Title
class to model the property.
The main class to use for unmarshalling an XML file is ASTReader
. This can be used to read the XML File using either the default, or
a specified, character set. The result returned is always an instance of
your root class, here an instance of Catalogue
, but the reader
only knows that it has the base type Object
, the ancestor of all
your AST nodes. You may downcast the result to your chosen type. The
default reader is used like this:
File xmlFile = new File("my/xml/input.xml"); // Or whatever file ASTReader reader = new ASTReader(xmlFile); Catalogue root = (Catalogue) reader.readDocument(); reader.close();This reads the file using the UTF-8 character set and discards all XML metadata and any extra formatting whitespace surrounding nodes. An alternative character set may be specified:
File xmlFile = new File("my/xml/input.xml"); ASTReader reader = new ASTReader(xmlFile, "ISO-8859-1"); // Latin-1 Catalogue root = (Catalogue) reader.readDocument(); reader.close();in which case the reader verifies that the XML file is also encoded in the named character set, before proceeding.
Further constructors of ASTReader
support unmarshalling XML
from an endpoint specified by a public URL
, or unmarshalling
from data supplied via a basic InputStream
. The following gives
an example of the former:
URL url = new URL("http://www.my.site/input.xml"); // Any URL ASTReader reader = new ASTReader(url, "UTF-8"); Catalogue root = (Catalogue) reader.readDocument(); reader.close();This constructor always requires both the URL and the character encoding, since it is unsafe to assume UTF-8; the HTTP1.1 standard specifies using ISO-8859-1 by default, if no character encoding is known. Similarly, the character encoding should be supplied when unmarshalling from a basic
InputStream
.
All of the above assume that your Java AST node classes are defined
in the default top-level package (the unnamed package, the working directory).
If you place your own AST classes in a named package, you will need to tell
the ASTReader
where to find them:
File xmlFile = new File("my/xml/input.xml"); ASTReader reader = new ASTReader(xmlFile); reader.usePackage("org.my.catalogue"); // Where your nodes are Catalogue root (Catalogue) reader.readDocument(); reader.close();The reader will now understand that your classes have fully qualified names like:
org.my.catalogue.Film
. In all of these examples, we have
assumed that the returned root node is of the type Catalogue
,
but you should replace this by whatever root node type you have in your
AST.
The main class to use for marshalling an AST in memory is ASTWriter
. This can be used to write the whole abstract syntax tree to an XML
File using either the default, or a specified, character set. The mapping
from Java to XML identifiers is handled as follows: if the same types were
previously unmarshalled in the current session, the original XML identifiers
will be recovered and used during marshalling, otherwise the simple class
names will be used as XML identifiers. The default writer:
File xmlFile = new File("my/xml/output.xml"); // Or whatever file ASTWriter writer = new ASTWriter(xmlFile); writer.writeDocument(root); // Created previously writer.close();writes the file using XML version 1.0 and the UTF-8 character set; and it pretty-prints the XML file according to a standard layout. An alternative character set may be specified:
File xmlFile = new File("my/xml/output.xml"); ASTWriter writer = new ASTWriter(xmlFile, "ISO-8859-1"); // Latin-1 writer.writeDocument(root); // Created previously writer.close();in which case the writer uses this character set and ensures that the XML document declares the same character set encoding.
Further constructors of ASTWriter
support marshalling
to a PrintWriter
, or to a basic OutputStream
.
The former is useful when the marshalled XML is to be written by a Java
Servlet, using a HTTPServletResponse response
object,
as defined in the Java package javax.servlet.http
,
to provide the output stream and character encoding:
response.setCharacterEncoding("UTF-8"); // UTF-8 ASTWriter writer = new ASTWriter(response.getWriter(), response.getCharacterEncoding()); writer.writeDocument(root); // Created previously writer.close();in which case the same character encoding will be declared in the XML document as that used by the output stream. The
response
object's character encoding may be reset to any character set, but this must
be done before accessing the response
's writer stream. General
marshalling to a basic OutputStream
is also supported.
While the above facilities will be adequate for all examples where the
XML elements have the same names as the Java classes, if you wish to have
variable mapping from elements to different Java packages, or mappings
from Java packages to different XML namespaces, then you will need to
use Metadata
, a class storing all of these mappings. It
is possible to access the metadata saved during unmarshalling:
File xmlFile = new File("my/xml/input.xml"); ASTReader reader = new ASTReader(xmlFile); reader.usePackage("org.my.catalogue"); Catalogue root (Catalogue) reader.readDocument(); Metadata metadata = reader.getMetadata(); // Save this for later reader.close();The
Metadata
object stores all the mappings from Java class
names to XML element names, and all the XML metadata that does not form part
of the AST model. To reuse the same mappings (and XML file properties) when
writing an AST model back to XML, you need to tell the writer to use the
same metadata:
File xmlFile = new File("my/xml/output.xml"); ASTWriter writer = new ASTWriter(xmlFile); writer.setMetadata(metadata); // Use the same metadata writer.writeDocument(root); writer.close();This will ensure that, if an XML element named
TV-show
is mapped
to a Java class org.my.catalogue.TVShow
, then this class will be
mapped back to TV-show
when it is written. It will also ensure
that the output XML file will have the same XML version and encoding, etc.,
as used in the input file. If the metadata is not transferred, then the
output XML file will use default settings and the element will be serialised
as TVShow
, the simple name of the Java class.
It is possible to ask JAST to use several different Java packages to
unmarshal XML elements coming from different XML namespaces. This is done
using the usePackage()
method:
File xmlFile = new File("my/xml/input.xml"); ASTReader reader = new ASTReader(xmlFile); reader.usePackage("org.my.home"); reader.usePackage("org.my.catalogue", "xmlns:cat"); Catalogue root (Catalogue) reader.readDocument(); reader.close();This tells the reader to use classes in the package
org.my.home
for XML elements in the default namespace, and to use classes in the package
org.my.catalogue
for XML elements in the xmlns:cat
namespace. Any XML element having the prefix cat, such as
cat:catalogue
, or cat:film
would then be mapped
to classes from this package. You may also specify these mappings in an
ASTWriter
, to ensure that the classes from different packages
are marshalled back to XML elements in the desired namespaces.
Apart from this, it is possible to access Metadata
directly,
using its own API. This allows you to set XML file properties, specify XML
namespaces and their base URIs, packages and their mapped XML namespaces,
and Java class identifiers and their mapped XML identifiers. If no URI
is specified for a namespace, the Java package ID is used instead. This
level of detail is only required if you generate an AST in memory, and want
the XML output to correspond to some Java to XML mapping that was not
previously read.
Notification of Exceptions
ASTReader
and ASTWriter
may raise various kinds
of IOException
, if a problem occurs with the underlying file
system. Ill-formed XML syntax is reported through ASTError
,
whereas the inability to construct or manipulate an AST node class is
reported through NodeError
. This covers a variety of errors,
including missing constructors, missing methods, or failing methods.
In summary, faulty user code may raise the following:
FileNotFoundException
- raised if the specified file
cannot be found (wrong pathname given)
UnsupportedEncodingException
- raised if the character
set encodings are inconsistent
IOException
- raised if a fault in the filesystem occurs
while reading an XML input file
ASTError
- raised if a syntax error is detected while
parsing an XML input file
NodeError
- raised if any required construction or
access method is not found, or fails
The latter are styled as errors, rather than exceptions, since the W3C
standard requires malformed XML to be rejected outright, and not handled
by exception-tolerant software.