This package contains the JAST 1.1 Custom AST Toolkit, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).

Java Abstract Syntax Trees, v1.1

If you are seeking to use any of the above software, please refer to the brief instructions immediately below and also the documentation on the the JAST website for more details:

Licensing Terms

This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:

The Custom AST Toolkit

The following assumes that you, the developer, wish to build a third-party application, which incorporates the JAST 1.1 custom AST processing tools. The components for mapping between XML files and arbitrary user-defined Java abstract syntax trees are to be found in this package org.jast.ast. There is support both for unmarshalling XML files into abstract syntax trees, and for marshalling abstract syntax trees back to XML files. What you do with your Java memory-model is up to you, and depends on the methods of the various AST classes that constitute the nodes of your abstract syntax tree.

Designing an XML Data Model

The first thing you will need to do is decide what kind of data you wish to model. Having done this, you will develop an XML markup scheme, using a mixture of XML elements and attributes to describe and encode the data. For example, a catalogue that stores information about films and TV shows might look like this:

	<?xml version="1.0" encoding="UTF-8"?>
	<Catalogue>
	  <Film year="1976" rating="PG">
	    <Title>Star Wars</Title>
	    <Director>George Lucas</Director>
	  </Film>
	  <TVShow year="1965">
	    <Title>Thunderbirds</Title>
	    <Director>Gerry Anderson</Director>
	  </TVShow>
	  <Film year="2007" rating="15">
	    <Title>Transformers</Title>
	    <Director>Michael Bay</Director>
	  </Film>
	</Catalogue>
So, the main XML nodes are called Catalogue, Film, TVShow, Title and Director; and the attributes year and rating are used in some nodes. Nodes like Title and Director are also known as leaf-nodes, because they are terminal nodes containing no further descendants, but only textual data (and possibly attributes). Other nodes are known as branch-nodes and contain descendants; in particular, one branch-node Catalogue is the root-node for the whole tree.

Designing Abstract Syntax Tree Nodes

Once you have a stable XML model, you can consider developing the Java AST model. The basic notion is that, for each differently-named XML element, you will provide a Java class with the same name that stores the information held by this element. So, an XML element named Film, will be mapped to a Java class of the same name. Since XML permits more liberal identifiers than Java, some XML names must be normalised, for example, the XML name cat:TV-show will be mapped to a Java class TVShow. The normalising algorithm removes namespace prefixes and all internal punctuation, capitalising the letter following each removed punctuation mark, on the assumption that this occurred at a word boundary. The resulting concatenated Java type identifier is in capital case.

The mapping between Java and XML is automatic, controlled by reflection and certain coding conventions. The AST node classes are written in a style similar to that used with Java Beans, where abstract properties are given standard get- and set-methods, by which the properties may be manipulated through reflection. Some aspects of the mapping between Java and XML may be customised by the end-user.

Initially, the AST node designer must determine what information each node should store. Each AST class may choose to store in its fields information that corresponds to XML attributes, or dependent XML nodes, or even plain textual data. The fields of the AST class must be declared in the order in which they expect to be serialised in the XML, for example:

public class Film {
	// fields storing attribute properties
    private int year;
    private String rating;
	// fields storing dependent elements
    private Title title;
    private Director director;
    ...
 }
indicates that the Film class will map two fields year and rating to XML attributes, and two fields title and director to dependent XML elements, which will be serialised in the given order of declaration. At this stage, almost any field name can be used (see below); and the stored value may be of a simple type (typically, for property fields) or a class type (typically, for dependent fields), or a collection type (a List, Set or Map) for storing collections of dependent elements.

One field name, content, is reserved to store any terminal information content. This is often a text string, but could be any other simple, or class type, as desired. In our example, leaf-node classes like Director that contain terminal textual information will indicate this by supplying a content field:

public class Director {
	// field storing textual information
    private String content;
    ...
 }
It would also be possible for Director to inherit from a class that declares the content field. JAST is able to treat each declared field as either a property, a dependent node, or terminal content, based on the public interface supplied to access the field. It is also possible for AST node classes to declare other fields, which are accessed internally by the class, and will not be serialised as XML.

Building Abstract Syntax Tree Nodes

Once you have decided how to map elements, attributes and textual data, you can create the standard construction API for the class. This is used by the JAST tools to unmarshal an object from the XML input file. Each class must provide a default constructor; and standard add -methods for adding dependent nodes; and standard set -methods for setting attribute properties or textual information. For example, the Film class might provide the following API:

public class Film {
    private int year;
    private String rating;
    Title title;
    private Director director;
	// default constructor
    public Film() {}
	// methods to add dependents
    public Film addTitle(Title title) {
        this.title = title;
    }
    public Film addDirector(Director director) {
        this.director = director;
        return this;
    }
	// methods to set properties
    public Film setYear(int year) {
        this.year = year;
        return this;
    }
    public Film setRating(String rating) {
        this.rating = rating;
        return this;
    }
 }
Notice how all methods have conventional names that are derived from the types of node that you are adding (dependent fields) or the names of the properties you are setting (property fields). These conventional names are discovered automatically in your AST node classes by JAST, through reflection. All add-methods accept an argument of the node-type. All set-methods accept an argument of the field-type. Their result-type is arbitrary, and all these methods could simply return void. However, in this example, all methods in the construction API have been designed to return this, the current object, since this will allow AST nodes to support sequences of nested calls, when building a syntax tree.

Similarly, leaf-node classes must supply a default constructor, and a set-method to set the the content field. This field may be of any primitive or String type, according to the information stored; and the set-method must accept an argument of the same type. For example, in the Director class:

public class Director {
    private String content;
	// default constructor
    public Director() {}
	// method to store textual information
    public Director setContent(String name) {
        this.name = name;
        return this;
    }
 }
the method setContent(String) is determined by reflection, based on the discovery of a field String content. The types of all set-methods are determined in this way.

Accessing Abstract Syntax Tree Nodes

Next, you should provide access methods for your AST classes. These are also used by the JAST tools to marshal every object back to XML. When marshalling an AST object, the JAST tools expect every serialised field to be provided with a suitable get-method, that returns a value of the same type as the field. Note that methods should return strongly-typed results. For example, the Film class should provide the following API:

public class Film {
    private int year;
    private String rating;
    private Title title;
    private Director director;
    ...  // everything else as above
    
	// methods to access dependent nodes
    public Director getDirector() {
        return director;
    }
    public Title getTitle() {
        return title;
    }
	// methods to access properties
    public int getYear() {
        return year;
    }
    public String getRating() {
        return rating;
    }
 }
Notice how getDirector() returns an instance of Director as you would expect, in strongly-typed Java. The other methods may return primitive types or object types, according to how they were stored as fields in the Film object. The same applies when accessing the content-field in an AST class. For example, the Director class should provide the following:
public class Director {
    private String content;
    ...  // everything else as above

	// method to access textual information
    public String getContent() {
        return content;
    }
 }
to return the information stored in content. The idea is that, whenever you access your data, you get back the obvious strongly-typed Java objects that you can use directly in your Java programs. In some AST nodes, the content could be stored as an integer, or some other type.

Apart from this, there are very few restrictions. A property-, or content-field should store a single value, which may be converted to a String. A dependent-field may either store a single dependent object, or a list of dependents (see below), which will then be marshalled recursively as XML elements.

Factoring Common Behaviour in AST Nodes

Sometimes, different AST classes may end up looking quite similar, and it would be a chore to have to repeat similar coding for several classes. For example, the Film and TVShow classes overlap considerably, in terms of their dependent- and property-fields, and their associated get- and set-methods. Fortunately, you may arrange your AST classes in a hierarchy, according to their similarities. The following Show class is intended as the abstract superclass of both Film and TVShow:

public abstract class Show {
    private int year;
    private String rating;
    private Title title;
    private Director director;
	// default constructor
    public Show() { ... }
	// methods to add dependents
    public Show addTitle(Title title) { ... }
    public Show addDirector(Director director) { ... }
	// methods to access dependents
    public Title getTitle() { ... }
    public Director getDirector() { ... }
	// methods to set properties
    public Show setYear(int year) { ... }
    public Show setRating(String rating) { ... }
	// methods to access properties
    public int getYear() { ... }
    public String getRating() { ... }
}
All the common fields and methods needed are defined in one place. Now, it is very easy to define the AST classes for Film and TVShow as subclasses of Show, using Java inheritance, and inherit all fields and construction methods from the superclass:
public class Film extends Show {
    public Film() {}      // only needs a default constructor
}

public class TVShow extends Show {
    public TVShow() {}    // only needs a default constructor
}

Handling Polymorphic Lists of AST Nodes

Furthermore, the JAST tools make it easy to manipulate polymorphic lists of AST nodes having heterogeneous types. For example, let us assume that, in the root node Catalogue, we do not care to distinguish between the action of adding a Film and that of adding a TVShow . Instead, we are only interested in adding polymorphic Show objects. Accordingly, we can design the construction API for Catalogue in the following way:

public class Catalogue {
	// field to store heterogeneous dependents
    private List<Show> shows;
	// default constructor creates the list field
    public Catalogue() {
        shows = new ArrayList<Show>();
    } 
	// methods required to add/access dependents
    public Catalogue addShow(Show show) {
        shows.add(show);
        return this;
    }
    public List<Show> getShows() {
        return shows;
    }
	// other user-methods added for convenience
    public Show getShow(int i) throws IndexOutOfBoundsException {
        return shows.get(i);
    }
    public List<Film> getFilms() {
       ...  // return the subset of Shows that are Films
    }
    ...
}
Two things have happened here. Firstly, rather than providing Catalogue with separate add-methods addFilm(Film) and addTVShow(TVShow), we have decided that a Catalogue need not distinguish the two, and have simply provided addShow(Show) that accepts a polymorphic Show argument. The JAST reflection tools will automatically discover this more general method, if you don't supply the more specific methods. Secondly, the get-method getShows() will now be used to access the heterogeneous list of films and TV shows. This method will be detected automatically, by reflecting the name of the field. Notice how, in contrast to earlier examples, this dependent-field's get-method returns a list of objects. These will be marshalled in the same order that they were added to the list, as XML elements of mixed kinds.

Although storing dependent nodes in a List is the most common case, it is also possible to store them in a Set or a Map. In the case of a Map, the dependent node should be stored as a value in the Map, indexed against some key attribute accessed from the stored node. The JAST tools will seek to discover a suitable add- method for the type of node stored in any collection-typed field, and from this will determine that the collection can be serialised as XML. Note that if unordered Set or Map implementations are chosen, the order of saved nodes may not be stable.

Distinguishing Dependent and Property Nodes

You may design arbitrary Java classes to represent the nodes of the abstract syntax tree, so long as they provide the construction and access API described above. It is normally expected that these nodes will be used to model only the XML elements, and that the XML attributes will be modelled using other simple Java types, such as String, int or double. However, you may also design an AST class that can be used as the type of a basic property. Assume that we now wish for the Title class to be used as a property of Film, rather than as a dependent:

public class Title {
    private String text;
	// constructor building this from a String
    public Title(String text) {
        this.text = text;
    }
	// method converting this back to a String
    public String toString() {
        return text;
    }
}
The class must provide a constructor that accepts a String argument, and also must provide a toString method that converts an object of this type back to a String. The JAST reflection tools assume that if a property-type is not one of the basic types, it must supply a String constructor. Likewise, all property values must be convertible to a String. Internally, the Title class may store the data in any way it wishes. Naturally, if you wish to represent a Title as a property, then the owning class must provide a suitable set-method (rather than the add-method for a dependent):
public class Film extends Show {
    ...
    public Film setTitle(Title title) { ... }
}
This Film class now expects to find the XML information about a title in an attribute called title, rather than in a dependent element, but will use the Title class to model the property.

Unmarshalling from an XML File to a Java AST

The main class to use for unmarshalling an XML file is ASTReader . This can be used to read the XML File using either the default, or a specified, character set. The result returned is always an instance of your root class, here an instance of Catalogue, but the reader only knows that it has the base type Object, the ancestor of all your AST nodes. You may downcast the result to your chosen type. The default reader is used like this:

    File xmlFile = new File("my/xml/input.xml");  // Or whatever file
    ASTReader reader = new ASTReader(xmlFile);
    Catalogue root = (Catalogue) reader.readDocument();
    reader.close();
This reads the file using the UTF-8 character set and discards all XML metadata and any extra formatting whitespace surrounding nodes. An alternative character set may be specified:
    File xmlFile = new File("my/xml/input.xml");  
    ASTReader reader = new ASTReader(xmlFile, "ISO-8859-1");  // Latin-1
    Catalogue root = (Catalogue) reader.readDocument();
    reader.close();
in which case the reader verifies that the XML file is also encoded in the named character set, before proceeding.

Further constructors of ASTReader support unmarshalling XML from an endpoint specified by a public URL, or unmarshalling from data supplied via a basic InputStream. The following gives an example of the former:

    URL url = new URL("http://www.my.site/input.xml");  // Any URL
    ASTReader reader = new ASTReader(url, "UTF-8");
    Catalogue root = (Catalogue) reader.readDocument();
   reader.close();
This constructor always requires both the URL and the character encoding, since it is unsafe to assume UTF-8; the HTTP1.1 standard specifies using ISO-8859-1 by default, if no character encoding is known. Similarly, the character encoding should be supplied when unmarshalling from a basic InputStream.

All of the above assume that your Java AST node classes are defined in the default top-level package (the unnamed package, the working directory). If you place your own AST classes in a named package, you will need to tell the ASTReader where to find them:

    File xmlFile = new File("my/xml/input.xml");
    ASTReader reader = new ASTReader(xmlFile);
    reader.usePackage("org.my.catalogue");   // Where your nodes are
    Catalogue root (Catalogue) reader.readDocument();
    reader.close();
The reader will now understand that your classes have fully qualified names like: org.my.catalogue.Film. In all of these examples, we have assumed that the returned root node is of the type Catalogue, but you should replace this by whatever root node type you have in your AST.

Marshalling from a Java AST to an XML File

The main class to use for marshalling an AST in memory is ASTWriter . This can be used to write the whole abstract syntax tree to an XML File using either the default, or a specified, character set. The mapping from Java to XML identifiers is handled as follows: if the same types were previously unmarshalled in the current session, the original XML identifiers will be recovered and used during marshalling, otherwise the simple class names will be used as XML identifiers. The default writer:

    File xmlFile = new File("my/xml/output.xml");  // Or whatever file
    ASTWriter writer = new ASTWriter(xmlFile);
    writer.writeDocument(root);                // Created previously
    writer.close();
writes the file using XML version 1.0 and the UTF-8 character set; and it pretty-prints the XML file according to a standard layout. An alternative character set may be specified:
    File xmlFile = new File("my/xml/output.xml");
    ASTWriter writer = new ASTWriter(xmlFile, "ISO-8859-1");  // Latin-1
    writer.writeDocument(root);                // Created previously
    writer.close();
in which case the writer uses this character set and ensures that the XML document declares the same character set encoding.

Further constructors of ASTWriter support marshalling to a PrintWriter, or to a basic OutputStream. The former is useful when the marshalled XML is to be written by a Java Servlet, using a HTTPServletResponse response object, as defined in the Java package javax.servlet.http, to provide the output stream and character encoding:

    response.setCharacterEncoding("UTF-8");		// UTF-8
    ASTWriter writer = new ASTWriter(response.getWriter(), 
        response.getCharacterEncoding());
    writer.writeDocument(root);                // Created previously
    writer.close();
in which case the same character encoding will be declared in the XML document as that used by the output stream. The response object's character encoding may be reset to any character set, but this must be done before accessing the response's writer stream. General marshalling to a basic OutputStream is also supported.

Controlling the Java to XML Mapping

While the above facilities will be adequate for all examples where the XML elements have the same names as the Java classes, if you wish to have variable mapping from elements to different Java packages, or mappings from Java packages to different XML namespaces, then you will need to use Metadata, a class storing all of these mappings. It is possible to access the metadata saved during unmarshalling:

    File xmlFile = new File("my/xml/input.xml");
    ASTReader reader = new ASTReader(xmlFile);
    reader.usePackage("org.my.catalogue");
    Catalogue root (Catalogue) reader.readDocument();
    Metadata metadata = reader.getMetadata();  // Save this for later
    reader.close();
The Metadata object stores all the mappings from Java class names to XML element names, and all the XML metadata that does not form part of the AST model. To reuse the same mappings (and XML file properties) when writing an AST model back to XML, you need to tell the writer to use the same metadata:
    File xmlFile = new File("my/xml/output.xml");
    ASTWriter writer = new ASTWriter(xmlFile);
    writer.setMetadata(metadata);              // Use the same metadata
    writer.writeDocument(root);
    writer.close();
This will ensure that, if an XML element named TV-show is mapped to a Java class org.my.catalogue.TVShow, then this class will be mapped back to TV-show when it is written. It will also ensure that the output XML file will have the same XML version and encoding, etc., as used in the input file. If the metadata is not transferred, then the output XML file will use default settings and the element will be serialised as TVShow, the simple name of the Java class.

It is possible to ask JAST to use several different Java packages to unmarshal XML elements coming from different XML namespaces. This is done using the usePackage() method:

    File xmlFile = new File("my/xml/input.xml");
    ASTReader reader = new ASTReader(xmlFile);
    reader.usePackage("org.my.home");
    reader.usePackage("org.my.catalogue", "xmlns:cat");
    Catalogue root (Catalogue) reader.readDocument();
    reader.close();
This tells the reader to use classes in the package org.my.home for XML elements in the default namespace, and to use classes in the package org.my.catalogue for XML elements in the xmlns:cat namespace. Any XML element having the prefix cat, such as cat:catalogue, or cat:film would then be mapped to classes from this package. You may also specify these mappings in an ASTWriter, to ensure that the classes from different packages are marshalled back to XML elements in the desired namespaces.

Apart from this, it is possible to access Metadata directly, using its own API. This allows you to set XML file properties, specify XML namespaces and their base URIs, packages and their mapped XML namespaces, and Java class identifiers and their mapped XML identifiers. If no URI is specified for a namespace, the Java package ID is used instead. This level of detail is only required if you generate an AST in memory, and want the XML output to correspond to some Java to XML mapping that was not previously read.

Notification of Exceptions

ASTReader and ASTWriter may raise various kinds of IOException, if a problem occurs with the underlying file system. Ill-formed XML syntax is reported through ASTError, whereas the inability to construct or manipulate an AST node class is reported through NodeError. This covers a variety of errors, including missing constructors, missing methods, or failing methods. In summary, faulty user code may raise the following:

The latter are styled as errors, rather than exceptions, since the W3C standard requires malformed XML to be rejected outright, and not handled by exception-tolerant software.