JAST: Java Abstract Syntax Trees

Natural Java idioms for processing XML data

You are here: JAST Home / User Guide /
Department of Computer Science

The JAST User Guide

The Java Abstract Syntax Trees toolkit provides a complete Java library for processing XML files. The various components support reading XML files into DOM-trees, writing DOM-trees to XML files, scanning large XML files to build Java structures on demand, marshalling bespoke Java models to serial XML, and unmarshalling XML to bespoke Java models consisting of the programmer's own classes. Further components exist for checking document validity according to DTD or XSD specifications, and for searching or filtering XML DOM-trees according to the XPath abbreviated syntax. The most frequently used readers and writers may be found in the top-level package: uk.ac.sheffield.jast. A number of demonstration programs are supplied in the package: uk.ac.sheffield.jast.test and require XML files found in the unzipped download bundle. Please refer to the Download Guide.

Parsing Document Object Models

The standard XML toolkit allows Java programs to read and write XML files to and from a standard Document Object Model (DOM), a syntax tree that corresponds exactly to the hierarchical structure of the XML document. The nodes of the DOM-tree have obvious Java class names, such as Declaration, Instruction, Element, Attribute, Text, Data and Comment, inspired by the W3C XML specification. The top-level package: uk.ac.sheffield.jast contains XMLReader for reading XML files into DOM-trees, and XMLWriter for writing DOM-trees as XML files. Please refer to the DOM Processing Guide.

Once read into memory, the DOM-tree is returned as an instance of the type Document, from which the root Element may be extracted. The root Element and all of its descendant nodes may be inspected and manipulated using methods of the relevant nodes. The APIs of all the nodes used for building a DOM-tree are described in the package: uk.ac.sheffield.jast.xml. By default, XML documents are only checked for well-formedness. It is also possible to validate a document against a Document Type Definition (DTD), or alternatively against an XML Schema Definition (XSD). Tools for doing this are provided in the package: uk.ac.sheffield.jast.valid; and validation can also be triggered automatically when reading with XMLReader. Please refer to the XML Validation Guide.

Marshalling Bespoke Java Models

The custom AST toolkit allows Java programs to marshal a custom Abstract Syntax Tree (AST) to a serialised XML File and to unmarshal the XML file back to an in-memory AST. The the nodes of the AST are provided as custom Java classes designed by the programmer. The Java AST model may be a simple tree, or a cyclic and re-entrant graph of arbitrarily-connected Java objects. Marshalling will write such structures to serial XML files without duplication, writing references when an object is encountered more than once. The top-level package: uk.ac.sheffield.jast contains ASTReader for unmarshalling XML files into Java ASTs, and ASTWriter for marshalling ASTs as XML files. Please refer to the Java Binding Guide.

The AST node classes supplied by the programmer are designed according to simple API conventions, rather like Java Beans, and do not require complicated Java annotations or XML mapping files to support conversion to and from XML. Data is stored in these classes and accessed using the usual strongly-typed setter- and getter-methods familiar to the programmer; these methods are automatically discovered by the marshalling framework through Java reflection. All textual data is converted into suitable strongly-typed values, before being stored in the programmer's own classes. By way of example, a collection of AST classes for modelling a film catalogue is provided in the package: uk.ac.sheffield.jast.ast. Please refer to the Java Binding Guide.

Scanning Very Large XML Streams

The Streaming API for XML (SAX) allows Java programs to scan very large XML files and perform programmer-defined building-actions when specific XML events are encountered. This is a suitable strategy when the XML files to be processed are simply too large to hold in memory (although JAST outclasses all other DOM-tree builders in its tree-storing capacity). The components for building are provided in the package: uk.ac.sheffield.jast.build. The programmer must supply the scanning XMLParser with a builder-class that conforms to the Builder interface, and which defines how to respond to specific events scanned by the streaming XMLParser. Please refer to the SAX Builder Guide.

The programmer's builder-class may be provided more quickly and simply by inheriting from BasicBuilder, which defines empty responses to each event. The structures created by a builder are arbitrary, depending on how the programmer defines the methods of the builder-class. There is no corresponding method for inserting the extracted data back into the XML file. However, two builders are provided, called XMLBuilder and ASTBuilder, which mimic exactly the behaviour of XMLReader and ASTReader, and whose data can be serialised using the corresponding writers. Please refer to the SAX Builder Guide.

Searching and Filtering XML Documents

The XPath search engine is an accompaniment to the standard XML toolkit. It implements a subset of the W3C XPath standard, supporting the abbreviated syntax for XPath searching. A search query is represented by an XPath object, which compiles the query-string into a collection of rules that filter and navigate through the XML memory tree. A search is initiated by matching an XPath against a starting point in a DOM-tree, and returns a list of matching nodes. The XPath search engine is provided in the package: uk.ac.sheffield.jast.xpath. Please refer to the XPath Query Guide.

The XML tree filtering kit is an accompaniment to the standard XML toolkit. It provides a useful set of Filter classes that may be used to construct arbitrarily complex criteria for matching different kinds of XML node in a DOM-tree. Different filters can test the content-type, the name, the value, the attributes or the children of different kinds of node. The tree filtering kit is also an integral component of the XPath search engine and the document validation engine. The XML filtering kit is provided in the package: uk.ac.sheffield.jast.filter. Please refer to the XML Filter Guide.

Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom