This package contains the JAST 1.1 Standard XML Toolkit, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).

Java Abstract Syntax Trees, v1.1

If you are seeking to use any of the above software, please refer to the brief instructions immediately below and also the documentation on the the JAST website for more details:

     http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/.

Licensing Terms

This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:

     http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/.

The Standard XML Toolkit

The following assumes that you, the developer, wish to build a third-party application, which incorporates the JAST 1.1 standard XML processing tools. The components for reading and writing XML files, and for manipulating in-memory XML trees using the standard memory model, are to be found in this package org.jast.xml. The components for filtering XML memory-trees are to be found in another package org.jast.filter. The components for conducting XPath searches in XML memory-trees are to be found in another package org.jast.xpath.

Designing an XML Data Model

The first thing you will need to do is decide what kind of data you wish to model. Having done this, you will develop an XML markup scheme, using a mixture of XML elements and attributes to describe and encode the data. For example, a document that stores information about people in a family might look like this:

	<?xml version="1.0" encoding="UTF-8"?>
	<Family>
	  <!-- The Smith family -->
	  <Person role="father" age="45">
	    John Smith
	  </Person>
	  <Person role="mother" age="41">
	    Mary Smith
	  </Person>
	  <Person role="son" age="16">
	    Ben Smith
	  </Person>
	  <Person role="daughter" age="14">
	    Alice Smith
	  </Person>
	</Family>

So, the main XML element nodes used for markup are called Family and Person; there is an XML declaration; and there is a comment. The Person element also has attributes called role and age. We assume that information like this is stored in a text file.

Reading an XML File into a Memory Document

The main API class to use is XMLReader. This can be used to read an XML File, using the default, or a chosen, character set and preserving or discarding extra formatting whitespace. The default reader:

	File xmlFile = new File("my/xml/input.xml");  // Or whatever file
	XMLReader reader = new XMLReader(xmlFile);
	Document document = reader.readDocument();
	reader.close();

reads the file using the UTF-8 character set and discards extra formatting whitespace surrounding nodes. An alternative character set may be specified:

	File xmlFile = new File("my/xml/input.xml");  
	XMLReader reader = new XMLReader(xmlFile, "ISO-8859-1");  // Latin-1
	Document document = reader.readDocument();
	reader.close();
in which case the reader verifies that the XML file is also encoded in the named character set, before proceeding.

Further constructors of ASTReader support reading the XML file from an endpoint specified by a public URL, or reading data supplied via a basic InputStream. The following gives an example of the former:

	URL url = new URL("http://www.my.site/input.xml");  // Any URL
	XMLReader reader = new XMLReader(url, "UTF-8");
	Document document = reader.readDocument();
	reader.close();
This constructor always requires both the URL and the character encoding, since it is unsafe to assume UTF-8; the HTTP1.1 standard specifies using ISO-8859-1 by default, if no character encoding is known. Similarly, the character set must be given when reading from an InputStream.

By default, a compact document tree is built in memory. Alternatively, all formatting whitespace in the original XML document may be preserved:

	File xmlFile = new File("my/xml/input.xml");
	XMLReader reader = new XMLReader(xmlFile);
	reader.setCompactFormat(false);		     // Keep all layout text
	Document document = reader.readDocument();
	reader.close();
by invoking setCompactFormat(false). This preserves all formatting whitespace as extra Text nodes in the memory-tree. To undo this, invoke setCompactFormat(true) to discard all formatting whitespace and build a more compact memory-tree (this is the default setting). If compact format and pretty-printing (see below) are both disabled, the XML file may be re-written with exactly the same layout as it was read.

Writing a Memory Document out to an XML File

The main API class to use is XMLWriter. This can be used to write an XML File, using the default, or a chosen, character set and preserving the existing format, or pretty-printing the document. The default writer:

	File xmlFile = new File("my/xml/output.xml");  // Or whatever file
	XMLWriter writer = new XMLWriter(xmlFile);
	writer.writeDocument(document);                // Created previously
	writer.close();
writes the file using the UTF-8 character set and pretty-prints the XML file according to a standard layout. An alternative character set may be specified:
	File xmlFile = new File("my/xml/output.xml");
	XMLWriter writer = new XMLWriter(xmlFile, "ISO-8859-1");  // Latin-1
	writer.writeDocument(document);
	writer.close();
in which case the writer verifies that the XML document declares the same character set encoding, before proceeding.

Further constructors of XMLWriter support writing XML to a PrintWriter, or to a basic OutputStream. The former is useful when the XML DOM-tree is to be written by a Java Servlet, using a HTTPServletResponse response object, as defined in the Java package javax.servlet.http, to provide the output stream and character encoding:

	response.setCharacterEncoding("UTF-8");			// UTF-8
	XMLWriter writer = new XMLWriter(response.getWriter(), 
		response.getCharacterEncoding());
	writer.writeDocument(document);
	writer.close();
in which case you should always ensure that the XML declaration at the head of the document also uses the same character set. The response object's character encoding may be reset to any character set, but this must be done before accessing the response's writer stream. The character encoding must also be supplied when writing to a basic OutputStream.

By default, the XML file is pretty-printed using a standard layout with newlines and indentation. Alternatively, the original layout of the memory-tree can be preserved in the output:

	File xmlFile = new File("my/xml/output.xml");
	XMLWriter writer = new XMLWriter(xmlFile);
	writer.setPrettyFormat(false);		      // Write native layout
	writer.writeDocument(document);
	writer.close();
by invoking setPrettyFormat(false) to disable pretty-printing. The XML file will be formatted using whatever Text whitespace layout text is present in the memory-tree. To undo this, invoke setPrettyFormat(true) to enable pretty-printing again (this is the default setting). If compact format (see above) and pretty-printing are both disabled, the XML file may be re-written with exactly the same layout as it was read.

Accessing the Contents of the XML Memory-Tree

The main classes of interest are Document, Element, Attribute and Text. There are other types of node in the default memory model. The JAST XML memory-tree nodes are designed according to the Composite Design Pattern, that is, everything in the memory-tree is some kind of Content and respects a common API. The more specific kinds of node extend this API in different ways. Please see the full API descriptions for each of these types.

The following is just an example of how the nodes of the XML memory-tree can be accessed within a Java program:

	Directive header = document.getDeclaration();    // XML declaration.
	Doctype doctype = document.getDoctype();    // Optional doctype.
	Comment comment = document.getComment();    // Optional comment
	Element root = document.getRootElement();   // Root element.
	List<Content> contents = document.getContents(); // All subnodes.
	Content node = document.getContent(2);      // Third sub-node.

	String name = root.getName();               // Element name.
	int contentType = root.getType();           // Bitmask type.
	List<Element> allChildren = root.getChildren();
	List<Element> someChildren = root.getChildren("Person");
	Element child = root.getChild("Person");    // First so-named.
	Element parent = child.getParent();         // Same as root.
	String text = child.getText();              // Textual content.

	List<Attribute> properties = child.getAttributes();
	Attribute property = child.getAttribute("age");
	String ageStr = property.getValue();	// If property != null
	int age = property.intValue();	        // If property != null
	String value = child.getValue("age");   // Access value directly
In addition, there is provision to iterate over all nodes in a memory-tree. The iteration may include the starting node, or just all of its descendants. For further access to different kinds of Content node, such as Text, Data and Comment nodes, you must use Filter and its subclasses to filter the contents of a given node. Please see the package org.jast.filter.

Creating the Contents of the XML Memory-Tree

The main classes of interest are Document, Element, Attribute and Text. All construction methods are designed to nest, so that the Java code looks somewhat like the structure of the XML file being created. Please see the full API descriptions for how to construct each of these node types.

The following is just an example of how the nodes of the XML memory-tree can be created within a Java program, using the return value of the previous setter as the target of the next setter (suitably nested):

	Document document = new Document();           // Default encoding.
	Element root = new Element("Family")
		.addContent(new Comment("The Smith family"))
		.addContent(new Element("Person")
			.setText("John Smith")        // Sets all text.
			.setValue("role", "father")   // Sets attribute.
			.setValue("age", "45"))       // End of add John
		.addContent(new Element("Person")
				// Another way to add text content.
			.addContent(new Text("Mary Smith"))
			.setValue("role", "mother")
			.setValue("age", "41"))       // End of add Mary
		.addContent(new Element("Person")
			.setText("Ben Smith")
				// Another way to set an attribute.
			.setAttribute(new Attribute("role", "son"))
			.setValue("age", "16"))       // End of add Ben
		.addContent(new Element("Person")
				// Another way to add text incrementally.
			.addContent(new Text("Alice"))
			.addContent(new Text(" Smith"))
			.setValue("role", "daughter")
			.setValue("age", "14")));     // End of Family
	document.setRootElement(root);
In addition, there is provision to remove specific nodes, or all nodes of a given type, or nodes at an index. All Content nodes may have at most one parent node. If you wish to manipulate parts of an XML memory-tree, you must remove nodes from the source tree before adding them to the destination. Alternatively, you may clone() the part of the source tree and add this subtree to the destination.

Notification of Exceptions

Both XMLReader and XMLWriter may raise kinds of IOException, if a problem occurs with the underlying file system. Ill-formed XML syntax is reported through XMLError, whereas attempting to construct an illegal memory-tree is reported through ContentError. In general, faulty user code may raise the following:

The latter are styled as errors, rather than exceptions, since the W3C standard requires malformed XML to be rejected outright, and not handled by exception-tolerant software.