This package contains the JAST 1.1 Standard XML Toolkit, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).
org.jast.ast
contains tools for mapping XML
files to user-defined Java syntax trees, and vice-versa.org.jast.xml
contains
tools for mapping XML files to JAST's standard XML memory model, and vice-versa.org.jast.xpath
contains an XPath search
engine for use with the standard XML model.org.jast.dtd
contains a document validation
engine for use with the standard XML model.org.jast.filter
contains filters for
searching and validating the standard XML model.If you are seeking to use any of the above software, please refer to the brief instructions immediately below and also the documentation on the the JAST website for more details:
http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/.
This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:
http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/.
The following assumes that you, the developer, wish to build a third-party
application, which incorporates the JAST 1.1 standard XML
processing tools. The components for reading and writing XML files, and for
manipulating in-memory XML trees using the standard memory model, are to be
found in this package org.jast.xml
. The components for
filtering XML memory-trees are to be found in another package
org.jast.filter
. The components for conducting XPath searches
in XML memory-trees are to be found in another package
org.jast.xpath
.
The first thing you will need to do is decide what kind of data you wish to model. Having done this, you will develop an XML markup scheme, using a mixture of XML elements and attributes to describe and encode the data. For example, a document that stores information about people in a family might look like this:
<?xml version="1.0" encoding="UTF-8"?> <Family> <!-- The Smith family --> <Person role="father" age="45"> John Smith </Person> <Person role="mother" age="41"> Mary Smith </Person> <Person role="son" age="16"> Ben Smith </Person> <Person role="daughter" age="14"> Alice Smith </Person> </Family>
So, the main XML element nodes used for markup are called Family
and Person
; there is an XML declaration; and there is a comment.
The Person
element also has attributes called role
and age
. We assume that information like this is stored in a
text file.
The main API class to use is XMLReader
. This can be used to
read an XML File, using the default, or a chosen, character set and preserving
or discarding extra formatting whitespace. The default reader:
File xmlFile = new File("my/xml/input.xml"); // Or whatever file XMLReader reader = new XMLReader(xmlFile); Document document = reader.readDocument(); reader.close();
reads the file using the UTF-8 character set and discards extra formatting whitespace surrounding nodes. An alternative character set may be specified:
File xmlFile = new File("my/xml/input.xml"); XMLReader reader = new XMLReader(xmlFile, "ISO-8859-1"); // Latin-1 Document document = reader.readDocument(); reader.close();in which case the reader verifies that the XML file is also encoded in the named character set, before proceeding.
Further constructors of ASTReader
support reading the XML
file from an endpoint specified by a public URL
, or reading
data supplied via a basic InputStream
. The following gives
an example of the former:
URL url = new URL("http://www.my.site/input.xml"); // Any URL XMLReader reader = new XMLReader(url, "UTF-8"); Document document = reader.readDocument(); reader.close();This constructor always requires both the URL and the character encoding, since it is unsafe to assume UTF-8; the HTTP1.1 standard specifies using ISO-8859-1 by default, if no character encoding is known. Similarly, the character set must be given when reading from an
InputStream
.
By default, a compact document tree is built in memory. Alternatively, all formatting whitespace in the original XML document may be preserved:
File xmlFile = new File("my/xml/input.xml"); XMLReader reader = new XMLReader(xmlFile); reader.setCompactFormat(false); // Keep all layout text Document document = reader.readDocument(); reader.close();by invoking
setCompactFormat(false)
. This preserves all formatting
whitespace as extra Text
nodes in the memory-tree. To undo this,
invoke setCompactFormat(true)
to discard all formatting whitespace
and build a more compact memory-tree (this is the default setting).
If compact format and pretty-printing (see below) are both disabled, the XML
file may be re-written with exactly the same layout as it was read.
The main API class to use is XMLWriter
. This can be used to
write an XML File, using the default, or a chosen, character set and preserving
the existing format, or pretty-printing the document. The default writer:
File xmlFile = new File("my/xml/output.xml"); // Or whatever file XMLWriter writer = new XMLWriter(xmlFile); writer.writeDocument(document); // Created previously writer.close();writes the file using the UTF-8 character set and pretty-prints the XML file according to a standard layout. An alternative character set may be specified:
File xmlFile = new File("my/xml/output.xml"); XMLWriter writer = new XMLWriter(xmlFile, "ISO-8859-1"); // Latin-1 writer.writeDocument(document); writer.close();in which case the writer verifies that the XML document declares the same character set encoding, before proceeding.
Further constructors of XMLWriter
support writing XML
to a PrintWriter
, or to a basic OutputStream
.
The former is useful when the XML DOM-tree is to be written by a Java
Servlet, using a HTTPServletResponse response
object,
as defined in the Java package javax.servlet.http
,
to provide the output stream and character encoding:
response.setCharacterEncoding("UTF-8"); // UTF-8 XMLWriter writer = new XMLWriter(response.getWriter(), response.getCharacterEncoding()); writer.writeDocument(document); writer.close();in which case you should always ensure that the XML declaration at the head of the document also uses the same character set. The
response
object's character encoding may be reset to any character set, but this must
be done before accessing the response
's writer stream. The
character encoding must also be supplied when writing to a basic
OutputStream
.
By default, the XML file is pretty-printed using a standard layout with newlines and indentation. Alternatively, the original layout of the memory-tree can be preserved in the output:
File xmlFile = new File("my/xml/output.xml"); XMLWriter writer = new XMLWriter(xmlFile); writer.setPrettyFormat(false); // Write native layout writer.writeDocument(document); writer.close();by invoking
setPrettyFormat(false)
to disable pretty-printing.
The XML file will be formatted using whatever Text
whitespace
layout text is present in the memory-tree. To undo this, invoke
setPrettyFormat(true)
to enable pretty-printing again (this is
the default setting). If compact format (see above) and pretty-printing
are both disabled, the XML file may be re-written with exactly the same
layout as it was read.
The main classes of interest are Document
,
Element
, Attribute
and Text
.
There are other types of node in the default memory model.
The JAST XML memory-tree nodes are designed according to the Composite
Design Pattern, that is, everything in the memory-tree is some kind
of Content
and respects a common API. The more specific
kinds of node extend this API in different ways. Please see the full
API descriptions for each of these types.
The following is just an example of how the nodes of the XML memory-tree can be accessed within a Java program:
Directive header = document.getDeclaration(); // XML declaration. Doctype doctype = document.getDoctype(); // Optional doctype. Comment comment = document.getComment(); // Optional comment Element root = document.getRootElement(); // Root element. List<Content> contents = document.getContents(); // All subnodes. Content node = document.getContent(2); // Third sub-node. String name = root.getName(); // Element name. int contentType = root.getType(); // Bitmask type. List<Element> allChildren = root.getChildren(); List<Element> someChildren = root.getChildren("Person"); Element child = root.getChild("Person"); // First so-named. Element parent = child.getParent(); // Same as root. String text = child.getText(); // Textual content. List<Attribute> properties = child.getAttributes(); Attribute property = child.getAttribute("age"); String ageStr = property.getValue(); // If property != null int age = property.intValue(); // If property != null String value = child.getValue("age"); // Access value directlyIn addition, there is provision to iterate over all nodes in a memory-tree. The iteration may include the starting node, or just all of its descendants. For further access to different kinds of
Content
node, such as
Text
, Data
and Comment
nodes, you
must use Filter
and its subclasses to filter the contents of
a given node. Please see the package org.jast.filter
.
The main classes of interest are Document
,
Element
, Attribute
and Text
.
All construction methods are designed to nest, so that the Java code looks
somewhat like the structure of the XML file being created. Please see the
full API descriptions for how to construct each of these node types.
The following is just an example of how the nodes of the XML memory-tree can be created within a Java program, using the return value of the previous setter as the target of the next setter (suitably nested):
Document document = new Document(); // Default encoding. Element root = new Element("Family") .addContent(new Comment("The Smith family")) .addContent(new Element("Person") .setText("John Smith") // Sets all text. .setValue("role", "father") // Sets attribute. .setValue("age", "45")) // End of add John .addContent(new Element("Person") // Another way to add text content. .addContent(new Text("Mary Smith")) .setValue("role", "mother") .setValue("age", "41")) // End of add Mary .addContent(new Element("Person") .setText("Ben Smith") // Another way to set an attribute. .setAttribute(new Attribute("role", "son")) .setValue("age", "16")) // End of add Ben .addContent(new Element("Person") // Another way to add text incrementally. .addContent(new Text("Alice")) .addContent(new Text(" Smith")) .setValue("role", "daughter") .setValue("age", "14"))); // End of Family document.setRootElement(root);In addition, there is provision to remove specific nodes, or all nodes of a given type, or nodes at an index. All
Content
nodes may
have at most one parent node. If you wish to manipulate parts of an XML
memory-tree, you must remove nodes from the source tree before adding
them to the destination. Alternatively, you may clone()
the part of the source tree and add this subtree to the destination.
Both XMLReader
and XMLWriter
may raise kinds
of IOException
, if a problem occurs with the underlying file
system. Ill-formed XML syntax is reported through XMLError
,
whereas attempting to construct an illegal memory-tree is reported through
ContentError
. In general, faulty user code may raise the
following:
FileNotFoundException
- raised if the specified file
cannot be found (wrong pathname given)UnsupportedEncodingException
- raised if the character
set encodings are inconsistentIOException
- raised if a fault in the filesystem occurs
while reading an XML input fileXMLError
- raised if a syntax error is detected while
parsing an XML input fileContentError
- raised if any construction method violates
XML memory-tree rules