This package contains the JAST 1.1 Document Valdation Engine, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).

Java Abstract Syntax Trees, v1.1

If you are seeking to use any of the above software, please refer to the brief instructions immediately below and also the documentation on the the JAST website for more details:

Licensing Terms

This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:

The DTD and XSD Document Validation Engine

This package provides a common engine for performing XML document validation using either the older DTD specification (Document Type Definition) or the newer, but more complex XSD specification (XML Schema Definition) syntax. The validation engine may be invoked after an XML document has been parsed and judged to be well-formed. The main class is XMLValidator, which may be supplied at construction with either a Doctype node, or the root schema Element of an XML schema document. The validator compiles a graph of grammar rules from the doctype or schema, and applies these rules to the XML document, to judge its validity, when one its accept(Document) or validate(Document) methods is invoked.

In principle, a doctype will generate a single-rooted tree of grammar rules; whereas a schema may possibly generate a multiple-rooted graph, since there is no obligation to specify a single root element in an XML schema. The compiled grammar is constructed according to BNF productions of sequence, selection, iteration, unordered and other kinds of pattern. Specific sets of attributes may be allowed, or required in each element. Attribute values and textual element contents may be weakly or strongly typed, with specific constraints placed on the value or range of allowed values. So long as the XMLValidator exists, it may be used to validate multiple documents of the same type.

Validating an XML document from a DTD

To validate an XML document from a DTD specification, you use an instance of XMLValidator. At construction, you specify the Doctype that you want it to use. Thereafter, the XMLValidator may validate multiple Document objects, to see if they conform to the specified doctype. The client code for validating a document is simply:

	Document document = ...  // obtained somehow
	XMLValidator validator = new XMLValidator(document.getDoctype());
	validator.validate(document);

Altogether, XMLValidator offers two validation methods. Whereas the stricter validate(Document) succeeds silently when the document is valid, but raises a ContentError if the document is invalid, the weaker accept(Document) merely reports whether the document is valid or not, without raising exceptions:

	public void validate(Document document) throws ContentError;
	public boolean accept(Document document);
	ContentError getError();
If using the weaker validation approach, it is still possible to access the last mismatch error using the getError() method.

Precedence of Internal over External DTDs

It is common for a doctype declaration to refer to an external file (with the suffix .dtd), which stores the DTD grammar, also known as the external subset. It is also possible for a doctype to declare an internal grammar, also known as the internal subset. If the doctype declares both an internal and external subset, then rules in the internal subset take precedence over rules in the external subset, if they define a common element or attribute. In this way, it is possible to specialise the standard DTD by overriding definitions for certain elements or attributes in the internal subset.

Level of Compliance to W3C Standards for DTD

Currently, JAST supports the creation of DOCTYPE nodes that contain a sequence of ELEMENT and ATTLIST definitions. JAST does not yet support the definition of arbitrary ENTITY nodes. Instead, a standard set of entities is pre-defined; these are parsed by XMLReader and are output by XMLWriter when escaping is needed.

The full BNF syntax for ELEMENT definitions is supported, including sequence, selection and iteration of single items or bracketed structures. Multiplicity markers may specify optional, zero-to-many or one-to-many occurrences. Definitions may contain the EMPTY or ANY category markers. See the W3C specification for further details.

ATTLIST definitions may declare single, or multiple attributes for each element within the same declaration. Each defined attribute has a name, an attribute type, and either an occurrence specifier or a default value. Attribute types may be symbolic types such as ID, NMTOKEN or CDATA; or they may be enumerated selections. An occurrence specifier is either #REQUIRED (compulsory) or #IMPLIED (optional). The specifier #FIXED must be followed by a fixed value. Any other value is interpreted as a default value.

Validating an XML document from an XSD

To validate an XML document from a XSD specification, you use an instance of XMLValidator. At construction, you specify the root schema Element that you want it to use. Thereafter, the XMLValidator may validate multiple Document objects, to see if they conform to any of the grammars defined in the schema. The client code for validating a document is simply:

	Document document = ...  // obtained somehow
	Document schema = ...  // obtained somehow
	XMLValidator validator = new XMLValidator(schema.getRootElement());
	validator.validate(document);

Altogether, XMLValidator offers two validation methods. Whereas the stricter validate(Document) succeeds silently when the document is valid, but raises a ContentError if the document is invalid, the weaker accept(Document) merely reports whether the document is valid or not, without raising exceptions:

	public void validate(Document document) throws ContentError;
	public boolean accept(Document document);
	ContentError getError();
If using the weaker validation approach, it is still possible to access the last mismatch error using the getError() method.

Level of Compliance to W3C Standards for XSD

Currently, JAST supports the analysis of XML schemas written in a variety of styles, and containing a wide range of W3C XSD constructions. In terms of style, it accepts any of the Russian Doll, Salami Slice or Venetian Blind conventions for presenting a schema and also handles mixtures of these styles. It supports simple types, complex types, groups, attributes and attribute groups. It supports sequence, choice, all and any (element) specifiers. It supports the iterative constructs minOccurs and maxOccurs. It supports extension and restriction of simple content and complex content. Currently, the anyAttribute construction is not supported.

The majority of the W3C XSD type system is implemented in terms of filters that can be used to constrain the values of attributes or elements. All IEEE numerical types are supported. Most XSD basic types are supported, except for NOTATION and base64binary, which were not considered worth the extra effort. XSD simple types are typically one of these types, or a restriction on one of these types, expressed either as an enumeration, a regular expression, or a numerical subrange. Both subranges and field-widths may be specified. See the W3C specification for further details.

JAST supports validation against a single XML Schema, although the schema document may contain multiple, overlapping grammars. The validation process requires a single root GrammarRule, so, if the schema happens implicitly to contain multiple root GrammarRules, these are determined by analysis and each GrammarRule tree is attempted in turn. JAST does not currently support validation against multiple XML Schemas, which would require a further mechanism for automatic schema inclusion and resolution.

Visualisation of Compiled Grammars

Behind the scenes, XMLValidator delegates to other components to analyse the grammar implicit in a DTD or XSD specification. The class DTDReader converts a doctype definition into a grammar rule tree. The class XSDReader converts an XML schema into a grammar rule graph. XMLValidator stores the compiled grammar, or grammars, for later application.

For convenience, a GrammarViewer class is provided, to allow end-users to visualise the grammars compiled from doctype or schema nodes. GrammarViewer may be supplied with the single root of the grammar rule tree:

    GrammarRule grammar = validator.getGrammar(document);
	GrammarViewer viewer = new GrammarViewer(grammar);
	viewer.display();
In this case, the validator returns the single ElementRule whose name matches the name of the root element in the document. Alternatively, multiple overlapping grammars may be displayed:
    GrammarViewer viewer = new GrammarViewer(validator.getGrammars());
	viewer.display();
In this case, the grammar rule graph is resolved into a set of trees, which are each displayed in turn, starting with one ElementRule root.