This package contains the JAST 1.1 Document Valdation Engine, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).
org.jast.ast
contains tools for mapping XML
files to user-defined Java syntax trees, and vice-versa.org.jast.xml
contains tools for mapping XML
files to JAST's standard XML memory model, and vice-versa.org.jast.xpath
contains an XPath search
engine for use with the standard XML model.org.jast.dtd
contains a document validation
engine for use with the standard XML model.org.jast.filter
contains filters for
searching and validating the standard XML model.This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:
This package provides a common engine for performing XML document
validation using either the older DTD specification (Document Type
Definition) or the newer, but more complex XSD specification (XML Schema
Definition) syntax. The validation engine may be invoked after an XML
document has been parsed and judged to be well-formed. The main class is
XMLValidator
, which may be supplied at construction with either
a Doctype
node, or the root schema Element
of an
XML schema document. The validator compiles a graph of grammar rules from
the doctype or schema, and applies these rules to the XML document, to judge
its validity, when one its accept(Document)
or
validate(Document)
methods is invoked.
In principle, a doctype will generate a single-rooted tree of grammar
rules; whereas a schema may possibly generate a multiple-rooted graph, since
there is no obligation to specify a single root element in an XML schema.
The compiled grammar is constructed according to BNF productions of sequence,
selection, iteration, unordered and other kinds of pattern. Specific sets of
attributes may be allowed, or required in each element. Attribute values and
textual element contents may be weakly or strongly typed, with specific
constraints placed on the value or range of allowed values. So long as the
XMLValidator
exists, it may be used to validate multiple
documents of the same type.
To validate an XML document from a DTD specification, you use an
instance of XMLValidator
. At construction, you specify
the Doctype
that you want it to use. Thereafter, the
XMLValidator
may validate multiple Document
objects, to see if they conform to the specified doctype. The client
code for validating a document is simply:
Document document = ... // obtained somehow XMLValidator validator = new XMLValidator(document.getDoctype()); validator.validate(document);
Altogether, XMLValidator
offers two validation methods.
Whereas the stricter validate(Document)
succeeds silently
when the document is valid, but raises a ContentError
if the
document is invalid, the weaker accept(Document)
merely reports
whether the document is valid or not, without raising exceptions:
public void validate(Document document) throws ContentError; public boolean accept(Document document); ContentError getError();If using the weaker validation approach, it is still possible to access the last mismatch error using the
getError()
method.
It is common for a doctype declaration to refer to an external file
(with the suffix .dtd
), which stores the DTD grammar, also
known as the external subset. It is also possible for a
doctype to declare an internal grammar, also known as the internal
subset. If the doctype declares both an internal and external
subset, then rules in the internal subset take precedence over rules in
the external subset, if they define a common element or attribute. In
this way, it is possible to specialise the standard DTD by overriding
definitions for certain elements or attributes in the internal subset.
Currently, JAST supports the creation of DOCTYPE nodes that contain
a sequence of ELEMENT and ATTLIST definitions. JAST does not yet
support the definition of arbitrary ENTITY nodes. Instead, a standard
set of entities is pre-defined; these are parsed by XMLReader
and are output by XMLWriter
when escaping is needed.
The full BNF syntax for ELEMENT definitions is supported, including sequence, selection and iteration of single items or bracketed structures. Multiplicity markers may specify optional, zero-to-many or one-to-many occurrences. Definitions may contain the EMPTY or ANY category markers. See the W3C specification for further details.
ATTLIST definitions may declare single, or multiple attributes for each element within the same declaration. Each defined attribute has a name, an attribute type, and either an occurrence specifier or a default value. Attribute types may be symbolic types such as ID, NMTOKEN or CDATA; or they may be enumerated selections. An occurrence specifier is either #REQUIRED (compulsory) or #IMPLIED (optional). The specifier #FIXED must be followed by a fixed value. Any other value is interpreted as a default value.
To validate an XML document from a XSD specification, you use an instance
of XMLValidator
. At construction, you specify the root schema
Element
that you want it to use. Thereafter, the
XMLValidator
may validate multiple Document
objects, to see if they conform to any of the grammars defined in the schema.
The client code for validating a document is simply:
Document document = ... // obtained somehow Document schema = ... // obtained somehow XMLValidator validator = new XMLValidator(schema.getRootElement()); validator.validate(document);
Altogether, XMLValidator
offers two validation methods.
Whereas the stricter validate(Document)
succeeds silently
when the document is valid, but raises a ContentError
if the
document is invalid, the weaker accept(Document)
merely reports
whether the document is valid or not, without raising exceptions:
public void validate(Document document) throws ContentError; public boolean accept(Document document); ContentError getError();If using the weaker validation approach, it is still possible to access the last mismatch error using the
getError()
method.
Currently, JAST supports the analysis of XML schemas written in a variety
of styles, and containing a wide range of W3C XSD constructions. In terms
of style, it accepts any of the Russian Doll, Salami Slice or
Venetian Blind conventions for presenting a schema and also handles
mixtures of these styles. It supports simple types, complex types, groups,
attributes and attribute groups.
It supports sequence
, choice
,
all
and any
(element) specifiers. It supports the
iterative constructs minOccurs
and maxOccurs
. It
supports extension
and restriction
of simple
content and complex content. Currently, the anyAttribute
construction is not supported.
The majority of the W3C XSD type system is implemented in terms of filters
that can be used to constrain the values of attributes or elements. All IEEE
numerical types are supported. Most XSD basic types are supported, except for
NOTATION
and base64binary
, which were not considered
worth the extra effort. XSD simple types are typically one of these types, or
a restriction on one of these types, expressed either as an enumeration, a
regular expression, or a numerical subrange. Both subranges and field-widths
may be specified. See the W3C specification for further details.
JAST supports validation against a single XML Schema, although the schema document may contain multiple, overlapping grammars. The validation process requires a single root GrammarRule, so, if the schema happens implicitly to contain multiple root GrammarRules, these are determined by analysis and each GrammarRule tree is attempted in turn. JAST does not currently support validation against multiple XML Schemas, which would require a further mechanism for automatic schema inclusion and resolution.
Behind the scenes, XMLValidator
delegates to other components
to analyse the grammar implicit in a DTD or XSD specification. The class
DTDReader
converts a doctype definition into a grammar rule tree.
The class XSDReader
converts an XML schema into a grammar rule
graph. XMLValidator
stores the compiled grammar, or grammars,
for later application.
For convenience, a GrammarViewer
class is provided, to allow
end-users to visualise the grammars compiled from doctype or schema nodes.
GrammarViewer
may be supplied with the single root of the
grammar rule tree:
GrammarRule grammar = validator.getGrammar(document); GrammarViewer viewer = new GrammarViewer(grammar); viewer.display();In this case, the validator returns the single
ElementRule
whose
name matches the name of the root element in the document. Alternatively,
multiple overlapping grammars may be displayed:
GrammarViewer viewer = new GrammarViewer(validator.getGrammars()); viewer.display();In this case, the grammar rule graph is resolved into a set of trees, which are each displayed in turn, starting with one
ElementRule
root.