This package contains the JAST 1.1 XPath Search Engine, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).
org.jast.ast
contains tools for mapping XML
files to user-defined Java syntax trees, and vice-versa.org.jast.xml
contains tools for mapping XML
files to JAST's standard XML memory model, and vice-versa.org.jast.xpath
contains an
XPath search engine for use with the standard XML model.org.jast.dtd
contains a document validation
engine for use with the standard XML model.org.jast.filter
contains filters for
searching and validating the standard XML model.This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:
The following assumes that you, the developer, wish to build a third-party
application, which incorporates the JAST 1.1 default XML
processing tools. The components for reading and writing XML files, and for
manipulating in-memory XML trees using the default memory model, are to be
found in another package org.jast.xml
. The components for
conducting XPath searches in XML memory-trees are to be found here, in this
package org.jast.xpath
. The basic components for filtering
XML memory-trees are to be found in another package
org.jast.filter
.
The World-Wide Web Consortium (W3C) specifies a standard for conducting searches in XML trees (whether in memory or on disk) called XPath. This standard specifies the syntax of a language for constructing pattern expressions, which denote paths through an XML tree. The pattern expressions are matched against an XML tree and select one or more nodes (if the match succeeds) that match the pattern. The result may be a set of elements, of attributes, or of other kinds of content, according to the pattern.
The full XPath pattern language syntax is quite rich, including specifiers for which axis to explore, what kinds of node to select and what predicates to apply to the nodes. The full syntax includes a large subset of the functions normally found in a programming language. The W3C also defines an abbreviated syntax, which focuses on matching elements and attributes by their names and values. Examples of this more convenient abbreviated syntax include the following:
. |
selects the current node, known as the context |
.. |
selects the parent node of the context node (a relative path) |
/. |
selects the document node containing the context node (an absolute path, starting from the document root) |
//. |
selects all nodes in the current document (an absolute pattern starting from the document root) |
Film |
selects the Film children of the context node (a
relative path) |
Film/Director |
selects the Director children of the Film
children of the context node (a relative path) |
/Catalogue/Film |
selects the Film children of the Catalogue
root element of the document node (an absolute path) |
@year |
selects the year attribute of the context node (a
relative path) |
Film/@year |
selects the year attributes of the Film
children of the context node (a relative path) |
//Film |
selects all Film descendants of the document node
(an absolute path) |
Catalogue//Title |
selects all Title descendants of the Catalogue
child of the context node (a relative path) |
Film[@year=1976] |
selects all Film children of the context node, whose
year attribute has the value 1976 |
Film[Director='George Lucas'] |
selects all Film children of the context node, whose
Director child node has the value "George Lucas"
|
The main class of interest is XPath
, which represents
a compiled XPath pattern; and also contains the search engine for matching
a pattern against an XML memory tree. JAST only supports matching XPaths
against memory-trees; it does not support matching against XML files on
disk. An XPath
object is
created using the constructor XPath(String)
, supplying the
XPath pattern string as the argument. Behind the scenes, this invokes a
parser called XPathReader
, which compiles the pattern
string into a sequence of navigation rules and filters, known as
XPathRules
. When you perform an XPath search, these rules
are applied, one at a time, to the current context, yielding a new
context. The matching process is top-down and breadth-first, finishing
when the rules are exhausted, or no further nodes match the pattern.
To initiate an XPath search, a program need only construct an
XPath
instance and invoke one of its match()
methods on a single node, or list of nodes, which serve as the starting
context for the match. For example:
XPath findFilms = new XPath("/Catalogue/Film"); List<Content> films = findFilms.match(document); XPath findYears = new XPath("@year"); List<Content> years = findYears.match(films); XPath findText = new XPath("//Director/text()"); List<Content> names = findText.match(document);The first XPath searches for all
Film
elements under the
root element Catalogue
of the document
. This
returns a list of nodes. These are the new context for the second XPath
search, which returns all year
attributes of the
Film
s returned by the first search. The third example
shows how to return a list of Text
nodes storing the values
of all the Director
elements anywhere in the document.
If the same XPath
search pattern needs to be used many times, it is most efficient to
create the XPath
object once, and re-use it many times;
otherwise the XPathReader
will be invoked again to
recompile the string search pattern.
While the developer need not be concerned about the various
XPathRule
classes, you may like to know that these rules
use exactly the same kinds of Filter
as those supplied in
another JAST package. In general, a rule may advance one step in the XPath
search, but apply one or more filters to the following context, such as
testing its element-type, its name, or applying a constraint to its value,
or its attribute's value, or its child's value. Only those nodes passing
the filters are returned in the next cycle.
The JAST implementation of XPath supports the W3C abbreviated syntax for
most simple XPath patterns. It supports searching along the self-axis,
parent-axis, child-axis, attribute-axis and descendants-or-self axis.
It supports absolute paths starting from the root and relative paths
starting from the context node. It supports predicates testing the
self-axis (by value), the attribute-axis (by name and by value) and the
child-axis (by name and by value); and supports position selection at
the positions n
, last()
and last()-n
,
where n
is an integer.
Predicates on values may use any of the six usual inequality
operators. JAST supports the node()
, text()
,
comment()
and processing-instruction()
content
selectors in addition to the default child selector. The wildcard
*
may be given for attribute or element
names as a whole, (but not used as part of a name).
The JAST implementation does not support the W3C full syntax for XPath.
It does not yet support the preceding, following, preceding-sibling,
following-sibling, ancestor, ancestor-or-self or (just) descendant axes,
which have no shortform in the W3C abbreviated syntax. It only supports
predicates defined on the context node, its immediate children or attributes (not
on general nested XPaths). It does not support the position()<n
predicate. It does not yet support alternative paths
|
, or predicates combined
using explicit and
, or
and the not()
function (but a sequence of
predicates is implicitly conjoined). Arbitrary arithmetical and
string functions are not supported. These restrictions were chosen to
ensure optimum efficiency for the majority of cases where XPath searches
on a memory-document are appropriate.