This package contains the JAST 1.1 XML Filter Kit, © Anthony J H Simons, 2010-2015. This software is currently on experimental alpha release, and is offered as-is, under a free experimental license (see full terms below).

Java Abstract Syntax Trees, v1.1

If you are seeking to use any of the above software, please refer to the brief instructions immediately below and also the documentation on the the JAST website for more details:

Licensing Terms

This alpha-release software is free to use by academic and commercial users. The terms of the license are that you are free to use the software in any product (whether free or commercial), provided that any usage is acknowledged by citing "©Anthony J H Simons" as the copyright holder and referring to the JAST website "http://staffwww.dcs.shef.ac.uk/people/A.Simons/jast/" as the source. While this alpha license is perpetual and not subject to any restriction, we reserve the right to change the licensing terms of subsequent releases. The software is offered as-is, without any implied warranty for fitness of purpose. Please refer to the JAST website for further details:

The XML Filter Kit

The following assumes that you, the developer, wish to build a third-party application, which incorporates the JAST 1.1 default XML processing tools. The components for reading and writing XML files, and for manipulating in-memory XML trees using the default memory model, are to be found in another package org.jast.xml. The components for filtering XML memory-trees are to be found here, in this package org.jast.filter. The components for conducting XPath searches in XML memory-trees are to be found in another package org.jast.xpath.

Filtering the XML Memory Model

Filters can be used to select subsets of nodes from an XML memory tree. Every XML memory tree is rooted in some node that conforms to the base type Content. This class provides an API for retrieving all the children, or subsets of the children of a node. Relevant methods in the API of Content include:

	public List<Content> getContents();		// all contents
	public List<Content> getContents(Filter);	// a filtered subset
	public Content getContent(int);			// node at index
	public Content getContent(Filter, int);		// filtered at index
where the kind of Filter supplied determines what types of node are returned. Filtering may return nodes of one or more specific types, or with a given name, or constraint on the value, or any combination of these. The Content API also includes methods for removing single or multiple children.
	public List<Content> removeContents(Filter);	// remove a subset
	public Content removeContent(Content);		// remove a node
	public Content removeContent(int);		// remove at index
Removing a node, or list of nodes, returns the node, or node list, that was removed. Removed nodes no longer have a parent node, so may be reattached elsewhere, if desired.

Filters can be used freely by the programmer. They provide the most efficient Java implementation of how to filter subsets of nodes. They are used internally within the JAST software. For example, the methods of the Element class that return named children are encoded using NameFilter objects to select contents that are nodes of the type Element that have the desired name. The following calls are nearly identical:

	List<Content> contents = element.getContents(
			new NameFilter("Person", Content.ELEMENT));
	List<Element> children = element.getChildren("Person); 
except that the second expression returns a list of Element, rather than a list of Content, for convenience.

The Filter Class Library

The main API class to use is Filter and its descendants. All Filter subclasses provide a method:

	public boolean accept(Content);  // does the filter accept the node?
which returns true if the particular filter accepts the node and false otherwise. This method is implemented differently in each subclass of Filter, according to the test that the filter is seeking to apply. Filters may be used individually, or in combination, using methods that logically combine filters in the most efficient way. The most commonly used filters are: Filters use the most efficient algorithms for matching nodes. For example, TypeFilter uses a bitmask pattern to select nodes whose content-type matches the pattern. Comparison uses fast bitwise arithmetic on content-type identifiers. NameFilter compiles the name pattern to decide whether to match against a short name, or full identifier including the XML namespace prefix. All symbols are interned and compared using basic equality for speed. ValueFilter compiles a comparison operator and a reference value into a closure that may be applied to Element or Attribute values, which are converted to the correct types before comparison, and may be applied efficiently multiple times to many nodes.

Creating Compound Filters

Filters may be combined with other filters using logical combination methods that return the most suitable filter. The root class Filter declares the following API, which is suitably implemented in each descendant:

	public Filter and(Filter other); 	// Satisfy this and other
	public Filter or(Filter other); 	// Satisfy this or other
	public Filter not(); 			// Satisfy the complement
Sometimes, these methods construct a compound filter; but frequently they merge the constraints of the pair of filters, returning a more efficient filter than any manually created compound filter.

For example, whereas and() will by default return the AndFilter compound filter, when any filter is combined with a TypeFilter, this simply results in a specialised version of the same filter, accepting a more restricted type. Similarly, whereas not() will by default return a NotFilter, asking for the complement of the latter will return the original filter. Similar rules apply De Morgan's Law over compound filters to simplify the levels of nesting and also merge child and attribute constraints when combining multiple ChildFilter or PropertyFilter filters.