public class DTDReader extends BasicReader
uk.ac.sheffield.jast.valid
.
As well as parsing ELEMENT and ATTLIST declarations, DTDReader also parses ENTITY declarations that refer to additional character entities, defining special encoded characters, or defining string entities, defining special boilerplate text to expand in the main XML document. These entities are added to the Lexicon supplied by another reader or builder that requested this DTDReader. This allows the other reader or builder to read a main XML document containing these additional entity references.
DTDReader uses the low-level token scanner DTDScanner to supply it with integer tokens, members of the Tokens class, that represent different recognised DTD events. It consumes these events and, where events are associated with segmented text, it consumes the associated text. The resulting model is returned as an instance of the top ElementRule, which is typically stored in a Doctype node. It is also possible to access the grammar as a list of ElementRule productions. This is used to transfer the external grammar subset from one DTDReader to another, when the latter wishes to merge this with the internal grammar subset, which it is about to read. DTDReader implements the Closeable interface by virtue of inheriting from BasicReader.
Modifier and Type | Field and Description |
---|---|
private java.util.Map<java.lang.String,ElementRule> |
dictionary
The dictionary mapping from Element names to the ElementRule rules
defining the elements concerned.
|
private static java.util.Set<java.lang.String> |
legalTypes
Secret set of legal types for ATTLIST attributes.
|
lastToken, lexicon, scanner, validation
Constructor and Description |
---|
DTDReader(java.io.File file,
Lexicon lexicon,
java.lang.String encoding)
Creates a DTDReader to read an external grammar from a file, updating
a Lexicon supplied by some other XML or AST reader.
|
DTDReader(java.io.File file,
java.lang.String encoding)
Creates a DTDReader to read an external grammar from a file, updating
an internal Lexicon.
|
DTDReader(java.io.Reader reader,
Lexicon lexicon,
java.lang.String encoding)
Creates a DTDReader to read an internal grammar from a character
Reader, updating a Lexicon supplied by some other XML or AST reader.
|
DTDReader(java.net.URL url,
Lexicon lexicon,
java.lang.String encoding)
Creates a DTDReader to read an external grammar from a URL, updating
a Lexicon supplied by some other XML or AST reader.
|
DTDReader(java.net.URL url,
java.lang.String encoding)
Creates a DTDReader to read an external grammar from a URL, updating
an internal Lexicon.
|
Modifier and Type | Method and Description |
---|---|
private ElementRule |
getGrammar()
Returns a tree of grammar rules, rooted in the top ElementRule that
was parsed.
|
java.util.List<ElementRule> |
getProductions()
Returns the list of ElementRule grammar productions recorded by this
DTDReader.
|
private void |
parseAnyRule()
Parses any single DTD grammar rule.
|
private void |
parseAttlistRule()
Parses an ATTLIST definition.
|
private void |
parseComment()
Parses an XML comment.
|
private GrammarRule |
parseCompoundBody()
Parses a compound grammar rule body term.
|
private GrammarRule |
parseElementBody()
Parses an element grammar rule body term.
|
private void |
parseElementRule()
Parses an ELEMENT definition.
|
private void |
parseEntityRule()
Parses an ENTITY definition.
|
private GrammarRule |
parseMixedBody()
Parses a mixed-content compound grammar rule body.
|
private GrammarRule |
parseMultiplicity(GrammarRule rule)
Parses an optional multiplicity mark adjoining a grammar rule body
term.
|
private java.lang.String |
readExternalEntity(java.lang.String uri)
Read the value of an external entity from a URI.
|
ElementRule |
readGrammar()
Parses the DTD grammar, compiling a tree of GrammarRules and updating
a Lexicon with new entity definitions.
|
void |
setProductions(java.util.List<ElementRule> productions)
Installs an ordered list of ElementRule grammar productions in this
DTDReader.
|
checkEncoding, close, endOfStream, getContext, getEncoding, getLexicon, getLineNumber, parseQuotedValue, setLexicon, setValidation
encodingError, semanticError, syntaxError
private static final java.util.Set<java.lang.String> legalTypes
private java.util.Map<java.lang.String,ElementRule> dictionary
public DTDReader(java.io.File file, java.lang.String encoding) throws java.io.UnsupportedEncodingException, java.io.FileNotFoundException
file
- the DTD file, from the local filesystem.encoding
- the expected character encoding, typically "UTF-8".java.io.UnsupportedEncodingException
- if the DTD file's character
encoding is not supported.java.io.FileNotFoundException
- if the file cannot be found.public DTDReader(java.io.File file, Lexicon lexicon, java.lang.String encoding) throws java.io.UnsupportedEncodingException, java.io.FileNotFoundException
file
- the DTD file, from the local filesystem.lexicon
- the Lexicon from the XML or AST reader.encoding
- the expected character encoding, typically "UTF-8".java.io.UnsupportedEncodingException
- if the DTD file's character
encoding is not supported.java.io.FileNotFoundException
- if the file cannot be found.public DTDReader(java.net.URL url, java.lang.String encoding) throws java.io.UnsupportedEncodingException, java.io.IOException
url
- the URL describing a path to a DTD file.encoding
- the expected character encoding is "ISO-8859-1".java.io.UnsupportedEncodingException
- if the DTD file's character
encoding is not supported.java.io.IOException
- if the URL is malformed or cannot be opened as
a file.public DTDReader(java.net.URL url, Lexicon lexicon, java.lang.String encoding) throws java.io.UnsupportedEncodingException, java.io.IOException
url
- the URL describing a path to a DTD file.lexicon
- the Lexicon from the XML or AST reader.encoding
- the expected character encoding is "ISO-8859-1".java.io.UnsupportedEncodingException
- if the DTD file's character
encoding is not supported.java.io.IOException
- if the URL is malformed or cannot be opened as
a file.public DTDReader(java.io.Reader reader, Lexicon lexicon, java.lang.String encoding) throws java.io.UnsupportedEncodingException
reader
- a StringReader scanning the internal grammar subset.lexicon
- the Lexicon from the XML or AST reader.encoding
- the expected character encoding, typically "UTF-8".java.io.UnsupportedEncodingException
- if the DTD file's character
encoding is not supported.public ElementRule readGrammar() throws java.io.IOException, SyntaxError, SemanticError
java.io.IOException
- if the DTD grammar cannot be read.SyntaxError
- if the DTD syntax is faulty.SemanticError
- if the grammar is not complete.private ElementRule getGrammar()
public java.util.List<ElementRule> getProductions()
public void setProductions(java.util.List<ElementRule> productions)
productions
- a list of ElementRule productions.private void parseAnyRule() throws java.io.IOException, SyntaxError
java.io.IOException
- if the token stream fails.SyntaxError
- if the DTD syntax is faulty.private void parseComment() throws java.io.IOException, SyntaxError
java.io.IOException
- if the token stream fails.SyntaxError
- if the XML syntax is faulty.private void parseEntityRule() throws java.io.IOException, SyntaxError
java.io.IOException
- if the token stream fails.SyntaxError
- if the DTD syntax is faulty.private void parseAttlistRule() throws java.io.IOException, SyntaxError
java.io.IOException
- if the token stream fails.SyntaxError
- if the XML syntax is faulty.private void parseElementRule() throws java.io.IOException, SyntaxError
java.io.IOException
- if the token stream fails.SyntaxError
- if the XML syntax is faulty.private GrammarRule parseCompoundBody() throws java.io.IOException, SyntaxError
java.io.IOException
- if the token stream fails.SyntaxError
- if the XML syntax is faulty.private GrammarRule parseMixedBody() throws java.io.IOException, SyntaxError
|
' and for the compound body to have
a '*
' zero-to-many multiplicity marker; and returns a
MixedContentRule.java.io.IOException
- if the token stream fails.SyntaxError
- if the XML syntax is faulty.private GrammarRule parseElementBody() throws java.io.IOException, SyntaxError
java.io.IOException
- if the token stream fails.SyntaxError
- if the XML syntax is faulty.private GrammarRule parseMultiplicity(GrammarRule rule) throws java.io.IOException, SyntaxError
rule
- a GrammarRule.java.io.IOException
- if the token stream fails.SyntaxError
- if the XML syntax is faulty.private java.lang.String readExternalEntity(java.lang.String uri) throws java.io.IOException
The security risks associated with external entities are well known. They can include seeking to access privileged files on the host machine (password or startup files), or denial-of-service attacks that open file resources and do not release them. The security checks in place require entities to be stored in simple ".txt" text files; and file-reading times out after one second.
uri
- the URI of the external entity's expansion.java.io.IOException
- if the external file cannot be read.