Brill-style transformation-based learning methods are one of the few ML methods in NLP to have been applied above and beyond the part-of-speech tagging origins of virtually all ML in NLP. Brill's original application triggered only on POS tags; later [8] he added the possibility of lexical triggers. Since then the method has been extended successfully to e.g. speech act determination [39], and a template learning application was designed by Vilain [54].
A fast implementation based on the compilation of Brill-style rules to deterministic automata was developed at Mitsubishi labs [51] (see also [20]). The quality of the transformation rules learned depends on factors such as:
The accepted wisdom of the machine learning community is that it is very hard to predict which learning algorithm will produce optimal performance, so it is advisable to experiment with a range of algorithms running on real data. There have as yet been no systematic comparisons between these initial efforts and other conventional machine learning algorithms applied to learning extraction rules for IE data structures (e.g. example-based systems such as TiMBL [23] and ILP [44].
Such experiments should be considered as
strongly interacting with the issues discussed below (section 3 on the lexicon),
where we propose extensions to earlier work
done by us and others [4] on unsupervised learning of
the surface forms (subcategorization patterns) of a set of root
template verbs: this was work that sought to cover the
range of corpus forms under which a significant verb's NEs might
appear in text. Such
information might or might not be available in a given set of
document, template
pairs-e.g. would NOT be if the verbs appeared in
sentences only in canonical forms. Investigation is still needed on
the trade off between the corpus-intensive
and the
document, filled template
pair methods, if templates have
not been pre-provided for a very large corpus selection (for, if they
had, the methodology above could subsume the subcategorization
work below). It will be, in practice, a matter of training sample size
and richness.