Brill-style transformation-based learning methods are one of the few ML methods in NLP to have been applied above and beyond the part-of-speech tagging origins of virtually all ML in NLP. Brill's original application triggered only on POS tags; later  he added the possibility of lexical triggers. Since then the method has been extended successfully to e.g. speech act determination , and a template learning application was designed by Vilain .
A fast implementation based on the compilation of Brill-style rules to deterministic automata was developed at Mitsubishi labs  (see also ). The quality of the transformation rules learned depends on factors such as:
The accepted wisdom of the machine learning community is that it is very hard to predict which learning algorithm will produce optimal performance, so it is advisable to experiment with a range of algorithms running on real data. There have as yet been no systematic comparisons between these initial efforts and other conventional machine learning algorithms applied to learning extraction rules for IE data structures (e.g. example-based systems such as TiMBL  and ILP .
Such experiments should be considered as strongly interacting with the issues discussed below (section 3 on the lexicon), where we propose extensions to earlier work done by us and others  on unsupervised learning of the surface forms (subcategorization patterns) of a set of root template verbs: this was work that sought to cover the range of corpus forms under which a significant verb's NEs might appear in text. Such information might or might not be available in a given set of document, template pairs-e.g. would NOT be if the verbs appeared in sentences only in canonical forms. Investigation is still needed on the trade off between the corpus-intensive and the document, filled template pair methods, if templates have not been pre-provided for a very large corpus selection (for, if they had, the methodology above could subsume the subcategorization work below). It will be, in practice, a matter of training sample size and richness.