next up previous
Next: Brill Tagger [4]: Up: LaSIE Modules Previous: Tokenizer:

Sentence Splitter:

based on the sentence splitting algorithm used in the Sussex MUC-5 system, POETIC [13], the module identifies sentence start and end byte offsets, making use of SGML sentence markup if present.



Gillian Callaghan 2000-03-29