Tools/Resources for NLP Projects at Sheffield
- Stemmers
- Taggers
- Simple NVX tagger, in perl
- Text Categorisation
- Reuters 21578 collection
- Preprocessing / topic splitting script (perl) for Reuters collection
- Document frequencies
- Document frequency counts for BNC Section A
- Document frequency counts for BNC Section A, with porter stemming
- Dot Plotting
- Corpora
- Penn Treebank III
- BNC
- ? METER corpus
- Reuters 21578 collection
- Project Gutenberg - good source of texts
- ?? sources for essays/articles in (or nearly in) plain text, for
summarisation
- Some good collections / link pages
Last modified Tuesday October 30 2001 by Mark Hepple