R. Gaizauskas, M. Hepple, and C. Huyck. 1998. Modifying Existing Annotated Corpora for General Comparative Evaluation of Parsing. Proceedings of the First International Conference on Language Resources and Evaluation (LREC'98) Workshop on Evaluation of Parsing Systems. pp 21-28. Granada, 1998.

Abstract:

We argue that the current dominant paradigm in parser evaluation work, which combines use of the Penn Treebank reference corpus and of the Parseval scoring metrics, is not well-suited to the task of general comparative evaluation of diverse parsing systems. In (Gaizauskas et al., 1998), we propose an alternative approach which has two key components. Firstly, we propose parsed corpora for testing that are much flatter than those currently used, whose ``gold standard'' parses encode only those grammatical constituents upon which there is broad agreement across a range of grammatical theories. Secondly, we propose modified evaluation metrics that require parser outputs to be `faithful to', rather than mimic, the broadly agreed structure encoded in the flatter gold standard analyses. This paper addresses a crucial issue for the (Gaizauskas et al., 1998) approach, namely, the creation of the evaluation resources that the approach requires, i.e. annotated corpora recording the flatter parse analyses.We argue that, due to the nature of the resources required, they can be derived in a comparatively inexpensive fashion from existing parse annotated resources, where available.