YW: If automatic acquisition of content is possible to any degree, from a Machine-Readable Dictionary or corpus then, since those are plainly in NL, does this suggest that in some form NL is a representation language for information about language, and that settles the issue discussed raised earlier.
SN: First of all, I think that this premise is a moot point at the moment, because automatic acquisition of content can be considered possible only if content is plainly trivial. Any success in the automatic acquisition of content is predicated on the ability of the developers to model (in the weak sense, with no claims of similarity of the model to the modeled other than at output!) the disambiguation and other meaning assignment processes of humans. More concretely, this modeling involves overt, human-directed, formulation, at the time of acquisition, of the background knowledge and processes which support the automatic assignment of meaning at processing time.
But even if the premise of your argument is given, the argument itself still seems to be a bit of a sleight of hand. It is rather similar methodologically to the use by our colleagues at USC ISI (e.g.) of the fact that the ontology in the Pangloss DARPA-funded MT project used English as its metalanguage: the Spanish lexicon in that project explained the meanings of Spanish words in terms of an ontology whose atoms were homographs of English words and expressions.
YW: Well, if they can do it, I might want to say it is not a sleight of hand but proof of my NL-RL point. I also want to use the metaphor of a dictionary as containing a lexicographer's ``conscious, explicit, knowledge'', which is what we might extract by these processes-but other computations over the result could yield meaning connections no lexicographer had actually seen (and which might be said to model his unconscious).
SN: A representation needs to be reformulated and fleshed out for machines. Lexicographers in writing (printed) dictionary entries heavily (if subconsciously!) rely on the fact that their representations, such as definitions, will be processed by a high-quality language processor, namely, the human! This may be the very crux of our disagreement. The task of NLP knowledge acquirers is to use their language processing capacity to state information as overtly as possible given a desired grain-size of description AND in a format which facilitates access by machine (e.g., frames). The latter condition is, of course, of secondary importance: it is a convenience consideration only. The former condition is contentful in that it presumes that the definition is not complete by itself but only together with the human understander of that definition. This can be proven wrong, incidentally, if it is shown that dictionary entries, in fact, do not rely on extraneous human knowledge in specifying definitions. But if that were so, why do lexicographers say that if you don't know some meaning, you won't understand it from the dictionary? Is this just frivolity?
YW: It is frivolity and, if true in general, would make their products useless. I still think it is an open question whether structures derived pretty much automatically from MRDs can be useful for NLP . If they are your position weakens. Is our difference really one of bottom-up versus top-down approaches to the same information? You believe that the acquisition of the core of these knowledge resources can be done only semi-automatically, but under human supervision: for instance, in automatic production of lexicon entries through lexical rules.