Chapter 7
NLP for resource building
We evaluate several lexicon-based and
corpus-based methods to automatically induce new lexical units for
Swedish FrameNet, and we see that the best-performing setup uses a
combination of both types of methods. A particular challenge for
Swedish is the absence of a lexical resource such as WordNet;
however, we show that the semantic network Saldo, which is organized
according to lexicographical principles quite different from those
of WordNet, is very useful for our purposes.
Article outline
- 1.Introduction
- 1.1Frame semantics and frame-semantic lexicons
- 2.Computational representation of the meaning of words
- 2.1The semantic network Saldo
- 2.2Semantic representations induced from corpora
- 2.2.1Word representations from a class-based
n-gram model
- 2.2.2Geometric word representations from co-occurrences
- 2.2.3Geometric representations from contextual
classifiers
- 3.From word meaning to frame meaning
- 3.1Methods based on distance and similarity measures
- 3.2Classification-based methods
- 3.2.1Representing the meaning of a word using Saldo
- 4.Quantitative evaluation
- 4.1Evaluation metrics
- 4.2Which way is the best to make use of the Saldo
lexicon?
- 4.3Which corpus-based semantic representations are most
effective?
- 4.4Combining lexicon-based and corpus-based classifiers
- 4.5For which frames are our methods successful?
- 4.6Use by lexicographers
- 5.Conclusion
-
Acknowledgements
-
Notes
-
References