A new perspective on the unit of analysis in collostructional analysis
Thomas Proisl | Friedrich-Alexander-Universität Erlangen-Nürnberg
The aim of collostructional analysis or, more precisely, simple collexeme analysis, is to quantify the statistical association between a construction c and a lexeme l that occurs in a particular slot of the construction. The analysis is based on 2×2 contingency tables that ought to represent a cross-classification of the units of analysis. So far, the units of analysis have been identified either as all constructions in the corpus or all instances of a class C of constructions to which construction c belongs. In practice, it is often not possible or feasible to identify these constructions. Therefore, the sample size is typically approximated by heuristic estimates. The bottom-right cell of the contingency table is most affected by these approximations. I suggest that the units of analysis be defined on the word level, instead, as the class W of word forms that satisfy the restrictions on the collexeme slot of c.
(2010) Language, Usage and Cognition. Cambridge University Press.
Chen, D., & Manning, C. D.
(2014) A fast and accurate dependency parser using neural networks. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014) (pp. 740–750).
Church, K. W.
(2000) Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p2. In Proceedings of the 18th Conference on Computational Linguistics (COLING’00), Volume 1 (pp. 180–186). [URL].
Church, K. W., & Hanks, P.
(1990) Word association norms, mutual information, and lexicography. Computational Linguistics,
16
(1), 22–29.
Church, K., Gale, W., Hanks, P., & Hindle, D.
(1989) Parsing, word associations and typical predicate-argument relations. In Speech and Natural Language: Proceedings of a Workshop held at Cape Cod, Massachusetts, October 15–18, 1989 (pp. 75–81). [URL].
Evert, S.
(2004) The Statistics of Word Cooccurrences: Word Pairs and Collocations [Doctoral dissertation, Universität Stuttgart]. [URL]
Goldberg, A. E.
(2006) Constructions at Work: The Nature of Generalization in Language. Oxford University Press.
(1974) English lexical collocations. Cahiers de Lexicologie,
24
1, 15–61.
Katz, S. M.
(1996) Distribution of content words and phrases in text and language modelling. Natural Language Engineering,
2
(1), 15–59.
Kilgarriff, A.
(2005) Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory,
1
(2), 263–276.
Korhonen, A.
(2002) Subcategorization Acquisition [Doctoral dissertation, University of Cambridge]. [URL]
Korhonen, A., Krymolowski, Y., & Briscoe, T.
(2006) A large subcategorization lexicon for natural language processing applications. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (pp. 1015–1020). [URL]
Küchenhoff, H., & Schmid, H.-J.
(2015) Reply to “More (old and new) misunderstandings of collostructional analysis: On Schmid & Küchenhoff” by Stefan Th. Gries. Cognitive Linguistics,
26
(3), 537–547.
Loftus, G. R.
(1996) Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science,
5
(6), 161–171.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D.
(2014) The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, System Demonstrations (pp. 55–60).
Nivre, J., Marneffe, M.-C. de, Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D.
(2016) Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 1659–1666). [URL]
Pecina, P.
(2005) An extensive empirical study of collocation extraction methods. In C. Callison-Burch & S. Wan (Eds.), Proceedings of the ACL Student Research Workshop (pp. 13–18). [URL].
Pecina, P.
(2010) Lexical association measures and collocation extraction. Language Resources and Evaluation,
44
(1), 137–158.
Sarkar, A., & Zeman, D.
(2000) Automatic extraction of subcategorization frames for Czech. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), Volume 2 (pp. 691–697). [URL].
Schäfer, R.
(2015) Processing and querying large web corpora with the COW14 architecture. In P. Bański, H. Biber, E. Breiteneder, M. Kupietz, H. Lüngen, & A. Witt (Eds.), Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3) (pp. 28–34). [URL]
Schäfer, R., & Bildhauer, F.
(2012) Building large corpora from the web using a new efficient tool chain. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 486–493). [URL]
Schmid, H.-J.
(2000) English Abstract Nouns as Conceptual Shells: From Corpus to Cognition. Mouton de Gruyter.
Schmid, H.-J., & Küchenhoff, H.
(2013) Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics,
24
(3), 531–577.
Schuster, S., & Manning, C. D.
(2016) Enhanced English universal dependencies: An improved representation for natural language understanding tasks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 2371–2378). [URL]
Stefanowitsch, A.
(2014) Collostructional analysis: A case study of the English into-causative. In T. Herbst, H.-J. Schmid, & S. Faulhaber (Eds.), Constructions Collocations Patterns (pp. 217–238). De Gruyter Mouton.
(2005) Covarying collexemes. Corpus Linguistics and Linguistic Theory,
1
(1), 1–43.
Stefanowitsch, A., & Gries, S. T.
(2009) Corpora and grammar. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp. 933–952). Walter de Gruyter.
Stevens, M. E., Giuliano, V. E., & Heilprin, L. B.
(Eds.) (1965) Statistical Association Methods for Mechanized Documentation. Symposium Proceedings. Washington 1964. National Bureau of Standards.
Uhrig, P., Evert, S., & Proisl, T.
(2018) Collocation candidate extraction from dependency-annotated corpora: Exploring differences across parsers and dependency annotation schemes. In P. Cantos-Gómez & M. Almela-Sánchez (Eds.), Lexical Collocation Analysis: Advances and Applications (pp. 111–140). Springer.
Wiechmann, D.
(2008) On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory,
4
(2), 253–290.
Cited by (1)
Cited by 1 other publications
Lacić, Ivan
2024. A corpus-based study of maximizer–adjective patterns in Croatian. Language Sciences 102 ► pp. 101603 ff.
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.