Use words, not constructions!: A new perspective on the unit of analysis in collostructional analysis

Proisl, Thomas

doi:10.1075/ijcl.20072.pro

Article published In:

International Journal of Corpus Linguistics
Vol. 27:3 (2022) ► pp.349–379

Use words, not constructions!

A new perspective on the unit of analysis in collostructional analysis

Thomas Proisl | Friedrich-Alexander-Universität Erlangen-Nürnberg

The aim of collostructional analysis or, more precisely, simple collexeme analysis, is to quantify the statistical association between a construction c and a lexeme l that occurs in a particular slot of the construction. The analysis is based on 2×2 contingency tables that ought to represent a cross-classification of the units of analysis. So far, the units of analysis have been identified either as all constructions in the corpus or all instances of a class C of constructions to which construction c belongs. In practice, it is often not possible or feasible to identify these constructions. Therefore, the sample size is typically approximated by heuristic estimates. The bottom-right cell of the contingency table is most affected by these approximations. I suggest that the units of analysis be defined on the word level, instead, as the class W of word forms that satisfy the restrictions on the collexeme slot of c.

Keywords: collostructional analysis, collexeme analysis, contingency table, unit of analysis, sample size

Article outline

1.Introduction
- 1.1Analysis of cooccurrence data with contingency tables
- 1.2Collostructional analysis
2.The unit of analysis in simple collexeme analysis
- 2.1Constructions as the unit of analysis
- 2.2Approximating the sample size
- 2.3Problems arising from approximating the sample size
- 2.4Practical impact of the approximations
3.Suggested solution
4.Discussion
- 4.1Methodological advantages
- 4.2Accidental application of word-based simple collexeme analysis
- 4.3Change of interpretative perspective
5.Word-based vs. heuristic simple collexeme analysis – Case studies
- 5.1The [N waiting to happen] construction
- 5.2The [X think nothing of V-ing] construction
6.Conclusion
Notes
References

Published online: 25 May 2022

https://doi.org/10.1075/ijcl.20072.pro

References (35)

Bybee, J.

(2010) Language, Usage and Cognition. Cambridge University Press.

Chen, D., & Manning, C. D.

(2014) A fast and accurate dependency parser using neural networks. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014) (pp. 740–750).

Church, K. W.

(2000) Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p² . In Proceedings of the 18th Conference on Computational Linguistics (COLING’00), Volume 1 (pp. 180–186). [URL].

Church, K. W., & Hanks, P.

(1990) Word association norms, mutual information, and lexicography. Computational Linguistics, 16 (1), 22–29.

Church, K., Gale, W., Hanks, P., & Hindle, D.

(1989) Parsing, word associations and typical predicate-argument relations. In Speech and Natural Language: Proceedings of a Workshop held at Cape Cod, Massachusetts, October 15–18, 1989 (pp. 75–81). [URL].

Evert, S.

(2004) The Statistics of Word Cooccurrences: Word Pairs and Collocations [Doctoral dissertation, Universität Stuttgart]. [URL]

Goldberg, A. E.

(2006) Constructions at Work: The Nature of Generalization in Language. Oxford University Press.

Gries, S. T.

(2012) Frequencies, probabilities, and association measures in usage-/exemplar-based linguistics: Some necessary clarifications. Studies in Language, 36 (3), 477–510.

(2015) More (old and new) misunderstandings of collostructional analysis: On Schmid and Küchenhoff (2013). Cognitive Linguistics, 26 (3), 505–536.

Gries, S. T., & Stefanowitsch, A.

(2004a) Covarying collexemes in the into-causative. In M. Achard & S. Kemmer (Eds.), Language, Culture, and Mind (pp. 225–236). CSLI.

(2004b) Extending collostructional analysis: A corpus-based perspective on “alternations”. International Journal of Corpus Linguistics, 9 (1), 97–129.

Jones, S., & Sinclair, J.

(1974) English lexical collocations. Cahiers de Lexicologie, 24 1, 15–61.

Katz, S. M.

(1996) Distribution of content words and phrases in text and language modelling. Natural Language Engineering, 2 (1), 15–59.

Kilgarriff, A.

(2005) Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory, 1 (2), 263–276.

Korhonen, A.

(2002) Subcategorization Acquisition [Doctoral dissertation, University of Cambridge]. [URL]

Korhonen, A., Krymolowski, Y., & Briscoe, T.

(2006) A large subcategorization lexicon for natural language processing applications. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (pp. 1015–1020). [URL]

Küchenhoff, H., & Schmid, H.-J.

(2015) Reply to “More (old and new) misunderstandings of collostructional analysis: On Schmid & Küchenhoff” by Stefan Th. Gries. Cognitive Linguistics, 26 (3), 537–547.

Loftus, G. R.

(1996) Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5 (6), 161–171.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D.

(2014) The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, System Demonstrations (pp. 55–60).

Nivre, J., Marneffe, M.-C. de, Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D.

(2016) Universal Dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 1659–1666). [URL]

Pecina, P.

(2005) An extensive empirical study of collocation extraction methods. In C. Callison-Burch & S. Wan (Eds.), Proceedings of the ACL Student Research Workshop (pp. 13–18). [URL].

(2010) Lexical association measures and collocation extraction. Language Resources and Evaluation, 44 (1), 137–158.

Sarkar, A., & Zeman, D.

(2000) Automatic extraction of subcategorization frames for Czech. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’00), Volume 2 (pp. 691–697). [URL].

Schäfer, R.

(2015) Processing and querying large web corpora with the COW14 architecture. In P. Bański, H. Biber, E. Breiteneder, M. Kupietz, H. Lüngen, & A. Witt (Eds.), Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3) (pp. 28–34). [URL]

Schäfer, R., & Bildhauer, F.

(2012) Building large corpora from the web using a new efficient tool chain. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 486–493). [URL]

Schmid, H.-J.

(2000) English Abstract Nouns as Conceptual Shells: From Corpus to Cognition. Mouton de Gruyter.

Schmid, H.-J., & Küchenhoff, H.

(2013) Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics, 24 (3), 531–577.

Schuster, S., & Manning, C. D.

(2016) Enhanced English universal dependencies: An improved representation for natural language understanding tasks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 2371–2378). [URL]

Stefanowitsch, A.

(2014) Collostructional analysis: A case study of the English into-causative. In T. Herbst, H.-J. Schmid, & S. Faulhaber (Eds.), Constructions Collocations Patterns (pp. 217–238). De Gruyter Mouton.

Stefanowitsch, A., & Gries, S. T.

(2003) Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8 (2), 209–243.

(2005) Covarying collexemes. Corpus Linguistics and Linguistic Theory, 1 (1), 1–43.

(2009) Corpora and grammar. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp. 933–952). Walter de Gruyter.

Stevens, M. E., Giuliano, V. E., & Heilprin, L. B.

(Eds.) (1965) Statistical Association Methods for Mechanized Documentation. Symposium Proceedings. Washington 1964. National Bureau of Standards.

Uhrig, P., Evert, S., & Proisl, T.

(2018) Collocation candidate extraction from dependency-annotated corpora: Exploring differences across parsers and dependency annotation schemes. In P. Cantos-Gómez & M. Almela-Sánchez (Eds.), Lexical Collocation Analysis: Advances and Applications (pp. 111–140). Springer.

Wiechmann, D.

(2008) On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory, 4 (2), 253–290.

Cited by (1)

Cited by 1 other publications

Lacić, Ivan

2024. A corpus-based study of maximizer–adjective patterns in Croatian. Language Sciences 102 ► pp. 101603 ff.

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.