Publications

Publication details [#60845]

Publication type
Article in book
Publication language
English

Annotation

Collocation and colligation are two closely related concepts linked to the distributional properties of linguistic items in actual language use, referring to the likelihood of co-occurrence of (two or more) lexical items and grammatical categories, respectively. Both terms have been attributed to J. R Firth (see Östman and Simon-Vandenbergen 2005 and Shore 2010 for a summary of Firth’s work). Collocation in particular has become a fundamental concept in usage-based studies in many linguistic fields, most notably lexical syntax and semantics. Typically, collocations and colligations are studied in large electronic corpora which allows for statistical analyses of the co-occurrence patterns of linguistic items. Collocation refers to the syntagmatic attraction between two (or more) lexical items: morphemes, words, phrases or utterances. Collocation analyses have been mainly conducted on the word-level. The syntagmatic attraction, or collocation strength, between two words -a node and its collocate- is calculated based on four observed absolute frequencies in the data and by using statistical association measures which take into consideration the uneven distribution of words in the data. What counts as a co-occurrence depends on how one has defined the collocation window, i.e. how far apart the node and the collocate can be in order to be considered as co-occurring. Most studies on collocations do not take clause or sentence boundaries into consideration. The frequent co-occurrence of more than two words has often been treated as a separate phenomenon (referred to as e.g. lexical bundles, clusters and multi-word strings). In some cases, they are reducible to binary collocations. The term collocation has been used with slightly varying meanings in the linguistic literature. Following the Firthian tradition, many scholars assume a purely statistical definition of collocation, a view labelled empirical collocation by Evert (2009). Within the tradition of phraseology, however, collocation is often defined as a word combination that has been lexicalized to at least some extent, which Evert (2009) calls lexical collation. Finally, in computational linguistics, collocation is traditionally defined as a word combination with idiosyncratic semantic or syntactic properties, which Sag et al. (2002) and Evert (2009) refer to as multiword expressions. Collocation analysis is one of the most extensively used methods in corpus linguistics today, especially to compare the meaning of near-synonyms. It also has been an invaluable tool for lexicographers. Furthermore, this analysis has been used widely within the theoretical approach of frame semantics and the FrameNet project (e.g. Fillmore, Johnson and Petruck 2003). During the last two decades collocation analysis has also increasingly been applied in computational linguistics for the purposes of machine translation, natural language processing, as well as vector-space modeling in the field of distributional semantics. The term colligation has been used in a large number of different senses. Firth (1968, 181) used the term to refer to the syntagmatic attraction between grammatical categories, e.g. parts of speech or syntactic functions. The most common use of the term today, however, is to designate the attraction between a lexical item and a grammatical category (e.g. Tognini-Bonelli 2001, 163). The concept has also been applied to multi-word phrases. To date, the most extensive treatment of the concept colligation has been presented by Hoey (2005). In his theory of lexical priming -i.e. a statistically based theory of linguistic competence-, colligations play a crucial part in what it means to know a language. Hoey uses colligation as a cover term which encompasses both grammatical patterns and patterns of information structure associated with a lexical item. It is important to note that all of the relationships above can be positive as well as negative. The main area of study in which colligation analyses have been employed is the comparative study of near-synonyms. At a methodological-theoretical level, the findings of colligation analyses, as well as collocation analyses, suggest that different word forms of the same lexeme have often noticeably different distributional patterns. This effect has been demonstrated in a large variety of languages. Finally, it is worth stressing that neither collocational nor colligational preferences of a word form are constant across all domains and registers of language but vary significantly between different types of text and different subcorpora. Many scholars have stressed a close interrelationship between collocation and colligation: what follows is a selective overview of the main theoretical and methodological frameworks combining them. Much of the ground-laying work on collocations was done by John Sinclair. His classic book Corpus, Concordance, Collocation (1991) constitutes a powerful argument for placing the interrelations between different levels of linguistic analysis at the center of (corpus) linguistic research. According to Sinclair, the meaning of a word is heavily context-dependent and is to be analyzed via the lexical and grammatical elements with which the word co-occurs. However, the relationship between a word and its lexical and grammatical context is reciprocal. The knowledge of the collocational and colligational preferences of individual word forms is part of the speaker’s mental representation of the grammar of his or her language. In his later work, he referred to these mental representations as extended lexical units or extended units of meaning, encompassing information about four types of contextual parameters: (i) collocation, (ii) colligation, (iii) semantic preference, and (iv) semantic prosody (for a summary of Sinclair’s model, see Stubbs 2009, 123–126). Another comprehensive theoretical model that is founded on the concepts of collocation and colligation, is Michael Hoey’s (1997, 2005) theory of lexical priming. In Hoey’s view, collocation is primarily a psychological association between two word forms which is evidenced by their statistically significant tendency to co-occur in corpus data. Every item in a speaker’s mental lexicon is cumulatively loaded with all the linguistic and extralinguistic contexts in which it has been encountered and, thus, it becomes primed for use in certain contexts over others. Since every speaker has at least a partially different experience of language use, words can be differently primed for each person. Hence, the collocational and colligational preferences for individual word forms vary depending on the speaker (Hoey 2005, 8). According to Hoey, colligation encompasses not one, but three particular aspects of statistical attraction between lexical items and grammatical categories. In addition, Hoey introduces three types of priming that refer to statistical associations at the level of discourse (Hoey 2005, 13): textual collocation, textual semantic association and textual colligation. The parameters of priming at all levels of abstraction affect each other and are thus interrelated in a highly intricate manner. Both Sinclair’s and Hoey’s work, as well as the majority of corpus linguistic research in general, have taken the lexical item as a starting point and investigated the contexts and structures in which it occurs. Over the last few decades, however, it has become increasingly popular to study the association of lexical items to specific grammatical constructions; the currently most widely used approach here is probably collostruction analysis (for an overview, see Stefanowitsch and Gries 2009). This approach is an extension of both collocation and colligation analyses by combining corpus linguistic methods with the theory of construction grammar (e.g.Fillmore 1988, Goldberg 2006). To date, perhaps the most detailed method of analysis of the interrelations between the distributional properties and preferences of lexical items is the so called behavioral profiles analysis (e.g. Gries and Otani 2010). This method has been primarily used for cognitive semantic studies of near-synonyms and antonyms.The numerous approaches combining collocational and colligational data (as well as data on other types of lexical properties), have had notable influence on the development of linguistic methodology in general.