Corpus linguists often attempt to avoid assumptions imported from pre-corpus studies, by using methods which could be called “inductive”, in so far as they proceed from observations about textual sequences to generalizations about order in the system. However, induction has been questioned for over 400 years (by Bacon, Hume, Popper and others), and the possibility of rigorous, theory-free induction is now generally rejected. One major phraseological model, proposed by Sinclair in the late 1990s, is certainly not a purely inductive generalization from raw corpus data. I will discuss this model using attested data on a particular construction and a distinction proposed by Firth, Halliday and Palmer between “sequence” (an observable feature of texts) and “order” (a feature of linguists’ models).
Departing from Benor and Levy’s approach to binomials (2006), this study investigated the sequencing of word pairs by controlling grammatical, geographical, and semantic variables. Accordingly, 59 sex-determined noun pairs commonly actualized in American English were examined. The preferred sequencing of 56 of these pairs is predicted by a heuristic that applies three constraints sequentially: (1) the metrical constraint – if the pair’s syllables are asymmetrical, the noun with fewer syllables comes first; (2) the family relationship constraint (discovered in this study) – if the pair’s syllables are symmetrical and the pair expresses a family relationship, the feminine term precedes the masculine term; and (3) the power constraint, where the masculine noun precedes the feminine term in the remaining symmetrical pairs.
This paper reports on a study of attributive adjective sequences belonging to the semantic field of size, examples of which are ‘enormous great’ and ‘wee little’. It takes as its starting point a brief outline of the phenomenon provided by the Cambridge Grammar of the English Language (Huddleston & Pullum, 2002), in which it is referred to as ‘intensificatory tautology’. The paper begins by defining the lexical set to be investigated, and thereafter provides details of the relevant adjectival sequences found in the British National Corpus. Particular attention is paid to the relatively frequent pairs great big, tiny little and little tiny. Information is also given with regard to other semantic fields which corpus data suggests could usefully be investigated.
The present paper aims at shedding light on the diachronic evolution of two death-related intensifiers, dead and deadly, showing their subjectification and grammaticalisation over time. Data from the Middle English Dictionary, the Oxford English Dictionary, and three electronic databases (Early English Books Online, Eighteenth Century Fiction, and Online Books Page) are used to carry out a collocational analysis of both adverbial forms. A detailed study of the collocations of dead and deadly reveals different contexts of variation between the zero and the -ly counterparts. The paper additionally argues that these contexts of variation are not always random, and in certain cases owe to semantic considerations, while other occurrences of dead and deadly seem to point towards highly fossilised uses.
Contemporary spoken American English prefers go-V to go-and-V. However, this is only a synchronic snapshot. Using the Corpus of Historical American English, the present empirical study of the diachronic development of go-and-V and go-V in 19th and 20th century American English texts shows that both constructions underwent a remarkably diverging development. Whereas go-V only started to rise significantly in frequency at the turn of the 20th century, displaying a more or less steady increase up to today’s norm, go-and-V dropped in frequency after having its peak in the second half of the 19th century. A close look at the grammatical context shows that, depending on the verb form, go-V took over from go-and-V at different stages.
This paper describes the history and present status of a family of constructions containing two older (obsolescent and recessive) members, cannot choose but + bare infinitive and cannot but + bare infinitive, and two younger ones, cannot help -ing and cannot help but + bare infinitive. It is shown that cannot help but + bare infinitive constitutes an American-led innovation and that even today the type is distinctly more common and versatile in American than British English. In addition, the paper explores some major distributional constraints distinguishing between cannot help -ing and the three but-types. These involve differences between individual text types, the lexical diversity of the non-finite verb, and certain non-basic, especially Low Transitivity structures.
In this paper I propose an emancipation effect that may follow from the ‘reducing effect’ of frequency (Bybee 2006): if a reduced realization of an item gains in frequency, it will become conceptually independent from the full form. In a context of grammaticalization, I show that this is the case for the form gonna, which is becoming emancipated from its source form going to. I use corpus data of spoken American English to trace the process of emancipation as gonna sheds off the features of phonetic reduction and acquires those of a lexical variant.
This study looks at the variations within preposition + noun + preposition (PNP) sequences such as at (the) risk of, commonly classified as complex prepositions (CPs). The current literature suggests that the more indivisible the structure, the more grammaticalised the unit. Representations of complex prepositions within contemporary grammars indicate that the most common intruder within the fixed PNP sequence is the definite article. Synchronic and diachronic corpus studies were carried out to assess how fixed the form with the definite article is, and whether any CPs have shown a recent tendency to lose it. Decategorialisation was found to be only a minor factor for the CPs investigated, with a combination of semantic and grammatical factors featuring in the grammaticalisation process.
Neology can be identified in a text corpus at surface level by automatic means (Renouf 1993a). In a diachronic corpus of journalism a lexical neologism can be found by comparing each word in a stream of data with a baseline index. A semantic neologism is identifiable through the change in the word’s collocational environment (Renouf 1993b). In this paper, we examine the changing status of neologisms across time, tracking the ‘life-cycle’ of a word (Renouf 2007), from its first appearance in our text, through its fluctuations in frequency and popularity, to its possible assimilation into mainstream language, and its possible death and re-birth. The study is based on a corpus of 1.2 billion words of UK mainstream newspaper text spanning 1989–2011.
This paper is a work in progress report which examines the gender assignment of 950 recently borrowed English nouns manually extracted from lists of anglicism candidates from 2008–2010 in the Norwegian Newspaper Corpus. A corpus-based approach was applied in the search for the distribution of gender in the same corpus, as well as in other available corpora of Norwegian. In addition to presenting some data pertaining to gender assignment and possible assignment rules or principles, the paper also briefly addresses methodological issues such as the suitability of corpora for loanword identification and extraction.
The paper explores how verb-particle combinations have changed with the increased use of online real-time short communication forms. Following up on earlier research (Diemer 2008b & 2009), the study discusses examples of new prefix verbs from a web-based corpus of blogs, providing evidence that the long-term decline of this verb form in English has been reversed in computer-mediated communication, which facilitates the creation and increasingly flexible use of previously non-standard prefix verbs like inbe, oncome and atstand. Proposed reasons for this change are the influence of other languages on English, analogy with existing prefix verbs, special-purpose use, playful use of language, facilitation of syntax and demands of brevity.
Pattern Grammar (Hunston & Francis 2000) has typically focused mainly on complementation patterns, but Hunston (2003, 2011) has speculated if Pattern Grammar should incorporate the study of modality by considering the interaction between modal meaning and particular patterns. This paper presents a quantitative study that finds an association between verbs followed by interrogative clauses (the V wh pattern) and modal verbs/the to-infinitive. This is followed by a qualitative investigation to classify according to meaning phrases focused on the sequence to V wh: four main groups (purpose, difficulty, deontic meaning, volition/intention) are found. The findings raise the prospect of both broadening the scope of Pattern Grammar and improving our understanding of modality from a phraseological perspective.
This paper sets out to explore and evaluate several corpus search methods that are applied to uncover linguistic devices expressing ‘quantity approximation’ in a corpus of business English from an onomasiological perspective. The study is carried out within the framework of a project exploring quantity approximation in various business genres using a contrastive, corpus-driven approach (in Dutch, English and French). The paper sheds light on the advantages and disadvantages of using annotated corpora (part-of-speech and semantic tagging) and automatically extracted word lists for onomasiological investigations. The analysis of the results provides valuable insights into the way these methods might successfully complement each other to uncover a wide variety of linguistic devices expressing a specific notion, in this case quantity approximation.
Corpus linguists often attempt to avoid assumptions imported from pre-corpus studies, by using methods which could be called “inductive”, in so far as they proceed from observations about textual sequences to generalizations about order in the system. However, induction has been questioned for over 400 years (by Bacon, Hume, Popper and others), and the possibility of rigorous, theory-free induction is now generally rejected. One major phraseological model, proposed by Sinclair in the late 1990s, is certainly not a purely inductive generalization from raw corpus data. I will discuss this model using attested data on a particular construction and a distinction proposed by Firth, Halliday and Palmer between “sequence” (an observable feature of texts) and “order” (a feature of linguists’ models).
Departing from Benor and Levy’s approach to binomials (2006), this study investigated the sequencing of word pairs by controlling grammatical, geographical, and semantic variables. Accordingly, 59 sex-determined noun pairs commonly actualized in American English were examined. The preferred sequencing of 56 of these pairs is predicted by a heuristic that applies three constraints sequentially: (1) the metrical constraint – if the pair’s syllables are asymmetrical, the noun with fewer syllables comes first; (2) the family relationship constraint (discovered in this study) – if the pair’s syllables are symmetrical and the pair expresses a family relationship, the feminine term precedes the masculine term; and (3) the power constraint, where the masculine noun precedes the feminine term in the remaining symmetrical pairs.
This paper reports on a study of attributive adjective sequences belonging to the semantic field of size, examples of which are ‘enormous great’ and ‘wee little’. It takes as its starting point a brief outline of the phenomenon provided by the Cambridge Grammar of the English Language (Huddleston & Pullum, 2002), in which it is referred to as ‘intensificatory tautology’. The paper begins by defining the lexical set to be investigated, and thereafter provides details of the relevant adjectival sequences found in the British National Corpus. Particular attention is paid to the relatively frequent pairs great big, tiny little and little tiny. Information is also given with regard to other semantic fields which corpus data suggests could usefully be investigated.
The present paper aims at shedding light on the diachronic evolution of two death-related intensifiers, dead and deadly, showing their subjectification and grammaticalisation over time. Data from the Middle English Dictionary, the Oxford English Dictionary, and three electronic databases (Early English Books Online, Eighteenth Century Fiction, and Online Books Page) are used to carry out a collocational analysis of both adverbial forms. A detailed study of the collocations of dead and deadly reveals different contexts of variation between the zero and the -ly counterparts. The paper additionally argues that these contexts of variation are not always random, and in certain cases owe to semantic considerations, while other occurrences of dead and deadly seem to point towards highly fossilised uses.
Contemporary spoken American English prefers go-V to go-and-V. However, this is only a synchronic snapshot. Using the Corpus of Historical American English, the present empirical study of the diachronic development of go-and-V and go-V in 19th and 20th century American English texts shows that both constructions underwent a remarkably diverging development. Whereas go-V only started to rise significantly in frequency at the turn of the 20th century, displaying a more or less steady increase up to today’s norm, go-and-V dropped in frequency after having its peak in the second half of the 19th century. A close look at the grammatical context shows that, depending on the verb form, go-V took over from go-and-V at different stages.
This paper describes the history and present status of a family of constructions containing two older (obsolescent and recessive) members, cannot choose but + bare infinitive and cannot but + bare infinitive, and two younger ones, cannot help -ing and cannot help but + bare infinitive. It is shown that cannot help but + bare infinitive constitutes an American-led innovation and that even today the type is distinctly more common and versatile in American than British English. In addition, the paper explores some major distributional constraints distinguishing between cannot help -ing and the three but-types. These involve differences between individual text types, the lexical diversity of the non-finite verb, and certain non-basic, especially Low Transitivity structures.
In this paper I propose an emancipation effect that may follow from the ‘reducing effect’ of frequency (Bybee 2006): if a reduced realization of an item gains in frequency, it will become conceptually independent from the full form. In a context of grammaticalization, I show that this is the case for the form gonna, which is becoming emancipated from its source form going to. I use corpus data of spoken American English to trace the process of emancipation as gonna sheds off the features of phonetic reduction and acquires those of a lexical variant.
This study looks at the variations within preposition + noun + preposition (PNP) sequences such as at (the) risk of, commonly classified as complex prepositions (CPs). The current literature suggests that the more indivisible the structure, the more grammaticalised the unit. Representations of complex prepositions within contemporary grammars indicate that the most common intruder within the fixed PNP sequence is the definite article. Synchronic and diachronic corpus studies were carried out to assess how fixed the form with the definite article is, and whether any CPs have shown a recent tendency to lose it. Decategorialisation was found to be only a minor factor for the CPs investigated, with a combination of semantic and grammatical factors featuring in the grammaticalisation process.
Neology can be identified in a text corpus at surface level by automatic means (Renouf 1993a). In a diachronic corpus of journalism a lexical neologism can be found by comparing each word in a stream of data with a baseline index. A semantic neologism is identifiable through the change in the word’s collocational environment (Renouf 1993b). In this paper, we examine the changing status of neologisms across time, tracking the ‘life-cycle’ of a word (Renouf 2007), from its first appearance in our text, through its fluctuations in frequency and popularity, to its possible assimilation into mainstream language, and its possible death and re-birth. The study is based on a corpus of 1.2 billion words of UK mainstream newspaper text spanning 1989–2011.
This paper is a work in progress report which examines the gender assignment of 950 recently borrowed English nouns manually extracted from lists of anglicism candidates from 2008–2010 in the Norwegian Newspaper Corpus. A corpus-based approach was applied in the search for the distribution of gender in the same corpus, as well as in other available corpora of Norwegian. In addition to presenting some data pertaining to gender assignment and possible assignment rules or principles, the paper also briefly addresses methodological issues such as the suitability of corpora for loanword identification and extraction.
The paper explores how verb-particle combinations have changed with the increased use of online real-time short communication forms. Following up on earlier research (Diemer 2008b & 2009), the study discusses examples of new prefix verbs from a web-based corpus of blogs, providing evidence that the long-term decline of this verb form in English has been reversed in computer-mediated communication, which facilitates the creation and increasingly flexible use of previously non-standard prefix verbs like inbe, oncome and atstand. Proposed reasons for this change are the influence of other languages on English, analogy with existing prefix verbs, special-purpose use, playful use of language, facilitation of syntax and demands of brevity.
Pattern Grammar (Hunston & Francis 2000) has typically focused mainly on complementation patterns, but Hunston (2003, 2011) has speculated if Pattern Grammar should incorporate the study of modality by considering the interaction between modal meaning and particular patterns. This paper presents a quantitative study that finds an association between verbs followed by interrogative clauses (the V wh pattern) and modal verbs/the to-infinitive. This is followed by a qualitative investigation to classify according to meaning phrases focused on the sequence to V wh: four main groups (purpose, difficulty, deontic meaning, volition/intention) are found. The findings raise the prospect of both broadening the scope of Pattern Grammar and improving our understanding of modality from a phraseological perspective.
This paper sets out to explore and evaluate several corpus search methods that are applied to uncover linguistic devices expressing ‘quantity approximation’ in a corpus of business English from an onomasiological perspective. The study is carried out within the framework of a project exploring quantity approximation in various business genres using a contrastive, corpus-driven approach (in Dutch, English and French). The paper sheds light on the advantages and disadvantages of using annotated corpora (part-of-speech and semantic tagging) and automatically extracted word lists for onomasiological investigations. The analysis of the results provides valuable insights into the way these methods might successfully complement each other to uncover a wide variety of linguistic devices expressing a specific notion, in this case quantity approximation.