The use of vague language is one of themost common features of everyday spoken English. Speakers regularly use vague expressions to project shared knowledge (e.g., pens, books, and that sort of thing) as well as to make approximations (e.g. around sevenish; he’s sort of tall). Research shows that many of the most common single word items in a core vocabulary form part of vague language fixed expressions (e.g. thing in that kind of thing). This chapter will address the use of vague language in a new corpus of academic English, the Limerick-Belfast Corpus of Academic Spoken English (LIBEL CASE). The LIBEL corpus consists of one million words of spoken data collected in two universities on the island of Ireland, one in the Republic of Ireland and one in Northern Ireland. Analysis of the LIBEL corpus identified forms and functions of vague language in an academic context and these findings are compared with two corpora of everyday spoken language from the Republic of Ireland and the United Kingdom, the Limerick Corpus of Irish English (LCIE) and the Cambridge and Nottingham Corpus of Discourse in English (CANCODE). Cross-corpora comparison allowed us to look at how forms and frequencies of certain vague language expressions vary across casual and formal/institutional contexts. Within the academic data we build onWalsh’s work (see for example Walsh 2002, 2006) to show how vague language use is relative to mode of discourse at any given stage of classroom interaction. We suggest that these qualitative differences are a valuable means of understanding the complex relationship between language and learning.
The role played by mitigation in academic discourse has been widely debated in the literature, but little attention has been paid to emphatics, expressions used to intensify the degree of certainty of an utterance and to increase its illocutionary force. Focusing on the use of adverbs in journal articles and on their evaluative orientations/parameters, the chapter looks at how their frequencies, meanings and uses vary across two “soft” disciplines: history and economics. The study combines a corpus and a discourse perspective, and shows that emphatics signal “engagement” as well as “stance”, by positioning research in the context of disciplinary debate, highlighting the significance of the data or the conclusions produced, negotiating convergent or conflicting positions with the reader.
This chapter aims to illustrate one way in which corpus-linguistic methods and specialised corpora can be combined in work on academic discourse. It reports selected findings from a study of social interaction in research articles written by German, British and US-American humanities academics, based on the 1-million-word SCEGADcorpus.While the main interest of the project was in possible cultural differences in academic discourse, statistical analysis was used to examine the influence also of linguistic background, discipline, author age, status and gender on the construction of identity and the encoding of social relations in academic writing. The findings reveal significant cultural differences, but also demonstrate the influence of variables such as discipline, gender and academic status on author-reader interaction and identity construction in scholarly texts.
This chapter brings instances of humour and laughter into relief using a corpus of authentic institutional interaction of English language teachers in school staff meetings. Humour is used within the meetings as a means of showing mutual support and creating solidarity. The corpus also contains a large proportion of subversive humour, or humour which is directed against the institution, individuals in the group, the group itself and the students. Identifying humour in the data is not a simple case of finding instances of laughter or assuming that it signifies either the intention of the speaker to elicit laughter, or to be humorous.However, wherever humour is manifested, laughter frequently occurs. The methodological issue of identifying and transcribing humour is discussed.
This chapter reports on a study which combines corpus-based and genrebased approaches to the analysis of a 225,000-word corpus of 60 environmental recommendation-based reports. I first describe the discourse-based coding system, which draws on the concept of genre move structure analysis, accounting for three different, but inter-related levels of text: (i) macrostructure; (ii) genre structure, and (iii) textual patterning, i.e. elements of the Problem-Solution pattern. I then describe the keyword analysis for the corpus as a whole and the key-key word analysis for each individual report. These keyword analyses provide internal linguistic evidence for classifying the reports as Problem-Solution based. An analysis of selected words (problem / problems and impact / impacts) reveals that their collocational behavior and involvement in certain causative phrases are related to specific discourse-based move structures.
This study examines the relationship between the phraseological characteristics of language and the communicative role of discourse intonation (Brazil 1997). The findings are based on one of the four sub-corpora of the one-million-word Hong Kong Corpus of Spoken English (HKCSE), which has been prosodically transcribed. A number of studies have looked at word associations, but this is the first corpus-based study of speakers’ discourse intonation choices for these patterns. The intonational features, viz. tone unit boundaries and prominences, of the ten most frequent 3- and 4-lexically-rich word associations and the ten most frequent grammatically-rich word associations in the sub-corpus of public discourse, which forms 25% of the HKCSE, were examined to determine the extent to which this patterning also reveals patterns of discourse intonation. The findings suggest that discourse intonation patterns do exist in terms of tone unit boundaries and the distribution of prominence. However, while discourse intonation patterns are discernible, speakers may, and indeed do, deviate from them in order to alter their discourse-specific communicative role.
We examine a corpus of texts drawn from 11 US newspapers and related to the 2004 US presidential election, focusing on hearsay evidentiality, the reporting of what one has heard from others. Motivated by the general question of whether bias exists in news reporting, we analyze the sources to whom statements in the corpus are attributed, in order to determine who gets to speak through the press, and whether there is balance between the two sides in this election. We also examine the ways in which speech is reported, asking questions about the use of direct vs. indirect speech, the explicitness of source identification, and the effects that the choice of reporting word can have on the portrayal of a source. Although we find slight evidence of an apparent preference for one candidate or the other in certain papers, overall we find no statistically significant differences that could be construed as bias.
Motivated by ESL (English as a Second Language) concerns, this study compares the language of a U.S. situation comedy, Friends, with natural conversation. A corpus of transcripts of the television show and the American conversation subcorpus of the Longman Grammar Corpus are used for analysis. This data-driven investigation combines multidimensional (MD) methodology (Biber 1988) with a frequency-based analysis of a large number of linguistic features associated with the typical characteristics of face-to-face conversation. The results of the MD analysis indicate that Friends shares the core linguistic characteristics of face-to-face conversation, thus constituting a fairly accurate representation of natural conversation for ESL purposes. However, a closer look at the linguistic features revealed interesting functional differences between the two corpora. These differences pointed to distinct functional patterns (e.g., vagueness, emotional language) suggested by the association of linguistic features sharing similar discourse functions.
This chapter is an analysis of a 100,000-word corpus consisting of message-board postings on hip-hop websites. A discourse analysis of this corpus reveals three strategies employed by the posters to identify themselves as members of the hip-hop community in the otherwise anonymous setting of the internet: (1) defined openings and closings, (2) repeated use of slang and taboo terms, and (3) performance of verbal art. Each strategy is characterized by the codification of non-standard grammar and pronunciations characteristic of speech, as well as by the use of non-standard orthography. The purpose of the discourse is shown to be a performance of identity, whereby language is used and recognized as the discursive construction of one’s hip-hop identity.
This chapter offers a new description of the use of the it-cleft construction in nineteenth-century English. The data for the present study are primarily from historical corpora (a corpus of nineteenth-century English, CONCE, and the Helsinki Corpus of English Texts), but findings from modern corpora and studies of cleft constructions in present-day English (e.g. Collins 1991) are also presented. The results show that it-clefts become more frequent in the 19th century and particularly in speech-related texts, such as trials. This is contrary to both earlier and later periods of English, where it-clefts are more common in written English. The chapter discusses how the structure of the it-cleft and its thematic organisation may have contributed to its increased frequency in 19th-century English. An in-depth analysis of the forms and functions of it-clefts in trials, the genre that most closely represents spoken English of the period, is provided.
This chapter builds on previous research that has established the spoken nature of learner writing by providing quantitative and qualitative accounts of time and place adverbs of student writing in comparison to published academic English writing and native English conversation. The chapter shows that the frequency differences among learner groups are not nearly as great as the frequency differences between student writing and conversation. The qualitative analyses point to some L1-L2 differences, particularly with respect to here. The other most pronounced differences were not found as L1-L2 differences but instead showed evidence of divergence due to language background.
The use of vague language is one of themost common features of everyday spoken English. Speakers regularly use vague expressions to project shared knowledge (e.g., pens, books, and that sort of thing) as well as to make approximations (e.g. around sevenish; he’s sort of tall). Research shows that many of the most common single word items in a core vocabulary form part of vague language fixed expressions (e.g. thing in that kind of thing). This chapter will address the use of vague language in a new corpus of academic English, the Limerick-Belfast Corpus of Academic Spoken English (LIBEL CASE). The LIBEL corpus consists of one million words of spoken data collected in two universities on the island of Ireland, one in the Republic of Ireland and one in Northern Ireland. Analysis of the LIBEL corpus identified forms and functions of vague language in an academic context and these findings are compared with two corpora of everyday spoken language from the Republic of Ireland and the United Kingdom, the Limerick Corpus of Irish English (LCIE) and the Cambridge and Nottingham Corpus of Discourse in English (CANCODE). Cross-corpora comparison allowed us to look at how forms and frequencies of certain vague language expressions vary across casual and formal/institutional contexts. Within the academic data we build onWalsh’s work (see for example Walsh 2002, 2006) to show how vague language use is relative to mode of discourse at any given stage of classroom interaction. We suggest that these qualitative differences are a valuable means of understanding the complex relationship between language and learning.
The role played by mitigation in academic discourse has been widely debated in the literature, but little attention has been paid to emphatics, expressions used to intensify the degree of certainty of an utterance and to increase its illocutionary force. Focusing on the use of adverbs in journal articles and on their evaluative orientations/parameters, the chapter looks at how their frequencies, meanings and uses vary across two “soft” disciplines: history and economics. The study combines a corpus and a discourse perspective, and shows that emphatics signal “engagement” as well as “stance”, by positioning research in the context of disciplinary debate, highlighting the significance of the data or the conclusions produced, negotiating convergent or conflicting positions with the reader.
This chapter aims to illustrate one way in which corpus-linguistic methods and specialised corpora can be combined in work on academic discourse. It reports selected findings from a study of social interaction in research articles written by German, British and US-American humanities academics, based on the 1-million-word SCEGADcorpus.While the main interest of the project was in possible cultural differences in academic discourse, statistical analysis was used to examine the influence also of linguistic background, discipline, author age, status and gender on the construction of identity and the encoding of social relations in academic writing. The findings reveal significant cultural differences, but also demonstrate the influence of variables such as discipline, gender and academic status on author-reader interaction and identity construction in scholarly texts.
This chapter brings instances of humour and laughter into relief using a corpus of authentic institutional interaction of English language teachers in school staff meetings. Humour is used within the meetings as a means of showing mutual support and creating solidarity. The corpus also contains a large proportion of subversive humour, or humour which is directed against the institution, individuals in the group, the group itself and the students. Identifying humour in the data is not a simple case of finding instances of laughter or assuming that it signifies either the intention of the speaker to elicit laughter, or to be humorous.However, wherever humour is manifested, laughter frequently occurs. The methodological issue of identifying and transcribing humour is discussed.
This chapter reports on a study which combines corpus-based and genrebased approaches to the analysis of a 225,000-word corpus of 60 environmental recommendation-based reports. I first describe the discourse-based coding system, which draws on the concept of genre move structure analysis, accounting for three different, but inter-related levels of text: (i) macrostructure; (ii) genre structure, and (iii) textual patterning, i.e. elements of the Problem-Solution pattern. I then describe the keyword analysis for the corpus as a whole and the key-key word analysis for each individual report. These keyword analyses provide internal linguistic evidence for classifying the reports as Problem-Solution based. An analysis of selected words (problem / problems and impact / impacts) reveals that their collocational behavior and involvement in certain causative phrases are related to specific discourse-based move structures.
This study examines the relationship between the phraseological characteristics of language and the communicative role of discourse intonation (Brazil 1997). The findings are based on one of the four sub-corpora of the one-million-word Hong Kong Corpus of Spoken English (HKCSE), which has been prosodically transcribed. A number of studies have looked at word associations, but this is the first corpus-based study of speakers’ discourse intonation choices for these patterns. The intonational features, viz. tone unit boundaries and prominences, of the ten most frequent 3- and 4-lexically-rich word associations and the ten most frequent grammatically-rich word associations in the sub-corpus of public discourse, which forms 25% of the HKCSE, were examined to determine the extent to which this patterning also reveals patterns of discourse intonation. The findings suggest that discourse intonation patterns do exist in terms of tone unit boundaries and the distribution of prominence. However, while discourse intonation patterns are discernible, speakers may, and indeed do, deviate from them in order to alter their discourse-specific communicative role.
We examine a corpus of texts drawn from 11 US newspapers and related to the 2004 US presidential election, focusing on hearsay evidentiality, the reporting of what one has heard from others. Motivated by the general question of whether bias exists in news reporting, we analyze the sources to whom statements in the corpus are attributed, in order to determine who gets to speak through the press, and whether there is balance between the two sides in this election. We also examine the ways in which speech is reported, asking questions about the use of direct vs. indirect speech, the explicitness of source identification, and the effects that the choice of reporting word can have on the portrayal of a source. Although we find slight evidence of an apparent preference for one candidate or the other in certain papers, overall we find no statistically significant differences that could be construed as bias.
Motivated by ESL (English as a Second Language) concerns, this study compares the language of a U.S. situation comedy, Friends, with natural conversation. A corpus of transcripts of the television show and the American conversation subcorpus of the Longman Grammar Corpus are used for analysis. This data-driven investigation combines multidimensional (MD) methodology (Biber 1988) with a frequency-based analysis of a large number of linguistic features associated with the typical characteristics of face-to-face conversation. The results of the MD analysis indicate that Friends shares the core linguistic characteristics of face-to-face conversation, thus constituting a fairly accurate representation of natural conversation for ESL purposes. However, a closer look at the linguistic features revealed interesting functional differences between the two corpora. These differences pointed to distinct functional patterns (e.g., vagueness, emotional language) suggested by the association of linguistic features sharing similar discourse functions.
This chapter is an analysis of a 100,000-word corpus consisting of message-board postings on hip-hop websites. A discourse analysis of this corpus reveals three strategies employed by the posters to identify themselves as members of the hip-hop community in the otherwise anonymous setting of the internet: (1) defined openings and closings, (2) repeated use of slang and taboo terms, and (3) performance of verbal art. Each strategy is characterized by the codification of non-standard grammar and pronunciations characteristic of speech, as well as by the use of non-standard orthography. The purpose of the discourse is shown to be a performance of identity, whereby language is used and recognized as the discursive construction of one’s hip-hop identity.
This chapter offers a new description of the use of the it-cleft construction in nineteenth-century English. The data for the present study are primarily from historical corpora (a corpus of nineteenth-century English, CONCE, and the Helsinki Corpus of English Texts), but findings from modern corpora and studies of cleft constructions in present-day English (e.g. Collins 1991) are also presented. The results show that it-clefts become more frequent in the 19th century and particularly in speech-related texts, such as trials. This is contrary to both earlier and later periods of English, where it-clefts are more common in written English. The chapter discusses how the structure of the it-cleft and its thematic organisation may have contributed to its increased frequency in 19th-century English. An in-depth analysis of the forms and functions of it-clefts in trials, the genre that most closely represents spoken English of the period, is provided.
This chapter builds on previous research that has established the spoken nature of learner writing by providing quantitative and qualitative accounts of time and place adverbs of student writing in comparison to published academic English writing and native English conversation. The chapter shows that the frequency differences among learner groups are not nearly as great as the frequency differences between student writing and conversation. The qualitative analyses point to some L1-L2 differences, particularly with respect to here. The other most pronounced differences were not found as L1-L2 differences but instead showed evidence of divergence due to language background.