The fastest-growing use of globalised English is among speakers for whom it is not a first language, that is, English used as a lingua franca (ELF). To keep up with the developments of the language in such varying circumstances poses a challenge to research: how can we access reliable data that captures new directions in this expanding use of English? How should we go about securing enough data in a new area of language use, where variability is highly unpredictable, and change is likely to be fast? Clearly, corpus methods have a lot to offer in teasing out the big picture and emergent patterning from the bewildering detail that small-scale studies easily drown themselves in. ELF has established itself particularly in two important and influential inherently highly international domains: science and business. Both are high-stakes domains where language plays an important role. It makes sense to pay close attention to the ways English works in them and how it takes shape. This paper looks into the scientific sphere, and draws on the experience of compiling and analysis of the first ELF corpus, comprising academic speech (ELFA: www.eng.helsinki.fi/elfa). It will tackle issues of data selection, relevance, and meaningful combinations of analytical methods.
This study aims to investigate how speakers employ personal pronouns (we, you, I) in two types of monologic academic speech, undergraduate lectures and public lectures, through analysis of the Michigan Corpus of Academic Spoken English (MICASE). Not only the frequency of instances of personal pronouns but also two linguistic environments were examined: words placed before and after the pronoun. The results show both common features and variations in the two types of academic speech. “You” was the most common personal pronoun in both undergraduate and public lectures. Variations seem to be related to the purpose of the speech and the relationship between the speaker and the audience.
Rapid developments in technology, together with dependence on electronically-mediated communication, are providing international business with new opportunities and challenges. This paper focuses on the attempts of a multinational corporation to achieve the goals of internal “employer branding” through a communications network which makes use of a variety of sub-genres and registers, written, spoken and multimodal. Three corpora will be compared and contrasted: oral presentations in audio-conference mode, a corpus of the accompanying power point slides, and a series of e-newsletters to employees. Using the textware of corpus linguistics, the mutual influence and multimodal blending between the spoken and written sub-genres will be traced, in terms of metadiscoursal structure and organization, interactional strategies, and terminological and metaphorical usage, in order to identify the written constraints on semi-formal speech production, and conversely the “speech in writing” typical of much e-mediated communication.
This paper illustrates an application of grammatical tagging as a methodological tool for the investigation of small specialized corpora. A contrastive analysis was performed on two tagged corpora that represent genres used for the purpose of financial disclosure: spoken earnings presentations and written earnings releases. The analysis focused on two key features that could be studied systematically and comprehensively thanks to grammatical tagging: lexical density and evaluative adjectives. The results revealed interesting differences between the two corpora that appeared to be influenced by mode, interactional setting, and role/status of speakers and writers. The study shows how grammatical tagging offers new ways to integrate quantitative and qualitative methods in order to better understand discourse used in specific communicative contexts.
This paper reports on the intonation analysis of yes-no questions in 15 regional varieties of Italian. The study has been carried out on a section of the CLIPS corpus consisting of a collection of Map Task dialogues of Northern, Central, and Southern accents estimated as representative of Italian regional variation. Results show that, contrary to what generally assumed in the literature, the most widespread intonation pattern for questions is rising-falling (not falling-rising), and the distribution of the rising-falling and falling-rising contour types across varieties is not regionally conditioned. Interestingly, for some varieties a different yes-no question intonation was found than in previous studies based on laboratory speech only. These findings confirm the fundamental importance of speaking style when analysing Italian intonation (especially where questions are involved), and make it clear that attention needs to be paid to elicitation methodology when acquiring/building corpora of spoken Italian with the aim of investigating intonation.
The Estonian Emotional Speech Corpus was created as an acoustic basis for synthesis of emotional speech. The present state of the Corpus enables an assessment of the choices made for its underlying theoretical model: whether it was justified to use non-acted speech and whether a difference has been established between the sentences where emotion is carried by voice only and those where sentence content may have influenced emotion identification.
Face-to-face and movie conversation are usually claimed to differ: the first is often described as the quintessence of spontaneity, whereas the second as the quintessence of artificiality. In fact, there are few empirical studies that demonstrate this and, in spite of what is generally maintained by the literature, empirical data, which are investigated here by applying Biber’s (1988) Multi-Dimensional approach, prove that the involved production typical of face-to-face conversation also characterizes movie conversation. This resemblance has interesting implications for the teaching of spoken discourse, as movies may be effectively used as a valid source of material. The present research also illustrates an experiment with 3rd year Italian students of English that proves this potentiality especially in the learning of elisions, blends, repetitions, false starts, reformulations, discourse markers, and interjections.
This paper considers the ways in which geographical variation can be explored both quantitatively and qualitatively using the Scottish Corpus of Texts & Speech (SCOTS). The resource is a freely accessible online corpus of written and spoken texts in the Germanic languages of Scotland (Scottish English and Scots), which can offer an insight into current Scots usage and speaker attitudes towards language. The paper introduces the corpus, giving an overview of the geographically-defined varieties of Scots represented in it, and demonstrates how the complex web of variation can be analysed quantitatively using integrated corpus tools. It then begins to explore qualitatively the ways in which participants in spoken documents talk about geographical, and closely related social, language variation in Scotland.
This paper discusses the possibilities for research with the Zurich English Newspaper Corpus (ZEN) and ways of expanding this corpus. The usefulness of and the problem with text classes is shown. The text class Foreign News was the most prominent one in the early 18th century. The empirical study deals with a special collection of newspapers within the ZEN Corpus, the papers of January 1701. Six newspapers are examined and some aspects of variation (morphological and text-linguistic) are investigated. The need of newspaper corpora of the same month is shown with a comparison of the same topic in different papers. Besides the study of grammatical variation, this will give – linguistic – answers to a classification of early English newspapers.
A concern of Ruskin, guidebook writing, has remained relatively marginal to critical discourse. Yet, he produced a well-known work addressed to travellers to Italy, Mornings in Florence, that can be termed a ‘guidebook’. The paper analyses this text with a view to investigating how heritage sites and places are construed from the writer’s point of view in the context of the development of modern travel guides from diaries and personal notes to works addressing a wide audience of tourists. It is here assumed that the perception and textual construction of space varies in accordance with shifting cultural frameworks and world views. To investigate the text in electronic form the phraseological approach developed by Francis and Hunston (2000), Stubbs (2001) and Hunston (2008) has been be adopted.
In The Uses of Argument (1958) Toulmin illustrated the concepts of field-invariant/dependent argumentation citing, among others, the discourse of art criticism, without specifying in detail how this instantiates the model there presented. This chapter tests the model’s applicability to aesthetic discourse by examining a small historical corpus of exhibition reviews. The analysis shows that, as prescribed by the model, claims are there supported by arguments whose relevance is underwritten by warrants, though mostly these are tacitly invoked. It further reveals synchronic and diachronic variation in the kind of warrant invoked, in apparent correspondence to a historical shift in the kind of statement prevalently used to make aesthetic claims.
The paper examines a small corpus of biostatistics texts, a discipline whose discourse has, as yet, not been explored, from the point of view of its evolution in terms of textual organisation and models. The research aims to explore the diachronic variations in the conceptual encoding of the discipline, its methodology and the grammatical structures used in the presentation, argumentation and interpretation of numerical data applied to the biosciences. It contrastively examines texts from three historical periods focussing in particular on evolutions in foregrounding structures (morphological and syntactic arrangements), figurative language and the typical characteristics of scientific registers (depersonalization and thematizing). The approach is both qualitative (semantic, pragmatic and rhetorical characteristics) and quantitative (keywords, phraseology and collocation) and signals the similarities and differences over time in the texts in terms of conceptual and lexical choices, and the discursive construction of the identity of the scientific community in communicating disciplinary theories. ConcApp software has been used for the quantitative analysis.
This paper explores discoursal change in research article abstracts in tourism studies. Based on a corpus of research article abstracts published over a span of thirty years in three prominent academic journals, changes in the patterns of use of the demonstrative ‘this’ are investigated. Findings show that it is increasingly used with a narrow range of lexical items which seem to signal change in the way authors introduce their research to the discourse community and persuade readers to continue to read the research article.
This paper describes changes in the frequency and use of some selected linguistic features in the language of Italian printed news: left dislocations, sentence-initial connectives, sentence length, lexical density and subordinating conjunctions. The study adopts a diachronic approach and relies on a corpus-based methodology. Language change was measured between 1985 and 2000 using two sub-sections of the Repubblica corpus. The main hypothesis is that these sixteen years highlighted significant changes emerging in newspaper language, partly due to the competition of commercial television, which entered a phase of exponential development in this period. More specifically, these changes reflect the attempt of written news to reproduce communicative styles more suitable to orality and to real-time events.
Corpora with temporal information enable us to observe and analyze time-related aspects of language. Here we will discuss two types of such case studies involving the Japanese language. In the first half of the article, we will analyze diachronic changes of the grammar and expressions of contemporary Japanese based upon the texts of the minutes of the National Diet of Japan. In the second half, we will use texts of daily newspapers in order to observe the ways in which the use of vocabulary items may change during the course of the year.
The present diachronic study compares two large contemporary corpora of British quality newspapers with the aim of investigating the increased popularisation of newspaper register. The study focuses on those examples which are highlighted by a quantitative comparative overview of the two corpora based on a series of analyses using keyword and concordancing tools. Results show that a shift in presentation and style is indeed present, with an increased ‘familiarisation’ of language, in particular the use of spoken forms. Given the high keyness of these words, and their preference for direct speech, it was expected that this increase would be reflected in a similar increase in frequency of quotes. However, this was not found.
The fastest-growing use of globalised English is among speakers for whom it is not a first language, that is, English used as a lingua franca (ELF). To keep up with the developments of the language in such varying circumstances poses a challenge to research: how can we access reliable data that captures new directions in this expanding use of English? How should we go about securing enough data in a new area of language use, where variability is highly unpredictable, and change is likely to be fast? Clearly, corpus methods have a lot to offer in teasing out the big picture and emergent patterning from the bewildering detail that small-scale studies easily drown themselves in. ELF has established itself particularly in two important and influential inherently highly international domains: science and business. Both are high-stakes domains where language plays an important role. It makes sense to pay close attention to the ways English works in them and how it takes shape. This paper looks into the scientific sphere, and draws on the experience of compiling and analysis of the first ELF corpus, comprising academic speech (ELFA: www.eng.helsinki.fi/elfa). It will tackle issues of data selection, relevance, and meaningful combinations of analytical methods.
This study aims to investigate how speakers employ personal pronouns (we, you, I) in two types of monologic academic speech, undergraduate lectures and public lectures, through analysis of the Michigan Corpus of Academic Spoken English (MICASE). Not only the frequency of instances of personal pronouns but also two linguistic environments were examined: words placed before and after the pronoun. The results show both common features and variations in the two types of academic speech. “You” was the most common personal pronoun in both undergraduate and public lectures. Variations seem to be related to the purpose of the speech and the relationship between the speaker and the audience.
Rapid developments in technology, together with dependence on electronically-mediated communication, are providing international business with new opportunities and challenges. This paper focuses on the attempts of a multinational corporation to achieve the goals of internal “employer branding” through a communications network which makes use of a variety of sub-genres and registers, written, spoken and multimodal. Three corpora will be compared and contrasted: oral presentations in audio-conference mode, a corpus of the accompanying power point slides, and a series of e-newsletters to employees. Using the textware of corpus linguistics, the mutual influence and multimodal blending between the spoken and written sub-genres will be traced, in terms of metadiscoursal structure and organization, interactional strategies, and terminological and metaphorical usage, in order to identify the written constraints on semi-formal speech production, and conversely the “speech in writing” typical of much e-mediated communication.
This paper illustrates an application of grammatical tagging as a methodological tool for the investigation of small specialized corpora. A contrastive analysis was performed on two tagged corpora that represent genres used for the purpose of financial disclosure: spoken earnings presentations and written earnings releases. The analysis focused on two key features that could be studied systematically and comprehensively thanks to grammatical tagging: lexical density and evaluative adjectives. The results revealed interesting differences between the two corpora that appeared to be influenced by mode, interactional setting, and role/status of speakers and writers. The study shows how grammatical tagging offers new ways to integrate quantitative and qualitative methods in order to better understand discourse used in specific communicative contexts.
This paper reports on the intonation analysis of yes-no questions in 15 regional varieties of Italian. The study has been carried out on a section of the CLIPS corpus consisting of a collection of Map Task dialogues of Northern, Central, and Southern accents estimated as representative of Italian regional variation. Results show that, contrary to what generally assumed in the literature, the most widespread intonation pattern for questions is rising-falling (not falling-rising), and the distribution of the rising-falling and falling-rising contour types across varieties is not regionally conditioned. Interestingly, for some varieties a different yes-no question intonation was found than in previous studies based on laboratory speech only. These findings confirm the fundamental importance of speaking style when analysing Italian intonation (especially where questions are involved), and make it clear that attention needs to be paid to elicitation methodology when acquiring/building corpora of spoken Italian with the aim of investigating intonation.
The Estonian Emotional Speech Corpus was created as an acoustic basis for synthesis of emotional speech. The present state of the Corpus enables an assessment of the choices made for its underlying theoretical model: whether it was justified to use non-acted speech and whether a difference has been established between the sentences where emotion is carried by voice only and those where sentence content may have influenced emotion identification.
Face-to-face and movie conversation are usually claimed to differ: the first is often described as the quintessence of spontaneity, whereas the second as the quintessence of artificiality. In fact, there are few empirical studies that demonstrate this and, in spite of what is generally maintained by the literature, empirical data, which are investigated here by applying Biber’s (1988) Multi-Dimensional approach, prove that the involved production typical of face-to-face conversation also characterizes movie conversation. This resemblance has interesting implications for the teaching of spoken discourse, as movies may be effectively used as a valid source of material. The present research also illustrates an experiment with 3rd year Italian students of English that proves this potentiality especially in the learning of elisions, blends, repetitions, false starts, reformulations, discourse markers, and interjections.
This paper considers the ways in which geographical variation can be explored both quantitatively and qualitatively using the Scottish Corpus of Texts & Speech (SCOTS). The resource is a freely accessible online corpus of written and spoken texts in the Germanic languages of Scotland (Scottish English and Scots), which can offer an insight into current Scots usage and speaker attitudes towards language. The paper introduces the corpus, giving an overview of the geographically-defined varieties of Scots represented in it, and demonstrates how the complex web of variation can be analysed quantitatively using integrated corpus tools. It then begins to explore qualitatively the ways in which participants in spoken documents talk about geographical, and closely related social, language variation in Scotland.
This paper discusses the possibilities for research with the Zurich English Newspaper Corpus (ZEN) and ways of expanding this corpus. The usefulness of and the problem with text classes is shown. The text class Foreign News was the most prominent one in the early 18th century. The empirical study deals with a special collection of newspapers within the ZEN Corpus, the papers of January 1701. Six newspapers are examined and some aspects of variation (morphological and text-linguistic) are investigated. The need of newspaper corpora of the same month is shown with a comparison of the same topic in different papers. Besides the study of grammatical variation, this will give – linguistic – answers to a classification of early English newspapers.
A concern of Ruskin, guidebook writing, has remained relatively marginal to critical discourse. Yet, he produced a well-known work addressed to travellers to Italy, Mornings in Florence, that can be termed a ‘guidebook’. The paper analyses this text with a view to investigating how heritage sites and places are construed from the writer’s point of view in the context of the development of modern travel guides from diaries and personal notes to works addressing a wide audience of tourists. It is here assumed that the perception and textual construction of space varies in accordance with shifting cultural frameworks and world views. To investigate the text in electronic form the phraseological approach developed by Francis and Hunston (2000), Stubbs (2001) and Hunston (2008) has been be adopted.
In The Uses of Argument (1958) Toulmin illustrated the concepts of field-invariant/dependent argumentation citing, among others, the discourse of art criticism, without specifying in detail how this instantiates the model there presented. This chapter tests the model’s applicability to aesthetic discourse by examining a small historical corpus of exhibition reviews. The analysis shows that, as prescribed by the model, claims are there supported by arguments whose relevance is underwritten by warrants, though mostly these are tacitly invoked. It further reveals synchronic and diachronic variation in the kind of warrant invoked, in apparent correspondence to a historical shift in the kind of statement prevalently used to make aesthetic claims.
The paper examines a small corpus of biostatistics texts, a discipline whose discourse has, as yet, not been explored, from the point of view of its evolution in terms of textual organisation and models. The research aims to explore the diachronic variations in the conceptual encoding of the discipline, its methodology and the grammatical structures used in the presentation, argumentation and interpretation of numerical data applied to the biosciences. It contrastively examines texts from three historical periods focussing in particular on evolutions in foregrounding structures (morphological and syntactic arrangements), figurative language and the typical characteristics of scientific registers (depersonalization and thematizing). The approach is both qualitative (semantic, pragmatic and rhetorical characteristics) and quantitative (keywords, phraseology and collocation) and signals the similarities and differences over time in the texts in terms of conceptual and lexical choices, and the discursive construction of the identity of the scientific community in communicating disciplinary theories. ConcApp software has been used for the quantitative analysis.
This paper explores discoursal change in research article abstracts in tourism studies. Based on a corpus of research article abstracts published over a span of thirty years in three prominent academic journals, changes in the patterns of use of the demonstrative ‘this’ are investigated. Findings show that it is increasingly used with a narrow range of lexical items which seem to signal change in the way authors introduce their research to the discourse community and persuade readers to continue to read the research article.
This paper describes changes in the frequency and use of some selected linguistic features in the language of Italian printed news: left dislocations, sentence-initial connectives, sentence length, lexical density and subordinating conjunctions. The study adopts a diachronic approach and relies on a corpus-based methodology. Language change was measured between 1985 and 2000 using two sub-sections of the Repubblica corpus. The main hypothesis is that these sixteen years highlighted significant changes emerging in newspaper language, partly due to the competition of commercial television, which entered a phase of exponential development in this period. More specifically, these changes reflect the attempt of written news to reproduce communicative styles more suitable to orality and to real-time events.
Corpora with temporal information enable us to observe and analyze time-related aspects of language. Here we will discuss two types of such case studies involving the Japanese language. In the first half of the article, we will analyze diachronic changes of the grammar and expressions of contemporary Japanese based upon the texts of the minutes of the National Diet of Japan. In the second half, we will use texts of daily newspapers in order to observe the ways in which the use of vocabulary items may change during the course of the year.
The present diachronic study compares two large contemporary corpora of British quality newspapers with the aim of investigating the increased popularisation of newspaper register. The study focuses on those examples which are highlighted by a quantitative comparative overview of the two corpora based on a series of analyses using keyword and concordancing tools. Results show that a shift in presentation and style is indeed present, with an increased ‘familiarisation’ of language, in particular the use of spoken forms. Given the high keyness of these words, and their preference for direct speech, it was expected that this increase would be reflected in a similar increase in frequency of quotes. However, this was not found.