90018361 03 01 01 JB code JB John Benjamins Publishing Company 01 JB code SCL 82 GE 15 9789027264565 06 10.1075/scl.82 13 2017050052 00 EA E133 10 01 JB code SCL 02 JB code 1388-0373 02 82.00 01 02 Studies in Corpus Linguistics Studies in Corpus Linguistics 01 01 Applications of Pattern-driven Methods in Corpus Linguistics Applications of Pattern-driven Methods in Corpus Linguistics 1 B01 01 JB code 322243643 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 2 B01 01 JB code 323243644 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University Växjö 01 eng 11 321 03 03 vii 03 00 313 03 24 JB code LIN.COMPUT Computational & corpus linguistics 24 JB code LIN.CORP Corpus linguistics 24 JB code LIN.THEOR Theoretical linguistics 10 LAN009000 12 CFX 01 06 02 00 This volume proposes the term pattern-driven approach as a more precise alternative. The chapters illustrate a variety of methods that fall under this broad methodology, such as lexical bundles, POS-grams and semantic frames, and demonstrate how these approaches can uncover new understandings of both synchronic and diachronic linguistic phenomena. 03 00 The use of corpora has conventionally been envisioned as being either corpus-based or corpus-driven. While the formal definition of the latter term has been widely accepted since it was established by Tognini-Bonelli (2001), it is often applied to studies that do not, in fact, fullfil the fundamental requirement of a theory-neutral starting point. This volume proposes the term pattern-driven as a more precise alternative. The chapters illustrate a variety of methods that fall under this broad methodology, such as the extraction of lexical bundles, POS-grams and semantic frames, and demonstrate how these approaches can uncover new understandings of both synchronic and diachronic linguistic phenomena. 01 00 03 01 01 D503 https://benjamins.com/covers/475/scl.82.png 01 01 D502 https://benjamins.com/covers/475_jpg/9789027200136.jpg 01 01 D504 https://benjamins.com/covers/475_tif/9789027200136.tif 01 01 D503 https://benjamins.com/covers/1200_front/scl.82.hb.png 01 01 D503 https://benjamins.com/covers/125/scl.82.png 02 00 03 01 01 D503 https://benjamins.com/covers/1200_back/scl.82.hb.png 03 00 03 01 01 D503 https://benjamins.com/covers/3d_web/scl.82.hb.png 01 01 JB code scl.82.ack 06 10.1075/scl.82.ack vii vii 1 Miscellaneous 1 01 04 Acknowledgements Acknowledgements 01 01 JB code scl.82.01tyr 06 10.1075/scl.82.01tyr 1 12 12 Chapter 2 01 04 Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics 1 A01 01 JB code 95320518 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University 2 A01 01 JB code 353320519 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 01 01 JB code scl.82.p1 06 10.1075/scl.82.p1 Section header 3 01 04 Part I. Methodological explorations Part I. Methodological explorations 01 01 JB code scl.82.02sch 06 10.1075/scl.82.02sch 15 56 42 Chapter 4 01 04 Chapter 2. From lexical bundles to surprisal and language models Chapter 2. From lexical bundles to surprisal and language models 01 04 Measuring the idiom principle in native and learner language Measuring the idiom principle in native and learner language 1 A01 01 JB code 662320520 Gerold Schneider Schneider, Gerold Gerold Schneider University of Zurich and University of Konstanz 2 A01 01 JB code 901320521 Gintare Grigonyte Grigonyte, Gintare Gintare Grigonyte University of Stockholm 01 01 JB code scl.82.03gra 06 10.1075/scl.82.03gra 57 80 24 Chapter 5 01 04 Chapter 3. Fine-tuning lexical bundles Chapter 3. Fine-tuning lexical bundles 01 04 A methodological reflection in the context of describing drug-drug interactions A methodological reflection in the context of describing drug-drug interactions 1 A01 01 JB code 22320522 Łukasz Grabowski Grabowski, Łukasz Łukasz Grabowski University of Opole 01 01 JB code scl.82.04tic 06 10.1075/scl.82.04tic 81 103 23 Chapter 6 01 04 Chapter 4. Lexical obsolescence and loss in English: 1700-2000 Chapter 4. Lexical obsolescence and loss in English: 1700–2000 1 A01 01 JB code 115320523 Ondřej Tichý Tichý, Ondřej Ondřej Tichý Charles University in Prague 01 01 JB code scl.82.p2 06 10.1075/scl.82.p2 Section header 7 01 04 Part II. Patterns in utilitarian texts Part II. Patterns in utilitarian texts 01 01 JB code scl.82.05pin 06 10.1075/scl.82.05pin 107 130 24 Chapter 8 01 04 Chapter 5. Constance and variability Chapter 5. Constance and variability 01 04 Using PoS-grams to find phraseologies in the language of newspapers Using PoS-grams to find phraseologies in the language of newspapers 1 A01 01 JB code 323320524 Antonio Pinna Pinna, Antonio Antonio Pinna Università degli Studi di Sassari 2 A01 01 JB code 639320525 David Brett Brett, David David Brett Università degli Studi di Sassari 01 01 JB code scl.82.06goz 06 10.1075/scl.82.06goz 131 158 28 Chapter 9 01 04 Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence 01 04 Exploring semantic sequences in judicial discourse Exploring semantic sequences in judicial discourse 1 A01 01 JB code 430320526 Stanisław Goźdź-Roszkowski Goźdź-Roszkowski, Stanisław Stanisław Goźdź-Roszkowski University of Lódz 01 01 JB code scl.82.07leh 06 10.1075/scl.82.07leh 159 186 28 Chapter 10 01 04 Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament 1 A01 01 JB code 487320527 Anu Lehto Lehto, Anu Anu Lehto University of Helsinki 01 01 JB code scl.82.p3 06 10.1075/scl.82.p3 Section header 11 01 04 Part III. Patterns in online texts Part III. Patterns in online texts 01 01 JB code scl.82.08hil 06 10.1075/scl.82.08hil 189 212 24 Chapter 12 01 04 Chapter 8. Lexical bundles in Wikipedia articles and related texts Chapter 8. Lexical bundles in Wikipedia articles and related texts 01 04 Exploring disciplinary variation Exploring disciplinary variation 1 A01 01 JB code 697320528 Turo Hiltunen Hiltunen, Turo Turo Hiltunen University of Helsinki 01 01 JB code scl.82.09mcv 06 10.1075/scl.82.09mcv 213 250 38 Chapter 13 01 04 Chapter 9. Join us for this Chapter 9. Join us for this 01 04 Lexical bundles and repetition in email marketing texts Lexical bundles and repetition in email marketing texts 1 A01 01 JB code 671320529 Joseph McVeigh McVeigh, Joseph Joseph McVeigh University of Jyväskylä 01 01 JB code scl.82.10bar 06 10.1075/scl.82.10bar 251 276 26 Chapter 14 01 04 Chapter 10. I don't want to and don't get me wrong Chapter 10. I don’t want to and don’t get me wrong 01 04 Lexical bundles as a window to subjectivity and intersubjectivity in American blogs Lexical bundles as a window to subjectivity and intersubjectivity in American blogs 1 A01 01 JB code 612320530 Federica Barbieri Barbieri, Federica Federica Barbieri Swansea University 01 01 JB code scl.82.11kop 06 10.1075/scl.82.11kop 277 310 34 Chapter 15 01 04 Chapter 11. Blogging around the world Chapter 11. Blogging around the world 01 04 Universal and localised patterns in Online Englishes Universal and localised patterns in Online Englishes 1 A01 01 JB code 595320531 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 2 A01 01 JB code 881320532 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University Växjö 01 01 JB code scl.82.index 06 10.1075/scl.82.index 311 311 1 Miscellaneous 16 01 04 Index Index 01 JB code JBENJAMINS John Benjamins Publishing Company 01 01 JB code JB John Benjamins Publishing Company 01 https://benjamins.com Amsterdam NL 00 John Benjamins Publishing Company Marketing Department / Karin Plijnaar, Pieter Lamers onix@benjamins.nl 04 01 00 20180313 C 2018 John Benjamins D 2018 John Benjamins 02 WORLD 13 15 9789027200136 WORLD 03 01 JB 17 Google 03 https://play.google.com/store/books 21 01 00 Unqualified price 00 99.00 EUR 01 00 Unqualified price 00 83.00 GBP 01 00 Unqualified price 00 149.00 USD 274016567 03 01 01 JB code JB John Benjamins Publishing Company 01 JB code SCL 82 Hb 15 9789027200136 06 10.1075/scl.82 13 2017045531 00 BB 08 730 gr 10 01 JB code SCL 02 1388-0373 02 82.00 01 02 Studies in Corpus Linguistics Studies in Corpus Linguistics 01 01 Applications of Pattern-driven Methods in Corpus Linguistics Applications of Pattern-driven Methods in Corpus Linguistics 1 B01 01 JB code 322243643 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 07 https://benjamins.com/catalog/persons/322243643 2 B01 01 JB code 323243644 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University Växjö 07 https://benjamins.com/catalog/persons/323243644 01 eng 11 321 03 03 vii 03 00 313 03 01 23 410.1/88 03 2018 P128.C68 04 Corpora (Linguistics)--Congresses. 04 Applied linguistics--Congresses. 10 LAN009000 12 CFX 24 JB code LIN.COMPUT Computational & corpus linguistics 24 JB code LIN.CORP Corpus linguistics 24 JB code LIN.THEOR Theoretical linguistics 01 06 02 00 This volume proposes the term pattern-driven approach as a more precise alternative. The chapters illustrate a variety of methods that fall under this broad methodology, such as lexical bundles, POS-grams and semantic frames, and demonstrate how these approaches can uncover new understandings of both synchronic and diachronic linguistic phenomena. 03 00 The use of corpora has conventionally been envisioned as being either corpus-based or corpus-driven. While the formal definition of the latter term has been widely accepted since it was established by Tognini-Bonelli (2001), it is often applied to studies that do not, in fact, fullfil the fundamental requirement of a theory-neutral starting point. This volume proposes the term pattern-driven as a more precise alternative. The chapters illustrate a variety of methods that fall under this broad methodology, such as the extraction of lexical bundles, POS-grams and semantic frames, and demonstrate how these approaches can uncover new understandings of both synchronic and diachronic linguistic phenomena. 01 00 03 01 01 D503 https://benjamins.com/covers/475/scl.82.png 01 01 D502 https://benjamins.com/covers/475_jpg/9789027200136.jpg 01 01 D504 https://benjamins.com/covers/475_tif/9789027200136.tif 01 01 D503 https://benjamins.com/covers/1200_front/scl.82.hb.png 01 01 D503 https://benjamins.com/covers/125/scl.82.png 02 00 03 01 01 D503 https://benjamins.com/covers/1200_back/scl.82.hb.png 03 00 03 01 01 D503 https://benjamins.com/covers/3d_web/scl.82.hb.png 01 01 JB code scl.82.ack 06 10.1075/scl.82.ack vii vii 1 Miscellaneous 1 01 04 Acknowledgements Acknowledgements 01 01 JB code scl.82.01tyr 06 10.1075/scl.82.01tyr 1 12 12 Chapter 2 01 04 Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics 1 A01 01 JB code 95320518 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University 07 https://benjamins.com/catalog/persons/95320518 2 A01 01 JB code 353320519 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 07 https://benjamins.com/catalog/persons/353320519 01 01 JB code scl.82.p1 06 10.1075/scl.82.p1 Section header 3 01 04 Part I. Methodological explorations Part I. Methodological explorations 01 01 JB code scl.82.02sch 06 10.1075/scl.82.02sch 15 56 42 Chapter 4 01 04 Chapter 2. From lexical bundles to surprisal and language models Chapter 2. From lexical bundles to surprisal and language models 01 04 Measuring the idiom principle in native and learner language Measuring the idiom principle in native and learner language 1 A01 01 JB code 662320520 Gerold Schneider Schneider, Gerold Gerold Schneider University of Zurich and University of Konstanz 07 https://benjamins.com/catalog/persons/662320520 2 A01 01 JB code 901320521 Gintare Grigonyte Grigonyte, Gintare Gintare Grigonyte University of Stockholm 07 https://benjamins.com/catalog/persons/901320521 03 00

We exploit the information theoretic measure of surprisal to analyze the formulaicity of lexical sequences. We first show the prevalence of individual lexical bundles, then we argue that abstracting to surprisal as an information-theoretic measure of lexical bundleness, formulaicity and non-creativity is an appropriate measure for the idiom principle, as it expresses reader expectations and text entropy. As strong and gradient formulaic, idiomatic and selectional preferences prevail on all levels, we argue for the abstraction step from individual bundles to measures of bundleness. We use surprisal to analyse differences between genres of native language use, and learner language at different levels: (a) spoken and written genres of native language (L1); (b) spoken and written learner language (L2), across selected written genres; (c) learner language as compared with native language (L1). We thus test Pawley and Syder (1983)’s hypothesis that native speakers know best how to play the tug-of-war between formulaicity (Sinclair’s idiom principle) and expressiveness (Sinclair’s open-choice principle), which can be measured with Levy and Jaeger (2007)’s uniform information density (UID) which is a principle of minimizing comprehension difficulty. Our goal to abstract away from word sequences also leads us to language models as models of processing, first in the form of a part-of-speech tagger, then in the form of a syntactic parser. While our hypotheses are largely confirmed, we also observe that advanced learners bundle most, and that scientific language may show lower surprisal than spoken language.

01 01 JB code scl.82.03gra 06 10.1075/scl.82.03gra 57 80 24 Chapter 5 01 04 Chapter 3. Fine-tuning lexical bundles Chapter 3. Fine-tuning lexical bundles 01 04 A methodological reflection in the context of describing drug-drug interactions A methodological reflection in the context of describing drug-drug interactions 1 A01 01 JB code 22320522 Łukasz Grabowski Grabowski, Łukasz Łukasz Grabowski University of Opole 07 https://benjamins.com/catalog/persons/22320522 03 00

This chapter has two major aims. First, it attempts to extend earlier research on recurrent phraseologies used in the pharmaceutical field (Grabowski 2015) by exploring the use, distribution and functions of lexical bundles found in English texts describing drug-drug interactions. Conducted from an applied perspective, the study uses 300 text samples extracted from DrugDDI Corpus originally collected in the Drugbank database (Segura-Bedmar et al. 2010). Apart from presenting new descriptive data, the second aim of the chapter is to reflect on the ways lexical bundles have been typically explored across different text types and genres. The problems discussed in the chapter concern the methods used to deal with structurally incomplete bundles, filter out overlapping bundles, and select, for the purposes of qualitative analyses, a representative sample of bundles other than the most frequent ones. This chapter is therefore meant to help researchers fine tune the methodologies used to explore lexical bundles depending on the specificity of the research material, research questions and scope of the analysis.

01 01 JB code scl.82.04tic 06 10.1075/scl.82.04tic 81 104 24 Chapter 6 01 04 Chapter 4. Lexical obsolescence and loss in English: 1700-2000 Chapter 4. Lexical obsolescence and loss in English: 1700–2000 1 A01 01 JB code 115320523 Ondřej Tichý Tichý, Ondřej Ondřej Tichý Charles University in Prague 07 https://benjamins.com/catalog/persons/115320523 03 00

This paper explores a new methodology for extracting forms that were once common but are now obsolete, from large corpora. It proceeds from the relatively under-researched problem of lexical mortality, or obsolescence in general, to the formulation of two closely related procedures for querying the n-gram data of the Google Books project in order to identify the best word and lexical expression candidates that may have become lost or obsolete in the course of the last three centuries, from the Late Modern era to Present-day English (1700–2000). After describing the techniques used to process big uni- and trigram data, this chapter offers a selective analysis of the results and proposes ways the methodology may be of help to corpus linguists as well as historical lexicographers.

01 01 JB code scl.82.p2 06 10.1075/scl.82.p2 Section header 7 01 04 Part II. Patterns in utilitarian texts Part II. Patterns in utilitarian texts 01 01 JB code scl.82.05pin 06 10.1075/scl.82.05pin 107 130 24 Chapter 8 01 04 Chapter 5. Constance and variability Chapter 5. Constance and variability 01 04 Using PoS-grams to find phraseologies in the language of newspapers Using PoS-grams to find phraseologies in the language of newspapers 1 A01 01 JB code 323320524 Antonio Pinna Pinna, Antonio Antonio Pinna Università degli Studi di Sassari 07 https://benjamins.com/catalog/persons/323320524 2 A01 01 JB code 639320525 David Brett Brett, David David Brett Università degli Studi di Sassari 07 https://benjamins.com/catalog/persons/639320525 03 00

This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).

01 01 JB code scl.82.06goz 06 10.1075/scl.82.06goz 131 158 28 Chapter 9 01 04 Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence 01 04 Exploring semantic sequences in judicial discourse Exploring semantic sequences in judicial discourse 1 A01 01 JB code 430320526 Stanisław Goźdź-Roszkowski Goźdź-Roszkowski, Stanisław Stanisław Goźdź-Roszkowski University of Lódz 07 https://benjamins.com/catalog/persons/430320526 03 00

The article investigates the link between lexical and meaning patterns in the specialized discourse of judicial opinions. It presents an analysis of the N that pattern in a corpus of US Supreme Court opinions. The analysis looks at the distribution of a selection of nouns found in the pattern across different discourse functions. It is shown that judicial opinions use a range of status-indicating nouns in the N that pattern to perform five main functions: evaluation, cause, result, confirmation and existence. Yet, evaluation plays a central role in judicial writing and most status-indicating nouns are used to signal sites of contentions, i.e. challenged propositions are likely to be labelled as arguments, assumptions, notions or suggestions. By drawing on the concept of semantic sequence (Hunston 2008), the analysis illustrates how corpus-based and corpus-driven approaches can complement one another to build a picture of common epistemological practices in the corpus of legal texts.

01 01 JB code scl.82.07leh 06 10.1075/scl.82.07leh 159 186 28 Chapter 10 01 04 Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament 1 A01 01 JB code 487320527 Anu Lehto Lehto, Anu Anu Lehto University of Helsinki 07 https://benjamins.com/catalog/persons/487320527 03 00

This chapter analyses three-word sequences in Early Modern and Present-day English legal writing by defining their grammatical and functional distribution in Acts of Parliament. The method follows a corpus-driven approach: the lexical bundles are retrieved automatically from the corpus using frequency as the criterion. The study indicates that lexical bundles in acts extend to the textual level and reveals consistent word combinations on the level of the lexis. The study illustrates that the acts are established as a genre, and the overall distribution of both grammatical types and functions of bundles is rather similar in all the analysed periods. Nevertheless, textual organisation is more important in contemporary acts and textual links further become more specific, although early modern bundles already show textual patterning. Noun phrase and prepositional phrases also increase in contemporary acts, indicating a change to nominal writing conventions.

01 01 JB code scl.82.p3 06 10.1075/scl.82.p3 Section header 11 01 04 Part III. Patterns in online texts Part III. Patterns in online texts 01 01 JB code scl.82.08hil 06 10.1075/scl.82.08hil 189 212 24 Chapter 12 01 04 Chapter 8. Lexical bundles in Wikipedia articles and related texts Chapter 8. Lexical bundles in Wikipedia articles and related texts 01 04 Exploring disciplinary variation Exploring disciplinary variation 1 A01 01 JB code 697320528 Turo Hiltunen Hiltunen, Turo Turo Hiltunen University of Helsinki 07 https://benjamins.com/catalog/persons/697320528 03 00

Wikipedia is widely used by academics and students in higher education, but research on the linguistic characteristics of this genre is scarce (Kuteeva 2016). This paper explores the usefulness of lexical bundles as an analytical tool to describe disciplinary variation within Wikipedia articles, and to contrast Wikipedia writing with two neighbouring genres, student essays and research articles. The results indicate that the occurrence of lexical bundles in Wikipedia varies between disciplines, which is in broad agreement with previous studies on other academic genres. The analysis of bundles also suggests that a credible authorial persona is less crucial to Wikipedia articles. Indicative of this is the low frequency of bundles indicating stance and engagement, which are characteristic of professional academic writing (e.g. Hyland 2008a).

01 01 JB code scl.82.09mcv 06 10.1075/scl.82.09mcv 213 250 38 Chapter 13 01 04 Chapter 9. Join us for this Chapter 9. Join us for this 01 04 Lexical bundles and repetition in email marketing texts Lexical bundles and repetition in email marketing texts 1 A01 01 JB code 671320529 Joseph McVeigh McVeigh, Joseph Joseph McVeigh University of Jyväskylä 07 https://benjamins.com/catalog/persons/671320529 03 00

This paper researches the lexical bundles of email marketing texts targeted at lawyers. The goal is to research the repetitive nature of email marketing. The research uses a corpus of email marketing texts targeted at lawyers, legal case decisions and blog posts written by and for labor and employment lawyers. The results show that the email marketing texts do not borrow lexical bundles from either of the other text types and that much of the language is predetermined by a template. This paper also presents the advantages of using range rather than frequency to analyze lexical bundles.

01 01 JB code scl.82.10bar 06 10.1075/scl.82.10bar 251 276 26 Chapter 14 01 04 Chapter 10. I don't want to and don't get me wrong Chapter 10. I don’t want to and don’t get me wrong 01 04 Lexical bundles as a window to subjectivity and intersubjectivity in American blogs Lexical bundles as a window to subjectivity and intersubjectivity in American blogs 1 A01 01 JB code 612320530 Federica Barbieri Barbieri, Federica Federica Barbieri Swansea University 07 https://benjamins.com/catalog/persons/612320530 03 00

Blogs are one of the most prominent genres of Web 2.0; yet, research on their linguistic characteristics is limited. This study contributes to addressing this research gap by investigating lexical bundles in American blogs. Lexical bundles are units of discourse structure which can reveal a great deal about the unique linguistic characteristics and communicative functions shaping registers. Extraction of four-word bundles in a corpus of American blogs reveals, firstly, that lexical bundles are relatively uncommon in blog writing. Analyses of discourse function and grammatical patterns show that blogs rely mainly on stance expressions, which often encapsulate first person reference (e.g., I don’t want to), thus reflecting the focus on self-expression and subjectivity which characterizes this register. Like in conversation, bundles in blogs tend to be verb-phrase based. But blogs also rely substantially on referential (e.g., a lot of people) and narrative expressions (e.g., I got to see), and thus share characteristics of literate registers and fiction writing. In sum, lexical bundles in blog writing are characterized by a unique combination of features which reflect two underlying forces: mode and communicative purpose.

01 01 JB code scl.82.11kop 06 10.1075/scl.82.11kop 277 310 34 Chapter 15 01 04 Chapter 11. Blogging around the world Chapter 11. Blogging around the world 01 04 Universal and localised patterns in Online Englishes Universal and localised patterns in Online Englishes 1 A01 01 JB code 595320531 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 07 https://benjamins.com/catalog/persons/595320531 2 A01 01 JB code 881320532 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University Växjö 07 https://benjamins.com/catalog/persons/881320532 03 00

The borderless nature of blogging raises the question whether the traditional regionally defined varieties of English continue to hold true (see Crystal 2011). In order to investigate the extent to which the language published online without external intervention is similar around the world, this chapter investigates repetitive patterns, or 3-grams, found in blogs in the 583-million-word GloWbE corpus (Davies 2013). The data shows two types of repetitive word sequences: universal, or those that are frequent in all or most of the nineteen geographic locations represented in the corpus, and localised, or those unique to specific regions. We explore multiple ways of approaching the regional distribution of universal and localised 3-grams, such as statistical similarity measures (Jaccard coefficient and hierarchical clustering) and network visualisations. Three correlated research issues are addressed by this study: (1) the ratio of 3-grams in blogs from various World Englishes, which will shed light onto the degree of formulaicity in Web Englishes around the world; (2) the overlaps between various locations in terms of preferred sequences, which may point to local or global standardization hubs on the level of sentence and text construction; (3) finally, the status of model-providing varieties for internet communication, especially American English, in view of the most frequent 3-grams from other locations (cf. Mair 2013).

01 01 JB code scl.82.index 06 10.1075/scl.82.index 311 311 1 Miscellaneous 16 01 04 Index Index
01 JB code JBENJAMINS John Benjamins Publishing Company 01 01 JB code JB John Benjamins Publishing Company 01 https://benjamins.com 02 https://benjamins.com/catalog/scl.82 Amsterdam NL 00 John Benjamins Publishing Company Marketing Department / Karin Plijnaar, Pieter Lamers onix@benjamins.nl 04 01 00 20180313 C 2018 John Benjamins D 2018 John Benjamins 02 WORLD WORLD US CA MX 09 01 JB 1 John Benjamins Publishing Company +31 20 6304747 +31 20 6739773 bookorder@benjamins.nl 01 https://benjamins.com 21 37 20 01 00 Unqualified price 02 JB 1 02 99.00 EUR 02 00 Unqualified price 02 83.00 01 Z 0 GBP GB US CA MX 01 01 JB 2 John Benjamins Publishing Company +1 800 562-5666 +1 703 661-1501 benjamins@presswarehouse.com 01 https://benjamins.com 21 37 20 01 00 Unqualified price 02 JB 1 02 149.00 USD
473016568 03 01 01 JB code JB John Benjamins Publishing Company 01 JB code SCL 82 Eb 15 9789027264565 06 10.1075/scl.82 13 2017050052 00 EA E107 10 01 JB code SCL 02 1388-0373 02 82.00 01 02 Studies in Corpus Linguistics Studies in Corpus Linguistics 11 01 JB code jbe-all 01 02 Full EBA collection (ca. 4,200 titles) 11 01 JB code jbe-eba-2023 01 02 Compact EBA Collection 2023 (ca. 700 titles, starting 2018) 11 01 JB code jbe-2018 01 02 2018 collection (152 titles) 05 02 2018 collection 01 01 Applications of Pattern-driven Methods in Corpus Linguistics Applications of Pattern-driven Methods in Corpus Linguistics 1 B01 01 JB code 322243643 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 07 https://benjamins.com/catalog/persons/322243643 2 B01 01 JB code 323243644 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University Växjö 07 https://benjamins.com/catalog/persons/323243644 01 eng 11 321 03 03 vii 03 00 313 03 01 23 410.1/88 03 2018 P128.C68 04 Corpora (Linguistics)--Congresses. 04 Applied linguistics--Congresses. 10 LAN009000 12 CFX 24 JB code LIN.COMPUT Computational & corpus linguistics 24 JB code LIN.CORP Corpus linguistics 24 JB code LIN.THEOR Theoretical linguistics 01 06 02 00 This volume proposes the term pattern-driven approach as a more precise alternative. The chapters illustrate a variety of methods that fall under this broad methodology, such as lexical bundles, POS-grams and semantic frames, and demonstrate how these approaches can uncover new understandings of both synchronic and diachronic linguistic phenomena. 03 00 The use of corpora has conventionally been envisioned as being either corpus-based or corpus-driven. While the formal definition of the latter term has been widely accepted since it was established by Tognini-Bonelli (2001), it is often applied to studies that do not, in fact, fullfil the fundamental requirement of a theory-neutral starting point. This volume proposes the term pattern-driven as a more precise alternative. The chapters illustrate a variety of methods that fall under this broad methodology, such as the extraction of lexical bundles, POS-grams and semantic frames, and demonstrate how these approaches can uncover new understandings of both synchronic and diachronic linguistic phenomena. 01 00 03 01 01 D503 https://benjamins.com/covers/475/scl.82.png 01 01 D502 https://benjamins.com/covers/475_jpg/9789027200136.jpg 01 01 D504 https://benjamins.com/covers/475_tif/9789027200136.tif 01 01 D503 https://benjamins.com/covers/1200_front/scl.82.hb.png 01 01 D503 https://benjamins.com/covers/125/scl.82.png 02 00 03 01 01 D503 https://benjamins.com/covers/1200_back/scl.82.hb.png 03 00 03 01 01 D503 https://benjamins.com/covers/3d_web/scl.82.hb.png 01 01 JB code scl.82.ack 06 10.1075/scl.82.ack vii vii 1 Miscellaneous 1 01 04 Acknowledgements Acknowledgements 01 01 JB code scl.82.01tyr 06 10.1075/scl.82.01tyr 1 12 12 Chapter 2 01 04 Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics Chapter 1. Present applications and future directions in pattern-driven approaches to corpus linguistics 1 A01 01 JB code 95320518 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University 07 https://benjamins.com/catalog/persons/95320518 2 A01 01 JB code 353320519 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 07 https://benjamins.com/catalog/persons/353320519 01 01 JB code scl.82.p1 06 10.1075/scl.82.p1 Section header 3 01 04 Part I. Methodological explorations Part I. Methodological explorations 01 01 JB code scl.82.02sch 06 10.1075/scl.82.02sch 15 56 42 Chapter 4 01 04 Chapter 2. From lexical bundles to surprisal and language models Chapter 2. From lexical bundles to surprisal and language models 01 04 Measuring the idiom principle in native and learner language Measuring the idiom principle in native and learner language 1 A01 01 JB code 662320520 Gerold Schneider Schneider, Gerold Gerold Schneider University of Zurich and University of Konstanz 07 https://benjamins.com/catalog/persons/662320520 2 A01 01 JB code 901320521 Gintare Grigonyte Grigonyte, Gintare Gintare Grigonyte University of Stockholm 07 https://benjamins.com/catalog/persons/901320521 03 00

We exploit the information theoretic measure of surprisal to analyze the formulaicity of lexical sequences. We first show the prevalence of individual lexical bundles, then we argue that abstracting to surprisal as an information-theoretic measure of lexical bundleness, formulaicity and non-creativity is an appropriate measure for the idiom principle, as it expresses reader expectations and text entropy. As strong and gradient formulaic, idiomatic and selectional preferences prevail on all levels, we argue for the abstraction step from individual bundles to measures of bundleness. We use surprisal to analyse differences between genres of native language use, and learner language at different levels: (a) spoken and written genres of native language (L1); (b) spoken and written learner language (L2), across selected written genres; (c) learner language as compared with native language (L1). We thus test Pawley and Syder (1983)’s hypothesis that native speakers know best how to play the tug-of-war between formulaicity (Sinclair’s idiom principle) and expressiveness (Sinclair’s open-choice principle), which can be measured with Levy and Jaeger (2007)’s uniform information density (UID) which is a principle of minimizing comprehension difficulty. Our goal to abstract away from word sequences also leads us to language models as models of processing, first in the form of a part-of-speech tagger, then in the form of a syntactic parser. While our hypotheses are largely confirmed, we also observe that advanced learners bundle most, and that scientific language may show lower surprisal than spoken language.

01 01 JB code scl.82.03gra 06 10.1075/scl.82.03gra 57 80 24 Chapter 5 01 04 Chapter 3. Fine-tuning lexical bundles Chapter 3. Fine-tuning lexical bundles 01 04 A methodological reflection in the context of describing drug-drug interactions A methodological reflection in the context of describing drug-drug interactions 1 A01 01 JB code 22320522 Łukasz Grabowski Grabowski, Łukasz Łukasz Grabowski University of Opole 07 https://benjamins.com/catalog/persons/22320522 03 00

This chapter has two major aims. First, it attempts to extend earlier research on recurrent phraseologies used in the pharmaceutical field (Grabowski 2015) by exploring the use, distribution and functions of lexical bundles found in English texts describing drug-drug interactions. Conducted from an applied perspective, the study uses 300 text samples extracted from DrugDDI Corpus originally collected in the Drugbank database (Segura-Bedmar et al. 2010). Apart from presenting new descriptive data, the second aim of the chapter is to reflect on the ways lexical bundles have been typically explored across different text types and genres. The problems discussed in the chapter concern the methods used to deal with structurally incomplete bundles, filter out overlapping bundles, and select, for the purposes of qualitative analyses, a representative sample of bundles other than the most frequent ones. This chapter is therefore meant to help researchers fine tune the methodologies used to explore lexical bundles depending on the specificity of the research material, research questions and scope of the analysis.

01 01 JB code scl.82.04tic 06 10.1075/scl.82.04tic 81 104 24 Chapter 6 01 04 Chapter 4. Lexical obsolescence and loss in English: 1700-2000 Chapter 4. Lexical obsolescence and loss in English: 1700–2000 1 A01 01 JB code 115320523 Ondřej Tichý Tichý, Ondřej Ondřej Tichý Charles University in Prague 07 https://benjamins.com/catalog/persons/115320523 03 00

This paper explores a new methodology for extracting forms that were once common but are now obsolete, from large corpora. It proceeds from the relatively under-researched problem of lexical mortality, or obsolescence in general, to the formulation of two closely related procedures for querying the n-gram data of the Google Books project in order to identify the best word and lexical expression candidates that may have become lost or obsolete in the course of the last three centuries, from the Late Modern era to Present-day English (1700–2000). After describing the techniques used to process big uni- and trigram data, this chapter offers a selective analysis of the results and proposes ways the methodology may be of help to corpus linguists as well as historical lexicographers.

01 01 JB code scl.82.p2 06 10.1075/scl.82.p2 Section header 7 01 04 Part II. Patterns in utilitarian texts Part II. Patterns in utilitarian texts 01 01 JB code scl.82.05pin 06 10.1075/scl.82.05pin 107 130 24 Chapter 8 01 04 Chapter 5. Constance and variability Chapter 5. Constance and variability 01 04 Using PoS-grams to find phraseologies in the language of newspapers Using PoS-grams to find phraseologies in the language of newspapers 1 A01 01 JB code 323320524 Antonio Pinna Pinna, Antonio Antonio Pinna Università degli Studi di Sassari 07 https://benjamins.com/catalog/persons/323320524 2 A01 01 JB code 639320525 David Brett Brett, David David Brett Università degli Studi di Sassari 07 https://benjamins.com/catalog/persons/639320525 03 00

This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).

01 01 JB code scl.82.06goz 06 10.1075/scl.82.06goz 131 158 28 Chapter 9 01 04 Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence 01 04 Exploring semantic sequences in judicial discourse Exploring semantic sequences in judicial discourse 1 A01 01 JB code 430320526 Stanisław Goźdź-Roszkowski Goźdź-Roszkowski, Stanisław Stanisław Goźdź-Roszkowski University of Lódz 07 https://benjamins.com/catalog/persons/430320526 03 00

The article investigates the link between lexical and meaning patterns in the specialized discourse of judicial opinions. It presents an analysis of the N that pattern in a corpus of US Supreme Court opinions. The analysis looks at the distribution of a selection of nouns found in the pattern across different discourse functions. It is shown that judicial opinions use a range of status-indicating nouns in the N that pattern to perform five main functions: evaluation, cause, result, confirmation and existence. Yet, evaluation plays a central role in judicial writing and most status-indicating nouns are used to signal sites of contentions, i.e. challenged propositions are likely to be labelled as arguments, assumptions, notions or suggestions. By drawing on the concept of semantic sequence (Hunston 2008), the analysis illustrates how corpus-based and corpus-driven approaches can complement one another to build a picture of common epistemological practices in the corpus of legal texts.

01 01 JB code scl.82.07leh 06 10.1075/scl.82.07leh 159 186 28 Chapter 10 01 04 Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament Chapter 7. Lexical bundles in Early Modern and Present-day English Acts of Parliament 1 A01 01 JB code 487320527 Anu Lehto Lehto, Anu Anu Lehto University of Helsinki 07 https://benjamins.com/catalog/persons/487320527 03 00

This chapter analyses three-word sequences in Early Modern and Present-day English legal writing by defining their grammatical and functional distribution in Acts of Parliament. The method follows a corpus-driven approach: the lexical bundles are retrieved automatically from the corpus using frequency as the criterion. The study indicates that lexical bundles in acts extend to the textual level and reveals consistent word combinations on the level of the lexis. The study illustrates that the acts are established as a genre, and the overall distribution of both grammatical types and functions of bundles is rather similar in all the analysed periods. Nevertheless, textual organisation is more important in contemporary acts and textual links further become more specific, although early modern bundles already show textual patterning. Noun phrase and prepositional phrases also increase in contemporary acts, indicating a change to nominal writing conventions.

01 01 JB code scl.82.p3 06 10.1075/scl.82.p3 Section header 11 01 04 Part III. Patterns in online texts Part III. Patterns in online texts 01 01 JB code scl.82.08hil 06 10.1075/scl.82.08hil 189 212 24 Chapter 12 01 04 Chapter 8. Lexical bundles in Wikipedia articles and related texts Chapter 8. Lexical bundles in Wikipedia articles and related texts 01 04 Exploring disciplinary variation Exploring disciplinary variation 1 A01 01 JB code 697320528 Turo Hiltunen Hiltunen, Turo Turo Hiltunen University of Helsinki 07 https://benjamins.com/catalog/persons/697320528 03 00

Wikipedia is widely used by academics and students in higher education, but research on the linguistic characteristics of this genre is scarce (Kuteeva 2016). This paper explores the usefulness of lexical bundles as an analytical tool to describe disciplinary variation within Wikipedia articles, and to contrast Wikipedia writing with two neighbouring genres, student essays and research articles. The results indicate that the occurrence of lexical bundles in Wikipedia varies between disciplines, which is in broad agreement with previous studies on other academic genres. The analysis of bundles also suggests that a credible authorial persona is less crucial to Wikipedia articles. Indicative of this is the low frequency of bundles indicating stance and engagement, which are characteristic of professional academic writing (e.g. Hyland 2008a).

01 01 JB code scl.82.09mcv 06 10.1075/scl.82.09mcv 213 250 38 Chapter 13 01 04 Chapter 9. Join us for this Chapter 9. Join us for this 01 04 Lexical bundles and repetition in email marketing texts Lexical bundles and repetition in email marketing texts 1 A01 01 JB code 671320529 Joseph McVeigh McVeigh, Joseph Joseph McVeigh University of Jyväskylä 07 https://benjamins.com/catalog/persons/671320529 03 00

This paper researches the lexical bundles of email marketing texts targeted at lawyers. The goal is to research the repetitive nature of email marketing. The research uses a corpus of email marketing texts targeted at lawyers, legal case decisions and blog posts written by and for labor and employment lawyers. The results show that the email marketing texts do not borrow lexical bundles from either of the other text types and that much of the language is predetermined by a template. This paper also presents the advantages of using range rather than frequency to analyze lexical bundles.

01 01 JB code scl.82.10bar 06 10.1075/scl.82.10bar 251 276 26 Chapter 14 01 04 Chapter 10. I don't want to and don't get me wrong Chapter 10. I don’t want to and don’t get me wrong 01 04 Lexical bundles as a window to subjectivity and intersubjectivity in American blogs Lexical bundles as a window to subjectivity and intersubjectivity in American blogs 1 A01 01 JB code 612320530 Federica Barbieri Barbieri, Federica Federica Barbieri Swansea University 07 https://benjamins.com/catalog/persons/612320530 03 00

Blogs are one of the most prominent genres of Web 2.0; yet, research on their linguistic characteristics is limited. This study contributes to addressing this research gap by investigating lexical bundles in American blogs. Lexical bundles are units of discourse structure which can reveal a great deal about the unique linguistic characteristics and communicative functions shaping registers. Extraction of four-word bundles in a corpus of American blogs reveals, firstly, that lexical bundles are relatively uncommon in blog writing. Analyses of discourse function and grammatical patterns show that blogs rely mainly on stance expressions, which often encapsulate first person reference (e.g., I don’t want to), thus reflecting the focus on self-expression and subjectivity which characterizes this register. Like in conversation, bundles in blogs tend to be verb-phrase based. But blogs also rely substantially on referential (e.g., a lot of people) and narrative expressions (e.g., I got to see), and thus share characteristics of literate registers and fiction writing. In sum, lexical bundles in blog writing are characterized by a unique combination of features which reflect two underlying forces: mode and communicative purpose.

01 01 JB code scl.82.11kop 06 10.1075/scl.82.11kop 277 310 34 Chapter 15 01 04 Chapter 11. Blogging around the world Chapter 11. Blogging around the world 01 04 Universal and localised patterns in Online Englishes Universal and localised patterns in Online Englishes 1 A01 01 JB code 595320531 Joanna Kopaczyk Kopaczyk, Joanna Joanna Kopaczyk University of Glasgow 07 https://benjamins.com/catalog/persons/595320531 2 A01 01 JB code 881320532 Jukka Tyrkkö Tyrkkö, Jukka Jukka Tyrkkö Linnaeus University Växjö 07 https://benjamins.com/catalog/persons/881320532 03 00

The borderless nature of blogging raises the question whether the traditional regionally defined varieties of English continue to hold true (see Crystal 2011). In order to investigate the extent to which the language published online without external intervention is similar around the world, this chapter investigates repetitive patterns, or 3-grams, found in blogs in the 583-million-word GloWbE corpus (Davies 2013). The data shows two types of repetitive word sequences: universal, or those that are frequent in all or most of the nineteen geographic locations represented in the corpus, and localised, or those unique to specific regions. We explore multiple ways of approaching the regional distribution of universal and localised 3-grams, such as statistical similarity measures (Jaccard coefficient and hierarchical clustering) and network visualisations. Three correlated research issues are addressed by this study: (1) the ratio of 3-grams in blogs from various World Englishes, which will shed light onto the degree of formulaicity in Web Englishes around the world; (2) the overlaps between various locations in terms of preferred sequences, which may point to local or global standardization hubs on the level of sentence and text construction; (3) finally, the status of model-providing varieties for internet communication, especially American English, in view of the most frequent 3-grams from other locations (cf. Mair 2013).

01 01 JB code scl.82.index 06 10.1075/scl.82.index 311 311 1 Miscellaneous 16 01 04 Index Index
01 JB code JBENJAMINS John Benjamins Publishing Company 01 01 JB code JB John Benjamins Publishing Company 01 https://benjamins.com 02 https://benjamins.com/catalog/scl.82 Amsterdam NL 00 John Benjamins Publishing Company Marketing Department / Karin Plijnaar, Pieter Lamers onix@benjamins.nl 04 01 00 20180313 C 2018 John Benjamins D 2018 John Benjamins 02 WORLD 13 15 9789027200136 WORLD 09 01 JB 3 John Benjamins e-Platform 03 https://jbe-platform.com 29 https://jbe-platform.com/content/books/9789027264565 21 01 00 Unqualified price 02 99.00 EUR 01 00 Unqualified price 02 83.00 GBP GB 01 00 Unqualified price 02 149.00 USD