Multi-dimensional register classification using bigrams

Crossley, Scott; Louwerse, Max M.

doi:10.1075/ijcl.12.4.02cro

Article published In:

International Journal of Corpus Linguistics
Vol. 12:4 (2007) ► pp.453–478

Multi-dimensional register classification using bigrams

Scott Crossley | Mississippi State University

Max M. Louwerse | University of Memphis

A corpus linguistic analysis investigated register classification using frequency of bigrams in nine spoken and two written corpora. Four dimensions emerged from a factor analysis using bigram frequencies shared across corpora: (1) Scripted vs. Unscripted Discourse, (2) Deliberate vs. Unplanned Discourse, (3) Spatial vs. Non-Spatial Discourse, and (4) Directional vs. Non-Directional Discourse. These findings were replicated in a second analysis. Both analyses demonstrate the strength of bigrams for classifying spoken and written registers, especially in locating distinct collocations among spoken corpora, as well as revealing syntactic and discourse features through a data-driven approach.

Keywords: register variation, multi-dimensional analysis, bigrams, collocations

Published online: 20 December 2007

https://doi.org/10.1075/ijcl.12.4.02cro

Cited by (21)

Cited by 21 other publications

Order by:

Alzetta, Chiara, Felice Dell'Orletta, Alessio Miaschi, Elena Prat & Giulia Venturi

2024. Tell me how you write and I'll tell you what you read: a study on the writing style of book reviews. Journal of Documentation 80:1 ► pp. 180 ff.

Bender, Michael, Noah Bubenhofer & Nina Janich

2024. Die öffentliche Aushandlung von Expertise: Wissenschaftsblogs als Ort eristischer Verständigung? Exploratorischer Einstieg in ein Forschungsprojekt. Zeitschrift für germanistische Linguistik 52:1 ► pp. 183 ff.

Giampieri, Patrizia

2024. Key n-Grams in EU Directives and in the UK National Legislation on Consumer Contracts. International Journal for the Semiotics of Law - Revue internationale de Sémiotique juridique 37:1 ► pp. 59 ff.

Berber Sardinha, Tony

2023. Corpus linguistics and historiography. Journal of Research Design and Statistics in Linguistics and Communication Science 7:1 ► pp. 69 ff.

Fitzsimmons-Doolan, Shannon & Jennifer Beseres Pollack

2023. Shifting linguistic patterns in oyster restoration news articles surrounding the Deepwater Horizon disaster. Frontiers in Conservation Science 4

Chen, Dongyan & Gengxin Sun

2022. Constructing a Data-Driven Model of English Language Teaching with a Multidimensional Corpus. Mathematical Problems in Engineering 2022 ► pp. 1 ff.

Gries, Stefan Th.

2022. Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach. Lexis :19

Kim, Hyojung, Inho Cho & Minjung Park

2022. Analyzing genderless fashion trends of consumers’ perceptions on social media: using unstructured big data analysis through Latent Dirichlet Allocation-based topic modeling. Fashion and Textiles 9:1

Pincemin, Bénédicte, Alexei Lavrentiev & Céline Guillot-Barbance

2020. Using the First Axis of a Correspondence Analysis as an Analytic Tool. In Text Analytics [Studies in Classification, Data Analysis, and Knowledge Organization, ], ► pp. 127 ff.

Biber, Douglas & Susan Conrad

2019. Register, Genre, and Style,

Hassan, Nik Rushdi & Lars Mathiassen

2018. Distilling a body of knowledge for information systems development. Information Systems Journal 28:1 ► pp. 175 ff.

Zhang, Panpan, Tingwen Liu, Yang Zhang, Jing Ya, Jinqiao Shi & Yubin Wang

2017. Domain Watcher: Detecting Malicious Domains Based on Local and Global Textual Features. Procedia Computer Science 108 ► pp. 2408 ff.

Berber Sardinha, Tony, Carlos Kauffmann & Cristina Mayer Acunzo

2014. Chapter 1.2 Dimensions of register variation in Brazilian Portuguese. In Multi-Dimensional Analysis, 25 years on [Studies in Corpus Linguistics, 60], ► pp. 35 ff.

Sardinha, Tony Berber, Carlos Kauffmann & Cristina Mayer Acunzo

2014. A multi-dimensional analysis of register variation in Brazilian Portuguese. Corpora 9:2 ► pp. 239 ff.

Bernardini, Silvia & Adriano Ferraresi

2013. Old Needs, New Solutions: Comparable Corpora for Language Professionals. In Building and Using Comparable Corpora, ► pp. 303 ff.

Cao, Yan & Richard Xiao

2013. A multi-dimensional contrastive study of English abstracts by native and non-native writers. Corpora 8:2 ► pp. 209 ff.

Lin, Yen-Liang

2013. Discourse Functions of Recurrent Multi-word Sequences in Online and Spoken Intercultural Communication. In Yearbook of Corpus Linguistics and Pragmatics 2013 [Yearbook of Corpus Linguistics and Pragmatics, 1], ► pp. 105 ff.

Spina, Stefania & Elena Tanganelli

2012. Les collocations comme indice pour distinguer les genres textuels. Corpus :11

Crossley, Scott & Thomas Lee Salsbury

2011. The development of lexical bundle accuracy and production in English second language speakers. IRAL - International Review of Applied Linguistics in Language Teaching 49:1 ► pp. 1 ff.

Sharoff, Serge

2010. In the Garden and in the Jungle. In Genres on the Web [Text, Speech and Language Technology, 42], ► pp. 149 ff.

Louwerse, Max M., Scott A. Crossley & Patrick Jeuniaux

2008. What if? Conditionals in educational registers. Linguistics and Education 19:1 ► pp. 56 ff.

This list is based on CrossRef data as of 11 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.