A corpus linguistic analysis investigated register classification using frequency of bigrams in nine spoken and two written corpora. Four dimensions emerged from a factor analysis using bigram frequencies shared across corpora: (1) Scripted vs. Unscripted Discourse, (2) Deliberate vs. Unplanned Discourse, (3) Spatial vs. Non-Spatial Discourse, and (4) Directional vs. Non-Directional Discourse. These findings were replicated in a second analysis. Both analyses demonstrate the strength of bigrams for classifying spoken and written registers, especially in locating distinct collocations among spoken corpora, as well as revealing syntactic and discourse features through a data-driven approach.
Sardinha, Tony Berber, Carlos Kauffmann & Cristina Mayer Acunzo
2014. A multi-dimensional analysis of register variation in Brazilian Portuguese. Corpora 9:2 ► pp. 239 ff.
Bernardini, Silvia & Adriano Ferraresi
2013. Old Needs, New Solutions: Comparable Corpora for Language Professionals. In Building and Using Comparable Corpora, ► pp. 303 ff.
Biber, Douglas & Susan Conrad
2019. Register, Genre, and Style,
Cao, Yan & Richard Xiao
2013. A multi-dimensional contrastive study of English abstracts by native and non-native writers. Corpora 8:2 ► pp. 209 ff.
Chen, Dongyan & Gengxin Sun
2022. Constructing a Data-Driven Model of English Language Teaching with a Multidimensional Corpus. Mathematical Problems in Engineering 2022 ► pp. 1 ff.
Crossley, Scott & Thomas Lee Salsbury
2011. The development of lexical bundle accuracy and production in English second language speakers. IRAL - International Review of Applied Linguistics in Language Teaching 49:1 ► pp. 1 ff.
Fitzsimmons-Doolan, Shannon & Jennifer Beseres Pollack
2023. Shifting linguistic patterns in oyster restoration news articles surrounding the Deepwater Horizon disaster. Frontiers in Conservation Science 4
Gries, Stefan Th.
2022. Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach. Lexis :19
Hassan, Nik Rushdi & Lars Mathiassen
2018. Distilling a body of knowledge for information systems development. Information Systems Journal 28:1 ► pp. 175 ff.
Kim, Hyojung, Inho Cho & Minjung Park
2022. Analyzing genderless fashion trends of consumers’ perceptions on social media: using unstructured big data analysis through Latent Dirichlet Allocation-based topic modeling. Fashion and Textiles 9:1
2013. Discourse Functions of Recurrent Multi-word Sequences in Online and Spoken Intercultural Communication. In Yearbook of Corpus Linguistics and Pragmatics 2013 [Yearbook of Corpus Linguistics and Pragmatics, 1], ► pp. 105 ff.
Louwerse, Max M., Scott A. Crossley & Patrick Jeuniaux
2008. What if? Conditionals in educational registers. Linguistics and Education 19:1 ► pp. 56 ff.
2020. Using the First Axis of a Correspondence Analysis as an Analytic Tool. In Text Analytics [Studies in Classification, Data Analysis, and Knowledge Organization, ], ► pp. 127 ff.
2010. In the Garden and in the Jungle. In Genres on the Web [Text, Speech and Language Technology, 42], ► pp. 149 ff.
Spina, Stefania & Elena Tanganelli
2012. Les collocations comme indice pour distinguer les genres textuels. Corpus :11
Zhang, Panpan, Tingwen Liu, Yang Zhang, Jing Ya, Jinqiao Shi & Yubin Wang
2017. Domain Watcher: Detecting Malicious Domains Based on Local and Global Textual Features. Procedia Computer Science 108 ► pp. 2408 ff.
This list is based on CrossRef data as of 22 may 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.