Multi-dimensional register classification using bigrams
A corpus linguistic analysis investigated register classification using frequency of bigrams in nine spoken and two written corpora. Four dimensions emerged from a factor analysis using bigram frequencies shared across corpora: (1) Scripted vs. Unscripted Discourse, (2) Deliberate vs. Unplanned Discourse, (3) Spatial vs. Non-Spatial Discourse, and (4) Directional vs. Non-Directional Discourse. These findings were replicated in a second analysis. Both analyses demonstrate the strength of bigrams for classifying spoken and written registers, especially in locating distinct collocations among spoken corpora, as well as revealing syntactic and discourse features through a data-driven approach.
Keywords: register variation, multi-dimensional analysis, bigrams, collocations
Published online: 20 December 2007
Cited by other publications
Berber Sardinha, Tony, Carlos Kauffmann & Cristina Mayer Acunzo
Bernardini, Silvia & Adriano Ferraresi
Cao, Yan & Richard Xiao
Crossley, Scott & Thomas Lee Salsbury
Hassan, Nik Rushdi & Lars Mathiassen
Louwerse, Max M., Scott A. Crossley & Patrick Jeuniaux
Pincemin, Bénédicte, Alexei Lavrentiev & Céline Guillot-Barbance
Sardinha, Tony Berber, Carlos Kauffmann & Cristina Mayer Acunzo
Spina, Stefania & Elena Tanganelli
Zhang, Panpan, Tingwen Liu, Yang Zhang, Jing Ya, Jinqiao Shi & Yubin Wang
This list is based on CrossRef data as of 02 january 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.