Identifying aboutgrams in engineering texts

Warren, Martin

doi:10.1075/scl.41.09war

Part of

Keyness in Texts
Edited by Marina Bondi and Mike Scott
[Studies in Corpus Linguistics 41] 2010
► pp. 113–126

Identifying aboutgrams in engineering texts

Martin Warren

This paper uses a new computer-mediated methodology, concgramming, to identify the aboutness of a text. Concgrams are the raw products of the concgramming process and consist of up to five co-occurring words irrespective of whether constituency variation (i.e. AB, A*B where * represents an intervening word) and/or positional variation (i.e. AB, BA) is present. Two engineering research articles are concgrammed to identify the most frequently occurring two-word lexical concgrams. The most frequent two-word lexical concgrams for each text are examined to determine whether the words simply co-occur or are meaningfully associated. Once this has been done, a provisional list of “aboutgrams” is drawn up which is tentatively taken to represent the aboutness of each text. These lists are then referred to a specialised corpus of engineering texts and then a general reference corpus. Those aboutgrams on the lists which are consistently more frequent in the text than in the two corpora are then put forward as representing the aboutness of the text. In the study, the lists of aboutgrams are compared with single word frequency lists to evaluate the advantages to be gained from determining aboutness by means of phraseology rather than key words. The conclusion is that aboutgrams are a better means for uncovering the aboutness of the texts.

Published online: 11 November 2010

https://doi.org/10.1075/scl.41.09war

Cited by (7)

Cited by 7 other publications

Order by:

Hou, Zhide

2023. China’s Greater Bay Area Plan and Hong Kong: How Phraseologies Represent Different Voices in the Media. SAGE Open 13:2 ► pp. 215824402311782 ff.

Schröter, Julian, Keli Du, Julia Dudar, Cora Rok & Christof Schöch

2021. From Keyness to Distinctiveness – Triangulation and Evaluationin Computational Literary Studies. Journal of Literary Theory 15:1-2 ► pp. 81 ff.

Chen, Lidan

2018. Use Corpus Keywords to Design Activities in Business English Instruction. In Emerging Technologies for Education [Lecture Notes in Computer Science, 11284], ► pp. 234 ff.

Gozdz-Roszkowski, Stanislaw

2018. Chapter 6. Between corpus-based and corpus-driven approaches to textual recurrence. In Applications of Pattern-driven Methods in Corpus Linguistics [Studies in Corpus Linguistics, 82], ► pp. 131 ff.

Sarfo-Kantankah, Kwabena Sarfo

2018. It's about people: identifying the focus of parliamentary debates through a corpus-driven approach. Corpora 13:3 ► pp. 393 ff.

Murakami, Akira, Paul Thompson, Susan Hunston & Dominik Vajn

2017. ‘What is this corpus about?’: using topic modelling to explore a specialised corpus. Corpora 12:2 ► pp. 243 ff.

Grabowski, Łukasz

2013. Register Variation Across English Pharmaceutical Texts: A Corpus-driven Study of Keywords, Lexical Bundles and Phrase Frames in Patient Information Leaflets and Summaries of Product Characteristics. Procedia - Social and Behavioral Sciences 95 ► pp. 391 ff.

This list is based on CrossRef data as of 16 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.