Chapter 5
Constance and variability
Using PoS-grams to find phraseologies in the language of newspapers
This paper describes the use of a corpus-driven methodology, the retrieval of part-of-speech-grams (PoS-grams), which is extremely effective for the discovery of phraseologies that might otherwise remain hidden. The PoS-gram is a string of part-of-speech categories (Stubbs 2007: 91), the tokens of which are strings of words that have been annotated with these PoS tags. A list of PoS-grams retrieved from a sample corpus can be compared with that from a reference corpus. Statistically significant items are further analysed to identify recurrent patterns and potential phraseologies. The utility of PoS-grams will be illustrated by way of analysis of a one million token corpus composed of texts from ten sections of The Guardian, the Sassari Newspaper Article Corpus (SNAC).
Article outline
- 1.Introduction
-
2.Materials and methods
- 3.Results and discussion
- 3.1Travel PoS-grams
- 3.2Crime PoS-grams
-
3.3Obituaries PoS-grams
- 4.Conclusions
-
Notes
-
References
-
Appendix
References
Baron, Alistair, Rayson, Paul, & Archer, Dawn
2009 Word frequency and key word statistics in historical corpus linguistics.
Anglistik: International Journal of English Studies 20(1): 41–67.

Bednarek, Monika
2008 Emotion Talk across Corpora. Houndmills: Palgrave Macmillan.


Bednarek, Monika & Caple, Helen
2012 News Discourse. London: Continuum.

Biber, Douglas & Barbieri, Federica
2007 Lexical bundles in university spoken and written registers.
English for Specific Purposes 26(3): 263–286.


Biber, Douglas & Conrad, Susan
2009 Register, Genre, and Style. Cambridge: CUP.


Biber, Douglas, Conrad, Susan & Cortes, Viviana
2004
If you look at…: Lexical bundles in university teaching and textbooks.
Applied Linguistics 25(3): 371–405.


Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad Susan, & Finegan, Edward
1999 Longman Grammar of Spoken and Written English. Harlow: Pearson Education.

Cheng, Winnie, Greaves, Chris, Sinclair, John McH., & Warren, Martin
2009 Uncovering the extent of the phraseological tendency: Towards a systematic analysis of Concgrams.
Applied Linguistics 30(2): 236–252.


Cheng, Winnie, Greaves, Chris & Warren, Martin
Carter, Ronald & McCarthy, Michael
2006 Cambridge Grammar of English. Cambridge: CUP.

D’hondt, Eva K. L., Verberne, Suzan, Weber, Niklas, Koster, Kees & Boves, Lou
2012 Using skipgrams and PoS-based feature selection for patent classification.
Computational Linguistics in the Netherlands Journal 2: 52–70.

Fletcher, William
2002–2007 kfNgram. Annapolis MD: USNA.
[URL]> (
10 June 2016).
Francis, Gill, Hunston, Susan & Manning, Elizabeth
1998 Grammar Patterns, 2: Nouns and Adjectives. London: HarperCollins.

Gray, Bethany & Biber, Douglas
Greaves, Chris & Warren, Martin
2010 What can a corpus tell us about multi-word units? In
The Routledge Handbook of Corpus Linguistics,
Anne O’Keeffe &
Michael McCarthy (eds), 212–226. London: Routledge.


Hunston, Susan
2011 Corpus Approaches to Evaluation. London: Routledge.

Hunston, Susan & Francis, Gill
Hunston, Susan & Sinclair, John McH.
2000 A local grammar of evaluation. In
Evaluation in Text. Authorial Stance and the Construction of Discourse,
Susan Hunston &
Geoff Thompson (eds), 74–101. Oxford: OUP.

Hyland, Ken
2008 As can be seen: Lexical bundles and disciplinary variation.
English for Specific Purposes 27: 4–21.


Kopaczyk, Joanna
2013 The Legal Language of Scottish Burghs. Standardization and Lexical Bundles 1380–1560. Oxford: OUP.


Martin, Jim R. & White, Peter R. R.
2005 The Language of Evaluation. Houndmills: Palgrave Macmillan.


Morley, Barry & Sift, Patricia
2006 Towards the automatic identification of directive speech acts. In
Corpus-based Studies of Diachronic English [
Linguistic Insights 31],
Roberta Facchinetti &
Matt Rissanen (eds), 95–112. Bern: Peter Lang.

Reyes, Antonio & Rosso, Paolo
2012 Making objective decisions from subjective data: Detecting irony in customer reviews.
Decision Support System 53: 754–760.


Spiccia, Carmelo, Augello Agnese & Pilato, Giovanni
2015 Posgram driven word prediction. In
Proceedings of the 7th International Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Vol. 1 [
IC3K 2015],
Ana Fred,
Jan Dietz,
David Aveiro,
Kecheng Liu &
Joaquim Filipe (eds), 589–596. Lisbon.


Stubbs, Michael
2007 An example of frequent English phraseology: Distributions, structures and functions. In
Corpus Linguistics 25 Years on,
Roberta Facchinetti (ed.), 89–105. Amsterdam: Rodopi.


Stubbs, Michael & Barth, Isabel
Cited by
Cited by 2 other publications
Clarke, Isobelle, Tony McEnery & Gavin Brookes
Drury, Brett & Samuel Morais Drury
2022.
Lexical Bundle Variation in Business Actors’ Public Communications. In
Text, Speech, and Dialogue [
Lecture Notes in Computer Science, 13502],
► pp. 339 ff.

This list is based on CrossRef data as of 18 november 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.