Tagging a Corpus of Spoken Swedish

Nivre, Joakim; Grönqvist, Leif

doi:10.1075/ijcl.6.1.03niv

Article published In:

International Journal of Corpus Linguistics
Vol. 6:1 (2001) ► pp.47–78

Tagging a Corpus of Spoken Swedish

Joakim Nivre | Department of Linguistics, Göteborg University, Sweden

Leif Grönqvist | Department of Linguistics, Göteborg University, Sweden

In this article, we present and evaluate a method for training a statistical part-of-speech tagger on data from written language and then adapting it to the requirements of tagging a corpus of transcribed spoken language, in our case spoken Swedish. This is currently a significant problem for many research groups working with spoken language, since the availability of tagged training data from spoken language is still very limited for most languages. The overall accuracy of the tagger developed for spoken Swedish is quite respectable, varying from 95% to 97% depending on the tagset used. In conclusion, we argue that the method presented here gives good tagging accuracy with relatively little effort.

Keywords: statistical part-of-speech tagging, spoken language corpora

Published online: 17 December 2001

https://doi.org/10.1075/ijcl.6.1.03niv

Cited by (4)

Cited by four other publications

Order by:

de Carvalho, Victor Diogho Heuer & Ana Paula Cabral Seixas Costa

2024. Towards corpora creation from social web in Brazilian Portuguese to support public security analyses and decisions. Library Hi Tech 42:4 ► pp. 1080 ff.

Bondi Johannessen, Janne

2017. Annotations in the Nordic Dialect Corpus. In Handbook of Linguistic Annotation, ► pp. 1303 ff.

Osimk-Teasdale, Ruth & Nora Dorn

2016. Accounting for ELF. International Journal of Corpus Linguistics 21:3 ► pp. 372 ff.

Allwood, Jens, Leif Grönqvist, Elisabeth Ahlsén & Magnus Gunnarsson

2003. Annotations and Tools for an Activity Based Spoken Language Corpus. In Current and New Directions in Discourse and Dialogue [Text, Speech and Language Technology, 22], ► pp. 1 ff.

This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.