Tagging a Corpus of Spoken Swedish
Joakim Nivre | Department of Linguistics, Göteborg University, Sweden
Leif Grönqvist | Department of Linguistics, Göteborg University, Sweden
In this article, we present and evaluate a method for training a statistical part-of-speech tagger on data from written language and then adapting it to the requirements of tagging a corpus of transcribed spoken language, in our case spoken Swedish. This is currently a significant problem for many research groups working with spoken language, since the availability of tagged training data from spoken language is still very limited for most languages. The overall accuracy of the tagger developed for spoken Swedish is quite respectable, varying from 95% to 97% depending on the tagset used. In conclusion, we argue that the method presented here gives good tagging accuracy with relatively little effort.
Published online: 17 December 2001
https://doi.org/10.1075/ijcl.6.1.03niv
https://doi.org/10.1075/ijcl.6.1.03niv
Cited by
Cited by 3 other publications
Allwood, Jens, Leif Grönqvist, Elisabeth Ahlsén & Magnus Gunnarsson
Osimk-Teasdale, Ruth & Nora Dorn
This list is based on CrossRef data as of 15 april 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.