From a linguistic point of view, spoken corpora should be primary for research but that has not been the case so far. Hence, the problem of what should be included in the corpora has hardly ever been considered. Often it would appear that anything spoken is included on an ad hoc basis. The need and scarcity of real prototypical spoken corpora points to a necessity of mapping the field in its entirety and identifying its relevant parameters. In order to do this the present paper translates the major differences between spoken and written texts into usable parameters. Ultimately this could enable the setting up of a representative spoken corpus with a clear core of real and typical spoken language, etc.
2015. The Comparison of Collocation Use by Turkish and Asian Learners of English: The Case of TCSE Corpus and Icnale Corpus. Procedia - Social and Behavioral Sciences 174 ► pp. 2278 ff.
Komrsková, Zuzana, Marie Kopřivová, David Lukeš, Petra Poukarová & Hana Goláňová
2017. New Spoken Corpora of Czech: ORTOFON and DIALEKT. Journal of Linguistics/Jazykovedný casopis 68:2 ► pp. 219 ff.
Kopřivová, Marie, Zuzana Komrsková, Petra Poukarová & David Lukeš
2019. Relevant Criteria for Selection of Spoken Data: Theory Meets Practice. Journal of Linguistics/Jazykovedný casopis 70:2 ► pp. 324 ff.
This list is based on CrossRef data as of 1 june 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.