Drawbacks and Pitfalls of Machine-Readable Texts for Linguistic Research
The paper highlights and discusses some practical issues related to the drawbacks and pitfalls of computerised texts in regard to both databases themselves and the software employed to codify and search them. In the first place, some corpora and databases are compiled in such a way as to be searched and analysed by means of tools which allow only specific kinds of search to be made. This often prevents scholars from carrying out their own free study of the data, thus hindering an effective, targeted analysis. Moreover, in some cases, the need for comprehensiveness leads to the codification and classification of subjective aspects like the text difficulty and the participants' social level This subjectivity of interpretation might mislead the researchers in a socially-orientated analysis. Finally, despite being highly sophisticated, the techniques employed for automated grammatical and part-of-speech tagging as well as for semantic and prosodic parsing appear not to be totally reliable, since mistakes in the codification of simple items are likely to occur. Each of the above thorny issues, together with some other minor matters, are testified to with instances drawn from the author's personal linguistic research on a variety of synchronic and diachronic corpora and databases.
Cited by (1)
Cited by one other publication
Usoniene, Aurelija, Linas Butenas, Birute Ryvityte, Jolanta Sinkuniene, Erika Jasionyte & Algimantas Juozapavicius
2011.
Corpus Academicum Lithuanicum: Design Criteria, Methodology, Application. In
Human Language Technology. Challenges for Computer Science and Linguistics [
Lecture Notes in Computer Science, 6562],
► pp. 412 ff.
This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.