Article published in:
Corpus approaches to telecinematic languageEdited by Monika Bednarek, Valentin Werner and Marcia Veirano Pinto
[International Journal of Corpus Linguistics 26:1] 2021
► pp. 10–37
The TV and Movies corpora
Design, construction, and use
Mark Davies | Brigham Young University
This paper discusses the creation and use of the TV Corpus (subtitles from 75,000 episodes, 325 million words, 6
English-speaking countries, 1950s-2010s) and the Movies Corpus (subtitles from 25,000 movies, 200 million words, 6 English-speaking
countries, 1930s–2010s), which are available at English-Corpora.org. The corpora compare
well to the BNC-Conversation data in terms of informality, lexis, phraseology, and syntax. But at 525 million words in total size, they are
more than 30 times as large as BNC-Conversation (both BNC1994 and BNC2014 combined), which means that they can be used to look at a wide
range of linguistic phenomena. The TV and Movies corpora also allow useful comparisons of very informal language across time (containing
texts from the 1930s and later for the movies, and from the 1950s onwards for TV shows) and between dialects of English (such as British and
American English).
Keywords: TV, movies, diachronic, dialects, speech
Published online: 17 November 2020
https://doi.org/10.1075/ijcl.00035.dav
https://doi.org/10.1075/ijcl.00035.dav
References
References
Baker, P.
Bednarek, M.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E.
BNC Consortium
Brysbaert, M., & New, B.
Brysbaert, M., Mandera, P., & Keuleers, E.
Canavan, A., & Zipperlen, G.
(1996) CALLFRIEND American English-Non-Southern Dialect (LDC96S46). Linguistic Data Consortium https://catalog.ldc.upenn.edu/LDC96S46.
Canavan, A., Graff, D., & Zipperlen, G.
(1997) CALLHOME American English Speech (LDC97S42). Linguistic Data Consortium https://catalog.ldc.upenn.edu/LDC97S42.
Davies, M.
Forchini, P.
Greenbaum, S.
Godfrey, J. J., & Holliman, E.
(1993) Switchboard-1 Release 2 (LDC97S62). Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC97S62
Van Heuven, W., Mandera, P., Keuleers, E., & Brysbaert, M.
Lison, P., & Tiedemann, J.
(2016) OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA). https://www.aclweb.org/anthology/L16-1147/
Love, R.
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T.
Lugea, J.
Piazza, R., Bednarek, M., & Rossi, F.
Quaglio, P.
Simpson, R., Briggs, L., Ovens, J., & Swales, J.
Tiedemann, J.
Veirano Pinto, M.
Cited by
Cited by 1 other publications
Cappelle, Bert
This list is based on CrossRef data as of 05 december 2020. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.