Mark Davies
List of John Benjamins publications for which Mark Davies plays a role.
The Coronavirus Corpus: Design, construction, and use Language and Covid-19, Mahlberg, Michaela and Gavin Brookes (eds.), pp. 583–598 | Article
2021 This paper discusses the creation and use of the Coronavirus Corpus, which is currently (March 2021) 900 million words in size, and which will probably be about one billion words in size by May–June 2021. The Coronavirus Corpus is a subset of the NOW Corpus (News on the Web), which is currently… read more
The TV and Movies corpora: Design, construction, and use Corpus approaches to telecinematic language, Bednarek, Monika, Valentin Werner and Marcia Veirano Pinto (eds.), pp. 10–37 | Article
2021 This paper discusses the creation and use of the TV Corpus (subtitles from 75,000 episodes, 325 million words, 6 English-speaking countries, 1950s-2010s) and the Movies Corpus (subtitles from 25,000 movies, 200 million words, 6 English-speaking countries, 1930s–2010s), which are available at… read more
Chapter 6. Semantic and lexical shifts with the “into-causative” construction in American English Explorations in English Historical Syntax, Cuyckens, Hubert, Hendrik De Smet, Liesbet Heyvaert and Charlotte Maekelberghe (eds.), pp. 159–178 | Chapter
2018 In this paper, we consider several lexical and semantic shifts with the “into-causative” construction (e.g. Sue talked them into leaving) in American English since the early 1800s. The study is based on more than 11,000 tokens (including 680 different matrix verbs) in several large corpora,… read more
A reply English World-Wide 36:1, pp. 45–47 | Commentary
2015 A reply to the commentaries by Christian Mair (DOI:10.1075/eww.36.1.02mai), Joybrato Mukherjee (DOI:10.1075/eww.36.1.02muk), Gerald Nelson (DOI:10.1075/eww.36.1.02nel), and Pam Peters (DOI:10.1075/eww.36.1.02pet). read more
Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE) English World-Wide 36:1, pp. 1–28 | Article
2015 In this paper, we provide an overview of the new GloWbE Corpus — the Corpus of Global Web-based English. GloWbE is based on 1.9 billion words in 1.8 million web pages from 20 different English-speaking countries. Approximately 60 percent of the corpus comes from informal blogs, and the rest from a… read more
Making Google Books n-grams useful for a wide range of research on language change International Journal of Corpus Linguistics 19:3, pp. 401–416 | Article
2014 The “standard” Google Books n-grams were released by Google in 2010, and they include more than 155 billion words of data for the American English data alone. Unfortunately, the standard interface is far too simplistic to allow many types of useful research on this massive dataset. In this paper,… read more
The 400 million word Corpus of Historical American English (1810–2009) English Historical Linguistics 2010: Selected Papers from the Sixteenth International Conference on English Historical Linguistics (ICEHL 16), Pécs, 23-27 August 2010, Hegedűs, Irén and Alexandra Fodor (eds.), pp. 231–262 | Article
2012 The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to… read more
Synchronic and diachronic uses of corpora Perspectives on Corpus Linguistics, Viana, Vander, Sonia Zyngier and Geoff Barnbrook (eds.), pp. 63–80 | Article
2011 In this interview, Mark Davies, Professor of (Corpus) Linguistics at Brigham Young University (United States), shows his interest in languages such as English, Spanish and Portuguese. This interest is revealed in his involvement with corpora compilation (Corpus of Historical American English,… read more
More than a peephole: Using large and diverse online corpora The Bootcamp Discourse and Beyond, Worlock Pope, Caty (ed.), pp. 412–418 | Article
2010 The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights International Journal of Corpus Linguistics 14:2, pp. 159–190 | Article
2009 The Corpus of Contemporary American English (COCA), which was released online in early 2008, is the first large and diverse corpus of American English. In this paper, we first discuss the design of the corpus — which contains more than 385 million words from 1990–2008 (20 million words each year),… read more
The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation International Journal of Corpus Linguistics 10:3, pp. 307–334 | Article
2005 Relational databases can be used to create large corpora that provide both very good search performance and a wide range of queries. This paper outlines how this approach has been used to create theCorpus del Español, which contains 100 million words of text in Spanish texts from the 1200s-1900s.… read more
Student use of large, annotated corpora to analyze syntactic variation Corpora and Language Learners, Aston, Guy, Silvia Bernardini and Dominic Stewart (eds.), pp. 259–269 | Article
2004 Syntactic Diffusion in Spanish and Portuguese Infinitival Complements New Approaches to Old Problems: Issues in Romance historical linguistics, Dworkin, Steven N. and Dieter Wanner (eds.), pp. 109–128 | Chapter
2000 A Computer Corpus-Based Study of Subject Raising in Modern Portuguese Lingvisticæ Investigationes 21:2, pp. 379–400 | Article
1997 This study is the first comprehensive, data-based examination of subject raising in Portuguese, and is based on 4500+ tokens in more than 26,500,000 words of text from both the written and spoken registers of Brazilian and European Portuguese. We have suggested that there are important differences… read more
The evolution of causative constructions in Spanish and Portuguese Contemporary Research in Romance Linguistics: Papers from the XXII Linguistic Symposium on Romance Languages, El Paso/Juárez, February 22–24, 1992, Amastae, Jon, Grant Goodall, M. Montalbetti and M. Phinney (eds.), pp. 105–122 | Article
1995