Article published in:
International Journal of Corpus Linguistics
Vol. 24:2 (2019) ► pp. 202228


Abou-Saad, A.
(1987) A Dictionary of Arabic Idiomatic Expressions. Beirut: Dar ElIlm Lilmalayin.Google Scholar
Alfaifi, A., Atwell, E. & Hedaya, I.
(2014) Arabic Learner Corpus (ALC) v2: A new written and spoken corpus of Arabic learners. In S. Ishikawa (Ed.), Proceedings of Learner Corpus Studies in Asia and the World, (pp. 77–89). Kobe: Kobe University. Retrieved from http://​eprints​.whiterose​.ac​.uk​/79561/ (last accessed April 2019).
[ p. 222 ]
Alghamdi, A., Atwell, E., & Brierley, C.
(2016) An empirical study of Arabic formulaic sequence extraction methods. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk & S. Piperidis (Eds.), Proceedings of LREC’2016 Language Resources and Evaluation Conference (pp. 502–506). Portoroz: LREC. Retrieved from http://​www​.lrec​-conf​.org​/proceedings​/lrec2016​/pdf​/126​_Paper​.pdf (last accessed April 2019).
Alghamdi, A., & Atwell, E.
(2017) نحو معجم حاسوبي للمتالزمات اللفظية في اللغة العربية المعاصرة. [Towards a Computational Lexicon for Arabic Formulaic Sequences]. In Proceedings of TICAM The International Conference on Information and Communication Technologies. Retrieved from http://​event​.ircam​.ma​/data​/papers2016​/18​.pdf (last accessed April 2019).
Alrabiah, M., Al-Salman, A., Atwell, E., & Alhelewh, N.
(2014) KSUCCA: A key to exploring Arabic historical linguistics. International Journal of Computational Linguistics, 5 (2), 27–36.Google Scholar
Alrehaili, S., & Atwell, E.
(2017) Extraction of multi-word terms and complex terms from the Classical Arabic text of the Quran. International Journal on Islamic Applications in Computer Science and Technology, 5 (3), 15–27.Google Scholar
Alshutayri, A., Atwell, E., Alosaimy, A., Dickins, J., Ingleby, M., & Watson, J.
(2016) Arabic language WEKA-based dialect classifier for Arabic automatic speech recognition transcripts. In P. Nakov, M. Zampieri, L. Tan, N. Ljubešić, J. Tiedemann & S. Malmasi (Eds.) Proceedings of VarDial’2016 Third Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 204–211. Osaka: COLING. Retrieved from https://​aclanthology​.info​/papers​/W16​-4826​/w16​-4826 (last accessed April 2019).
Alshutayri, A., & Atwell, E.
(2017) Exploring Twitter as a source of an Arabic dialect corpus. International Journal of Computational Linguistics, 8 (2), 37–44.Google Scholar
(2019) A social media corpus of Arabic dialect text. In C. Wigham & E. Stemle (Eds.), Computer-Mediated Communication and Social Media Corpora. Clermont-Ferrand: Presses Universitaires Blaise Pascal.Google Scholar
Al-Sulaiti, L., Abbas, N., Brierley, C., Atwell, E., & Alghamdi, A.
(2016) Compilation of an Arabic children’s corpus. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk & S. Piperidis (Eds.), Proceedings of LREC’2016 Language Resources and Evaluation Conference (pp. 1808–1812). Portoroz: LREC. Retrieved from http://​www​.lrec​-conf​.org​/proceedings​/lrec2016​/summaries​/142​.html (last accessed April 2019).
Attia, M. A.
(2006) Accommodating multiword expressions in an Arabic LFG grammar. In T. Salakoski, F. Ginter, S. Pyysalo, T. Pahikkala (Eds.), Advances in Natural Language Processing (pp. 87–98). Berlin: Springer. CrossrefGoogle Scholar
Atwell, E.
(1982) LOB Corpus Tagging Project: Manual Postedit Handbook. Research report, University of Lancaster. Retrieved from https://​www​.researchgate​.net​/publication​/246707360​_LOB​_Corpus​_tagging​_project​_post​-edit​_handbook (last accessed April 2019).
Baldwin, T., Bannard, C., Tanaka, T., & Widdows, D.
(2003) An empirical model of multiword expression decomposability. In F. Bond, A. Korhonen, D. McCarthy & A. Villavicencio (Eds.), Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (pp. 89–96). Sapporo: Association for Computational Linguistics. Retrieved from https://​aclanthology​.info​/papers​/W03​-1812​/w03​-1812 (last accessed April 2019). Crossref
[ p. 223 ]
Baldwin, T., & Kim, S. N.
(2010) Multiword expressions. In N. Indurkhya & F. J. Damerau (Eds.), Handbook of Natural Language Processing (2nd ed., pp. 267–292). Boca Raton, FL: Chapman and Hall/CRC.Google Scholar
Biber, D., Conrad, S., & Cortes, V.
(2004)  If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25 (3), 371–405. CrossrefGoogle Scholar
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E.
(1999) Longman Grammar of Spoken and Written English. Harlow: Longman.Google Scholar
Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, M.
(2006) Formulaic sequences and perceived oral proficiency: Putting a lexical approach to the test. Language Teaching Research, 10 (3), 245–261. CrossrefGoogle Scholar
Capel, A.
(2010) A1–B2 vocabulary: Insights and issues arising from the English Profile Wordlists project. English Profile Journal, 1 , e3. CrossrefGoogle Scholar
Church, K. W., & Hanks, P.
(1990) Word association norms, mutual information, and lexicography. Computational Linguistics, 16 (1), 22–29.Google Scholar
Coulmas, F.
(1979) On the sociolinguistic relevance of routine formulae. Journal of Pragmatics, 3 (3–4), 239–266. CrossrefGoogle Scholar
Coxhead, A.
(2000) A new academic wordlist. TESOL Quarterly, 34 (2), 213–238. CrossrefGoogle Scholar
Davies, M., & Gardner, D.
(2010) A Frequency Dictionary of American English: Word Sketches, Collocates and Thematic Lists. Abingdon: Routledge.Google Scholar
Dawood, M.
(2003) A Dictionary of Arabic Contemporary Idioms. Cairo: Dar Ghareeb.Google Scholar
Dorgeloh, H., & Wanner, A.
(2009) Formulaic argumentation in scientific discourse. In R. Corrigan, E. A. Moravcsik, H. Ouali, & K. Wheatley (Eds.), Formulaic Language Volume 2. Acquisition, Loss, Psychological Reality, and Functional Explanations (pp. 523–544). Amsterdam/Philadelphia, PA: John Benjamins. CrossrefGoogle Scholar
Dukes, K., & Atwell, E.
(2012) LAMP: A multimodal web platform for collaborative linguistic analysis. In N. Calzolari, K. Choukri, T. Declerck, M. Dogan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC’2012 Language Resources and Evaluation Conference (pp. 3268–3275). Istanbul: LREC. Retrieved from http://​www​.lrec​-conf​.org​/proceedings​/lrec2012​/pdf​/646​_Paper​.pdf (last accessed April 2019).
Durrant, P.
(2009) Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28 (3), 157–169. CrossrefGoogle Scholar
Erman, B., & Warren, B.
(2000) The idiom principle and the open choice principle. Text, 20 (1), 29–62. CrossrefGoogle Scholar
Fayed, W. K.
(2007) A Dictionary of Arabic Contemporary Idioms. Cairo: Abu Elhoul.Google Scholar
Fellbaum, C.
(1998) WordNet. Cambridge: MIT Press. CrossrefGoogle Scholar
Firth, J. R.
(1957) Papers in Linguistics 1934–1951. London: Oxford University Press.Google Scholar
Gralinski, F., Savary, A., Czerepowicka, M., & Makowiecki, F.
(2010) Computational lexicography of multi-word units: How efficient can it be? In É. Laporte, P. Nakov, C. Ramisch, A. Villavicencio (Eds), Proceedings of MWE’2010 Workshop on Multiword Expressions: From Theory to Applications (pp. 19–27). Beijing: COLING. Retrieved from https://​www​.aclweb​.org​/anthology​/W10​-3702 (last accessed April 2019).
[ p. 224 ]
Habash, N., & Rambow, O.
(2005) Arabic tokenization, morphological analysis, and part-of-speech tagging in one fell swoop. In K. Knight, H. T. Ng & K. Oflazer (Eds.), Proceedings of the Conference of American Association for Computational Linguistics (pp. 578–580). Ann Arbor, MI: ACL. Retrieved from https://​aclanthology​.info​/papers​/P05​-1071​/p05​-1071 (last accessed April 2019).
Hassan, H., Daud, N., & Atwell, E.
(2013) Connectives in the World Wide Web Arabic corpus. World Applied Sciences Journal (Special Issue of Studies in Language Teaching and Learning), 21 , 67–72.Google Scholar
Hawwari, A., Attia, M., & Diab, M.
(2014) A framework for the classification and annotation of multiword expressions in dialectal Arabic. In N. Habash & S. Vogel (Eds.), Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP) (pp. 48–56). Retrieved from https://​aclanthology​.info​/papers​/W14​-3606​/w14​-3606 (last accessed April 2019). Crossref
Hawwari, A., Bar, K., & Diab, M.
(2012) Building an Arabic multiword expressions repository. Paper presented at the ACL 2012 joint workshop on statistical parsing and semantic processing of morphologically rich languages, Jeju.
Hunston, S.
(2002) Corpora in Applied Linguistics. Cambridge: Cambridge University Press. CrossrefGoogle Scholar
Hyland, K.
(2008)  As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27 (1), 4–21. CrossrefGoogle Scholar
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P. & Suchomel, V.
(2014) The Sketch Engine: Ten years on. Lexicography, 1 (1), 7–36. CrossrefGoogle Scholar
Kjellmer, G.
(1990) A mint of phrases. In K. Aijmer & B. Altenberg (Eds.), English Corpus Linguistics: Studies in Honour of Jan Svartvik (pp. 111–127). London: Longman.Google Scholar
Leech, G. N., Rayson, P., & Wilson, A.
(2001) Word Frequencies in Written and Spoken English: Based on the British National Corpus. Harlow: Longman.Google Scholar
Leech, G., Garside, R., & Atwell, E. S.
(1983) The automatic grammatical tagging of the LOB corpus. ICAME Journal, 7 , 13–33.Google Scholar
Li, W., Zhang, X., Niu, C., Jiang, Y., & Srihari, R.
(2003, July). An expert lexicon approach to identifying English phrasal verbs. In E. Hinrichs & D. Roth (Eds.), Proceedings of the 41st Annual Meeting on Association for Computational Linguistics–Volume 1 (pp. 513–520). Sapporo: Association for Computational Linguistics. Retrieved from https://​aclanthology​.info​/papers​/P03​-1065​/p03​-1065 (last accessed April 2019).
Martinez, R.
(2011) The Development of a Corpus-informed List of Formulaic Sequences for Language Pedagogy (Unpublished doctoral dissertation). University of Nottingham, Nottingham.Google Scholar
Martinez, R., & Murphy, V. A.
(2011) Effect of frequency and idiomaticity on second language reading comprehension. TESOL Quarterly, 45 (2), 267–290. CrossrefGoogle Scholar
Martinez, R., & Schmitt, N.
(2012) A phrasal expressions list. Applied Linguistics, 33 (3), 299–320. CrossrefGoogle Scholar
Meghawry, S., Elkorany, A., Salah, A., & Elghazaly, T.
(2015) Semantic extraction of Arabic multiword expressions. Computer Science & Information Technology, 5 (2), 21–31.Google Scholar
Mel’ćuk, I.
(1998) Collocations and lexical functions. In A. Cowie (Ed.), Phraseology: Theory, Analysis, and Applications (pp. 23–53). Oxford: Clarendon Press.Google Scholar
[ p. 225 ]
Milton, J.
(2009) Measuring Second Language Vocabulary Acquisition. Bristol: Multilingual Matters. CrossrefGoogle Scholar
Nation, I. S. P.
(2001) Learning Vocabulary in Another Language. Cambridge: Cambridge University Press. CrossrefGoogle Scholar
Nation, P., & Waring, R.
(1997) Vocabulary size, text coverage and word lists. Vocabulary: Description, Acquisition and Pedagogy, 14 , 6–19.Google Scholar
Nerima, L., Seretan, V., & Wehrli, E.
(2003) Creating a multilingual collocation dictionary from large text corpora. In A. Copestake & J. Hajic (Eds.), Proceedings of the Tenth Conference on European chapter of the Association for Computational Linguistics – Volume 2. (pp. 131–134). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from https://​aclweb​.org​/anthology​/E03​-1022 (last accessed April 2019).
Ohlrogge, A.
(2009) Formulaic expressions in intermediate EFL writing assessment. In R. Corrigan, E. A. Moravcsik, H. Ouali, & K. Wheatley (Eds.), Formulaic Language Volume 2. Acquisition, Loss, Psychological Reality, and Functional Explanations (pp. 387–404). Amsterdam/Philadelphia, PA: John Benjamins. CrossrefGoogle Scholar
O’Keeffe, A., McCarthy, M., & Carter, R.
(2007) From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. CrossrefGoogle Scholar
Omar, A.
(2007) Arabic Multi-word Expressions and Language Resources. Tunis: National Publishing Complex.Google Scholar
Pasha, A., Al-Badrashiny, M., Diab, M. T., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., & Roth, R.
(2014) MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk & S. Piperidis (Eds.), Proceedings of LREC’2014 Ninth International Conference on Language Resources and Evaluation (pp. 1094–1101). Reykjavic: LREC. Retrieved from http://​www​.lrec​-conf​.org​/proceedings​/lrec2014​/pdf​/593​_Paper​.pdf (last accessed April 2019).
Pawley, A., & Syder, F. H.
(1983) Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. Richards & R. Schmidt (Eds.), Language and Communication (pp. 191–227). London: Longman.Google Scholar
Peters, A. M.
(1983) The Units of Language Acquisition. Cambridge: Cambridge University Press.Google Scholar
Ramisch, C.
(2015) State of the art in MWE processing. In C. Ramisch (Ed.), Multiword Expressions Acquisition (pp. 53–102). Berlin: Springer. CrossrefGoogle Scholar
Ramisch, C., De Araujo, V., & Villavicencio, A.
(2012) A broad evaluation of techniques for automatic acquisition of multiword expressions. In J. Cheung, J. Hatori, C. Henriquez & A. Irvine (Eds.), Proceedings of ACL 2012 Student Research Workshop (pp. 1–6). Jeju: Association for Computational Linguistics.Google Scholar
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D.
(2002) Multiword expressions: A pain in the neck for NLP. In A. Gelbukh (Ed.), Proceedings of CICLing’2002 Computational Linguistics and Intelligent Text Processing (pp. 1–15). Berlin: Springer. CrossrefGoogle Scholar
[ p. 226 ]
Sawalha, M., & Atwell, E.
(2013) Accelerating the processing of large corpora: Using grid computing for lemmatizing the 176 million words Arabic internet corpus. In E. Atwell (Ed.), Proceedings of WACL-2 – 2nd Workshop of Arabic Corpus Linguistics. Lancaster: Lancaster University. Retrieved from http://​eprints​.whiterose​.ac​.uk​/81622/ (last accessed April 2019).
Schmitt, N.
(2010) Researching Vocabulary: A Vocabulary Research Manual. Basingstoke: Palgrave Macmillan. CrossrefGoogle Scholar
Schmitt, N., & Martinez, R.
(2012) A Phrasal Expressions List. Applied Linguistics, 33 (3), 299–320. CrossrefGoogle Scholar
Schneider, N.
(2014) Lexical Semantic Analysis in Natural Language Text (Unpublished doctoral dissertation). University of Melbourne, Melbourne.Google Scholar
Scott, M.
(2016) WordSmith Tools (Version 6) [Computer software]. Stroud: Lexical Analysis Software.Google Scholar
Seeny, M., Mokhtar, A., & Sayyed, A.
(1996) A Contextual Dictionary of Idioms [almu’jm alsyaqi lelta’birat alastlahiah]. Beirut: Librairie du Liban Publishers.Google Scholar
Sharoff, S.
(2006) Creating general-purpose corpora using automated search engine queries. In M. Baroni & S. Bernardini (Eds.), WaCky Working Papers on the Web as Corpus (pp. 63–98). Bologna: GEDIT.Google Scholar
Siyanova-Chanturia, A., Conklin, K., & Schmitt, N.
(2011) Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers. Second Language Research, 27 (2), 251–272. CrossrefGoogle Scholar
Smadja, F., McKeown, K. R., & Hatzivassiloglou, V.
(1996) Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22 (1), 1–38.Google Scholar
Stubbs, M.
(1995) Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language, 2 (1), 23–55. CrossrefGoogle Scholar
Taylor, J.
(2006) Polysemy and the lexicon. In G. Kristiansen, M. Achard, R. Dirven & F. Ruiz de Mendoza Ibanez (Eds.), Cognitive Linguistics: Current Applications and Future Perspectives (pp. 51–80). Berlin: Mouton de Gruyter.Google Scholar
Underwood, G., Schmitt, N., & Galpin, A.
(2004) The eyes have it: An eye-movement study into the processing of formulaic sequences. In N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use, (pp. 153–172). Amsterdam/Philadelphia, PA: John Benjamins. CrossrefGoogle Scholar
West, M.
(1953) A General Service List of English Words. London: Longman.Google Scholar
Wood, D.
(2010) Formulaic Language and Second Language Speech Fluency: Background, Evidence, and Classroom Applications. London/New York, NY: Continuum.Google Scholar
(2015) Fundamentals of Formulaic Language: An Introduction. London: Bloomsbury Academic.Google Scholar
Wray, A.
(2002) Formulaic language in computer-supported communication: Theory meets reality. Language Awareness, 11 (2), 114–131. CrossrefGoogle Scholar
(2009) Identifying formulaic language: Persistent challenges and new opportunities. In R. Corrigan, E. A. Moravcsik, H. Ouali, & K. Wheatley (Eds.), Formulaic Language Volume 1. Distribution and Historical Change (pp. 27–51). Amsterdam/Philadelphia, PA: John Benjamins. CrossrefGoogle Scholar
(2013) Formulaic language. Language Teaching, 46 (3), 316–334. CrossrefGoogle Scholar
[ p. 227 ]
Wray, A., & Namba, K.
(2003) Use of formulaic language by a Japanese-English bilingual child: A practical approach to data analysis. Japan Journal of Multilingualism and Multiculturalism, 9 , 24–51.Google Scholar
Wulff, S., Swales, J. M., & Keller, K.
(2009) “We have about seven minutes for questions”: The discussion sessions from a specialized conference. English for Specific Purposes, 28 (2), 79–92. CrossrefGoogle Scholar
Yang, D., Lee, I., & Cantos, P.
(2002) On the corpus size needed for compiling a comprehensive computational lexicon by automatic lexical acquisition. Computers and the Humanities, 36 (2), 171–190. CrossrefGoogle Scholar