Surveying native speakers to find the proportions of registers used in Levantine Arabic

Flinn, Andrea

doi:10.1075/rs.25002.fli

Article In: Register Studies: Online-First Articles

Surveying native speakers to find the proportions of registers used in Levantine Arabic

Andrea Flinn | Iowa State University

This content is being prepared for publication; it may be subject to changes.

Abstract

Corpora consisting of Levantine Arabic, the dialects spoken in Jordan, Lebanon, Palestine, and Syria, include a narrow range of registers and are rarely based on a careful domain description, limiting their ability to represent the target domain. The purpose of this study is to describe the proportions of registers used within Levantine Arabic by conducting a Parameters of Language Use Survey (), so that subsequent corpora can better represent the Levantine dialects. The registers used (e.g., conversations, song lyrics, audio/video sharing) and their frequency were identified. As expected for a traditionally oral variety of Arabic, much language use consisted of conversation (61.1%). Another 16.4% consisted of digital language use, most of which was written, reflecting a noteworthy change precipitated by the advent of Web 2.0. Results can be used to compare varieties of Arabic including MSA, and inform research and corpus design.

Keywords: Register, representative, survey, Levantine Arabic

Article outline

1.Background
2.The Process of Describing the Target Domain of Levantine Arabic
- 2.1Determining a Language’s Most frequent Registers: Diary- and Survey-based Studies
- 2.2The Target Domain of Levantine Arabic
- 2.3The Importance of Representativeness and the Proportions of Registers in a Domain
- 2.4Research Offering Insight into the most Frequent Registers in Levantine Arabic
- 2.5Rationale
3.Methodology
- 3.1Identifying the Domain Boundaries of Levantine Arabic
- 3.2Research Site, Sampling Procedure, and Participants for the Survey
- 3.3Data Collection Procedure
4.Results
- 4.1Results from the Main Survey
- 4.2Results from the Final Survey
- 4.3Results from the Observations
- 4.4Results from the Application Tracker
5.Discussion
- 5.1Insights into Levantine Arabic
- 5.2How Results can Guide Corpus Creation, Selection, and Evaluation
- 5.3How Results could be Used to Compare Varieties of Arabic
6.Conclusion
Notes
Author queries
References

References (31)

References

Abu Kwaik, K., Saad, M. K., Chatzikyriakidis, S., & Dobnik, S. (2018a). Shami: A corpus of Levantine Arabic dialects. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Eleventh International Conference on Language Resources and Evaluation (LREC’18) (pp. 3645–3652). Miyazaki, Japan: European Language Resources Association. [URL].

Abu Kwaik, K. A., Saad, M., Chatzikyriakidis, S., & Dobnik, S. (2018b). A lexical distance study of Arabic dialects. Procedia Computer Science, 1421, 2–13.

Althobaiti, M. J. (2020). Automatic Arabic dialect identification systems for written texts: a survey. arXiv preprint.

Al Weir, E. (2006). Jordanian Arabic (Amman). In K. Versteegh (Ed.), Encyclopedia of Arabic Language and Linguistics (Vol. 21, pp. 526–538). Leiden: Brill.

Appen Pty Ltd. (2007). Levantine Arabic conversational telephone speech. Philadelphia: Linguistic Data Consortium.

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.

Biber, D., & Reppen, R. (2002). What does frequency have to do with grammar teaching? Studies in Second Language Acquisition, 24(2), 199–208.

Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., Abdulrahim, D., Obeid, O., Khalifa, S., Eryani, F., Erdmann, A., & Oflazer, K. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Eleventh International Conference on Language Resources and Evaluation (pp. 3387–3396), Miyazaki, Japan: European Language Resources Association (ELRA). [URL].

Burnard, L. (1995). User’s reference guide to the British National Corpus (Version 1.0). Oxford University Computing Services. [URL]

Clear, J. (1992). Corpus sampling. In G. Leitner (Ed.) New directions in English language corpora (pp. 21–31). New York: Mouton de Gruyter.

Cotterell, R., & Callison-Burch, C. (2014). A multi-dialect, multi-genre corpus of informal written Arabic. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Ninth International Conference in Language Resources and Evaluation Conference (pp. 241–245). Reykjavik, Iceland: European Language Resources Association. [URL].

Crystal, D. (2006). Language and the Internet. Cambridge: Cambridge University Press.

Davies, M. (2009). The 385+ million word Corpus of Contemporary American English 1990–2008+): Design, architecture, and linguistic insights. International Journal of Corpus Linguistics, 14(2), 159–90.

Dewey, D. P., Bown, J., & Eggett, D. (2012). Japanese language proficiency, social networking, and language use during study abroad: Learners’ perspectives. Canadian Modern Language Review, 68(2), 111–137.

Duck, S., Rutt, D. J., Hoy, M., & Strejc, H. H. (1991). Some evident truths about conversations in everyday relationships: All communications are not created equal. Human Communication Research, 18(2), 228–267.

Egbert, J., Biber, D., & Gray, B. (2022). Designing and evaluating language corpora: a practical framework for corpus representativeness. Cambridge: Cambridge University Press.

El-Haj, M. (2020). Habibi — a multi dialect multinational Arabic song lyrics corpus. In N. Calzolari, F. Bechet, P. Blache, K. Choukri, C. Cieri., T. Declerck, S. Goggi, H. Ishahara, B. Macgaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperdis. Proceedings of the Twelfth International Conference in Language Resources and Evaluation Conference (pp. 1318–1326), Marseille, France: European Language Resources Association (ELRA). [URL]

Freed, B. F., Dewey, D. P., Segalowitz, N., & Halter, R. (2004). The language contact profile. Studies in Second Language Acquisition, 26(2), 349–356.

Hammond, A. (2007). Popular culture in the Arab world: Arts, politics, and the media. Cairo: American University in Cairo Press.

Hashimoto, B. (2020). Describing the Language Experience of University Students [Doctoral Dissertation, Northern Arizona University].

(2024). What are university students doing with language?: A proportional description of student processing mode and register use in an American university. Linguistics and Education, 831. 101336.

Khalil, S. (2022). Arabic writing in the digital age: Towards a theoretical framework. London: Routledge.

Koiso, H., Tsuchiya, T., Watanabe, R., Yokomori, D., Aizawa, M., & Den, Y. (2016). Survey of conversational behavior: Towards the design of a balanced corpus of everyday Japanese conversation. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (pp. 4434–4439), Portorož, Slovenia: English Language Resources Association (ELRA). [URL].

Maamouri, M., Buckwalter, T., Graff, D., & Jin, H. (2006). Levantine Arabic QT training data set 5 [Data set]. Linguistic Data Consortium.

McMillan, J. H., & Schumacher, S. (1993). Research in education: A conceptual introduction. New York: Harper-Collins.

Reményi, A. Á. (2001). Use logbooks and find the original meaning of representativeness. In A. A. Ashour & A. S. F. Obada (Eds.), Proceedings of the Conference on Mathematics and the 21st Century (pp. 485–491). Singapore: World Scientific.

Seliger, H. (1977). Does practice make perfect? A study of interaction patterns and L2 competence. Language Learning, 271, 263–278.

United Nations High Commissioner for Refugees. (2022). UNHCR global trends 2019: Forced displacement in 2019. The UN Refugee Agency. [URL]

Wheeler, L., & Reis, H. T. (1991). Self-recording of everyday life events: Origins, types, and uses. Journal of Personality, 59(3), 339–354.

World Bank Group. (2023). Middle East and North Africa. [URL]

Zaidan, O. F. & Callison-Burch, C. (2011). The Arabic online commentary dataset: an annotated dataset of informal Arabic with high dialectal content. In D. Lin, Y. Matsumoto, & R. Mihalcea (Eds.). Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 37–41), Portland, Oregon: Association for Computational Linguistics. [URL]