Quantifying lexical and pronunciation variation between three Arabic varieties*
This paper reports on computational measures of linguistic variation that quantify the lexical and pronunciation variation between three varieties of Arabic, Moroccan Arabic, Egyptian Arabic, and Gulf Arabic. We provide three measures of linguistic variation; all computed based on elicitation of the Swadesh list. The first measure is the lexical variation based on the percentage of noncognate words. The second is another lexical measure that takes into account a pronunciation aspect by considering the IPA transcription of the same word list. The third is a pronunciation measure that computes the variation of the IPA transcription of the cognate words in the Swadesh list. The results of the three measures show that geographically proximate languages are also linguistically closer to each other.
References (30)
References
Almeida, A., & Braun, A. (1986). "Richtig" und"Falsch" in Phonetischer Transkription. Vorschläge zum Vergleich von Transkriptionen mit Beispielen aus deutschen Dialekten. Zeitschrift für Dialektologie und Linguistik, 158-172.
Babitch, R.M., & Lebrun, E. (1989). Dialectometry as computerized agglomerative hierarchical classification analysis. Journal of English Linguistics, 22(1), 83-87.
Berghel, H., & Roach, D. (1996). An extension of Ukkonen's enhanced dynamic programming ASM algorithm. ACM Transactions on Information Systems (TOIS), 14(1), 94-106.
Biadsy, F., Hirschberg, J., & Habash, N. (2009, March). Spoken Arabic dialect identification using phonotactic modeling. In
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
(pp. 53-61). Association for Computational Linguistics.
Cadora, F.J. (1979). Interdialectal lexical compatibility in Arabic: An analytical study of the lexical relationships among the major Syro-Lebanese varieties (Vol. 11). Leiden: Brill Archive.
Cucchiarini, C. (1993). Phonetic transcription: A methodological and empirical study. PhD Dissertation. Nijmegen: Katholieke Universiteit Nijmegen.
Ebobisse, C. (1989). Dialectométrie lexicale des parlers sawabantu. Journal of West African Languages, 19(2), 57-66.
Elfardy, H., & Diab, M. (2013). Sentence Level Dialect Identification in Arabic. In
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics
, (pp. 456-461). Association for Computational Linguistics.
Elsie, R. (1986). Dialect Relationships in Goidelic: A Study in Celtic Dialectology. Hamburg: Helmut Buske.
Fitch, W.M., & Margoliash, E. (1967). Construction of phylogenetic trees. Science, 155(3760), 279-284.
Gooskens, C. (2007). The contribution of linguistic factors to the intelligibility of closely related languages. Journal of Multilingual and multicultural development, 28(6), 445-467.
Gray, R.D., & Atkinson, Q.D. (2003). Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature, 426(6965), 435-439.
Gray, R.D., & Jordan, F.M. (2000). Language trees support the express-train sequence of Austronesian expansion. Nature, 405(6790), 1052-1055.
Heeringa, W.J. (2004). Measuring Dialect Pronunciation Differences Using Levenshtein Distance. Phd Dissertation, University of Groningen.
Hoppenbrouwers, C.A.J., & Hoppenbrouwers, G.A. (2001). De indeling van de Nederlandse streektalen: dialecten van 156 steden en dorpen geklasseerd volgens de FFM. Uitgeverij Van Gorcum.
Kessler, B. (1995). Computational dialectology in Irish Gaelic. In
Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
(pp. 60-66). Dublin: Morgan Kaufmann Publishers Inc.
Kondrak, G., & Sherif, T. (2006). Evaluation of several phonetic similarity algorithms on the task of cognate identification. In
Proceedings of the Workshop on Linguistic Distances
(pp. 43-50). Sydney: Association for Computational Linguistics.
Kondrak, G. (2009). Identification of cognates and recurrent sound correspondences in word lists. Traitement automatique des langues, 50(2), 201-235.
Levenshtein, Vladimir I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory, 10(8), 707-710.
Navarro, G. (2001). A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1), 31-88.
Nerbonne, J., & Kretzschmar, W. (2003). Introducing computational techniques in dialectometry. Computers and the Humanities, 37(3), 245-255.
Séguy, J. (1973). La dialectométrie dans l’Atlas linguistique de la Gascogne. Revue de linguistique romane, 37, 1-24.
Serva, M., & Petroni, F. (2008). Indo-European languages tree by Levenshtein distance. EPL (Europhysics Letters), 81(6), 68005.
Ukkonen, E. (1983). On approximate string matching. In Foundations of Computation Theory (pp. 487-495). Berlin/Heidelberg: Springer.
Ukkonen, E. (1985). Algorithms for approximate string matching. Information and control, 64(1), 100-118.
Valls, E., Nerbonne, J., Prokic, J., Wieling, M., Clua, E., & Lloret, M.R. (2011). Applying the Levenshtein Distance to Catalan dialects: A brief comparison of two dialectometric approaches. Verba: Anuario Galego de Filoloxía, 39, 35-61.
Vieregge, W.H., Rietveld, A.C., & Jansen, C. (1984). A distinctive feature based system for the evaluation of segmental transcription in Dutch. In
Proceedings of the 10th International Congress of Phonetic Sciences
(pp. 654-659). Dordrecht: Foris Publications.
Wagner, H. (1958). Linguistic atlas and survey of Irish dialects. Dublin: Institute for Advanced Studies.
Wichmann, S., Holman, E.W., Bakker, D., & Brown, C.H. (2010). Evaluating linguistic distance measures. Physica A: Statistical Mechanics and its Applications, 389(17), 3632-3639.
Zaidan, O.F., & Callison-Burch, C. (2012). Arabic dialect identification. Computational Linguistics, 40(1), 171-202.
Cited by (1)
Cited by one other publication
Albirini, Abdulkafi, Eman Saadah & Mohammad T. Alhawary
This list is based on CrossRef data as of 25 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.