Readability for foreign language learning: The importance of cognates

Beinborn, Lisa; Zesch, Torsten; Gurevych, Iryna

doi:10.1075/itl.165.2.02bei

Article published In:

Recent Advances in Automatic Readability Assessment and Text Simplification
Edited by Thomas François and Delphine Bernhard
[ITL - International Journal of Applied Linguistics 165:2] 2014
► pp. 136–162

Readability for foreign language learning

The importance of cognates

Lisa Beinborn

Torsten Zesch

Iryna Gurevych

In this paper, we analyse the differences between L1 acquisition and L2 learning and identify four main aspects: input quality and quantity, mapping processes, cross-lingual influence, and reading experience. As a consequence of these differences, we conclude that L1 readability measures cannot be directly mapped to L2 readability. We propose to calculate L2 readability for various dimensions and for smaller units. It is particularly important to account for the cross-lingual influence from the learner’s L1 and other previously acquired languages and for the learner’s higher experience in reading.

In our analysis, we focus on lexical readability as it has been found to be the most influential dimension for L2 reading comprehension. We discuss the features frequency, lexical variation, concreteness, polysemy, and context specificity and analyse their impact on L2 readability. As a new feature specific to L2 readability, we propose the cognateness of words with words in languages the learner already knows. A pilot study confirms our assumption that learners can deduce the meaning of new words by their cognateness to other languages.

Keywords: cognates, readability measures, second language learning, language transfer

Published online: 23 January 2015

https://doi.org/10.1075/itl.165.2.02bei

References (84)

Adamson, G.W., & Boreham, J

(1974) The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval, 10(7), 253–260.

Al-Khalifa, H.S., & Al-Ajlan, A

(2010) Automatic readability measurements of the arabic text: An exploratory study. The Arabian Journal for Science and Engineering, 35(2C).

Aluisio, S., Specia, L., Gasperin, C., & Scarton, C

(2010) Readability assessment for text simplification. Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 1–9). Association for Computational Linguistics.

Anagnostou, N.K., & Weir, G.R.S

(2006) From corpus-based collocation frequencies to readability measure. ICT in the Analysis, Teaching and Learning of Languages (pp. 33–46).

Beinborn, L., Zesch, T., & Gurevych, I

(2012) Towards fine-grained readability measures for self-directed language learning. Proceedings of the 1st Workshop on NLP for Computer-Assisted Language Learning (Vol. 801, pp. 11–19). Linköping University Electronic Press, Linköping Universitet.

(2013) Cognate production using character-based machine translation. Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 883–891). Asian Federation of Natural Language Processing.

Benjamin, R.G

(2011) Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review, 24(1), 63–88.

Brew, C., & McKelvie, D

(1996) Word-pair extraction for lexicography. Proceedings of the Second International Conference on new Methods in Language Processing (pp. 45–55).

Brown, J.D

(1998) An EFL readability index. JALT Journal, 20(2), 7–36.

Carrell, P.L

(1987) Readability in ESL. Reading in a Foreign Language, 4(1), 21–40.

Cenoz, J

(2003) The additive effect of bilingualism on third language acquisition: A review. International Journal of Bilingualism, 7(1), 71–87.

Chomsky, N

(1965) Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Coleman, M., & Liau, T

(1975) A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283–284.

Collins-Thompson, K., & Callan, J

(2005) Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology, 56(13), 1448–1462.

Cook, V.J., Long, J., & McDonough, S

(1979) First and second language learning. In G.E. Perren (Ed.), The mother tongue and other languages in education (pp. 7–22). London: CILTR

Crystal, D

(2011) Dictionary of linguistics and phonetics (Vol. 301). John Wiley & Sons.

Curran, J

(2010) Agatha Christie’s secret notebooks: Fifty years of mysteries in the making (pp. 496). US:HarperCollins.

Danielsson, P., & Mühlenbock, K

(2000) Small but efficient: The misconception of high- frequency words in Scandinavian translation. Envisioning Machine Translation in the Information Future Lecture Notes in Computer Science, Volume 1934 (pp. 158–168).

De Groot, A.M.B., & Keijzer, R

(2000) What is hard to learn is easy to forget: The roles of word concreteness, cognate status, and word frequency in foreign-language vocabulary learning and forgetting. Language Learning, 50(1), 1–56.

DeKeyser, R.M

(2009) Cognitive-psychological processes in second language learning. In M.H. Long & C.J. Doughty (Eds.), The handbook of language teaching (pp. 119–138). Oxford, UK: Wiley-Blackwell.

Dell’Orletta, F., Montemagni, S., & Venturi, G

(2011) READ-IT: Assessing readability of Italian texts with a view to text simplification. Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies (pp. 73–83).

DuBay, W.H

(2004) The principles of readability. Impact Information, 1–76.

Ellis, N

(1994) Consciousness in second language learning: Psychological perspectives on the role of conscious processes in vocabulary acquisition. AILA Review, 111.

Fellbaum, C

(1998) WordNet: An electronic database. Cambridge, MA: MIT Press.

Feng, L., Elhadad, N., & Huenerfauth, M

(2009) Cognitively motivated features for readability assessment. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (pp. 229–237). Morristown, NJ, USA, Association for Computational Linguistics.

Ferreira de Souza, V

(2003) The role of cognates in reading comprehension. repositorio.ufsc.br. Florianópolis, Brazil: Universidade Federal de Santa Catarina.

Firth, J

(1957) A synopsis of linguistic theory, 1930–1955.

François, T., & Fairon, C

(2012) An “AI readability” formula for French as a foreign language. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 466–477).

Gomes, L., & Pereira Lopes, J.G

(2011) Measuring spelling similarity for cognate identification. Progress in Artificial Intelligence, 624–633. Lecture Notes in Computer Science, Volume 7026.

Graesser, A.C., & McNamara, D

(2004) Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, 36(2).

Greenfield, J

(2004) Readability formulas for EFL. JALT Journal, 26(1), 5–24.

Gunning, R

(1969) The fog index after twenty years. Journal of Business Communication, 6(2), 3–13.

Haastrup, K

(1991) Lexical inferencing procedures, or, talking about words: Receptive procedures in foreign language learning with special reference to English. Tübingen, Germany: Gunter Narr Verlag.

Hall, P.A.V., & Dowling, G.R

(1980) Approximate string matching. ACM Computing Surveys (CSUR), 12(4), 381–402.

Heilman, M.J., Collins-Thompson, K., Callan, J., & Eskenazi, M

(2007) Combining lexical and grammatical features to improve readability measures for first and second language texts. Proceedings of NAACL-HLT (pp. 460–467).

Hoshino, N., & Kroll, J.F

(2008) Cognate effects in picture naming: Does cross-language activation survive a change of script? Cognition, 106(1), 501–11.

Inkpen, D., Frunza, O., & Kondrak, G

(2005) Automatic identification of cognates and false friends in French and English. Proceedings of the International Conference Recent Advances in Natural Language Processing (pp. 251–257).

Jiang, N

(2000) Lexical representation and development in a second language. Applied Linguistics, 21(1), 47–77.

Kaushanskaya, M., & Rechtzigel, K

(2012) Concreteness effects in bilingual and monolingual word learning. Psychonomic Bulletin & Review, 19(5), 935–941.

Kincaid, J.P., Fishburne Jr, R.P., Rogers, R.L., & Chissom, B.S

(1975) Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel..

Kondrak, G

(2000) A new algorithm for the alignment of phonetic sequences. Proceedings of the 1st NAACL (pp. 288–295).

Kondrak, G., & Dorr, B

(2004) Identification of confusable drug names: A new approach and evaluation methodology. Proceedings of the 20th International Conference on Computational Linguistics (pp. 952–958).

Kroll, J.F., & Stewart, E

(1994) Category interference in translation andpicture naming: Evidence for asymmetric connections between bilingual memory representations. Journal of Memory and Language, 331, 149–174.

Larsson, P

(2006) Classification into readability levels implementation and evaluation. Sweden: Uppsala University.

Laufer, B., & Ravenhorst-Kalovski, G.C

(2010) Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension, 22(1), 15–30.

Lemhöfer, K., & Dijkstra, T

(2008) Native language influences on word recognition in a second language: A megastudy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(1), 12.

Lervåg, A., & Aukrust, V.G

(2010) Vocabulary knowledge is a critical determinant of the difference in reading comprehension growth between first and second language learners. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 51(5), 612–20.

Lin, D

(1998) Automatic retrieval and clustering of similar words. Proceedings of the 17th International Conference on on Computational Linguistics (Vol. 21, pp. 768–774). Association for Computational Linguistics.

List, J.-M

(2012) LexStat: Automatic detection of cognates in multilingual wordlists. Proceedings of the EACL 2012 Joint Workshop of LINGVIS\UNCLH (pp. 117–125).

Lotto, L., & De Groot, A.M.B

(1998) Effects of learning method and word type on acquiring vocabulary in an unfamiliar language. Language Learning, 48(1), 31–69.

McDonald, S., & Ramscar, M

(2001) Testing the distributional hypothesis: The influence of context on judgements of semantic similarity. Proceedings of the 23rd Annual Conference of the Cognitive Science Society (pp. 611–616).

McLaughlin, G.H

(1969) SMOG grading: A new readability formula. Journal of Reading, 12(8), 639–646.

Meara, P

(1988) Learning words in an L1 and an L2. Polyglot, 9(3), 1–11.

Melamed, I.D

(1999) Bitext maps and alignment via pattern recognition. Computational Linguistics, 25(1), 107–130.

Mitkov, R., Pekar, V., Blagoev, D., & Mulloni, A

(2008) Methods for extracting and classifying pairs of cognates and false friends. Machine Translation, 21(1), 29–53.

Montalvo, S., Pardo, E.G., Martinez, R., & Fresno, V

(2012) Automatic cognate identification based on a fuzzy combination of string similarity measures. Proceedings of the IEEE International Conference on Fuzzy Systems (pp. 1–8).

Mulloni, A., & Pekar, V

(2006) Automatic detection of orthographic cues for cognate recognition. Proceedings of the 5th International Conference on Language Resources and Evaluation (pp. 2387–2390).

Nakov, S

(2009) Automatic identification of false friends in parallel corpora: Statistical and semantic approach, Serdica Journal of Computing, 3(2), 133–158.

Nation, P

(2003) The role of the first language in foreign language learning. Asian EFL Journal, 5(2), 1–8.

Odlin, T

(1989) Language transfer: Cross-linguistic influence in language learning. Cambridge, UK: Cambridge University Press.

Paribakht, T., & Wesche, M

(1997) Vocabulary enhancement activities and reading for meaning in second language vocabulary acquisition. Second language vocabulary acquisition: A rationale for pedagogy, 174–200.

Pitler, E., & Nenkova, A

(2008) Revisiting readability: A unified framework for predicting text quality. Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 186–195). Association for Computational Linguistics.

Potter, M.C., So, K.-F., Eckardt, B. Von, & Feldman, L.B

(1984) Lexical and conceptual representation in beginning and proficient bilinguals. Journal of Verbal Learning and Verbal Behavior, 23(1), 23–38.

Ringbom, H., & Jarvis, S

(2009) The importance of cross-linguistic similarity in foreign language learning. The Handbook of Language Teaching (pp. 106–118).

Sato, S., Matsuyoshi, S., & Kondoh, Y

(2008) Automatic assessment of Japanese text readability based on a textbook corpus. Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC) . Marrakech, Morocco.

Schmidt, R

(1995) Consciousness and foreign language learning: A tutorial on the role of attention and awareness in learning. Attention and Awareness in Foreign Language Learning, 1–63.

Schwarm, S.E., & Ostendorf, M

(2005) Reading level assessment using support vector machines and statistical language models. Proceedings of the 43rd Annual Meeting of the ACL (pp. 523–530).

Sepúlveda Torres, L., & Aluisio, S.M

(2011) Using machine learning methods to avoid the pitfall of cognates and false friends in Spanish-Portuguese word pairs. Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology (pp. 67–76).

Sigurd, B., Eeg-Olofsson, M., & Van Weijer, J

(2004) Word length, sentence length and frequency - Zipf revisited. Studia Linguistica, 58(1), 37–52.

Simard, M., Foster, G.F., & Isabelle, P

(1992) Using cognates to align sentences in bilingual corpora (pp. 67–81).

Smith, E.A., & Senter, R.J

(1967) Automated readability index. Ohio: Cincinnati University.

Tharp, J.B

(1939) The measurement of vocabulary difficulty. The Modern Language Journal, 24(3), 169–187.

Tonelli, S., Manh, K.T., & Pianta, E

(2012) Making readability indices readable. Proceedings of NAACL-HLT: Workshop on Predicting and Improving Text Readability for Target Reader Populations (pp. 40–48).

Uitdenbogerd, S

(2005) Readability of French as a foreign language and its uses. Proceedings of the Australian Document Computing Symposium (pp. 19–25).

Vajjala, S., & Meurers, D

(2012) On improving the accuracy of readability classification using insights from second language acquisition. Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7) (pp. 163–173).

Verspoor, M., & Lowie, W

(2003) Making sense of polysemous words. Language Learning, 53(3), 547–586.

Volodina, E., & Pijetlovic, D

(2013) Towards a gold standard for Swedish CEFR-based ICALL. Proceedings of the 2nd Workshop on NLP for Computer-Assisted Language Learning (pp. 48–65). Linköping University Electronic Press, Linköping Universitet.

Vor der Brück, T., Hartrumpf, S., & Helbig, H

(2008) A readability checker with supervised learning using deep syntactic and semantic indicators. Proceedings of the 11th International Multiconference: Information Society (pp. 92–97).

Vygotsky, L

(1978) Mind in society: The development of higher psychological processes. Harvard University Press.

Webb, S.A., & Chang, A.C.-S

(2012) Second language vocabulary growth. RELC Journal, 43(1), 113–126.

Zesch, T., & Gurevych, I

(2010) Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words. Natural Language Engineering, 16(1), 25–59.

Zhang, D., & Koda, K

(2011) Contribution of morphological awareness and lexical inferencing ability to L2 vocabulary knowledge and reading comprehension among advanced EFL learners: testing direct and indirect effects. Reading and Writing, 25(5), 1195–1216.

Zobel, J., & Dart, P

(1996) Phonetic string matching: lessons from information retrival. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 166–172).

Zobl, H

(1980) The formal and developmental selectivity of L1 influence on L2 acquisition. Language Learning, 30(1), 43–57.

Cited by (4)

Cited by 4 other publications

Order by:

Zhang, Haomin, Yuting Han, Xi Cheng, Jie Sun & Shoran Ohara

2024. Unpacking cross-linguistic similarities and differences in third language Japanese vocabulary acquisition among Chinese college students. Journal of Multilingual and Multicultural Development 45:2 ► pp. 101 ff.

Zhang, Haomin, Jie Sun, Yuting Han & Song Yin

2024. Morphological and cognate awareness in L2 Japanese word learning: evidence from Chinese-speaking learners. International Journal of Bilingual Education and Bilingualism 27:1 ► pp. 83 ff.

ALTUNTAŞ GÜRSOY, İlke & Mehmet ÇEVİK

2023. Türkçenin Yabancı Dil Olarak Öğretimi İçin Hazırlanmış Yardımcı Okuma Kitaplarının Okunabilirliklerinin İncelenmesi. Korkut Ata Türkiyat Araştırmaları Dergisi :13 ► pp. 1227 ff.

Zhang, Haomin, Yuting Han, Xing Zhang & Liuran Cui

2022. Frequency, Dispersion and Abstractness in the Lexical Sophistication Analysis of A Learner-Based Word Bank: Dimensionality Reduction and Identification. Journal of Quantitative Linguistics 29:2 ► pp. 195 ff.

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.