Article published in:Recent Advances in Automatic Readability Assessment and Text Simplification
Edited by Thomas François and Delphine Bernhard
[ITL - International Journal of Applied Linguistics 165:2] 2014
► pp. 223–258
Associative lexical cohesion as a factor in text complexity
In this paper we present an application of associative lexical cohesion to the analysis of text complexity as determined by expert-assigned US school grade levels. Lexical cohesion in a text is represented as a distribution of pairwise positive normalized mutual information values. Our quantitative measure of lexical cohesion is Lexical Tightness (LT), computed as average of such values per text. It represents the degree to which a text tends to use words that are highly inter-associated in the language. LT is inversely correlated with grade levels and adds significantly to the amount of explained variance when estimating grade level with a readability formula. In general, simpler texts are more lexically cohesive and complex texts are less cohesive. We further demonstrate that lexical tightness is a very robust measure. We compute lexical tightness for a whole text and also across segmental units of a text. While texts are more cohesive at the sentence level than at the paragraph or whole-text levels, the same systematic variation of lexical tightness with grade level is observed for all levels of segmentation. Measuring text cohesion at various levels uncovers a specific genre effect: informational texts are significantly more cohesive than literary texts, across all grade levels.
Keywords: text complexity, lexical cohesion, readability, word associations, lexical tightness
Published online: 23 January 2015
Baroni, M., & Lenci, A.
Barzilay, R., & Elhadad, M.
(1997) Using lexical chains for text summarization. Proceedings of ACL Intelligent Scalable Text Summarization Workshop (pp. 10–17). http://acl.ldc.upenn.edu/W/W97/W97-0703.pdf
Barzilay, R., & Lapata, M.
Beigman Klebanov, B., & Flor, M.
(2013a) Word association profiles and their use for automated scoring of essays. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics , (pp. 1148–1158). http://aclweb.org/anthology//P/P13/P13-1113.pdf
(2013b) Associative texture is lost in translation. Proceedings of the Workshop on Discourse in Machine Translation (DiscoMT at ACL2013) (pp. 27–32). http://aclweb.org/anthology//W/W13/W13-3304.pdf
Beigman Klebanov, B., & Shamir, E.
(2009) Normalized (Pointwise) mutual information in collocation extraction. In Chiarcos, Eckart de Castilho & Stede (Eds.), Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically (pp. 31–40). Proceedings of the Biennial GSCL Conference 2009. Tübingen: Gunter Narr Verlag.
Budanitsky, A., & Hirst, G.
Bullinaria, J., & Levy, J.
Chall, J.S., & Dale, E.
Church, K., & Hanks, P.
Coleman, M., & Liau, T.L.
Common Core State Standards Initiative (CCSSI)
(2010) Common core state standards for English language arts and literacy in history/social studies, science and technical subjects. Washington, DC: CCSSO and National Governors Association. http://www.corestandards.org/ELA-Literacy
Crossley, S.A., Greenfield, J., & McNamara, D.S.
(2004) The principles of readability. Costa Mesa, CA: Impact Information. http://www.impact-information.com/impactinfo/readability02.pdf
Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N.
(2010) A comparison of features for automatic readability assessment. Proceedings of COLING 2010 , (Poster Volume, pp. 276–284). http://aclweb.org/anthology//C/C10/C10-2032.pdf
Flor, M., Beigman Klebanov, B., & Sheehan, K.M.
(2013) Lexical tightness and text complexity. Proceedings of the 2nd workshop Natural Language Processing for Improving Textual Accessibility (NLP4ITA) (pp. 29–38). NAACL HLT 2013 Conference, Atlanta, USA. http://aclweb.org/anthology-new/W/W13/W13-1504.pdf
Foltz, P.W., Kintsch, W., & Landauer, T.K.
Fountas, I., & Pinnell, G.S.
Freebody, P., & Anderson, R.C.
Graesser, A.C., McNamara, D.S., & Kulikowich, J.M.
Graff, D., & Cieri, C.
Grosz, B., Joshi, A., & Weinstein, S.
Guinaudeau, C., Gravier, G., & Sébillot, P.
Gurevych, I., & Strube, M.
(2004) Semantic similarity applied to spoken dialogue summarization. Proceedings of COLING 2004 (pp. 764–770). http://aclweb.org/anthology//C/C04/C04-1110.pdf
Halliday, M.A.K., & Matthiessen, C.M.I.M.
(2013) http://havefunteaching.com, Last accessed May 9, 2013.
(2013) Text Project. http://textproject.org. Last accessed May 9, 2013.
Kincaid, J.P., Fishburne, R.P. Jr., Rogers, R.L., & Chissom, B.S.
Landauer, T.K., & Dumais, S.T.
Lee, M.D., Pincombe, B.M., & Welsh, M.B.
(2011) Composing and updating verb argument expectations: A distributional semantic model. Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics (CMCL) (pp. 58–66). http://aclweb.org/anthology//W/W11/W11-0607.pdf
Manning, C., & Schütze, H.
Marathe, M., & Hirst, G.
McNamara, D.S., Louwerse, M.M., McCarthy, P.M., & Graesser, A.C.
McNamara, D.S., Cai, Z., & Louwerse, M.M.
Mitchell, J., & Lapata, M.
(2008) Vector-based models of semantic composition. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (pp. 236–244). http://aclweb.org/anthology//P/P08/P08-1028.pdf
Mohammad, S., & Hirst, G.
(2006) Distributional measures of concept-distance: A task-oriented evaluation. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006) (pp. 35–43). http://aclweb.org/anthology//W/W06/W06-1605.pdf
Morris, J., & Hirst, G.
(2004) Non-Classical Lexical Semantic Relations. Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004 conference . http://aclweb.org/anthology//W/W04/W04-2607.pdf doi:
Nelson, J., Perfetti, C., Liben, D., & Liben, M.
(2012) Measures of text difficulty: Testing their predictive value for grade levels and student performance. Student Achievement Partners. http://www.ccsso.org/Documents/2012/Measures%20ofText%20Difficulty_final.2012.pdf
Petersen, S.E., & Ostendorf, M.
Pitler, E., & Nenkova, A.
(2008) Revisiting readability: A unified framework for predicting text quality. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 186–195). http://aclweb.org/anthology//D/D08/D08-1020.pdf
Schulte im Walde, S., & Melinger, A.
Senter, R.J., & Smith, E.A.
Shanahan, T., Fisher, D., & Frey, N.
Sheehan, K.M., Flor, M., & Napolitano, D.
(2013) A two-stage approach for generating unbiased estimates of text complexity. Proceedings of the 2nd Workshop Natural Language Processing for Improving Textual Accessibility (NLP4ITA) (pp. 49–58), NAACL HLT 2013 conference. http://aclweb.org/anthology//W/W13/W13-1506.pdf
Sheehan, K.M, Futagi, Y., Kostin, I., & Flor, M.
(2010) Generating automated text complexity classifications that are aligned with targeted text complexity standards. ETS Research Report RR-10-28, Princeton, NJ: ETS. http://www.ets.org/research/policy_research_reports/rr-10-28
Sheehan, K.M., Kostin, I., & Futagi, Y.
(2007) SourceFinder: A construct-driven approach for locating appropriately targeted reading comprehension source texts. Proceedings of the 2007 Workshop of the International Speech Communication Association . Farmington, PA: Special Interest Group on Speech and Language Technology in Education.
Silber, H.G., & McCoy, K.
Štajner, S., Evans, R., Orăsan, C., & Mitkov, R.
(2012) What can readability measures really tell us about text complexity? Proceedings of Workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA) at LREC 2012 conference (pp. 14–22). http://www.taln.upf.edu/nlp4ita/pdfs/stajner-nlp4ita2012.pdf
Stokes, N., Carthy, J., & Smeaton, A.F.
Tierney, R.J., & Mosenthal, J.H.
Turney, P.D., & Pantel, P.
(2001) Mining the web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of European Conference on Machine Learning (pp. 491–502). Freiburg, Germany.
Vajjala, S., & Meurers, D.
(2012) On improving the accuracy of readability classification using insights from second language acquisition. Proceedings of the 7th Workshop on the Innovative Use of NLP for Building Educational Applications (BEA-7) (pp. 163–173). http://aclweb.org/anthology//W/W12/W12-2019.pdf
Woodsend, K., & Lapata, M.
(2011) Learning to simplify sentences with quasi-synchronous grammar and integer programming. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 409–420). http://aclweb.org/anthology//D/D11/D11-1038.pdf
Yang, D., & Powers, D.M.W.
(2006) Word sense disambiguation using lexical cohesion in the context. Proceedings of COLING/ACL2006, Main Conference Poster Sessions (pp. 929–936).http://aclweb.org/anthology//P/P06/P06-2119.pdf
Zhang, Z., Gentile, A.L., & Ciravegna, F.
Cited by 2 other publications
Hartmann, Nathan, Livia Cucatto, Danielle Brants & Sandra Aluísio
This list is based on CrossRef data as of 19 may 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.