Introduction published in:
Natural language processing for learner corpus researchEdited by Kristopher Kyle
[International Journal of Learner Corpus Research 7:1] 2021
► pp. 1–16
Introduction
Natural language processing for learner corpus research
Article outline
- 1.Introduction to NLP
- The role of training corpora in NLP
- Tokenization
- Lemmatization
- Part of speech annotation
- Constituency parse annotation
- Dependency relation annotation
- 2.Some specific challenges for calculating accuracy in LCR research
- 3.The present issue
- Notes
-
References
This article is available free of charge.
Published online: 01 March 2021
https://doi.org/10.1075/ijlcr.00019.int
https://doi.org/10.1075/ijlcr.00019.int
References
Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D.
Anthony, L.
Bauer, L., & Nation, I. S. P.
Berzak, Y., Kenney, J., Spadine, C., Wang, J. X., Lam, L., Mori, K. S., Garza, S., & Katz, B.
Bestgen, Y., & Granger, S.
Biber, D., Gray, B., & Staples, S.
Chen, D., & Manning, C. D.
Choi, J. D., Tetreault, J., & Stent, A.
(2015) It depends: Dependency parser comparison using a web-based evaluation tool. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 387–396). Stroudsburg: Association for Computational Linguistics.
Crossley, S. A., Kyle, K., & Dascalu, M.
Crossley, S. A., & McNamara, D. S.
Díez-Bedmar, M. B., & Pérez-Paredes, P.
Explosion AI
Garside, R., Leech, G. N., & McEnery, T.
Geertzen, J., Alexopoulou, T., & Korhonen, A.
(2013) Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). In R. T. Miller, K. I. Martin, C. M. Eddington, A. Henery, N. Marcos Miguel, A. M. Tseng, A. Tuninetti, & D. Walter (Eds.), Selected Proceedings of the 2012 Second Language Research Forum (pp. 240–254). Somerville, MA: Cascadilla Proceedings Project.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z.
Granger, S., & Bestgen, Y.
Green, C.
Heatley, A., & Nation, I. S. P.
(1994) Range. [Computer Software]. Victoria University of Wellington, NZ. Retrieved from http://Www.Vuw.Ac.Nz/Lals/
Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A.
Jurafsky, D., & Manning, C. D.
Jurafsky, D., & Martin, J. H.
(2019) Speech and Language Processing (Unpublished Manuscript). October 2019 Retrieved from https://web.stanford.edu/~jurafsky/slp3/
Khushik, G. A., & Huhta, A.
Kitaev, N., & Klein, D.
Klein, D., & Manning, C. D.
Kyle, K.
(2016) Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication (Unpublished doctorial dissertation). Georgia State University, Atlanta. http://scholarworks.gsu.edu/alesl_diss/35/
Kyle, K., & Crossley, S. A.
Kyle, K., Crossley, S. A., & Verspoor, M.
in press). Measuring longitudinal writing development using indices of syntactic complexity and VAC sophistication. Studies in Second Language Acquisition.
Kyle, K., & Eguchi, M.
in press). Automatically assessing lexical sophistication using word, bigram, and dependency indices. In S. Granger Ed. Perspectives on the Second Language Phrasicon: The View from Learner Corpora Bristol Multilingual Matters
in progress). A gold standard part of speech tagged and dependency parsed corpus of L2 speech.
Levy, R., & Andrew, G.
Lu, X.
Lu, X., & Ai, H.
McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z.
Meurers, D., & Dickinson, M.
Nivre, J., Hall, J., & Nilsson, J.
Paquot, M.
Paquot, M., Naets, H., & Gries, S. T.
in press). Using syntactic co-occurrences to trace phraseological complexity development in learner writing: Verb + object structures in LONGDALE. In B. LeBruyn & M. Paquot Eds. Learner Corpus Research Meets Second Language Acquisition Cambridge Cambridge University Press
Pinchbeck, G. G.
Polio, C., & Yoon, H.
Schmid, H.
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y.
(2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology – Volume 1 (pp. 173–180). Stroudsburg: Association for Computational Linguistics.
van den Bosch, A., Busser, B., Canisius, S., & Daelemans, W.
Weischedel, R., Palmer, M., Marcus, M., Hovy, E., Pradhan, S., Ramshaw, L., Xue, N., Taylor, A., Kaufman, J., & Franchini, M.
(2013) Ontonotes release 5.0. Philadelphia: Linguistic Data Consortium. Retrieved from https://catalog.ldc.upenn.edu/LDC2013T19
Yannakoudakis, H., Briscoe, T., & Medlock, B.