Vol. 24:2 (2019) ► pp.202–228
Constructing a corpus-informed list of Arabic formulaic sequences (ArFSs) for language pedagogy and technology
This study aims to construct a corpus-informed list of Arabic Formulaic Sequences (ArFSs) for use in language pedagogy (LP) and Natural Language Processing (NLP) applications. A hybrid mixed methods model was adopted for extracting ArFSs from a corpus, that combined automatic and manual extracting methods, based on well-established quantitative and qualitative criteria that are relevant from the perspective of LP and NLP. The pedagogical implications of this list are examined to facilitate the inclusion of ArFSs in the process of learning and teaching Arabic, particularly for non-native speakers. The computational implications of the ArFSs list are related to the key role of the ArFSs as a novel language resource in the improvement of various Arabic NLP tasks.
Article outline
- 1.Introduction
- 2.Formulaic Sequences in language pedagogy and technology
- 2.1Corpus-informed pedagogical formulaic sequences
- 2.2Arabic computational MWEs research
- 3.Methodology: A hybrid model for FSs extraction
- 3.1Issues of frequency, extent and identification
- 3.2The corpus source of the language data
- 3.3The selection criteria
- 3.4Stages of constructing the FSs list
- 3.4.1Statistical phase
- 3.4.2Qualitative phase
- 3.4.3Linguistic analysis and classification phase
- 4.Results and discussion
- 5.Conclusions
- Acknowledgements
- Note
-
References
https://doi.org/10.1075/ijcl.16088.alg