PolylexFLE
A MWE database for French L2 language learners
MWE knowledge is key in the process of learning a foreign language, but its teaching remains hindered by the lack
of list of expressions connected to pedagogical aims. In this paper, we present an extended version of the PolylexFLE database,
containing 4,525 French multiword expressions (MWE) of three types: idioms, collocations or fixed expressions. In order to propose
exercises following the difficulty scale of the European Framework of Reference for Languages (CEFR), we used a mixed approach
(manual and automatic) to annotate 1,186 expressions according to the CEFR levels. The paper focuses mostly on the automatic
procedure that first identifies the expressions from the PolylexFLE database (and their variants) in a corpus of pedagogical texts
(with CEFR labels) using a pattern-based system. In a second step, their distribution in this corpus is estimated and transformed
into a single CEFR level. The automatic approach proposed is finally evaluated by 52 French as foreign language learners.
Article outline
- 1.Introduction
- 2.Related work
- 3.MWE: Definitions and classification criteria
- 4.Building the PolylexFLE database
- 4.1Data collection
- 4.2Linguistic informations
- 5.Identification of CEFR level
- 5.1The manual approach
- 5.2Corpus description
- 5.3MWE extraction : PolyExtractor
- 5.3.1Method description
- 5.3.2Evaluation of PolyExtractor
- 5.3.2.1Reference corpus manually annotated with MWE
- 5.3.2.2Results of PolyExtractor
- 5.4From distribution to a single CEFR level
- 6.Evaluating the quality of the CEFR annotation
- 7.Conclusion and further work
- Notes
-
Bibliography