Data-driven learning (DDL) typically involves language learners consulting corpus data, either directly or via prepared materials, to answer questions about language. The approach has been mooted since the beginning of the modern era of corpus linguistics and has come to be associated with work by Tim Johns who coined the term in print in 1990. Since then, hundreds of studies have attempted to evaluate some aspect of DDL, giving rise to several reviews and syntheses. This paper introduces DDL and discusses the syntheses to date, before analysing a rigorous collection of 351 studies published up to and including 2018. While previous syntheses have evaluated the field, the objective here is to provide an overview of how researchers see DDL across the board, to identify more clearly what DDL actually looks like today, how it has evolved from its early beginnings in the 1980s, and to suggest avenues for future research in underexplored areas.
This chapter focuses on the need to address both theories of learning and theories of language acquisition in data-driven learning (DDL) research. While it recognises that there has been so much worthwhile research work on DDL which has shed so much light on the value of DDL, it is still not a mainstream methodology. The chapter argues that by understanding better the variations in pedagogical underpinnings and ontologies, DDL research can better pinpoint what works within specified variables. Additionally, the paper argues strongly for engagement with ongoing research in second language acquisition (SLA), especially from a usage-based perspective because there are so many resonances for DDL in terms of the centrality of the role of frequently experienced syntactic regularities in learning.
This chapter is the result of a conversation between Professors Tony McEnery and Michael McCarthy, two of the greatest names in the fields of corpus linguistics and the corpus-based analysis and teaching of the English language. They share their views and experiences of the areas discussed in this volume, including the relationship between DDL and SLA research, the role of formulaic language, spoken language and pragmatics as areas of potential applications of learner corpus research (LCR) methods, the role of frequency in language learning and teaching and, among others, annotation.
Although -ing clauses are frequent in English, their acquisition has not received much attention, and there is a lack of longitudinal studies and detailed explorations of cross-linguistic influence. This longitudinal case study of five young Norwegian students reveals a developmental sequence for the syntactic roles of -ing clauses: complements of aspectual verbs > complements of other verbs and prepositions > bare adjuncts and postmodifiers of nouns > subjects. The sequence may arise from a combination of frequencies in the input and grammatical selection. Syntactic restrictions on Norwegian present participle clauses are not mirrored in the acquisition of -ing clauses, indicating that the students do not make an interlingual identification. Cross-linguistic influence is evident mainly in late acquisition and infrequent use of -ing clauses.
The present paper analyses the use of V-N collocations with the four ‘light’ verbs do, give, make and take in a corpus representing longitudinal written data from 83 German learners of English in their four final years of secondary school. The paper focuses on the development of correct and incorrect collocation use, the development of L1 interference, as well as intersubjective variability regarding these aspects. Corroborating previous studies, the paper shows that roughly one quarter of collocations is atypical with no significant changes over the four years. Around half of all erroneous collocations can be interpreted as the result of L1-interference. As regards individual paths of acquisition, the paper shows that students show a large degree of variability. The paper argues that collocations need to be taught explicitly, focusing on frequent collocations of medium restrictedness that show incongruity of L1 and L2.
This proof-of-concept study presents a novel way of analysing learner language based on 20 randomly selected interviews from the Chinese part of the Louvain International Database of Spoken English Interlanguage (LINDSEI) (Gilquin et al., 2010), annotated both pragmatically and using a newly devised error analysis scheme. Annotation and initial analyses were primarily conducted in the Dialogue Annotation and Research Tool (DART) (Weisser, 2016b), and the relevant features extracted and normed by the number of functional (speech act) units to obtain meaningful comparable results across speakers. The results indicate that the majority of errors affect the coherence of the learners’ narrative and that the communicative strategies used by the learners exhibit a high number of discourse markers or response-signals used (em)phatically, apparently only serving as ‘planning facilitators’, rather than genuine structural or interactional devices.
While much of the research into the efficacy of data-driven learning (DDL) has focused on productive skills such as writing, less attention has been on how DDL affects receptive skills, such as reading. This chapter presents a four-year mixed-methods study, which investigates the impact of DDL on the extensive reading (ER) proficiency of second language learners in a Japanese national university. Conducted in three stages and under a wide range of class conditions, this study concludes that explicit forms of DDL are most successful with small groups and one-to-one instruction, where personal feedback and less pressure are key. Implicit forms of DDL tend to work best with larger class sizes, where it results in faster reading speeds and in greater amounts of material read.
This chapter investigates the effect of data-driven learning (DDL) by comparing the use of a new corpus tool, #LancsBox, with the use of a corpus-based collocations dictionary for academic collocation learning. Learners in the study improved their collocational knowledge by using #LancsBox, although not significantly. No improvement was found through consulting the collocations dictionary. The majority of learners believed that direct corpus consultation using concordance lines and collocation graphs facilitated their collocation learning, whereas learners in the dictionary group considered the use of a dictionary to be more beneficial for writing than learning collocations. Interestingly, learners from both groups reported that they would continue using the assigned tools in future language learning and teaching.
The purpose of Scoledit is to build a computer-aided longitudinal corpus of texts written by pupils between 6 and 11 years as well as associated automatic processing tools. This project seeks to produce linguistic descriptions of pupils’ writings and to facilitate the teaching of spelling and writing. Currently, an increasing number of projects aim to create large primary school corpora of French (Elalouf, 2005; Garcia-Debanc & Bonnemaison, 2014; David & Doquet, 2016). However, these corpora are neither longitudinal nor associated with natural language processing (NLP) tools (Wolfarth, 2017). This chapter discusses some of the automated tools for linguistic analyses developed and the advantages of the Scoledit project in the context of language teaching
This chapter discusses the development of multilingual pedagogical resources that make use of functional descriptors such as vocabulary and grammar items aligned to the CEFR levels in English. CEFR-J × 28 is a project that sets out to create multilingual pedagogical resources for 28 languages based upon the resources developed for the CEFR-J, a localised version of the CEFR for English language teaching in Japan. The chapter discusses the original CEFR-J project and its accompanying resources such as word/phrase lists, grammar and text profiles. It describes how to convert English resources into multiple languages using machine translation and multilingual corpora. A discussion of the advantages of this approach as well as future work is provided. The chapter discusses a series of e-learning tools developed to support learners’ vocabulary and grammar learning as well as a web-based learner corpus collection tool for multilingual spoken and written production.
Data-driven learning (DDL) typically involves language learners consulting corpus data, either directly or via prepared materials, to answer questions about language. The approach has been mooted since the beginning of the modern era of corpus linguistics and has come to be associated with work by Tim Johns who coined the term in print in 1990. Since then, hundreds of studies have attempted to evaluate some aspect of DDL, giving rise to several reviews and syntheses. This paper introduces DDL and discusses the syntheses to date, before analysing a rigorous collection of 351 studies published up to and including 2018. While previous syntheses have evaluated the field, the objective here is to provide an overview of how researchers see DDL across the board, to identify more clearly what DDL actually looks like today, how it has evolved from its early beginnings in the 1980s, and to suggest avenues for future research in underexplored areas.
This chapter focuses on the need to address both theories of learning and theories of language acquisition in data-driven learning (DDL) research. While it recognises that there has been so much worthwhile research work on DDL which has shed so much light on the value of DDL, it is still not a mainstream methodology. The chapter argues that by understanding better the variations in pedagogical underpinnings and ontologies, DDL research can better pinpoint what works within specified variables. Additionally, the paper argues strongly for engagement with ongoing research in second language acquisition (SLA), especially from a usage-based perspective because there are so many resonances for DDL in terms of the centrality of the role of frequently experienced syntactic regularities in learning.
This chapter is the result of a conversation between Professors Tony McEnery and Michael McCarthy, two of the greatest names in the fields of corpus linguistics and the corpus-based analysis and teaching of the English language. They share their views and experiences of the areas discussed in this volume, including the relationship between DDL and SLA research, the role of formulaic language, spoken language and pragmatics as areas of potential applications of learner corpus research (LCR) methods, the role of frequency in language learning and teaching and, among others, annotation.
Although -ing clauses are frequent in English, their acquisition has not received much attention, and there is a lack of longitudinal studies and detailed explorations of cross-linguistic influence. This longitudinal case study of five young Norwegian students reveals a developmental sequence for the syntactic roles of -ing clauses: complements of aspectual verbs > complements of other verbs and prepositions > bare adjuncts and postmodifiers of nouns > subjects. The sequence may arise from a combination of frequencies in the input and grammatical selection. Syntactic restrictions on Norwegian present participle clauses are not mirrored in the acquisition of -ing clauses, indicating that the students do not make an interlingual identification. Cross-linguistic influence is evident mainly in late acquisition and infrequent use of -ing clauses.
The present paper analyses the use of V-N collocations with the four ‘light’ verbs do, give, make and take in a corpus representing longitudinal written data from 83 German learners of English in their four final years of secondary school. The paper focuses on the development of correct and incorrect collocation use, the development of L1 interference, as well as intersubjective variability regarding these aspects. Corroborating previous studies, the paper shows that roughly one quarter of collocations is atypical with no significant changes over the four years. Around half of all erroneous collocations can be interpreted as the result of L1-interference. As regards individual paths of acquisition, the paper shows that students show a large degree of variability. The paper argues that collocations need to be taught explicitly, focusing on frequent collocations of medium restrictedness that show incongruity of L1 and L2.
This proof-of-concept study presents a novel way of analysing learner language based on 20 randomly selected interviews from the Chinese part of the Louvain International Database of Spoken English Interlanguage (LINDSEI) (Gilquin et al., 2010), annotated both pragmatically and using a newly devised error analysis scheme. Annotation and initial analyses were primarily conducted in the Dialogue Annotation and Research Tool (DART) (Weisser, 2016b), and the relevant features extracted and normed by the number of functional (speech act) units to obtain meaningful comparable results across speakers. The results indicate that the majority of errors affect the coherence of the learners’ narrative and that the communicative strategies used by the learners exhibit a high number of discourse markers or response-signals used (em)phatically, apparently only serving as ‘planning facilitators’, rather than genuine structural or interactional devices.
While much of the research into the efficacy of data-driven learning (DDL) has focused on productive skills such as writing, less attention has been on how DDL affects receptive skills, such as reading. This chapter presents a four-year mixed-methods study, which investigates the impact of DDL on the extensive reading (ER) proficiency of second language learners in a Japanese national university. Conducted in three stages and under a wide range of class conditions, this study concludes that explicit forms of DDL are most successful with small groups and one-to-one instruction, where personal feedback and less pressure are key. Implicit forms of DDL tend to work best with larger class sizes, where it results in faster reading speeds and in greater amounts of material read.
This chapter investigates the effect of data-driven learning (DDL) by comparing the use of a new corpus tool, #LancsBox, with the use of a corpus-based collocations dictionary for academic collocation learning. Learners in the study improved their collocational knowledge by using #LancsBox, although not significantly. No improvement was found through consulting the collocations dictionary. The majority of learners believed that direct corpus consultation using concordance lines and collocation graphs facilitated their collocation learning, whereas learners in the dictionary group considered the use of a dictionary to be more beneficial for writing than learning collocations. Interestingly, learners from both groups reported that they would continue using the assigned tools in future language learning and teaching.
The purpose of Scoledit is to build a computer-aided longitudinal corpus of texts written by pupils between 6 and 11 years as well as associated automatic processing tools. This project seeks to produce linguistic descriptions of pupils’ writings and to facilitate the teaching of spelling and writing. Currently, an increasing number of projects aim to create large primary school corpora of French (Elalouf, 2005; Garcia-Debanc & Bonnemaison, 2014; David & Doquet, 2016). However, these corpora are neither longitudinal nor associated with natural language processing (NLP) tools (Wolfarth, 2017). This chapter discusses some of the automated tools for linguistic analyses developed and the advantages of the Scoledit project in the context of language teaching
This chapter discusses the development of multilingual pedagogical resources that make use of functional descriptors such as vocabulary and grammar items aligned to the CEFR levels in English. CEFR-J × 28 is a project that sets out to create multilingual pedagogical resources for 28 languages based upon the resources developed for the CEFR-J, a localised version of the CEFR for English language teaching in Japan. The chapter discusses the original CEFR-J project and its accompanying resources such as word/phrase lists, grammar and text profiles. It describes how to convert English resources into multiple languages using machine translation and multilingual corpora. A discussion of the advantages of this approach as well as future work is provided. The chapter discusses a series of e-learning tools developed to support learners’ vocabulary and grammar learning as well as a web-based learner corpus collection tool for multilingual spoken and written production.