Development of the use of discourse markers across different fluency levels of CEFR: A learner corpus analysis

Lan-fen Huang, Yen-liang Lin and Tomáš Gráf
Abstract

Fluent L2 English speakers frequently use discourse markers (DMs) as a speech management strategy, but research has largely ignored how this develops across different proficiency levels and how it is related to immersive experiences. This study examines the developmental patterns of three DMs – well, you know and like – in the speech of learners at A2-C1 in CEFR with and without immersive experiences in target language environments. The fluency-rated LINDSEI corpus (173 learners) and a parallel native corpus (50 speakers) provided approximately 350,000 tokens and 3,395 instances of the analyzed DMs. Overall, DM frequency (especially with well and you know) among C1 speakers increases with rising fluency levels up to almost native-like levels. Immersive experience correlates positively with overall and individual DM frequency (except for like). As the skillful use of DMs results in more fluent speech production, the didactic implications for L2 instructors should be developed.

Keywords:
Publication history
Table of contents

1.Introduction

Fluency is generally recognized as a multidimensional construct (Housen et al. 2012Housen, Alex, Folkert Kuiken, and Ineke Vedder (eds.) 2012Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA. Amsterdam: John Benjamins. DOI logoGoogle Scholar) and the literature abounds in ways of defining and operationalizing it. One of the most common findings is that, in order to achieve fluent performance, speakers deploy various strategies to buy time for planning subsequent utterances. One such strategy is the use of discourse markers (DMs)11.The definition of DMs is still open to debate. Varying approaches have been adopted to develop criteria for determining DMs (e.g., Schiffrin 1987Schiffrin, Deborah 1987Discourse Markers. Cambridge: Cambridge University Press. DOI logoGoogle Scholar; Fraser 1990Fraser, Bruce 1990 “An Approach to Discourse Markers.” Journal of Pragmatics 14 (3): 383–395. DOI logoGoogle Scholar). Based on work by Schourup (1999)Schourup, Lawrence 1999 “Discourse Markers.” Lingua 107: 227–265. DOI logoGoogle Scholar and Fung and Carter (2007)Fung, Loretta, and Ronald Carter 2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar, a DM is determined by the possession of five characteristics: (1) optionality, (2) flexibility of position, (3) prosodic independence, (4) connectivity and (5) multi-grammaticality. as compensatory fillers (Hedge 1993Hedge, Tricia 1993 “Key Concepts in ELT.” ELT Journal 47 (3): 275–277. DOI logoGoogle Scholar), a non-intrusive strategy, since DMs naturally and frequently occur in native spoken English (Carter and McCarthy 2006Carter, Ronald, and Michael McCarthy 2006Cambridge Grammar of English. Cambridge: Cambridge University Press.Google Scholar) and contribute to pragmatically effective communication (Polat 2011Polat, Brittany 2011 “Investigating Acquisition of Discourse Markers through a Developmental Learner Corpus.” Journal of Pragmatics 43: 3745–3756. DOI logoGoogle Scholar). Although previous research has shown that fluent speakers tend to use more and more varied DMs (e.g., Hasselgren 2002Hasselgren, Angela 2002 “Learner Corpora and Language Testing: Smallwords as Markers of Learner Fluency.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, ed. by Sylviane Granger, Joseph Hung, and Stephanie Petch-Tyson, 143–173. Amsterdam: John Benjamins. DOI logoGoogle Scholar; Götz 2013Götz, Sandra 2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar; Crible 2017Crible, Ludivine 2017 “Discourse Markers and (Dis)fluency across Registers: A Contrastive Usage-based Study in English and French.” PhD diss., Université de Berne.), few empirical studies have investigated the developmental patterns of DMs in learner corpora across fluency levels in an internationally-recognized system, namely the Common European Framework of Reference for Languages (CEFR; Council of Europe 2001Council of Europe 2001Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press.Google Scholar, 2018 2018Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Strasbourg Cedex: Council of Europe.Google Scholar). The CEFR provides a comprehensive description of increasing language proficiency from A1 to C2 levels.22.The CEFR, released in 2001, was developed to accommodate all languages by describing competences of listening, reading, spoken interaction, spoken production and writing. See https://​www​.coe​.int​/en​/web​/common​-european​-framework​-reference​-languages​/home.

The scarcity of multi-level learner speech data has resulted in a dearth of studies describing the process of acquiring DMs. The present study examines three typical DMs (well, like and you know) in the speech of learners, empirically testing the use of DMs at different perceived fluency levels of CEFR, and shows how the use of DMs develops as perceived fluency levels increase.

Studies of second language (L2) acquisition have also reported that learners’ exposure to target language environments and regular contact with native speakers (NSs) facilitates L2 development, especially in the use of DMs (e.g., Müller 2005Müller, Simone 2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar; Hellermann and Vergun 2007Hellermann, John, and Andrea Vergun 2007 “Language Which Is Not Taught: The Discourse Marker Use of Beginning Adult Learners of English.” Journal of Pragmatics 39: 157–179. DOI logoGoogle Scholar; Gilquin 2016 2016 “Discourse Markers in L2 English: From Classroom to Naturalistic Input.” In New Approaches to English Linguistics: Building Bridges, ed. by Olga Timofeeva, Anne-Christine Gardner, Alpo Honkapohja, and Sarah Chevalier, 213–249. Amsterdam: John Benjamins. DOI logoGoogle Scholar; Götz and Mukherjee 2018Götz, Sandra, and Joybrato Mukherjee 2018 “Investigating the Effect of the Study Abroad Variable on Learner Output: A Pseudo-Longitudinal Study on Spoken German Learner English.” In Learner Corpus Research, ed. by Vaclav Brezina, and Lynne Flowerdew, 47–65. London: Bloomsbury.Google Scholar). The concept of this type of acquisition derives in part from the socio-cultural perspective of Vygotsky (1978)Vygotsky, Lev Semenovich 1978Mind in Society: The Development of Higher Psychological Processes. Cambridge, MA: Harvard University Press.Google Scholar, which identifies the role of social interaction in creating an environment that provides L2 learners with abundant opportunities to engage in target-language use. It is therefore reasonable to argue that L2 learners with immersive experience are more exposed than other L2 learners to the natural production of DMs, resulting in an enhanced competence to comprehend and produce fluent L2 speech (Gilquin 2016 2016 “Discourse Markers in L2 English: From Classroom to Naturalistic Input.” In New Approaches to English Linguistics: Building Bridges, ed. by Olga Timofeeva, Anne-Christine Gardner, Alpo Honkapohja, and Sarah Chevalier, 213–249. Amsterdam: John Benjamins. DOI logoGoogle Scholar). Hence, we aim to explore how far the acquisition of DMs relates to such immersive experiences.33.One of the metadata of our learner corpus presents the duration of stay in countries where English is spoken, which includes any form of activity from attending formal instruction to sojourns of different kinds.

It has been argued that, since DMs can be used as strategies for enhancing fluency, learners should be encouraged to use them in order to reduce the incidence of more disfluent features, such as (un)filled pauses (Götz 2013Götz, Sandra 2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar), because this could lead to improvements in a speaker’s perceived fluency. The present study thus provides evidence for the pedagogical implications of the use of DMs in learning English, raising awareness of the developmental pattern of DMs in learners’ spoken English and in benchmarking learner fluency against CEFR levels which are applicable to English speaking tests.

2.Discourse markers in speech

DMs are ubiquitous in spoken discourse and have many roles in spontaneous speech, which is constructed in real time and involves, among other things, the immediate social and interpersonal situation. Primarily, DMs function to “signal transitions in the evolving process of the conversation, index the relation of an utterance to the preceding context and indicate an interactive relationship between speaker, hearer, and message” (Fung and Carter 2007Fung, Loretta, and Ronald Carter 2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar, 401). In this regard, DMs act as “punctuation for speech”, signaling and signposting for the speaker (Carter 2008Carter, Ronald 2008 “ Right, Well, OK, So, It’s Like, You Know, Isn’t It, I Suppose: Spoken Words, Written Words and Why Speaking Is Different.” In The Sound and the Silence: Key Perspectives on Speaking and Listening and Skills for Life, ed. by Caroline Hudson, 11–23. Coventry: Quality Improvement Agency.Google Scholar, 15). This coherence-based view is concordant with Schiffrin’s (1987Schiffrin, Deborah 1987Discourse Markers. Cambridge: Cambridge University Press. DOI logoGoogle Scholar, 31) definition of DMs as “sequentially dependent elements which bracket units of talk”. DMs usually work at a discourse level to maintain coherence by providing contextual coordinates for ongoing discourse and acting as linking devices, which reflect choices in monitoring, organizing and managing discourse. DMs also help to organize utterances for the listener, which serve to make the structure and main points of the speech more readily apparent. In addition to maintaining discourse cohesiveness, DMs have important interpersonal functions in spoken discourse: they enable speakers to project interactive understanding in face-to-face communication, in token of, for example, politeness, shared knowledge, turn-taking, emotional engagement and responses; e.g., agreement, confirmation and acknowledgement (Carter and McCarthy 2017 2017 “Spoken Grammar: Where Are We and Where Are We Going?Applied Linguistics 38 (1): 1–20. DOI logoGoogle Scholar). This reflects the listener-sensitive function of DMs: to indicate the attitudes of the speaker and his/her stance vis-à-vis the information conveyed.

2.1Discourse markers and speech fluency

DMs have sometimes been negatively characterized as “a sign of disfluency and carelessness” (Brinton 1996Brinton, Laurel J. 1996Pragmatic Markers in English: Grammaticalization and Discourse Functions. New York: Mouton de Gruyter. DOI logoGoogle Scholar, 33). Nevertheless, Tottie (2011Tottie, Gunnel 2011 “ Uh and Um as Sociolinguistic Markers in British English.” International Journal of Corpus Linguistics 16 (2): 173–197. DOI logoGoogle Scholar, 193) argues that the term disfluency is based on an idealized conception of fluent speech production and is “a rather negative and uninformative default term that says nothing about the discourse functions”. She then proposes the more positive term “planners”. When employed as a speech planning and monitoring strategy, DMs in spoken discourse have been shown to be helpful in spontaneous speech production; they contribute to speech fluency and smoother communication (Götz 2013Götz, Sandra 2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar; Crible et al. 2017Crible, Ludivine, Liesbeth Degand, and Gaëtanelle Gilquin 2017 “The Clustering of Discourse Markers and Filled Pauses: A Corpus-Based French-English Study of (Dis)fluency.” Languages in Contrast 17 (1): 69–95. DOI logoGoogle Scholar; Rühlemann 2019Rühlemann, Christoph 2019Corpus Linguistics for Pragmatics. Oxon: Routledge.Google Scholar; Wolk et al. 2021Wolk, Christoph, Sandra Götz, and Katja Jäschke 2021 “Possibilities and Drawbacks of Using an Online Application for Semi-Automatic Corpus Analysis to Investigate Discourse Markers and Alternative Fluency Variables.” Corpus Pragmatics 5: 1–30. DOI logoGoogle Scholar), especially when a lexical gap or speech difficulty emerges. In this regard, DMs may fulfill “potentially disfluent functions” by monitoring (checking for understanding, calling for help), punctuation (stalling, planning) and reformulation (paraphrase and actual corrective relations; Cribble 2017Crible, Ludivine 2017 “Discourse Markers and (Dis)fluency across Registers: A Contrastive Usage-based Study in English and French.” PhD diss., Université de Berne., 80).

As House (2009House, Juliane 2009 “Subjectivity in English as Lingua Franca Discourse: The Case of You Know .” Intercultural Pragmatics 6 (2): 171–193. DOI logoGoogle Scholar, 187) notes, DMs can be a “gap-filler”, in the form of “a stock phrase mainly used to help speakers process and plan their output, and link spans of discourse”, which may fill the pauses and replace the disfluency. Tsai and Chu (2017)Tsai, Pei-shu., and Wo-hsin Chu 2017 “The Use of Discourse Markers among Mandarin Chinese Teachers, and Chinese as a Second Language and Chinese as a Foreign Language Learners.” Applied Linguistics 38 (5): 638–665.Google Scholar examine the DMs used by Chinese-speaking teachers and learners and report that individual speakers who very often use DMs display their fluency in the target language and decreasing numbers of incomplete utterances (false starts) per turn. Like filled pauses in speaking, DMs serve as the “elegant fillers” or “fluencemes” that occur frequently and can easily be used to fill a silence during speech processing and planning in a natural-sounding way, increasing “the length of a speech run (and thus the overall productive fluency) as well as the degree of naturalness of the output (and thus perceptive fluency)” (Götz 2013Götz, Sandra 2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar, 40). Such fluency-enhancing functions of DMs can be equivalent strategies for speakers to choose for coping with planning phases in spontaneous speech (Wolk et al. 2021Wolk, Christoph, Sandra Götz, and Katja Jäschke 2021 “Possibilities and Drawbacks of Using an Online Application for Semi-Automatic Corpus Analysis to Investigate Discourse Markers and Alternative Fluency Variables.” Corpus Pragmatics 5: 1–30. DOI logoGoogle Scholar). Learning how to use DMs can facilitate listeners’ interpretation of English. If speakers are able to produce DMs appropriately, the English that they produce will be more natural and fluent (Hoey 2002Hoey, Michael 2002 “Spoken Discourse.” In Macmillan English Dictionary for Advanced Learners of American English, ed. by Michael Rundell, LA16–LA17. Oxford: Macmillan Education.Google Scholar).

2.2Discourse markers in native and learner discourse

DMs have been investigated extensively and intensively in the speech of English NSs and learners across numerous L1s or cultural backgrounds, such as Chinese (Liao 2009Liao, Silvie 2009 “Variation in the Use of Discourse Markers by Chinese Teaching Assistants in the US.” Journal of Pragmatics 41 (7): 1313–1328. DOI logoGoogle Scholar), Dutch (Buysse 2012Buysse, Lieven 2012 “ So as a Multifunctional Discourse Marker in Native and Learner Speech.” Journal of Pragmatics 44 (13): 1764–1782. DOI logoGoogle Scholar), German (Müller 2005Müller, Simone 2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar; Götz 2013Götz, Sandra 2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar), Spanish (Romero-Trillo 2002Romero-Trillo, Jesús 2002 “The Pragmatic Fossilization of Discourse Markers in Non-Native Speakers of English.” Journal of Pragmatics 34 (6): 769–784. DOI logoGoogle Scholar), Swedish (Aijmer 2011Aijmer, Karin 2011 “ Well I’m Not Sure I Think… The Use of Well by Non-Native Speakers.” International Journal of Corpus Linguistics 16 (2): 231–254. DOI logoGoogle Scholar), French and Polish (Gilquin and Granger 2015Gilquin, Gaëtanelle, and Sylviane Granger 2015 “Learner Language.” In The Cambridge Handbook of English Corpus Linguistics, ed. by Douglas Biber and Rand Reppen, 418–435. Cambridge: Cambridge University Press. DOI logoGoogle Scholar) and Taiwanese (Lin 2016Lin, Yen-liang 2016 “Discourse Marking in Spoken Intercultural Communication between British and Taiwanese Adolescent Learners.” Pragmatics 26 (2): 221–245. DOI logoGoogle Scholar; Huang 2019 2019 “A Corpus-Based Exploration of the Discourse Marker Well in Spoken Interlanguage.” Language and Speech 62 (3): 570–593. DOI logoGoogle Scholar). Müller (2005)Müller, Simone 2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar, for example, investigates the use of DMs so, like, well, and you know by NSs and German-speaking learners. It has been reported that overall in learner discourse DMs are under-represented,44.For language learning, it is reasonable to assume that the language produced by native speakers is taken as the target norm. The terms underuse and overuse are generally adopted by most learner corpus studies, implying that learners use a given target item too much or too little to sound like a native speaker. In studies of DMs, we would suggest as alternatives the neutral terms under- and over-representation for discussing differences in frequency across corpora. The underlying assumption of under- and over-representation is to keep frequency information as linguistic evidence in focus and avoid over-generalizing differences to learners’ performance; in particular, the use of DMs is contextually dependent and syntactically and semantically optional. with the exception of well. Aijmer (2011)Aijmer, Karin 2011 “ Well I’m Not Sure I Think… The Use of Well by Non-Native Speakers.” International Journal of Corpus Linguistics 16 (2): 231–254. DOI logoGoogle Scholar, for example, reports an over-representation of well by Swedish learners in several different functional roles, with learners tending to use it as a fluency device mostly to cope with speech management problems and rarely to indicate attitude (e.g., by mitigating disagreement). Fung and Carter’s (2007Fung, Loretta, and Ronald Carter 2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar, 410) examination of learners in Hong Kong, based on a pedagogic sub-corpus from CANCODE, shows evidence that DMs serve as “useful interactional manoeuvres” to organize and structure speech on interpersonal, referential, structural and cognitive levels. Employing the same analytical scheme, Lin (2016)Lin, Yen-liang 2016 “Discourse Marking in Spoken Intercultural Communication between British and Taiwanese Adolescent Learners.” Pragmatics 26 (2): 221–245. DOI logoGoogle Scholar examines the speech of British NSs and Taiwanese learners of English, based on a specialized corpus derived from an adolescent intercultural exchange program. Both studies show that NSs used a wider range of DMs for discourse pragmatic functions, whereas L2 learners’ use of DMs was more restricted. Although the above studies found significant differences between NSs and non-NSs, it remains unclear how the proficiency levels of learners influence the use of DMs. Learners’ proficiency levels remain a “fuzzy variable” in learner corpus research (Carlsen 2012Carlsen, Cecilie 2012 “Proficiency Level – A Fuzzy Variable in Computer Learner Corpora.” Applied Linguistics 33 (2): 161–183. DOI logoGoogle Scholar).

Several studies document the different uses of DMs across proficiency levels in L2 speech. Dumont (2018)Dumont, Amandine 2018Fluency and Disfluency: A Corpus Study of Non-Native and Native Speaker (Dis)fluency Profiles. PhD thesis, Université Catholique de Louvain, Louvain-la-Neuve. finds that C1 learners use significantly more DMs than do B2 learners. Jones et al. (2018)Jones, Christian, Shelley Byrne, and Nicola Halenko 2018Successful Spoken English: Findings from Learner Corpora. Oxon: Routledge.Google Scholar show that more proficient learners (equivalent to C1 in CEFR in the UCLan Speaking Test Corpus) used well more frequently than lower-level learners, whereas B1 learners employed the DM you know significantly more than B2 and C1 learners. Neary-Sundquist (2014)Neary-Sundquist, Colleen 2014 “The Use of Pragmatic Markers across Proficiency Levels in Second Language Speech.” Studies in Second Language Learning and Teaching 4 (4): 637–663. DOI logoGoogle Scholar reports that DM use is positively correlated with proficiency; but even highly proficient learners could not reach native-like patterns of variation, and certain DMs, such as I think, were over-represented. Although some studies do not clarify whether a wider variety or higher frequency of DMs was predictive of higher proficiency levels (Wei 2011Wei, Ming 2011 “A Comparative Study of the Oral Proficiency of Chinese Learners of English across Task Functions: A Discourse Marker Perspective.” Foreign Language Annals 44 (4): 674–691. DOI logoGoogle Scholar), many studies have found that DMs are used more frequently by more advanced learners of a language (Müller 2005Müller, Simone 2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar; Hellermann and Vergun 2007Hellermann, John, and Andrea Vergun 2007 “Language Which Is Not Taught: The Discourse Marker Use of Beginning Adult Learners of English.” Journal of Pragmatics 39: 157–179. DOI logoGoogle Scholar) and the rate of DM use by advanced learners is similar to that by NSs (Neary-Sundquist 2014Neary-Sundquist, Colleen 2014 “The Use of Pragmatic Markers across Proficiency Levels in Second Language Speech.” Studies in Second Language Learning and Teaching 4 (4): 637–663. DOI logoGoogle Scholar). This implies that the use of DMs can be a mark of both NSs and successful users of English (Prodromou 2008Prodromou, Luke 2008English as a Lingua Franca: A Corpus-Based Analysis. London: Continuum.Google Scholar). DMs indicate speakers’ membership “within cultural communities and project a ‘deep commonality’ amongst interlocutors” (O’Keeffe et al. 2007O’Keeffe, Anne, Michael McCarthy, and Ronald Carter 2007From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. DOI logoGoogle Scholar, 76). It is further suggested that learners who wish to advance closer to near-native fluency should be exposed to and practice these distinctive features of spoken grammar (ibid.).

However, neither the levels of learners’ speaking proficiency nor the fluency levels of these studies are clearly or uniformly defined, making it difficult to compare them in understanding the development of DM use. Only Dumont (2018)Dumont, Amandine 2018Fluency and Disfluency: A Corpus Study of Non-Native and Native Speaker (Dis)fluency Profiles. PhD thesis, Université Catholique de Louvain, Louvain-la-Neuve. and Jones et al. (2018)Jones, Christian, Shelley Byrne, and Nicola Halenko 2018Successful Spoken English: Findings from Learner Corpora. Oxon: Routledge.Google Scholar used the CEFR to quantify learner levels, while other studies used a graded language course (Hellermann and Vergun 2007Hellermann, John, and Andrea Vergun 2007 “Language Which Is Not Taught: The Discourse Marker Use of Beginning Adult Learners of English.” Journal of Pragmatics 39: 157–179. DOI logoGoogle Scholar), investigator assessment (Fung and Carter 2007Fung, Loretta, and Ronald Carter 2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar), a pre-study speaking proficiency test (Neary-Sundquist 2014Neary-Sundquist, Colleen 2014 “The Use of Pragmatic Markers across Proficiency Levels in Second Language Speech.” Studies in Second Language Learning and Teaching 4 (4): 637–663. DOI logoGoogle Scholar), or lacked a defined proficiency variable (Müller 2005Müller, Simone 2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar). In light of these ongoing discrepancies in the literature, the present study uses a multi-level learner corpus, evaluated post hoc, deploying CEFR fluency descriptors. It allows the results of developing DM use to be easily adopted in such practical contexts as language classrooms and assessments.

2.3Discourse markers and immersive experience

Socio-cultural integration and exposure to the target language environment have been shown to influence thinking, speaking and the acquisition of a new language. Previous L2 research has provided strong evidence that learners’ immersive experience with rich L2 input and opportunities for interaction in natural communicative contexts have a positive impact on the development of oral fluency. Mora and Valls‐Ferrer (2012)Mora, Joan C., and Margalida Valls-Ferrer 2012 “Oral Fluency, Accuracy, and Complexity in Formal Instruction and Study Abroad Learning Contexts.” TESOL Quarterly 46 (4): 610–641. DOI logoGoogle Scholar, for example, investigated 30 learners at an upper-intermediate level (B2) who had failed to improve in fluency after six months of formal instruction in the participants’ home country, but after a three-month study-abroad program increased all measures of fluency other than accuracy and complexity. These observed fluency gains may have been due to the frequent employment of various fluency devices in speech, such as DMs.

Immersive experience may also promote the acquisition of DMs. Hellermann and Vergun (2007)Hellermann, John, and Andrea Vergun 2007 “Language Which Is Not Taught: The Discourse Marker Use of Beginning Adult Learners of English.” Journal of Pragmatics 39: 157–179. DOI logoGoogle Scholar find that L2 learners who had more contact with NSs acquired three DMs (you know, well and like), whereas students with little or no use of DMs in their classroom talk all reported speaking their first language at least 50 per cent of the time spent outside the classroom. Liu (2016)Liu, Binmei 2016 “Effect of L2 Exposure: From a Perspective of Discourse Markers.” Applied Linguistics Review 7 (1): 73–98. DOI logoGoogle Scholar notes that, for Chinese learners of English as a foreign language (EFL) who lived in the United States, both the increased exposure and increased socialization had significant positive effects on the frequency and variety of the DMs produced. Gilquin (2016) 2016 “Discourse Markers in L2 English: From Classroom to Naturalistic Input.” In New Approaches to English Linguistics: Building Bridges, ed. by Olga Timofeeva, Anne-Christine Gardner, Alpo Honkapohja, and Sarah Chevalier, 213–249. Amsterdam: John Benjamins. DOI logoGoogle Scholar examines 554 EFL learners from the Louvain International Database of Spoken English Interlanguage (LINDSEI) and finds a general, significant increase in the use of DMs with the increased length of time spent studying abroad in an English-speaking country. Götz and Mukherjee (2018)Götz, Sandra, and Joybrato Mukherjee 2018 “Investigating the Effect of the Study Abroad Variable on Learner Output: A Pseudo-Longitudinal Study on Spoken German Learner English.” In Learner Corpus Research, ed. by Vaclav Brezina, and Lynne Flowerdew, 47–65. London: Bloomsbury.Google Scholar report a positive significant effect on the use of DMs of immersive experiences lasting more than one year, showing how the duration of English instruction and a period of stay abroad lead to an increased use of DMs and a reduced use of (un)filled pauses, resulting in speech that is more fluent. These studies demonstrate how increased exposure and socio-cultural integration through a study-abroad experience have measurable effects on the L2 development of DMs.

Although the above studies show a relationship between immersive experience, fluency and the use of DMs in learner groups of a particular level of L1, the present study proposes to further investigate the effect of immersive experience on learners across four CEFR fluency levels.

2.4Focal discourse markers

A wide variety of words or multiword units considered as DMs have been investigated, but some may be taken as uncontroversial and are classified as central DMs, such as well, you know and like. These three DMs are the most frequently used and have been extensively selected for analysis in both native and non-NS corpora (e.g., Müller 2005Müller, Simone 2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar; Hellermann and Vergun 2007Hellermann, John, and Andrea Vergun 2007 “Language Which Is Not Taught: The Discourse Marker Use of Beginning Adult Learners of English.” Journal of Pragmatics 39: 157–179. DOI logoGoogle Scholar; Polat 2011Polat, Brittany 2011 “Investigating Acquisition of Discourse Markers through a Developmental Learner Corpus.” Journal of Pragmatics 43: 3745–3756. DOI logoGoogle Scholar; Götz 2013Götz, Sandra 2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar; Dumont 2018Dumont, Amandine 2018Fluency and Disfluency: A Corpus Study of Non-Native and Native Speaker (Dis)fluency Profiles. PhD thesis, Université Catholique de Louvain, Louvain-la-Neuve.). Making the distinction between non-DM use and DM use of well and like is straightforward, by referring to the parts of speech that they represent. When analyzing you know, syntactical necessity was the criterion.

DM well serves multiple functions in spontaneous speech; one of its major functions is to indicate that the speaker is thinking about things (Carter and McCarthy 2006Carter, Ronald, and Michael McCarthy 2006Cambridge Grammar of English. Cambridge: Cambridge University Press.Google Scholar). Aijmer (2011Aijmer, Karin 2011 “ Well I’m Not Sure I Think… The Use of Well by Non-Native Speakers.” International Journal of Corpus Linguistics 16 (2): 231–254. DOI logoGoogle Scholar, 235) describes it as “primarily a ‘mental state’ interjection”, which can be associated with the speaker’s deliberation. Biber et al. (1999Biber, Douglas, Edward Finegan, Stig Johansson, Susan Conrad, and Geoffrey Leech 1999Longman Grammar of Spoken and Written English. Essex: Pearson Education Limited.Google Scholar, 1086) also state that well “appears to have the general function of a ‘deliberation signal’, indicating the speaker’s need to give (brief) thought or consideration to the point at issue”. In such cases, the use of well can provide cognitive benefits that allow speakers to buy time for planning, processing and searching for alternative expressions. As Fung and Carter (2007)Fung, Loretta, and Ronald Carter 2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar note, DM well also serves the interpersonal function of indicating the speaker’s attitude and the structural function of indicating the shift to a new topic. It can be taken as a mitigator to soften disagreements, dispreferrred points, unexpected answers, etc. Some studies have reported an over-representation of well by French-speaking (Gilquin 2008Gilquin, Gaëtanelle 2008 “Hesitation Markers among EFL Learners: Pragmatic Deficiency or Difference?” In Pragmatics and Corpus Linguistics: A Mutualistic Entente, ed. by Jesús Romero-Trillo, 119–149. Berlin, Heidelberg and New York: Mouton de Gruyter.Google Scholar) and Swedish-speaking (Aijmer 2011Aijmer, Karin 2011 “ Well I’m Not Sure I Think… The Use of Well by Non-Native Speakers.” International Journal of Corpus Linguistics 16 (2): 231–254. DOI logoGoogle Scholar) learners, compared to their native-speaking counterparts, while others report an under-representation of DM well, for instance by Chinese-speaking learners from Hong Kong (Fung and Carter 2007Fung, Loretta, and Ronald Carter 2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar) and Taiwan (Lin 2016Lin, Yen-liang 2016 “Discourse Marking in Spoken Intercultural Communication between British and Taiwanese Adolescent Learners.” Pragmatics 26 (2): 221–245. DOI logoGoogle Scholar; Huang 2019 2019 “A Corpus-Based Exploration of the Discourse Marker Well in Spoken Interlanguage.” Language and Speech 62 (3): 570–593. DOI logoGoogle Scholar).

You know is commonly considered an interpersonal DM, signaling that speakers are sensitive to the needs of their listeners and are monitoring the state of shared knowledge in the conversation (O’Keeffe et al. 2007O’Keeffe, Anne, Michael McCarthy, and Ronald Carter 2007From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. DOI logoGoogle Scholar). But, while you know functions as an interpersonal DM, it may not always be the case that speakers and hearers have shared knowledge. Speakers occasionally use it for reformulating, repairing and exemplifying in a way that may replace pauses and disfluency and provide a coherence function in discourse (Polat 2011Polat, Brittany 2011 “Investigating Acquisition of Discourse Markers through a Developmental Learner Corpus.” Journal of Pragmatics 43: 3745–3756. DOI logoGoogle Scholar; Lin 2016Lin, Yen-liang 2016 “Discourse Marking in Spoken Intercultural Communication between British and Taiwanese Adolescent Learners.” Pragmatics 26 (2): 221–245. DOI logoGoogle Scholar). It can also be used to launch a new topic (O’Keeffe et al. 2007O’Keeffe, Anne, Michael McCarthy, and Ronald Carter 2007From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. DOI logoGoogle Scholar) and to highlight a particular point in an utterance (Fox Tree and Schrock, 2002Fox Tree, Jean E., and Josef C. Schrock 2002 “Basic Meanings of You Know and I Mean .” Journal of Pragmatics 34 (6): 727–747. DOI logoGoogle Scholar). However, House (2009)House, Juliane 2009 “Subjectivity in English as Lingua Franca Discourse: The Case of You Know .” Intercultural Pragmatics 6 (2): 171–193. DOI logoGoogle Scholar argues that the functional use of you know by EFL learners and NSs is markedly different in that EFL speakers use you know predominantly as a self-serving strategy to improve coherence rather than inviting addressee inferences or cooperating with their interlocutors.

Like has been reported as the most prevalent DM in casual spoken interaction (Lin 2016Lin, Yen-liang 2016 “Discourse Marking in Spoken Intercultural Communication between British and Taiwanese Adolescent Learners.” Pragmatics 26 (2): 221–245. DOI logoGoogle Scholar). The DM like in spoken discourse is interpreted as serving several functions. One of the most frequent is to preface new information (Fuller 2003Fuller, Janet M. 2003 “Use of the Discourse Marker Like in Interviews.” Journal of Sociolinguistics 7 (3): 365–377. DOI logoGoogle Scholar; Hellermann and Vergun 2007Hellermann, John, and Andrea Vergun 2007 “Language Which Is Not Taught: The Discourse Marker Use of Beginning Adult Learners of English.” Journal of Pragmatics 39: 157–179. DOI logoGoogle Scholar). Studies have identified significant differences in the use of like between native and non-NSs, and have identified like as having the greatest disparity in usage between these two groups (Müller 2005Müller, Simone 2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar; Lin 2016Lin, Yen-liang 2016 “Discourse Marking in Spoken Intercultural Communication between British and Taiwanese Adolescent Learners.” Pragmatics 26 (2): 221–245. DOI logoGoogle Scholar). When like occurs with numeral expressions, it often serves as a vagueness marker, denoting the approximateness of the quantity and purposely suggesting uncertainty. O’Keeffe et al. (2007O’Keeffe, Anne, Michael McCarthy, and Ronald Carter 2007From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. DOI logoGoogle Scholar, 177) state that “speakers frequently introduce approximators to downtone what might otherwise sound overly precise”. Like can also function as a filler, hesitation marker or discourse linking device, indicating the need for speech planning without giving up the floor (Polat 2011Polat, Brittany 2011 “Investigating Acquisition of Discourse Markers through a Developmental Learner Corpus.” Journal of Pragmatics 43: 3745–3756. DOI logoGoogle Scholar). Speakers search for the content or appropriate lexical information while thinking and speaking. This use of like occurs commonly with false starts, pauses and self-repairs, especially in language learners’ discourse.

3.Research questions

This study addresses two research questions:

First, what are the developmental patterns of the three DMs well, you know and like in learner speech across fluency levels in CEFR? The developmental patterns are shown with the relative frequencies of DMs and the proportion of non-users of DMs in each speaker group, using native norms as a benchmark.

Second, what is the effect of immersive experience on the learners’ acquisition of DMs? The participants were divided into three groups, learners without immersive experience, those with such experience and their native-speaker counterparts (“native counterparts”, below), to examine whether the speaker groups were statistically significantly related.

4.Methodology

The corpus data are introduced in the first subsection below. In addition to describing the unified structure of the corpora, this subsection briefly reports how learners’ speaking fluency levels were assigned. The second subsection reports the research methods adopted.

4.1Corpus data under investigation

The learner corpus data were derived from 183 interviews, collected by two of the three authors. One hundred interviews were held with university English majors from the Czech (Gráf 2017Gráf, Tomáš 2017 “The Story of the Learner Corpus LINDSEI_CZ.” Studie z Aplikované Lingvistiky [Studies in Applied Linguistics] 8 (2): 22–35.Google Scholar) and Taiwanese (Huang 2014Huang, Lan-fen 2014 “Constructing the Taiwanese Component of the Louvain International Database of Spoken English Interlanguage (LINDSEI).” Taiwan Journal of TESOL 11 (1): 31–74.Google Scholar) sub-corpora of LINDSEI55.The first 11 sub-corpora were published in LINDSEI version 1 (Gilquin et al. 2010Gilquin, Gaëtanelle, Sylvie De Cock, and Sylviane Granger (eds.) 2010LINDSEI Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.Google Scholar). At the time of writing, there are 24 sub-corpora; see LINDSEI Partners on https://​uclouvain​.be​/en​/research​-institutes​/ilc​/cecl​/lindsei​-partners​.html. One of the criteria for selecting eligible participants was majoring in English. (Gilquin et al. 2010Gilquin, Gaëtanelle, Sylvie De Cock, and Sylviane Granger (eds.) 2010LINDSEI Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.Google Scholar). To expand our learner corpus data to lower proficiency levels, 83 interviews came from the supplemented version of LINDSEI (see Huang and Gráf 2021Huang, Lan-fen, and Tomáš Gráf (2021) “Expanding LINDSEI to Spoken Learner English from Several L1s across CEFR Levels.” Corpora 16 (2): 271–285. DOI logoGoogle Scholar for more detail), which were collected in various university departments (English, Chinese, Business Management, Financial Management, International Trade, Tourism Management and Information Technology) in Taiwan and Finland. The interviewees’ ages ranged between 19 and 26 years, with an average age of 22.5 years in the Czech sub-corpus, 21.7 in the Taiwanese one and 20.7 in the supplementary data.

The LINDSEI interviews called for three major tasks. The first was a monologue on a set topic. There were three set topics in LINDSEI: (a) An experience you have had which has taught you an important lesson; (b) A country you have visited which has impressed you; (c) A film/play you’ve seen which you thought was particularly good/bad (Gilquin et al. 2010Gilquin, Gaëtanelle, Sylvie De Cock, and Sylviane Granger (eds.) 2010LINDSEI Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.Google Scholar, 8). The participants chose one of the set topics. The lower-level students in the supplemented corpus were given ten simple topics (hobbies, school/major, daily routine, plans, family, a person you admire, good friends, favorite food, leisure activities and travel experience) and were asked to talk about three of them. The second task was a dialogue about topics of general interest. The third task was to reconstruct a narrative on the basis of four sequential pictures. Teachers of English conducted the interviews and the learners participated voluntarily. Each interview lasted approximately 15 minutes.

The 183 learners were grouped into six fluency levels according to the aural evaluations of their sample performances by trained examiners. Learners’ performances under the headings of range, fluency, accuracy, phonological control and coherence were rated according to the descriptors of CEFR (Council of Europe 2018 2018Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Strasbourg Cedex: Council of Europe.Google Scholar, 171–172). The 100 learners in LINDSEI-Czech and LINDSEI-Taiwanese were assessed independently by two qualified Cambridge IELTS examiners, who had previously been trained in CEFR rater standardization. The rating results from the two raters correlated closely with each other (ρ = .893), as did the rating of the 83 supplementary corpus items (ρ = .84).66. Huang et al. (2018)Huang, Lan-fen, Simon Kubelec, Nicole Keng, and Lung-hsun Hsu 2018 “Evaluating CEFR Rater Performance through the Analysis of Spoken Learner Corpora.” Language Testing in Asia 8 (14): 1–17. DOI logoGoogle Scholar documented the details of rating LINDSEI-Czech and LINDSEI-Taiwanese. The rating of the supplementary data is reported in Huang and Gráf (2021)Huang, Lan-fen, and Tomáš Gráf (2021) “Expanding LINDSEI to Spoken Learner English from Several L1s across CEFR Levels.” Corpora 16 (2): 271–285. DOI logoGoogle Scholar. Cases showing a discrepancy were sent to a third rater for adjudication. In the present study, the learners’ levels of fluency in CEFR were used for grouping because we were investigating the relationship between fluency and the use of DMs. The qualitative features of fluency operationalized in CEFR are presented in the Appendix.

As shown in Table 1, the post hoc assessment resulted in a division into six groups: A1 (n = 5), A2 (n = 23), B1 (n = 33), B2 (n = 69), C1 (n = 48) and C2 (n = 5). This study focused on the development of DMs from A2 to C1; the sample sizes of A1 and C2 were too small to represent these two levels and therefore they were not included in this study. To compare the use of DMs by speakers at the four different fluency levels (n = 173) with that of their native counterparts, 50 interviews with British university students from the Louvain Corpus of Native English Conversation (LOCNEC; De Cock 2004De Cock, Sylvie 2004 “Preferred Sequences of Words in NS and NNS Speech.” Belgian Journal of English Language and Literatures New Series 2: 225–246.Google Scholar) were also examined. The construction of the LOCNEC corpus followed the same structure as LINDSEI, making them directly comparable.

Table 1.Distribution of speakers across fluency levels in CEFR
Speaker groups Numbers of speakers Tokens
A1   5 1,551
A2  23 16,146
B1  33 31,027
B2  69 98,314
C1  48 84,649
C2   5 12,661
British native speakers of English  50 122,049
Total 233 366,397

To investigate the relationship between the learners’ immersive experiences (whatever their form) and their use of DMs, the 173 learners were further grouped on the basis of experience: learners with no immersive experience in an English-speaking country (n = 101), and those with immersive experience (n = 72), ranging from 0.2 months to 167.8 months (SD = 22). Table 2 below presents the distribution of learners based on their stay-abroad experiences and fluency levels. The effect of this experience is analyzed and discussed in Section 5.3 below.

Table 2.Distribution of learner speakers with or without immersive experiences
Speaker groups Numbers of speakers Distribution of fluency levels
Learners with immersive experiences  72 A2 = 5 (7%)
B1 = 6 (8%)
B2 = 25 (35%)
C1 = 36 (50%)
Learners without immersive experiences 101 A2 = 18 (18%)
B1 = 27 (27%)
B2 = 44 (44%)
C1 = 12 (12%)
Total 173 173

4.2Data analysis

The first part of the analysis examined the corpus data quantitatively to measure the overall frequencies of the three DMs (well, like and you know) across each fluency level of CEFR. The three DMs were retrieved with the Concord tool in WordSmith 7 (Scott 2016Scott, Mike 2016WordSmith Tools (Version 7). Stroud: Lexical Analysis Software.Google Scholar) and one of the present authors manually disambiguated the instances between their discourse and non-discourse use, as defined in the Cambridge Grammar of English (Carter and McCarthy 2006Carter, Ronald, and Michael McCarthy 2006Cambridge Grammar of English. Cambridge: Cambridge University Press.Google Scholar). The classification was then double-checked by a research assistant with a master’s degree in English language.

The frequencies of DMs were normalized as the number of instances per hundred words (phw). The resulting relative frequencies in the learner data were compared to those produced by their native counterparts. The relationships between the relative frequencies of DMs and learner fluency levels and between the speakers with/without immersive experiences and the duration of learners’ immersive experiences were then evaluated77.Spearman rank order correlation is used when one of the variables consists of non-parametric ranked data (e.g., CEFR fluency levels and speaker groups; Pallant 2011Pallant, Julie 2011SPSS Survival Manual (4th ed.). Crows Nest NSW: Allen and Unwin.Google Scholar). using Spearman rank order correlation. A Kruskal-Wallis test88.A Shapiro-Wilk normality test (p < 0.05, except for well in the native-speaker group) showed that the frequencies were not normally distributed; therefore, a Kruskal-Wallis test, an alternative to ANOVA for non-parametric data, was conducted (Leech et al. 2005Leech, Nancy L., Karen C. Barrett, and George A. Morgan 2005SPSS for Intermediate Statistics: Use and Interpretation (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar). and Dunn’s pair comparisons were then used to determine whether the differences were statistically significant.

The corpus methods allowed us to quantitatively analyze the distribution and developmental patterns of DMs across CEFR levels, but we were unable to explain adequately the observed use of the DMs in a particular speaker group unless we explored the co-texts. For this reason, we then conducted a largely qualitative immediate context analysis to exemplify and explain how DMs were produced by a speaker at a certain fluency level. All the instances of the three DMs were re-arranged to identify typical instances and their broader co-texts according to the immediate co-occurring items to the left and right of each DM.

5.Corpus analysis results

5.1Overall frequencies of the three discourse markers

A total of 3,395 instances of three DMs was identified, comprising 1,280 instances of well, 853 of you know and 1,262 of like. The descriptive statistical information is presented in Table 3. On average, the learners from A2 to C1 levels produced 0.19, 0.27, 0.63 and 1.28 DMs phw. The 48 C1-level learners used them almost as often as their native counterparts did (1.3 DMs phw).

Table 3.Descriptive statistical information on discourse markers across learner and native groups
CEFR fluency levels Number of speakers Well You know Like Three discourse markers
Raw freq. Mean (phw) Raw freq. Mean (phw) Raw freq. Mean (phw) Mean (phw) Min (phw) Max (phw) SD
A2 23   2  0.01   5 0.03  22 0.14  0.19 0    0.53 0.18
B1 33  41 0.1  21 0.04  43 0.13  0.27 0    2.87 0.52
B2 69 202  0.18 116 0.11 334 0.34  0.63 0 3 0.77
C1 48 450  0.56 128 0.14 460 0.58  1.28    0.11    3.25 0.76
Native speakers 50 585 0.5 583 0.47 403 0.33 1.3    0.06    4.01 0.68

To yield a better visual display, the relative frequencies of the three DMs are presented in boxplots. Figure 1 shows that the use of DMs develops linearly with fluency levels. A Spearman’s correlation was run to determine the relationship between the relative frequencies of DMs and the fluency levels. Overall, there was a strong, positive correlation (rs (223) = 0.65, p < 0.0001). The relationship between the relative frequencies of individual DM and fluency levels was also strongly positive for well (rs (223) = 0.674, p < 0.0001) and you know (rs (223) = 0.576, p < 0.0001), but weakly positive99.The strength of the correlation adopts the guide that Cohen (1988Cohen, Jacob 1988Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.Google Scholar, 79–81) suggests for the absolute value of correlation coefficients: 0.10 to 0.29 ‘small’; 0.30 to 0.49 ‘medium’; 0.50 to 1.0 ‘large’. for like (rs (223) = 0.243, p < 0.0001).

In Figure 1, the boxplots of the advanced C1 level are broadly similar to that of the native counterparts. The medians for C1 and native counterparts are 1.04 and 1.13 instances of DMs phw, respectively. In the native data, four outliers are shown. Since the native counterparts were taken as a benchmark and the frequencies of the DMs used by the high users show the phenomena of natural speech, we decided not to reject the outliers. In the learner groups, one outlier in C1, nine in B2 and two in B1 were identified. A closer look at these outliers revealed their preference for a particular DM. The only outlier at C1 (FI112) produced 2.55, 0 and 0.7 instances phw of well, you know and like respectively. Six of the nine outliers in the B2 group were frequent users of like, of whom one (TW003) employed like only. In the B1 group, one outlier (TW040) was a high user of well and the other (TW106) was a high user of you know. It is not easy from the current data to explain why a speaker uses a given DM more frequently than other DMs. Possible interpretations could be the speakers’ idiosyncrasies, or perhaps that the immediate contexts where the DMs occurred required their use.

Figure 1.Boxplots of relative frequencies of the three focal discourse markers for learner and native groups
Figure 1.

To test if there was a statistically significant difference between speaker groups, the data from learners at four levels (A2, B1, B2 and C1) and those of their native counterparts were used to run a Kruskal-Wallis test. It showed that the frequencies of DMs had a significant, strong1010. Lomax and Hahs-Vaughn (2012)Lomax, Richard G., and Debbie L. Hahs-Vaughn 2012Statistical Concepts – A Second Course (4th ed.). New York: Routledge.Google Scholar suggest that effect size values measured with epsilon squared can be interpreted similarly to those of eta squared; therefore, a value of 0.01 is considered a small effect, 0.06 a medium effect and 0.14 a large effect (Cohen 1988Cohen, Jacob 1988Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.Google Scholar, 284–287). effect on the levels of perceived fluency, χ 2(4, n = 223) = 99.065, p < 0.0001, ε 2 = 0.446. Dunn’s post hoc tests were then conducted on each pair of groups.1111.In order to preserve a family-wise 0.05 significance level, we applied the Bonferroni adjustment by dividing the alpha equally across the ten tests (Pallant 2011Pallant, Julie 2011SPSS Survival Manual (4th ed.). Crows Nest NSW: Allen and Unwin.Google Scholar; Tabachnick and Fidell 2012Tabachnick, Barbara G., and Linda S. Fidell 2012Using Multivariate Statistics (6th ed.). Boston: Pearson.Google Scholar). It was found that the A2 group was significantly different from the C1 (p < 0.0001) and native counterpart groups (p < 0.0001), but not from the B1 (p = 1) and B2 (p = 0.064) groups. A statistically significant relationship was found between the B1 and B2 groups (p < 0.043), between the B1 and C1 (p < 0.0001), between the B2 and C1 (p < 0.0001) and between the B2 and the native counterpart groups (p < 0.0001). As suggested earlier, the C1 learners performed similarly to their native counterparts. The statistically significant difference did not lie in the difference between these two groups (p = 1).

5.2Three focal discourse markers in learner and native groups

Almost all the 50 British native university students under investigation used DMs well and you know and only 9 (18%) of them did not use DM like (see Table 4). All of the C1 speakers were users of DMs, which means that their speech was closer to native norms. The percentages of non-users of the three DMs at A2 and B1 were higher (39% and 42% respectively) than those in the B2, C1 and native data (13%, 0% and 0% respectively), suggesting that these three common DMs may develop in line with speakers’ fluency levels. This trend was particularly marked in the use of DMs well and you know (see Figure 2). Their frequencies started to increase from B1 to B2. The use of DM like appeared distinct from the use made of the other two DMs. Nearly half of the speakers at A2 and B1 and four-fifths at B2 and C1 used like as a DM.

Table 4.Proportions of non-users of discourse markers in learner and native groups
CEFR fluency levels Number of speakers Non-users of well % Non-users of you know % Non-users of like % Non-users of three discourse markers %
A2 23 21 91 20 87 12 52  9 39
B1 33 28 85 27 82 17 52 14 42
B2 69 42 61 38 55 15 22  9 13
C1 48  5 10 20 42 10 21  0  0
Native speakers 50  0  0  2  4  9 18  0  0
Figure 2.Boxplots of relative frequencies of the discourse markers well, you know and like for learner and native groups
fig2a.svg
fig2b.svg
fig2c.svg

5.2.1Discourse marker well

Among the three DMs for analysis, well is the only one used by over 90% of the advanced C1 learners and native counterparts, while only approximately one third of the B2 learners (38%) use it. The high frequency of the use of well is thus a characteristic feature of the more fluent C1 speakers, who may have adopted it as one of their strategies for maintaining fluency.

In the extract, the mark-up <A> refers to the turn produced by the interviewer and <B> to that of the learner. In order to discuss possible reasons why well is used and demonstrate how the use of well relates to fluency, we selected a learner at C1. In Example (1) below, the first instance produced by Speaker A follows a disagreement with the other speaker. Speaker B, a C1 speaker, is one of the outliers (i.e., a user who used DMs relatively frequently) cited in Figure 1 above. The second example may act as a “frame” for introducing clarifications or for self-repairs (Svartvik 1980Svartvik, Jan 1980 “ Well in Conversation.” In Studies in English Linguistics for Randolph Quirk, ed. by Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik, 167–177. London: Longman.Google Scholar, 175). The third instance, which follows a short silence, might indicate Speaker B’s attempt to compensate for disfluency.

(1)

Example of DM well at fluency level C1 (File: FI112, LINDSEI-supplemented)

<A> I really want to learn that’s my goal in spring is it difficult </A>

<B> I don’t think so </B>

<A> well (1) you started at two </A>

<B> it’s (eh) yeah . well (2) it’s difficult if you want to be good at it . but </B>

<A> (mm) </A>

<B> I think you can just be an amateur and try and just . well (3) slide down you’re not gonna be good at it at first but </B>

Based on the qualitative features of C1, “only a conceptually difficult subject can hinder a natural, smooth flow of language” (Council of Europe 2018 2018Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Strasbourg Cedex: Council of Europe.Google Scholar, 171–172). Speaker B is able to speak fluently and appropriately employs DM well – when reformulation is needed to perform pragmatic functions rather than producing long pauses, which would probably have resulted in an impression of disfluency. If the instances of well in Example (1) above had been taken out, the speakers would also have been able to get their messages across, but they would have produced (un)filled pauses and lacked the pragmatic functions that well serves.

5.2.2Discourse marker you know;

In the learner data a linear developmental pattern can be seen. The proportions of non-users declined from 87% at A2 to 43% at C1. In Example (2), Speaker B at B2 produced three instances of you know. The first one co-occurred with an intensifier, actually, signaling the introduction of key information. The second instance prefaced a re-start and the third one follows the vague item kind of. These two instances may indicate that the speaker is searching for content or appropriate words. This use of you know, not surprisingly, is common in learner data because learners need more time to formulate what they say in a foreign language (Polat 2011Polat, Brittany 2011 “Investigating Acquisition of Discourse Markers through a Developmental Learner Corpus.” Journal of Pragmatics 43: 3745–3756. DOI logoGoogle Scholar). Other evidence suggesting disfluency was seen in the hesitation markers (eh, mm, er and em, underlined) and silent pauses (transcribed into one, two and three periods). In the speech produced by a B2 speaker, “there are few noticeably long pauses” (Council of Europe 2018 2018Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Strasbourg Cedex: Council of Europe.Google Scholar, 171–172, see Appendix). If Speaker B had not employed DMs, there would have been more evident pausing, which could possibly have downgraded Speaker B’s fluency level to B1.

(2)

Example of DM you know at fluency level B2 (File: TW035, LINDSEI-Taiwanese)

<B> in Korea . well . Korea (eh) re= (eh) really really . I mean . polite </B>

<A> the Korean people right </A>

<B> yeah . (eh) . (mm) well . (er) superficially they are polite but actually you know (1) <overlap /> <starts laughing> it’s another way <stops laughing> it’s another </B>

<A> <overlap /> <laughs> oh okay </A>

<B> you know (2) another .. things . (em) </B>

<A> were they friendly . do you think they were friendly to you </A>

<B> they are friendly to me cos they are kind of you know (3) association . to: greet foreign students <overlap /> so: .. it’s the job </B>

Like DM well, you know can be employed as a compensatory filler. This function could be found in the speech of nearly half the B2 learners, while most of the learners below B2 could have used silent pauses and such filled pauses as er and mm to serve the same function. The latter group was more likely to be seen as disfluent, distinct from the B2 speakers. In CEFR, the fluency at B1 featured evident pausing (see Appendix).

5.2.3Discourse marker like

Of the three prominent DMs under investigation, well and you know were used by all 50 British native university students in LOCNEC, although 9 (18%) of them did not produce like as a DM. In terms of relative frequencies, like was least frequently used (0.33 instances phw), compared to well (0.5 instances phw) and you know (0.47 instances phw) by the native group. In contrast, the analysis of learner data shows that like was most often used by learners at all four levels. The proportion of its users in each learner group was also higher than for the other two DMs. 48% of A2 and 48% of B1, 78% of B2 and 79% of C1 used like as a DM. It was also found that the relationship between the relative frequencies and fluency levels was weak. In other words, compared to well and you know, like was the least likely to be a distinguishing feature between fluency levels. It is therefore interesting to examine how lower-level speakers use like.

Of the three DMs, like was the most popular among the 23 A2 learners, producing 0.14 instances phw of like (see Table 3 above). Like often co-occurred with (un)filled pauses and false starts (73% of the 22 instances), as demonstrated in Example (3) by an A2 learner, and was thus clearly due to hesitation, exemplifying the qualitative features of fluency at A2, where a speaker “can make him/herself understood in very short utterances, even though pauses, false starts and reformulation are very evident” (Council of Europe 2018 2018Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Strasbourg Cedex: Council of Europe.Google Scholar, 171–172, see Appendix). Although the development of like starts early at A2 (see Figure 2 above) compared to well and you know, Example (3) shows A2-level learners’ excessive use of like as a hesitation marker, which may be evaluated to signify a lower fluency level.

(3)

Example of DM like at fluency level A2 (File: JP101, LINDSEI-supplemented)

<A> (mhm) okay (em) how is your Chinese .. ho= how is your Chinese </A>

<B> (erm) <laughs> .. (em) .. (er) I studied Chinese .. four year </B>

<A> (mhm) </A>

<B> (er) . (erm) . I . can speak .. (erm) . like like . conversation . and . I can listen the class Chinese class </B>

<A> okay . okay so it’s good </A>

The empirical corpus data show that speakers at higher fluency levels use more DMs to enhance fluency as well as for their pragmatic functions. The pragmatic functions of the DMs are discussed in the above examples in their immediate co-texts, demonstrating the contribution of DMs in the framework of fluency in CEFR.

5.3Use of discourse markers and learners’ immersive experience

In order to analyze the relationship between the use of DMs and learners’ immersive experience, the learners were divided into two groups: those with immersive experience in an English-speaking country (n = 72) and those without (n = 101; see Table 2 above for more detail). The native counterparts (n = 50) served as a benchmark. A Spearman’s correlation test indicated that the relationship between the relative frequencies of the three DMs as a grouped category and the three speaker groups was moderately positive (rs (223) = 0.506, p < 0.0001). The same positive relationship was found individually between the three speaker groups and the use of well (rs (223) = 0.54, p < 0.0001), you know (rs (223) = 0.535, p < 0.0001) and like (rs (223) = 0.196, p = 0.003).

Native students used more DMs overall and with greater within-group consistency than those with or without immersive experience (Figure 3). A Kruskal-Wallis test found that the relative frequency of DM use by any speaker group was significantly different from its use by the others (χ 2(2, n = 223) = 56.948, p < 0.0001, ε 2 = 0.257). Dunn’s post hoc pairwise comparisons revealed significant differences between the three groups (the no-experience group and those who had had such experience, p < 0.0001; the no-experience and native groups, p < 0.0001; and the immersive experience and native groups, p = 0.006). These results suggest that immersive experience may positively influence the acquisition of DMs.

Figure 3.Boxplots of the relative frequencies of the three focal discourse markers for learners with and without immersive experience and native speakers
Figure 3.

This study further analyzed each level individually between learners with and without immersive experience. The results revealed a significant difference at the B2 level, suggesting that B2 learners with immersive experience (Mdn = 0.55) produced significantly more DMs than those at the same level without immersive experience (Mdn = 0.18, U = 779.5, p = 0.004, r = 0.35).

6.Discussion

This section addresses the two research questions regarding (i) the development of three focal DMs across the fluency levels of CEFR and (ii) the effects that immersive experiences in English-speaking countries have on DM use.

6.1Developmental pattern of discourse markers across fluency levels in CEFR

Analysis of learner data reveals that the use of DMs positively correlated with fluency level. On the whole, it is reasonable to conclude that the use of DMs was indicative of perceived fluency. As mentioned earlier, instead of silent pauses and filled pauses (e.g., er and mm), DMs can be employed to increase fluency (Hedge 1993Hedge, Tricia 1993 “Key Concepts in ELT.” ELT Journal 47 (3): 275–277. DOI logoGoogle Scholar; Götz 2013Götz, Sandra 2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar).

It is worth noting, however, that the use of DMs does not develop steadily with fluency levels. In the cases of well and you know, the breakthrough is made between B1 and B2, and then the maximum use is observed on attaining C1, when most learners adopt these two DMs as frequently as their native counterparts do. This phenomenon was previously reported in Neary-Sundquist (2014Neary-Sundquist, Colleen 2014 “The Use of Pragmatic Markers across Proficiency Levels in Second Language Speech.” Studies in Second Language Learning and Teaching 4 (4): 637–663. DOI logoGoogle Scholar, 652), who identified a sudden rise in the frequency of a wider range of pragmatic markers “between Levels 5 and 6”, Level 6 being the highest. However, we cannot infer their equivalent levels in CEFR.

The use of DM like displays a different developmental pattern. It is least influenced by learners’ fluency levels, possibly because the frequency of like was found particularly high in A2 learners. They prefer to use like often with (un)filled pauses and false starts when a lexical gap or speech difficulty emerges, thus clearly indicating hesitation. This confirms the phenomenon that DM like might fulfill “potentially disfluent functions” by monitoring, punctuating and reformulating (Cribble 2017Crible, Ludivine 2017 “Discourse Markers and (Dis)fluency across Registers: A Contrastive Usage-based Study in English and French.” PhD diss., Université de Berne.).

While the fluency levels of the learner data in this study had been assigned on the basis of CEFR, previous studies (e.g., Neary-Sundquist 2014Neary-Sundquist, Colleen 2014 “The Use of Pragmatic Markers across Proficiency Levels in Second Language Speech.” Studies in Second Language Learning and Teaching 4 (4): 637–663. DOI logoGoogle Scholar) did not offer sufficient information on their learners’ proficiency or fluency levels. Although a few studies (e.g., Jones et al. 2018Jones, Christian, Shelley Byrne, and Nicola Halenko 2018Successful Spoken English: Findings from Learner Corpora. Oxon: Routledge.Google Scholar) also adopted CEFR to describe this variable, the data collection methods and contexts differ from each other. These discrepancies make it problematic to compare the use of DMs across datasets.

6.2Effects of immersive experience on the use of discourse markers

This paper reports a strong and positive relationship between the frequencies of DMs in general and learners’ immersive experiences, suggesting that DMs can be acquired incidentally when learners are given the chance to stay in a country where the target language is used, a finding that is in line with previous studies (e.g., Polat 2011Polat, Brittany 2011 “Investigating Acquisition of Discourse Markers through a Developmental Learner Corpus.” Journal of Pragmatics 43: 3745–3756. DOI logoGoogle Scholar; Gilquin 2016 2016 “Discourse Markers in L2 English: From Classroom to Naturalistic Input.” In New Approaches to English Linguistics: Building Bridges, ed. by Olga Timofeeva, Anne-Christine Gardner, Alpo Honkapohja, and Sarah Chevalier, 213–249. Amsterdam: John Benjamins. DOI logoGoogle Scholar; Liu 2016Liu, Binmei 2016 “Effect of L2 Exposure: From a Perspective of Discourse Markers.” Applied Linguistics Review 7 (1): 73–98. DOI logoGoogle Scholar; Götz and Mukherjee 2018Götz, Sandra, and Joybrato Mukherjee 2018 “Investigating the Effect of the Study Abroad Variable on Learner Output: A Pseudo-Longitudinal Study on Spoken German Learner English.” In Learner Corpus Research, ed. by Vaclav Brezina, and Lynne Flowerdew, 47–65. London: Bloomsbury.Google Scholar). Such cultural and linguistic immersion may provide L2 learners with naturalistic input, increased opportunities for social interaction and repeated exposure to the DMs used in naturally-occurring contexts, which may serve as a driver of L2 DM development and further enhance oral fluency. Although Götz and Mukherjee (2018)Götz, Sandra, and Joybrato Mukherjee 2018 “Investigating the Effect of the Study Abroad Variable on Learner Output: A Pseudo-Longitudinal Study on Spoken German Learner English.” In Learner Corpus Research, ed. by Vaclav Brezina, and Lynne Flowerdew, 47–65. London: Bloomsbury.Google Scholar report that a significant positive effect on the use of DMs was found only when the immersive experience continued for more than one year, this study finds a positive relationship after a shorter interval (average 9.25 months) on learners across CEFR levels.

The DMs you know and well, in particular, were found to be influenced by learners’ immersive experience. This may perhaps be explained by the interpersonal nature of these DMs. For example, you know is commonly considered an interpersonal DM, signaling that speakers are sensitive to the needs of their listeners and are monitoring the state of shared knowledge in the conversation (O’Keeffe et al. 2007O’Keeffe, Anne, Michael McCarthy, and Ronald Carter 2007From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. DOI logoGoogle Scholar). Exposure to the target language in its natural environment encourages learners to understand and use the language for these interpersonal purposes, which spurs the development of native-like patterns of language use (Liu 2016Liu, Binmei 2016 “Effect of L2 Exposure: From a Perspective of Discourse Markers.” Applied Linguistics Review 7 (1): 73–98. DOI logoGoogle Scholar). Similarly, well as an interpersonal DM is often under-represented in EFL learner speech due to the lack of both exposure in authentic contexts and social interaction (Huang 2019 2019 “A Corpus-Based Exploration of the Discourse Marker Well in Spoken Interlanguage.” Language and Speech 62 (3): 570–593. DOI logoGoogle Scholar). The present study finds that well can be developed rather sooner in the natural exposure of L2 learners, which is consistent with Liu (2016)Liu, Binmei 2016 “Effect of L2 Exposure: From a Perspective of Discourse Markers.” Applied Linguistics Review 7 (1): 73–98. DOI logoGoogle Scholar.

7.Conclusion

This study examined the developmental patterns of three DMs – well, you know and like – and revealed their frequency ranges in learner groups at four different fluency levels, based on the CEFR-aligned sub-corpora of LINDSEI and NS data as a benchmark. A strong positive correlation between perceived fluency and the overall frequency of DMs was found, suggesting that the use of DMs in learner data develops linearly with the increasing fluency levels; the higher-level learners (C1) use DMs almost as frequently as their native counterparts do. Similar patterns were also found in the use of you know and well, but not with like. Learners’ immersive experience was also found to positively correlate with overall and individual DM frequency, especially in B2 learners.

Although the present study has shed light on the development of DMs, there remain a number of limitations to note. One such limitation is that the variable of immersion was simply manipulated by the stay-abroad experiences of each individual without measuring or controlling the quality or quantity of socio-cultural exposure. This would be expected to influence the development of fluency and DMs, thereby influencing results. Some learners went abroad for formal education and some went abroad for other purposes. Only the duration of their stay was recorded in the metadata of the corpus under investigation. Another limitation of this study is the unequal distribution of the number of learners with different L1s at different CEFR levels, which is especially important when analyzing the immersive experience for different levels. In this situation, some subsets (e.g., A1 and B1) might not be large enough to be properly reflected in the results. Furthermore, the functions of DMs in the corpora under investigation are not quantified. Although it is manageable to categorize the 3,395 instances, providing additional information regarding the way in which each DM is actually used and its distribution, this study focuses on the developmental patterns and how they work for fluency-enhancing purposes, and consequently, a qualitative analysis of typical instances was sufficient to answer the research questions.

Further research may consider the above issues. Instead of analyzing corpus data, smaller datasets might provide qualitative information, such as students’ prior learning contexts and levels of immersive experience, which would help validate the contribution of immersion in the target-language environment. The use of DMs in different age groups could also be investigated. The current study examined university students’ development of DMs across CEFR fluency levels. Older or younger cohorts may adopt DMs differently. In addition, for the purposes of language assessment, interviews with CEFR examiners may reveal their insights into the roles of DMs in fluency.

This analysis of a multi-level learner corpus has important pedagogical implications. First, given the beneficial effect of immersive experiences on the acquisition of DMs, it is suggested that seeking opportunities for repeated exposure to naturally occurring and spontaneous target-language, as well as socio-cultural interaction, would allow learners to learn language, learn about language and learn through language. Second, learners in the classroom could be instructed to become aware of the use of DMs in order to improve fluency and interaction in significant dialogues. Data in a corpus of naturally occurring discourse can therefore provide an empirical basis for language description by showing how DMs as a speech management strategy are used in natural contexts. In addition to awareness-raising, these features can be encouraged in the speech of learners, from which they can see how fluency is enhanced using DMs in a given context.

The results of the current study also have implications for spoken English assessment in the global context. DMs, along with many other variables, may play a part in the assessment of speakers’ proficiency. DMs not only serve pragmatic functions, as discussed in the literature, but could also affect the way that fluency is rated when judged by the CEFR scales, as has been demonstrated in the present study, where the relationship between the frequencies of DMs and fluency levels is shown to be positively linear. In a global context where English is used as a lingua franca, English users can be aware of the presence or absence of DMs across fluency levels and acquire the ability to manipulate DMs for greater fluency.

Funding

This work was supported by the research project entitled “Fluency and accuracy in spoken English interlanguage across CEFR levels and mother tongues”, sponsored by the Ministry of Science and Technology, Taiwan, under grant number MOST108-2410-H-012-002-MY2.

Acknowledgements

We would like to express our gratitude to the two anonymous reviewers for their thoughtful comments and constructive suggestions. We would also like to thank Dr. Eve Richards for proofreading this manuscript. Any errors or omissions are ours.

Notes

1.The definition of DMs is still open to debate. Varying approaches have been adopted to develop criteria for determining DMs (e.g., Schiffrin 1987Schiffrin, Deborah 1987Discourse Markers. Cambridge: Cambridge University Press. DOI logoGoogle Scholar; Fraser 1990Fraser, Bruce 1990 “An Approach to Discourse Markers.” Journal of Pragmatics 14 (3): 383–395. DOI logoGoogle Scholar). Based on work by Schourup (1999)Schourup, Lawrence 1999 “Discourse Markers.” Lingua 107: 227–265. DOI logoGoogle Scholar and Fung and Carter (2007)Fung, Loretta, and Ronald Carter 2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar, a DM is determined by the possession of five characteristics: (1) optionality, (2) flexibility of position, (3) prosodic independence, (4) connectivity and (5) multi-grammaticality.
2.The CEFR, released in 2001, was developed to accommodate all languages by describing competences of listening, reading, spoken interaction, spoken production and writing. See https://​www​.coe​.int​/en​/web​/common​-european​-framework​-reference​-languages​/home.
3.One of the metadata of our learner corpus presents the duration of stay in countries where English is spoken, which includes any form of activity from attending formal instruction to sojourns of different kinds.
4.For language learning, it is reasonable to assume that the language produced by native speakers is taken as the target norm. The terms underuse and overuse are generally adopted by most learner corpus studies, implying that learners use a given target item too much or too little to sound like a native speaker. In studies of DMs, we would suggest as alternatives the neutral terms under- and over-representation for discussing differences in frequency across corpora. The underlying assumption of under- and over-representation is to keep frequency information as linguistic evidence in focus and avoid over-generalizing differences to learners’ performance; in particular, the use of DMs is contextually dependent and syntactically and semantically optional.
5.The first 11 sub-corpora were published in LINDSEI version 1 (Gilquin et al. 2010Gilquin, Gaëtanelle, Sylvie De Cock, and Sylviane Granger (eds.) 2010LINDSEI Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.Google Scholar). At the time of writing, there are 24 sub-corpora; see LINDSEI Partners on https://​uclouvain​.be​/en​/research​-institutes​/ilc​/cecl​/lindsei​-partners​.html. One of the criteria for selecting eligible participants was majoring in English.
6. Huang et al. (2018)Huang, Lan-fen, Simon Kubelec, Nicole Keng, and Lung-hsun Hsu 2018 “Evaluating CEFR Rater Performance through the Analysis of Spoken Learner Corpora.” Language Testing in Asia 8 (14): 1–17. DOI logoGoogle Scholar documented the details of rating LINDSEI-Czech and LINDSEI-Taiwanese. The rating of the supplementary data is reported in Huang and Gráf (2021)Huang, Lan-fen, and Tomáš Gráf (2021) “Expanding LINDSEI to Spoken Learner English from Several L1s across CEFR Levels.” Corpora 16 (2): 271–285. DOI logoGoogle Scholar.
7.Spearman rank order correlation is used when one of the variables consists of non-parametric ranked data (e.g., CEFR fluency levels and speaker groups; Pallant 2011Pallant, Julie 2011SPSS Survival Manual (4th ed.). Crows Nest NSW: Allen and Unwin.Google Scholar).
8.A Shapiro-Wilk normality test (p < 0.05, except for well in the native-speaker group) showed that the frequencies were not normally distributed; therefore, a Kruskal-Wallis test, an alternative to ANOVA for non-parametric data, was conducted (Leech et al. 2005Leech, Nancy L., Karen C. Barrett, and George A. Morgan 2005SPSS for Intermediate Statistics: Use and Interpretation (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar).
9.The strength of the correlation adopts the guide that Cohen (1988Cohen, Jacob 1988Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.Google Scholar, 79–81) suggests for the absolute value of correlation coefficients: 0.10 to 0.29 ‘small’; 0.30 to 0.49 ‘medium’; 0.50 to 1.0 ‘large’.
10. Lomax and Hahs-Vaughn (2012)Lomax, Richard G., and Debbie L. Hahs-Vaughn 2012Statistical Concepts – A Second Course (4th ed.). New York: Routledge.Google Scholar suggest that effect size values measured with epsilon squared can be interpreted similarly to those of eta squared; therefore, a value of 0.01 is considered a small effect, 0.06 a medium effect and 0.14 a large effect (Cohen 1988Cohen, Jacob 1988Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.Google Scholar, 284–287).
11.In order to preserve a family-wise 0.05 significance level, we applied the Bonferroni adjustment by dividing the alpha equally across the ten tests (Pallant 2011Pallant, Julie 2011SPSS Survival Manual (4th ed.). Crows Nest NSW: Allen and Unwin.Google Scholar; Tabachnick and Fidell 2012Tabachnick, Barbara G., and Linda S. Fidell 2012Using Multivariate Statistics (6th ed.). Boston: Pearson.Google Scholar).

References

Aijmer, Karin
2011 “ Well I’m Not Sure I Think… The Use of Well by Non-Native Speakers.” International Journal of Corpus Linguistics 16 (2): 231–254. DOI logoGoogle Scholar
Biber, Douglas, Edward Finegan, Stig Johansson, Susan Conrad, and Geoffrey Leech
1999Longman Grammar of Spoken and Written English. Essex: Pearson Education Limited.Google Scholar
Brinton, Laurel J.
1996Pragmatic Markers in English: Grammaticalization and Discourse Functions. New York: Mouton de Gruyter. DOI logoGoogle Scholar
Buysse, Lieven
2012 “ So as a Multifunctional Discourse Marker in Native and Learner Speech.” Journal of Pragmatics 44 (13): 1764–1782. DOI logoGoogle Scholar
Carlsen, Cecilie
2012 “Proficiency Level – A Fuzzy Variable in Computer Learner Corpora.” Applied Linguistics 33 (2): 161–183. DOI logoGoogle Scholar
Carter, Ronald
2008 “ Right, Well, OK, So, It’s Like, You Know, Isn’t It, I Suppose: Spoken Words, Written Words and Why Speaking Is Different.” In The Sound and the Silence: Key Perspectives on Speaking and Listening and Skills for Life, ed. by Caroline Hudson, 11–23. Coventry: Quality Improvement Agency.Google Scholar
Carter, Ronald, and Michael McCarthy
2006Cambridge Grammar of English. Cambridge: Cambridge University Press.Google Scholar
2017 “Spoken Grammar: Where Are We and Where Are We Going?Applied Linguistics 38 (1): 1–20. DOI logoGoogle Scholar
Cohen, Jacob
1988Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.Google Scholar
Council of Europe
2001Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press.Google Scholar
2018Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Strasbourg Cedex: Council of Europe.Google Scholar
Crible, Ludivine
2017 “Discourse Markers and (Dis)fluency across Registers: A Contrastive Usage-based Study in English and French.” PhD diss., Université de Berne.
Crible, Ludivine, Liesbeth Degand, and Gaëtanelle Gilquin
2017 “The Clustering of Discourse Markers and Filled Pauses: A Corpus-Based French-English Study of (Dis)fluency.” Languages in Contrast 17 (1): 69–95. DOI logoGoogle Scholar
De Cock, Sylvie
2004 “Preferred Sequences of Words in NS and NNS Speech.” Belgian Journal of English Language and Literatures New Series 2: 225–246.Google Scholar
Dumont, Amandine
2018Fluency and Disfluency: A Corpus Study of Non-Native and Native Speaker (Dis)fluency Profiles. PhD thesis, Université Catholique de Louvain, Louvain-la-Neuve.
Fox Tree, Jean E., and Josef C. Schrock
2002 “Basic Meanings of You Know and I Mean .” Journal of Pragmatics 34 (6): 727–747. DOI logoGoogle Scholar
Fraser, Bruce
1990 “An Approach to Discourse Markers.” Journal of Pragmatics 14 (3): 383–395. DOI logoGoogle Scholar
Fuller, Janet M.
2003 “Use of the Discourse Marker Like in Interviews.” Journal of Sociolinguistics 7 (3): 365–377. DOI logoGoogle Scholar
Fung, Loretta, and Ronald Carter
2007 “Discourse Markers and Spoken English: Native and Learner Use in Pedagogic Settings.” Applied Linguistics 28 (3): 410–439. DOI logoGoogle Scholar
Gilquin, Gaëtanelle
2008 “Hesitation Markers among EFL Learners: Pragmatic Deficiency or Difference?” In Pragmatics and Corpus Linguistics: A Mutualistic Entente, ed. by Jesús Romero-Trillo, 119–149. Berlin, Heidelberg and New York: Mouton de Gruyter.Google Scholar
2016 “Discourse Markers in L2 English: From Classroom to Naturalistic Input.” In New Approaches to English Linguistics: Building Bridges, ed. by Olga Timofeeva, Anne-Christine Gardner, Alpo Honkapohja, and Sarah Chevalier, 213–249. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Gilquin, Gaëtanelle, and Sylviane Granger
2015 “Learner Language.” In The Cambridge Handbook of English Corpus Linguistics, ed. by Douglas Biber and Rand Reppen, 418–435. Cambridge: Cambridge University Press. DOI logoGoogle Scholar
Gilquin, Gaëtanelle, Sylvie De Cock, and Sylviane Granger
(eds.) 2010LINDSEI Louvain International Database of Spoken English Interlanguage. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.Google Scholar
Götz, Sandra
2013Fluency in Native and Non-Native English Speech. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Götz, Sandra, and Joybrato Mukherjee
2018 “Investigating the Effect of the Study Abroad Variable on Learner Output: A Pseudo-Longitudinal Study on Spoken German Learner English.” In Learner Corpus Research, ed. by Vaclav Brezina, and Lynne Flowerdew, 47–65. London: Bloomsbury.Google Scholar
Gráf, Tomáš
2017 “The Story of the Learner Corpus LINDSEI_CZ.” Studie z Aplikované Lingvistiky [Studies in Applied Linguistics] 8 (2): 22–35.Google Scholar
Hasselgren, Angela
2002 “Learner Corpora and Language Testing: Smallwords as Markers of Learner Fluency.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, ed. by Sylviane Granger, Joseph Hung, and Stephanie Petch-Tyson, 143–173. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Hedge, Tricia
1993 “Key Concepts in ELT.” ELT Journal 47 (3): 275–277. DOI logoGoogle Scholar
Hellermann, John, and Andrea Vergun
2007 “Language Which Is Not Taught: The Discourse Marker Use of Beginning Adult Learners of English.” Journal of Pragmatics 39: 157–179. DOI logoGoogle Scholar
Hoey, Michael
2002 “Spoken Discourse.” In Macmillan English Dictionary for Advanced Learners of American English, ed. by Michael Rundell, LA16–LA17. Oxford: Macmillan Education.Google Scholar
House, Juliane
2009 “Subjectivity in English as Lingua Franca Discourse: The Case of You Know .” Intercultural Pragmatics 6 (2): 171–193. DOI logoGoogle Scholar
Housen, Alex, Folkert Kuiken, and Ineke Vedder
(eds.) 2012Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA. Amsterdam: John Benjamins. DOI logoGoogle Scholar
Huang, Lan-fen
2014 “Constructing the Taiwanese Component of the Louvain International Database of Spoken English Interlanguage (LINDSEI).” Taiwan Journal of TESOL 11 (1): 31–74.Google Scholar
2019 “A Corpus-Based Exploration of the Discourse Marker Well in Spoken Interlanguage.” Language and Speech 62 (3): 570–593. DOI logoGoogle Scholar
Huang, Lan-fen, and Tomáš Gráf
(2021) “Expanding LINDSEI to Spoken Learner English from Several L1s across CEFR Levels.” Corpora 16 (2): 271–285. DOI logoGoogle Scholar
Huang, Lan-fen, Simon Kubelec, Nicole Keng, and Lung-hsun Hsu
2018 “Evaluating CEFR Rater Performance through the Analysis of Spoken Learner Corpora.” Language Testing in Asia 8 (14): 1–17. DOI logoGoogle Scholar
Jones, Christian, Shelley Byrne, and Nicola Halenko
2018Successful Spoken English: Findings from Learner Corpora. Oxon: Routledge.Google Scholar
Leech, Nancy L., Karen C. Barrett, and George A. Morgan
2005SPSS for Intermediate Statistics: Use and Interpretation (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Liao, Silvie
2009 “Variation in the Use of Discourse Markers by Chinese Teaching Assistants in the US.” Journal of Pragmatics 41 (7): 1313–1328. DOI logoGoogle Scholar
Lin, Yen-liang
2016 “Discourse Marking in Spoken Intercultural Communication between British and Taiwanese Adolescent Learners.” Pragmatics 26 (2): 221–245. DOI logoGoogle Scholar
Liu, Binmei
2016 “Effect of L2 Exposure: From a Perspective of Discourse Markers.” Applied Linguistics Review 7 (1): 73–98. DOI logoGoogle Scholar
Lomax, Richard G., and Debbie L. Hahs-Vaughn
2012Statistical Concepts – A Second Course (4th ed.). New York: Routledge.Google Scholar
Mora, Joan C., and Margalida Valls-Ferrer
2012 “Oral Fluency, Accuracy, and Complexity in Formal Instruction and Study Abroad Learning Contexts.” TESOL Quarterly 46 (4): 610–641. DOI logoGoogle Scholar
Müller, Simone
2005Discourse Markers in Native and Non-Native English Discourse (Vol. 138). Amsterdam: John Benjamins. DOI logoGoogle Scholar
Neary-Sundquist, Colleen
2014 “The Use of Pragmatic Markers across Proficiency Levels in Second Language Speech.” Studies in Second Language Learning and Teaching 4 (4): 637–663. DOI logoGoogle Scholar
O’Keeffe, Anne, Michael McCarthy, and Ronald Carter
2007From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. DOI logoGoogle Scholar
Pallant, Julie
2011SPSS Survival Manual (4th ed.). Crows Nest NSW: Allen and Unwin.Google Scholar
Polat, Brittany
2011 “Investigating Acquisition of Discourse Markers through a Developmental Learner Corpus.” Journal of Pragmatics 43: 3745–3756. DOI logoGoogle Scholar
Prodromou, Luke
2008English as a Lingua Franca: A Corpus-Based Analysis. London: Continuum.Google Scholar
Romero-Trillo, Jesús
2002 “The Pragmatic Fossilization of Discourse Markers in Non-Native Speakers of English.” Journal of Pragmatics 34 (6): 769–784. DOI logoGoogle Scholar
Rühlemann, Christoph
2019Corpus Linguistics for Pragmatics. Oxon: Routledge.Google Scholar
Schiffrin, Deborah
1987Discourse Markers. Cambridge: Cambridge University Press. DOI logoGoogle Scholar
Schourup, Lawrence
1999 “Discourse Markers.” Lingua 107: 227–265. DOI logoGoogle Scholar
Scott, Mike
2016WordSmith Tools (Version 7). Stroud: Lexical Analysis Software.Google Scholar
Svartvik, Jan
1980 “ Well in Conversation.” In Studies in English Linguistics for Randolph Quirk, ed. by Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik, 167–177. London: Longman.Google Scholar
Tabachnick, Barbara G., and Linda S. Fidell
2012Using Multivariate Statistics (6th ed.). Boston: Pearson.Google Scholar
Tottie, Gunnel
2011 “ Uh and Um as Sociolinguistic Markers in British English.” International Journal of Corpus Linguistics 16 (2): 173–197. DOI logoGoogle Scholar
Tsai, Pei-shu., and Wo-hsin Chu
2017 “The Use of Discourse Markers among Mandarin Chinese Teachers, and Chinese as a Second Language and Chinese as a Foreign Language Learners.” Applied Linguistics 38 (5): 638–665.Google Scholar
Vygotsky, Lev Semenovich
1978Mind in Society: The Development of Higher Psychological Processes. Cambridge, MA: Harvard University Press.Google Scholar
Wei, Ming
2011 “A Comparative Study of the Oral Proficiency of Chinese Learners of English across Task Functions: A Discourse Marker Perspective.” Foreign Language Annals 44 (4): 674–691. DOI logoGoogle Scholar
Wolk, Christoph, Sandra Götz, and Katja Jäschke
2021 “Possibilities and Drawbacks of Using an Online Application for Semi-Automatic Corpus Analysis to Investigate Discourse Markers and Alternative Fluency Variables.” Corpus Pragmatics 5: 1–30. DOI logoGoogle Scholar

Appendix.Qualitative features of spoken fluency operationalized in the scales of the CEFR (Council of Europe 2018 2018Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion Volume with New Descriptors. Strasbourg Cedex: Council of Europe.Google Scholar, 171–172)

C2 Can express him/herself spontaneously at length with a natural colloquial flow, avoiding or backtracking around any difficulty so smoothly that the interlocutor is hardly aware of it.
C1 Can express him/herself fluently and spontaneously, almost effortlessly.
Only a conceptually difficult subject can hinder a natural, smooth flow of language.
B2+
B2 Can produce stretches of language with a fairly even tempo; although he/she can be hesitant as he or she searches for patterns and expressions, there are few noticeably long pauses.
B1 Can keep going comprehensibly, even though pausing for grammatical and lexical planning and repair is very evident, especially in longer stretches of free production.
A2+
A2 Can make him/herself understood in very short utterances, even though pauses, false starts and reformulation are very evident.
A1 Can manage very short, isolated, mainly pre-packaged utterances, with much pausing to search for expressions, to articulate less familiar words, and to repair communication.
Pre‑A1

Address for correspondence

Lan-fen Huang

Republic of China Naval Academy

P. O. Box 90175 Kaohsiung Zuoying

Kaohsiung 81300

Taiwan

[email protected]

Biographical notes

Lan-fen Huang is currently an Associate Professor at the Department of Foreign Languages, Republic of China Naval Academy, Taiwan. She received her PhD degree in English Language from the University of Birmingham, UK. Her research interests center on the corpus-based study of the English language and the application of corpus linguistics to issues in discourse analysis and English language teaching.

Yen-liang (Eric) Lin is currently an Associate Professor in the Department of English, also serving as the Director of Foreign Language Center, at National Taipei University of Technology, Taiwan. He received his PhD in Applied Linguistics in 2013 from the University of Nottingham, UK. His research interests include corpus linguistics, discourse analysis, speech and gesture, and language teaching research.

Tomáš Gráf is a senior lecturer at Department of English Language and ELT Methodology, Faculty of Arts, Charles University in Prague, Czech Republic, where he received his PhD in English Language Teaching Methodology. His research interests include L2 fluency and accuracy, pedagogical implications of learner corpus research, and teacher-training processes for secondary-school teachers of EFL.