Corpus Linguistics has made great strides in language research and teaching but it is only fairly known, and thus its potentials lost, to many African academics and linguistic communities. The aim of this chapter is to introduce corpus linguistics to those African researchers and others who are not yet familiar with, or have limited knowledge of, the field and who are interested in using this method for linguistic analysis. The chapter introduces the concept of corpus linguistics (Section 1), explains some of the key terms and concepts used in it (Section 2), and considers the types of corpora, as well as the scope and applications of corpus linguistics (Section 3).
This chapter provides linguists and students not yet familiar with corpus-based research on varieties of English in Africa with a practical introduction to the field. After explaining the rationale and aims of corpus-based research on varieties of English (in Africa), we introduce methods, tools and resources commonly used and applied in the field in order to provide readers with a point of entry into the field. Most of the corpora and software that are introduced can be obtained free of charge. The software is introduced in a practical way to permit readers to use it in their own research. The application and value of corpus linguistics are exemplified with three case studies. These examples are based in part on previous research, retracing the methodological steps, but are also expanded with more data from across Africa. Case study 1 shows how corpora allow researchers to investigate lexical differences between African varieties of English, arguably an area that is amenable to scholarly inquiry with relatively limited methodological means. Case study 2 considers a grammatical phenomenon, the present perfect in African Englishes, and demonstrates how a corpus tagged for parts of speech permits syntactic analyses. Case study 3 illustrates the analysis of a phonological corpus with an investigation of the optional deletion of the phoneme /h/ in words such as house in Nigerian English. The chapter concludes with recommendations for further reading, allowing readers to explore selected topics in more depth according to their interests.
This chapter reports on a new learner corpus project, describes its purpose and design, and demonstrates its use: the error-annotated corpus of 442,939 words of Nigerian and Cameroonian English Learner Language (Conacell) (Esimaje 2016). The aim of the project is to produce a resource for measuring learners’ language development and to enhance the teaching-learning process. The corpus data comprises the language output of 998 students; 383 university and 615 secondary students. The specific uses of the corpus to explore lexical form and tense usages by learners in Nigeria and Cameroon are shown. Corpus analysis reveals, for instance, that the lexical form of capitalisation and tense are hard to learn, and therefore remain learning needs in the contexts.
The possibility of an evolving “Namibian English” was already suggested more than 30 years ago (Chamberlain 1981: 46). However, detailed analyses of the English(es) used in Namibia were only recently initiated (e.g. Otaala 2006; Buschfeld & Kautzsch 2014; Kautzsch & Schröder 2016; Steigertahl 2017). The present chapter adds to previous research on morphosyntactic structures, introducing the Corpus of English(es) Spoken by Black Namibians post Independence (ESBNaPI). First, the Namibian linguistic situation will be introduced before procedures of data collection and methodology will be presented. Afterwards, morphosyntactic examples from the corpus will be given and compared to South African English(es) (SAE) to address potential generalizations of post-Independence English(es). The overall goal is to raise awareness of corpus resources in southern Africa.
This chapter discusses various aspects relevant to the compilation of the Historical Corpus of English in Ghana (HiCE Ghana), a 600,000-word corpus of written Ghanaian English (GhE) from the period 1966 to 1975. The corpus captures written GhE in the early years of its nativization phase and by comparison to the relevant sections of the Ghanaian component of ICE makes it possible to trace structural nativization in real time. The paper addresses the motivation for such a corpus and shows how methodological and theoretical concerns and challenges have affected the final corpus design. The paper is rounded off with contemporary examples of GhE.
This paper illustrates the uses of a tagged corpus of spoken Cameroon Pidgin English (CPE), which has recently been finalised (Ozón et al. 2017) and made available online (Green et al. 2016). The corpus consists of 240,000 words, with mark-up and part-of-speech-tagging. Text categories and proportions of monologue/dialogue are guided by those of the ICE project (Nelson 1996), making the CPE corpus comparable with existing corpora of post-colonial Englishes. This tagged corpus offers an invaluable resource for the investigation of CPE, particularly in addressing issues of multifunctionality in pidgin or creole languages. We introduce the dataset and present case studies illustrating its potential uses, in order to highlight the usefulness of this freely accessible resource for research on African languages.
One major contribution of the corpus-based approach to language study is the facility with which linguists can access large amounts of data and search for regularities across text types and varieties of English on the basis of observed frequencies. Such regularities of features, if based on well-designed and/or specialised corpora, can be useful for classroom activities and material design. This chapter describes the basic steps involved in corpus design and corpus exploitation. Section 1 defines the corpus and briefly states the relevance of investigating English second language (ESL) varieties via a corpus. Section 2 describes key steps involved in simple corpus compilation and how students can compile their own corpora for research. Section 3 presents results of two studies on the frequency and use of modals in the corpus of Cameroonian English. These results are intended to illustrate the point that the corpus approach is indispensable if certain types of linguistic information are investigated in a particular variety.
Previous research suggests that some varieties of English in Africa do not mark the past tense consistently with an inflectional suffix. Various explanations are offered for this state of affairs, including the option of the historical present tense, non-marking because the context already makes clear that the verb denotes a past event, phonological reduction, and limited English language proficiency. This chapter reports on a corpus analysis of spoken conversation in Nigerian English and Black South African English, which indicates that the non-marking of the past tense occurs in about one in every five contexts where an event in the past is represented. However, no convincing support is found for any of the explanations in the previous research on the topic.
This chapter investigates the extent of similarity in the use of stance markers in two national varieties of West African English, Nigerian English and Ghanaian English, and compares them to British English. The frequency and stylistic variability of four semantic groups of stance markers were examined in ICE-Nigeria and ICE-Ghana and compared with ICE-Great Britain. The results are mixed: the two West African varieties show an overall lower frequency of stance markers compared to British English but the speakers of the two West African English varieties do not demonstrate lower stylistic variability in the use of stance markers across different text types. Notwithstanding, there are systematic differences in stance marker usage between the two West African English varieties.
The present chapter introduces the Corpus of Namibian Online Newspapers (CNamON), which encompasses the contents of seventeen news sources as available on the Internet from May to June 2016. These sources add up to roughly 44 million words of text. The corpus was compiled to facilitate taking a systematic account of the lexical and structural properties of Namibian English, which have not been investigated to date. This chapter focusses on the technical details of the compilation process of CNamON as well as its set-up and its usefulness for linguistic research. To demonstrate the corpus’ potential for analyses on the linguistic levels of lexis and (morpho-)syntax, both a qualitative stock-taking and exemplary quantitative analyses of structural characteristics of English in Namibia are provided.
This chapter provides a real-time structural and semantic analysis of lexical expansion in the Nativization phase of Ghanaian English based on the Historical Corpus of English in Ghana (HiCE Ghana) and the written-printed sections of the Ghanaian component of ICE. Taking a comprehensive list of previously attested ‘Ghanaianisms’ – innovative lexical items of English and local origin – as a starting point, the paper shows that traditional word-formation processes like derivation or compounding play only a subordinate role and that semantic shift is the most important process used in both periods. While the corpora are comparatively small for lexical research, the results still provide a useful starting point to better our understanding of how Ghanaian English has evolved over the past 40 years.
This paper discusses corpus linguistics as one method to investigate the lexicon of Ugandan English, which is characterised by borrowing, calquing, semantic extension, narrowing, and shift. It documents how analysing a well-balanced corpus, such as the Uganda component of the International Corpus of English, allows for a contextualisation of observations made from current uses of English and for an assessment of the textual genres in which such innovations occur. At the same time, it critically discusses the limitations and biases associated with a comparatively small corpus and argues for a multi-method approach that involves using larger, though less well controlled, corpora as well as supplementing corpus analyses with experimental data, which tap into the spread of lexical innovations.
The study examines the use of conjunctions in written texts by Nigerian university learners. It uses the corpus-based method and instruments to compare the Nigerian Learner English Corpus (NLEC)’s use of conjunctions to their native counterparts in Louvain Corpus of Native English Essays (LOCNESS). The analysis shows additive conjunctions have the highest frequency of use by the two learner groups, while causal, adversative and temporal conjunctions have below average usage. The learners repeatedly used particular conjunctions and underused others within the same category. The study concludes that the advanced learners do not display the optimal awareness of the various alternative conjunctive items available within the different groups to create stylistic variation in their texts for enhanced cohesion and overall coherence.
This contribution proposes to replace the traditional native-speaker model at African universities with a sophisticated and stratified corpus model of nation-, university- and department-specific usage. It illustrates that the long-discussed realistic “national standard” may be possible – at least for restricted domains at advanced levels such as postgraduate studies. A qualitative and quantitative analysis of prepositions and their variation in three case studies shows that there is some flexibility in preposition usage: cases where different preposition choices can be explained by equally valid cognitive principles, cases where the addition of prepositions may be acceptable because this adds explicitness (which may be preferred in non-native contexts) and cases where prepositions may appear redundant since there is no choice and no semantic opposition. Although sociolinguistic values and attitudes may be necessary complements to linguistic frequency analyses, a careful corpus-linguistic study of prepositional choices irrespective of standardised native conventions is the basis for all discussions of new functional standards for African English.
Corpus analysis has become established as an approach to the study of language description or for applied pursuits in language teaching, terminology, and so on. However, because of the social indexicalities of language use, corpora can also inform studies of social phenomena. This chapter draws on social semiotics to argue that, in the analysis of social phenomena, meanings that are socially significant can be read not only from what is said in corpora, but also from a range of other resources, such as names of persons and places as well as language choices made in texts. This chapter thus uses two heuristics, onomastics and discursive mono-/multilingualism, to query a diachronic corpus associated with a South African political party for evidence of whether or not the party has over time become more inclusive, contrary to its discursive positioning by a rival party as an untransformed organisation. The analysis shows evidence of the party opening up to diversity in terms of race, gender, geography, and language choice, but the finding raises the question of the relationship between semiotic evidence and reality.
Corpus Linguistics has made great strides in language research and teaching but it is only fairly known, and thus its potentials lost, to many African academics and linguistic communities. The aim of this chapter is to introduce corpus linguistics to those African researchers and others who are not yet familiar with, or have limited knowledge of, the field and who are interested in using this method for linguistic analysis. The chapter introduces the concept of corpus linguistics (Section 1), explains some of the key terms and concepts used in it (Section 2), and considers the types of corpora, as well as the scope and applications of corpus linguistics (Section 3).
This chapter provides linguists and students not yet familiar with corpus-based research on varieties of English in Africa with a practical introduction to the field. After explaining the rationale and aims of corpus-based research on varieties of English (in Africa), we introduce methods, tools and resources commonly used and applied in the field in order to provide readers with a point of entry into the field. Most of the corpora and software that are introduced can be obtained free of charge. The software is introduced in a practical way to permit readers to use it in their own research. The application and value of corpus linguistics are exemplified with three case studies. These examples are based in part on previous research, retracing the methodological steps, but are also expanded with more data from across Africa. Case study 1 shows how corpora allow researchers to investigate lexical differences between African varieties of English, arguably an area that is amenable to scholarly inquiry with relatively limited methodological means. Case study 2 considers a grammatical phenomenon, the present perfect in African Englishes, and demonstrates how a corpus tagged for parts of speech permits syntactic analyses. Case study 3 illustrates the analysis of a phonological corpus with an investigation of the optional deletion of the phoneme /h/ in words such as house in Nigerian English. The chapter concludes with recommendations for further reading, allowing readers to explore selected topics in more depth according to their interests.
This chapter reports on a new learner corpus project, describes its purpose and design, and demonstrates its use: the error-annotated corpus of 442,939 words of Nigerian and Cameroonian English Learner Language (Conacell) (Esimaje 2016). The aim of the project is to produce a resource for measuring learners’ language development and to enhance the teaching-learning process. The corpus data comprises the language output of 998 students; 383 university and 615 secondary students. The specific uses of the corpus to explore lexical form and tense usages by learners in Nigeria and Cameroon are shown. Corpus analysis reveals, for instance, that the lexical form of capitalisation and tense are hard to learn, and therefore remain learning needs in the contexts.
The possibility of an evolving “Namibian English” was already suggested more than 30 years ago (Chamberlain 1981: 46). However, detailed analyses of the English(es) used in Namibia were only recently initiated (e.g. Otaala 2006; Buschfeld & Kautzsch 2014; Kautzsch & Schröder 2016; Steigertahl 2017). The present chapter adds to previous research on morphosyntactic structures, introducing the Corpus of English(es) Spoken by Black Namibians post Independence (ESBNaPI). First, the Namibian linguistic situation will be introduced before procedures of data collection and methodology will be presented. Afterwards, morphosyntactic examples from the corpus will be given and compared to South African English(es) (SAE) to address potential generalizations of post-Independence English(es). The overall goal is to raise awareness of corpus resources in southern Africa.
This chapter discusses various aspects relevant to the compilation of the Historical Corpus of English in Ghana (HiCE Ghana), a 600,000-word corpus of written Ghanaian English (GhE) from the period 1966 to 1975. The corpus captures written GhE in the early years of its nativization phase and by comparison to the relevant sections of the Ghanaian component of ICE makes it possible to trace structural nativization in real time. The paper addresses the motivation for such a corpus and shows how methodological and theoretical concerns and challenges have affected the final corpus design. The paper is rounded off with contemporary examples of GhE.
This paper illustrates the uses of a tagged corpus of spoken Cameroon Pidgin English (CPE), which has recently been finalised (Ozón et al. 2017) and made available online (Green et al. 2016). The corpus consists of 240,000 words, with mark-up and part-of-speech-tagging. Text categories and proportions of monologue/dialogue are guided by those of the ICE project (Nelson 1996), making the CPE corpus comparable with existing corpora of post-colonial Englishes. This tagged corpus offers an invaluable resource for the investigation of CPE, particularly in addressing issues of multifunctionality in pidgin or creole languages. We introduce the dataset and present case studies illustrating its potential uses, in order to highlight the usefulness of this freely accessible resource for research on African languages.
One major contribution of the corpus-based approach to language study is the facility with which linguists can access large amounts of data and search for regularities across text types and varieties of English on the basis of observed frequencies. Such regularities of features, if based on well-designed and/or specialised corpora, can be useful for classroom activities and material design. This chapter describes the basic steps involved in corpus design and corpus exploitation. Section 1 defines the corpus and briefly states the relevance of investigating English second language (ESL) varieties via a corpus. Section 2 describes key steps involved in simple corpus compilation and how students can compile their own corpora for research. Section 3 presents results of two studies on the frequency and use of modals in the corpus of Cameroonian English. These results are intended to illustrate the point that the corpus approach is indispensable if certain types of linguistic information are investigated in a particular variety.
Previous research suggests that some varieties of English in Africa do not mark the past tense consistently with an inflectional suffix. Various explanations are offered for this state of affairs, including the option of the historical present tense, non-marking because the context already makes clear that the verb denotes a past event, phonological reduction, and limited English language proficiency. This chapter reports on a corpus analysis of spoken conversation in Nigerian English and Black South African English, which indicates that the non-marking of the past tense occurs in about one in every five contexts where an event in the past is represented. However, no convincing support is found for any of the explanations in the previous research on the topic.
This chapter investigates the extent of similarity in the use of stance markers in two national varieties of West African English, Nigerian English and Ghanaian English, and compares them to British English. The frequency and stylistic variability of four semantic groups of stance markers were examined in ICE-Nigeria and ICE-Ghana and compared with ICE-Great Britain. The results are mixed: the two West African varieties show an overall lower frequency of stance markers compared to British English but the speakers of the two West African English varieties do not demonstrate lower stylistic variability in the use of stance markers across different text types. Notwithstanding, there are systematic differences in stance marker usage between the two West African English varieties.
The present chapter introduces the Corpus of Namibian Online Newspapers (CNamON), which encompasses the contents of seventeen news sources as available on the Internet from May to June 2016. These sources add up to roughly 44 million words of text. The corpus was compiled to facilitate taking a systematic account of the lexical and structural properties of Namibian English, which have not been investigated to date. This chapter focusses on the technical details of the compilation process of CNamON as well as its set-up and its usefulness for linguistic research. To demonstrate the corpus’ potential for analyses on the linguistic levels of lexis and (morpho-)syntax, both a qualitative stock-taking and exemplary quantitative analyses of structural characteristics of English in Namibia are provided.
This chapter provides a real-time structural and semantic analysis of lexical expansion in the Nativization phase of Ghanaian English based on the Historical Corpus of English in Ghana (HiCE Ghana) and the written-printed sections of the Ghanaian component of ICE. Taking a comprehensive list of previously attested ‘Ghanaianisms’ – innovative lexical items of English and local origin – as a starting point, the paper shows that traditional word-formation processes like derivation or compounding play only a subordinate role and that semantic shift is the most important process used in both periods. While the corpora are comparatively small for lexical research, the results still provide a useful starting point to better our understanding of how Ghanaian English has evolved over the past 40 years.
This paper discusses corpus linguistics as one method to investigate the lexicon of Ugandan English, which is characterised by borrowing, calquing, semantic extension, narrowing, and shift. It documents how analysing a well-balanced corpus, such as the Uganda component of the International Corpus of English, allows for a contextualisation of observations made from current uses of English and for an assessment of the textual genres in which such innovations occur. At the same time, it critically discusses the limitations and biases associated with a comparatively small corpus and argues for a multi-method approach that involves using larger, though less well controlled, corpora as well as supplementing corpus analyses with experimental data, which tap into the spread of lexical innovations.
The study examines the use of conjunctions in written texts by Nigerian university learners. It uses the corpus-based method and instruments to compare the Nigerian Learner English Corpus (NLEC)’s use of conjunctions to their native counterparts in Louvain Corpus of Native English Essays (LOCNESS). The analysis shows additive conjunctions have the highest frequency of use by the two learner groups, while causal, adversative and temporal conjunctions have below average usage. The learners repeatedly used particular conjunctions and underused others within the same category. The study concludes that the advanced learners do not display the optimal awareness of the various alternative conjunctive items available within the different groups to create stylistic variation in their texts for enhanced cohesion and overall coherence.
This contribution proposes to replace the traditional native-speaker model at African universities with a sophisticated and stratified corpus model of nation-, university- and department-specific usage. It illustrates that the long-discussed realistic “national standard” may be possible – at least for restricted domains at advanced levels such as postgraduate studies. A qualitative and quantitative analysis of prepositions and their variation in three case studies shows that there is some flexibility in preposition usage: cases where different preposition choices can be explained by equally valid cognitive principles, cases where the addition of prepositions may be acceptable because this adds explicitness (which may be preferred in non-native contexts) and cases where prepositions may appear redundant since there is no choice and no semantic opposition. Although sociolinguistic values and attitudes may be necessary complements to linguistic frequency analyses, a careful corpus-linguistic study of prepositional choices irrespective of standardised native conventions is the basis for all discussions of new functional standards for African English.
Corpus analysis has become established as an approach to the study of language description or for applied pursuits in language teaching, terminology, and so on. However, because of the social indexicalities of language use, corpora can also inform studies of social phenomena. This chapter draws on social semiotics to argue that, in the analysis of social phenomena, meanings that are socially significant can be read not only from what is said in corpora, but also from a range of other resources, such as names of persons and places as well as language choices made in texts. This chapter thus uses two heuristics, onomastics and discursive mono-/multilingualism, to query a diachronic corpus associated with a South African political party for evidence of whether or not the party has over time become more inclusive, contrary to its discursive positioning by a rival party as an untransformed organisation. The analysis shows evidence of the party opening up to diversity in terms of race, gender, geography, and language choice, but the finding raises the question of the relationship between semiotic evidence and reality.