Named entities as potentially problematic items in corpora
This chapter discusses problems in the interpretation of
corpus data arising from the insufficiencies in the annotation of named
entities. Many corpora nowadays still do not adequately enable corpus users
to set up queries that would exclude items appearing in names when needed to
improve precision of the searches. Through an examination of case studies in
major English language corpora, the chapter highlights the need to carefully
post-process the search results, as irrelevant occurrences of named entities
may pose challenges in the analyses of word frequencies and their
collocational behaviour. The chapter calls for more detailed annotation of
named entities in already available large linguistic corpora and reminds of
the importance of close inspection of the search hits.
Article outline
- 1.Introduction
- 2.Background
- 2.1The concepts of proper nouns and proper names
- 2.2Annotation of named entities
- 3.Case studies
- 3.1Common nouns used as (parts of) proper nouns: Lifespan and samurai
- 3.2Near-synonymous adjectives in named entities: Limited/restricted, royal/regal and
fantastic/fabulous
- 4.Discussion and conclusion
-
Notes
-
References