Using data-mining to identify and study patterns in lexical
innovation on the web
The NeoCrawler
This paper presents the NeoCrawler – a tailor-made webcrawler,
which identifies and retrieves neologisms from the Internet and systematically
monitors the use of detected neologisms on the web by means of weekly searches.
It enables researchers to use the web as a corpus in order to investigate the
dynamics of lexical innovation on a large-scale and systematic basis. The
NeoCrawler represents an innovative web-mining tool which opens up new
opportunities for linguists to tackle a number of unresolved and
under-researched issues in the field of lexical innovation. This paper presents
the design as well as the most important characteristics of two modules, the
Discoverer and the Observer, with regard to the usage-based study of lexical
innovation and diffusion.
Article outline
- 1.Introduction
- 2.The Discoverer
- 2.1Source material and pre-processing
- 2.2String matching procedure
- 2.3Reference dictionary
- 2.4Manual evaluation
- 3.The Observer
- 3.1Architecture of the Observer
- 3.2The NeoCrawler database
- 3.3The Observer interface
- 4.Summary and future work
- Notes
-
References
References (35)
References
Algeo, John. 1998. Vocabulary. In Suzanne Romaine (ed.), The Cambridge history of the English Language, vol. 31, Cambridge: Cambridge University Press. 57–91.
Ayto, John. 2003. Newspapers and neologisms. In Jean Aitchison & Diana M. Lewis (eds.), New media language, 182–187. Routledge: New York.
Baayen, Harald R. & Anneke Neijt. 1997. Productivity in context: A case study of a Dutch
suffix. Linguistics 351. 565–587.
Bauer, Laurie. 1983. English word-formation. Cambridge: Cambridge University Press.
Cabré, Maria Teresa & Lluís de Yzaguirre. 1995. Stratégie pour la détection semiautomatique des néologismes de
presse. TTR: Traduction, Terminologie, Redaction 81. 89–100.
Cartier, Emmanuel. 2017. Neoveille, a web platform for neologism tracking. Proceedings of the Software Demonstrations of the 15th Conference of the
European Chapter of the Association for Computational Linguistics, 95–98.
Cartier, Emmanuel. 2019. (to appear). Néoveille, plateforme de détection, de description et de suivi
des néologismes en onze langues. Néologica.
Falk, Ingrid, Delphine Bernhard & Christophe Gérard. 2018. The Logoscope: A semi-automatic tool for detecting and
documenting French new words from the linguistic project to the web
interface. Research Report, Université Strasbourg. [URL] [accessed 1 August 2018].
Fischer, Roswitha. 1998. Lexical change in present-day English: A corpus-based study of the
motivation, institutionalization, and productivity of creative
neologisms. Tübingen: Narr.
Gérard, Christophe, Lauren Bruneau, Ingrid Falk, Delphine Bernhard & Ann-Lise Rosio. 2017. Le Logoscope : Observatoire des innovations lexicales en français
contemporain. In Joaquín García Palacios, Goedele de Sterck, Daniel Linder, Jesús Torre del Rey, Miguel Sánchez Ibanez & Nava Maroto García (eds.), La neología en las lenguas Románicas: Recursos, estrategias y nuevas
orientaciones. Frankfurt: Peter Lang. 339–356.
Hamilton, William L., Jure Leskovec & Dan Jurafsky. 2016. Cultural shift or linguistic drift? Comparing two computational
models of semantic change. Proceedings of Conference on Empirical Methods on Natural Language
Processing, Austin, Texas, USA, 1–5 November 2016. [URL] [accessed 1 March 2018].
Iakovleva, Tatiana. 2017. Automatic detection of neologisms in Russian newspaper corpora
with Néoveille. Proceedings of the International Conference CORPUS LINGUISTICS – 2017,
St Petersburg, 27–30 June 2017, 43–47. [URL] [accessed 1 May 2018].
Janssen, Maarten. 2005. NeoTrack: Semiautomatic neologism detection. APL Conference 2005, Lisboa, Portugal. [URL] [accessed 15 March 2018].
Jatowt, Adam & Kevin Duh. 2014. A framework for analysing semantic change of words across
time. Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital
Libraries, 229–238.
Kerremans, Daphné. 2015. A web of new words: A corpus-based study of the conventionalization
process of English neologisms. Frankfurt am Main: Peter Lang.
Kerremans, Daphné, Susanne Stegmayr & Hans-Jörg Schmid. 2012. The NeoCrawler: Identifying and retrieving neologisms from the
internet and monitoring on-going change. In Kathryn Allan & Justyna Robinson (eds.), Current methods in historical semantics, 59–96. Berlin: Mouton de Gruyter.
Kerremans, Daphné & Jelena Prokić. 2018. Mining the web for new words: Semi-automatic neologism
identification with the NeoCrawler. Anglia 136(2). 239–268.
Labov, William. 1966. The social stratification of English in New York City. Washington: Center for Applied Linguistics.
Labov, William. 1980. The social origins of sound change. In William Labov (ed.), Locating language in time and space, 251–266. New York: Academic Press.
Labov, William. 2001. Principles of linguistic change. Volume II: Social factors. Oxford: Blackwell.
Levenshtein, Vladimir I. 1965. Binary codes capable of correcting deletions, insertions, and
reversals. Soviet Physics Doklady 101. 707–710.
Lewandowski, Dirk. 2008. A three-year study on the freshness of web search engine
databases. Journal of Information Science 34(6). 817–831.
Liao, Xuanyi & Guang Cheng. 2016. Analysing the semantic change based on word
embedding. In Natural language understanding and intelligent applications. Proceedings
of the 5th CCF Conference on Natural Language Processing and Chinese
Computing, NLPCC 2016, and 24th International Conference on Computer
Processing of Oriental Languages, ICCPOL 2016, Kunming, China, December 2–6,
2016, 213–223. Cham: Springer.
Liu, Tsun-Jui, Shu-Kai Hsieh & Laurent Prevot. 2013. Observing features of PTT neologisms: A corpus-driven study with
N-gram model. Proceedings of the Twenty-Fifth Conference on Computational Linguistics
and Speech Processing (ROCLING 2013), 250–259.
Megerdoomian, Karine & Ali Hadjarian. 2010. Mining and classification of neologisms in Persian
blogs. Proceedings of the 2nd Workshop on Computational Approaches to
Linguistic Creativity (HLT 2010), 6–13.
Milroy, James & Lesley Milroy. 1985. Linguistic change, social network and speaker
innovation. Journal of Linguistics 211. 339–384.
Nevalainen, Terttu. 2000. Mobility, social networks and language change in Early Modern
England. European Journal of English Studies 4(3). 253–264.
Nevalainen, Terttu & Helena Raumolin-Brunberg. 2003. Historical sociolinguistics: Language change in Tudor and Stuart
England. London: Longman.
Plag, Ingo. 1999. Morphological productivity: Structural constraints in English
derivation. Berlin/New York: Mouton de Gruyter.
Schmid, Hans-Jörg. 2016. English morphology and word-formation: An introduction, 3rd revised and extended edition. Berlin: Erich Schmidt.
Tagliamonte, Sali A. & Derek Denis. 2014. Expanding the transmission/diffusion dichotomy: Evidence from
Canada. Language 90(1). 90–136.
Torres-del-Rey, Jesús & Nava Maroto. 2014. Building the interface between experts and linguists in the
detection and characterisation of neology in the field of
neurosciences. Proceedings of the 4th International Workshop on Computational
Terminology, Dublin, Ireland, August 2014, 64–67. [URL] [accessed 25 March 2018].
Tournier, Jean. 1985. Introduction Descriptive à la Lexicogénétique de l’Anglais
Contemporain. Paris: Champion-Slatkine.
Wilson, Lee. 2017. Google Freshness Algorithm: Everything you need to
know. Search Engine Journal. [URL]. Last accessed August 1, 2018.
Cited by (1)
Cited by one other publication
Würschinger, Quirin
2021.
Social Networks of Lexical Innovation. Investigating the Social Dynamics of Diffusion of Neologisms on Twitter.
Frontiers in Artificial Intelligence 4
This list is based on CrossRef data as of 21 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.