Utilising heterogeneous language resources for term extraction in maritime domains

Andersen, Gisle

doi:10.1075/term.20024.and

Article published In:

Terminology
Vol. 28:1 (2022) ► pp.1–36

Utilising heterogeneous language resources for term extraction in maritime domains

Gisle Andersen | Norwegian School of Economics

The development of terminologies for domains where these are lacking is a time-consuming and costly task. This article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding, utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains. The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.

Keywords: Term extraction, Norwegian language, Language for Specific Purposes (LSP), corpus linguistics, natural language processing (NLP), marine and maritime domains

Article outline

1.Introduction
2.Historical and theoretical background
3.Methods and criteria for term extraction in maritime domains
- 3.1Maritime domains
- 3.2Overview of term extraction methods
- 3.3Criteria for unithood and termhood
4.Methodological specifics and results from the various term extraction methods
- 4.1Method 1: Frequency analysis of domain-specific corpus
- 4.2Method 2: Keyness analysis of domain-specific vs. general corpus
- 4.3Method 3: Collocation analysis of domain-specific corpus
- 4.4Method 4: Chunking of aligned sentences from a parallel domain-specific corpus
- 4.5Method 5: Retrieval of terms from domain-specific lexical resources
- 4.6Method 6: Retrieval of domain-specific entries in bilingual general dictionary
5.Results
6.Concluding remarks
Acknowledgements
Notes
References

Published online: 10 September 2021

https://doi.org/10.1075/term.20024.and

References

Ahmad, Khurshid, and Margaret A. Rogers

2001 “Corpus linguistics and terminology extraction.” In Handbook of Terminology Management (Volume 21), ed. by Sue-Ellen Wright and Gerhard Budin, 725–760. Amsterdam: John Benjamins.

Ahmad, Khurshid, Andrea E. Davies, Heather Fulford, and Margaret A. Rogers

1994 “What is a term? The semi-automatic extraction of terms from text.” In Translation Studies – An Interdiscipline, ed. by Mary Snell-Hornby, Franz Pöchhacker and Klaus Kaindl, 267–278.

Austlid, Einar

1971 Norsk-engelsk ordliste for fiskarar [Norwegian-English dictionary for fishermen]. Oslo: Reenskaugs forlag.

Andersen, Gisle

2008 “Quantifying domain-specificity: the occurrence of financial terms in a general corpus.” SYNAPS 211: 37–52.

(ed.) 2012 Exploring Newspaper Language – Using the web to create and investigate a large corpus of modern Norwegian. Amsterdam: John Benjamins.

2016 “Using the corpus-driven method to chart discourse-pragmatic change.” In Discourse-pragmatic variation and change in English: New methods and insights, ed. by Heike Pichler, 21–40. Cambridge: Cambridge University Press.

Andersen, Gisle, Peder Gammeltoft, and Kjetil Gundersen

In preparation. Termportalen – frå forprosjekt til fast finansiering [The terminology Portal – from pilot project to permanent funding]. To be published in Nordterm.

Andersen, Gisle, and Knut Hofland

2012 “Building a large corpus based on newspapers from the web.” In Exploring Newspaper Language, ed. by Gisle Andersen, 1–28. Amsterdam: John Benjamins.

Andersen, Gisle, and Marita Kristiansen

2013 “Towards a national portal for Norwegian terminology in the CLARINO project.” Terminologen 21:188–189.

2015 “Termportalen som infrastruktur for terminologi i Norge.” Terminologen 51: 53–60.

Lyse, Gunn Inger, and Gisle Andersen

2012 “Collocations and statistical analysis of n-grams: Multiword expressions in newspaper text.” In Exploring Newspaper Language, ed. by Gisle Andersen, 79–109, Amsterdam: John Benjamins.

Bondi, Marina

2010 “Perspectives on keywords and keyness: An introduction.” In Keyness in Texts, ed. by Marina Bondi, and Mike Scott. Amsterdam, John Benjamins, 1–18.

Bourigault, Didier

1992 “Surface grammatical analysis for the extraction of terminological noun phrases.” In COLING ’92: Proceedings of the Fourteenth International conference on Computational Linguistics, 977–981. Nantes: ICC.

1994 LEXTER, un Logiciel d’Extraction de Terminologie: Application à l’acquisition de connaissances à partir de textes. PhD Thesis, École des Hautes Études en Sciences Sociales, Paris.

Brekke, Magnar, Kai Innselset, Marita Kristiansen, and Kari Øvsthus

2006 “KB-N: Automatic term extraction from a knowledge-bank of economics.” In Proceedings from LRECC 2006, 1912–1915, [URL]

Cabré, M. Teresa

2003 “Theories of terminology: Their description, prescription and explanation.” Terminology 9(2): 163–199.

Cabré, M. Teresa, María Estopa, Rosa Bagot, and Jordi Palatresi

2001 “Automatic term detection: A review of current systems.” In Recent advances in computational terminology, ed. by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 53–88. Amsterdam: John Benjamins.

Cabré, M. Teresa

1999 Terminology: Theory, methods and applications. Amsterdam: John Benjamins.

Drouin, Patrick, Jean-Benoît Morel, and Marie-Claude L’Homme

2020 “Automatic Term Extraction from Newspaper Corpora: Making the Most of Specificity and Common Features.” Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 1–7.

Foo, Jody, and Magnus Merkel

2010 “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools”. In Terminology in Everyday Life, ed. by Marcel Thelen and Frieda Steurs, 163–180. Amsterdam: John Benjamins.

Fulford, Heather

2001 “Exploring terms and their linguistic environment: A domain-independent approach to automated term extraction.” Terminology 7(2): 259–279.

Heid, Ulrich

2006 “Extracting term candidates from recursively chunked text.” In Terminology, computing and translation, ed. by Pius ten Hacken, 97–115. Tübingen: Gunter Narr.

Hiemstra, Djoerd

1998 “Multilingual Domain Modeling in Twenty-One. Automatic Creation of a Bi-directional Translation Lexicon from a Parallel Corpus.” In Proceedings of the 8th CLIN meeting, ed. by P. H. Coppen, L. van Halsteren, and L. Teunissen, 41–58. Amsterdam: Rodopi.

Hofland, Knut, and Øystein Reigem

2006 Translation Corpus Aligner, version 2. An interactive sentence aligner. Paper presented at ICAME. [URL]

Hofland, Knut, and Stig Johansson

1998 “The Translation Corpus Aligner: A program for automatic alignment of parallel texts.” In Corpora and Cross-linguistic Research: Theory, Method, and Case Studies, ed. by In Stig Johansson, and Signe Oksefjell, 87–100. Amsterdam: Rodopi.

Kageura, Kyo, and Elizabeth Marshman

2019 “Terminology Extraction and Management.” In The Routledge Handbook of Translation and Technology, ed. by Minako O’Hagan, 61–77. London: Routledge.

Kageura, Kyo, and Bin Umino

1996 “Methods of automatic term recognition.” Terminology, 3(2), 259–289.

Kolstad, Ellinor

2006 “Skjær i sjøen under oversettelse av romanen Trawler” [Stumbling blocks in the translation of the novel Trawler]. Språknytt 2006 (2): 19–23.

Kristiansen, Marita, and Magnar Brekke

2004 “Kunnskapsbank for norsk økonomisk- administrative fagdomene.” Språk og språkundervisning 11.

Macken, Lieve, Els Lefever, and Veronique Hoste

2013 “TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.” Terminology, 19(1), 1–30.

McEnery, Tony, and Andrew Hardie

2012 Corpus linguistics. Cambridge: Cambridge University Press.

Musacchio, M. Teresa

2017 Translating popular science. Padova: CLEUP.

Myking, Johan

2005 “Terminologi i Noreg – historisk oversyn” [Terminology in Norway – an historical overview]. In Hvem tar ansvaret for fagterminologien?, ed. by Jan Hoel, 2–15. Oslo: Språkrådet.

2006 Nyare terminologiarbeid i Noreg. Språknytt 2006 (2): 13–18.

Nazarenko, Adeline, and Haifa Zargayouna

2009 “Evaluating term extraction.” International Conference Recent Advances in Natural Language Processing (RANLP’09). Borovets, Bulgaria. 299–304. [URL]

Pettersen, Jan Martin

1997 Go fishing! Engelsk for fiskere, havbrukere og fisketilvirkere. [Go fishing! English for fishermen, sea farmers and fish product manufacturers.] Oslo: Landbruksforlaget.

Rayson, Paul, and Roger Garside

2000 “Comparing corpora using frequency profiling.” In Proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), 1–6.

Rayson, Paul, Geoffrey Leech, and Mary Hodges

1997 “Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus.” International Journal of Corpus Linguistics 2 (1):133–52.

Rigouts Terryn, Ayla, Patrick Drouin, Veronique Hoste, and Els Lefever

2020 “TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset.” Proceedings of the LREC 2020 6th International Workshop on Computational Terminology (COMPUTERM 2020), 85–94.

Rigouts Terryn, Ayla, Veronique Hoste, and Els Lefever

2019 “In No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable Corpora.” Language Resources and Evaluation, 54(2), 385–418.

Sinclair, John, Susan Jones, Robert Daley, and Ramesh Krishnamurthy

2004 English collocational studies: The OSTI report. London: Continuum.

Solberg, Marte

1995 A dictionary and terminological analysis of merchant ship terms. Unpublished Master thesis, NHH.

Stubbs, Michael

2001 Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell.

Vintar, Špela

2010 “Bilingual Term Recognition Revisited.” Terminology, 16(2), 141–158.