Measuring the degree of specialisation of sub-technical legal terms through corpus comparison
A domain-independent method
One of the most remarkable features of the legal English lexicon is the use of sub-technical vocabulary, that is, words frequently shared by the general and specialised fields which either retain a legal meaning in general English or acquire a specialised one in the legal context. As testing has shown, almost 50% of the terms extracted from BLaRC, an 8.85m word legal corpus, were found amongst the most frequent 2,000 word families of West’s (1953) GSL, Coxhead’s (2000) AWL or the BNC (2007), hence the relevance of this type of vocabulary in this English variety. Owing to their peculiar statistical behaviour in both contexts, it is particularly problematic to identify them and measure their termhood based on such parameters as their frequency or distribution in the general and specialised environments. This research proposes a novel termhood measuring method intended to objectively quantify this lexical phenomenon through the application of Williams’ (2001) lexical network model, which incorporates contextual information to compute the level of specialisation of sub-technical terms.
References
Ahmad, Khurshid, Andrea Davies, Heather Fulford, and Monika Rogers
Alcaraz Varó, Enrique
1994 El Inglés Jurídico: Textos y Documentos. Madrid: Derecho.

Alcaraz Varó, Enrique
2000 El Inglés Profesional y Académico. Madrid: Alianza Editorial.

Ananiadou, Sofia
1988
A Methodology for Automatic Term Recognition. PhD Thesis, University of Manchester, Institute of Science and Technology, United Kingdom.
Aronson, Alan, and Françoise-Michel Lang
2010 “
An Overview of MetaMap: Historical Perspective and Recent Advances.”
Journal of American Medical Informatics Association 17 (3): 229–236.


Baker, Mona
1988 “
Sub-technical Vocabulary and the ESP Teacher: An Analysis of some Rhetorical Items in Medical Journal Articles.”
Reading in a Foreign Language 4 (2): 91–105.

Barrón-Cedeño, Alberto, Gerardo Sierra, Patrick Drouin, and Sofia Ananiadou
2009 “
An Improved Automatic Term Recognition Method for Spanish.” In
Proceedings of the 10th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009), ed. by
A. Gelbuck, 125–136. Berlin: Springer-Verlag. (
[URL]). Accessed January 2016.


Bourigault, Didier
1992 “
Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases.” In
Proceedings of the 5th International Conference on Computational Linguistics
, 977–981. Nantes, France.
Borja Albí, Anabel
2000 El Texto Jurídico en Inglés y su Traducción. Barcelona: Ariel.

Cabré, María Teresa, Rosa Estopà, and Jorge Vivaldi
Chung, Teresa M., and Paul Nation
2003 “
Technical Vocabulary in Specialised Texts.”
Reading in a Foreign Language 15 (2): 103–116.

Church, Kenneth W., and Patrick Hanks
1990 “
Word Association Norms, Mutual Information, and Lexicography.”
Computational Linguistics 16 (1): 22–29.

Church, Kenneth W., and William Gale
1995 “
Inverse Document Frequency IDF: A Measure of Deviations from Poisson.” In
Proceedings of the Third Workshop on Very Large Corpora, ed. by
D. Yarowsky and
K. Church, 121–130. Cambridge: Massachusetts Institute of Technology Press.

Cowan, Ronayne
1974 “
Lexical and Syntactic Research for the Design of EFL.”
TESOL Quarterly 81: 389–399.


Coxhead, Averyl
2000 “
A New Academic Word List.”
TESOL Quarterly 34 (2): 213–238.


Dagan, Ido, and Kenneth Church
1994 “
TERMIGHT: Identifying and Translating Technical Terminology.” In
Proceedings of the 4th Conference on Applied Natural Language Processing
, 34–40. Stuttgart, Germany (
[URL]). Accessed January, 2016.
Daille, Beatrice
1996 “
Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In
The Balancing Act: Combining Symbolic and Statistical Approaches to Language, ed. by
J.L. Klavans and
P. Resnik, 29–36. Cambridge: Massachusetts Institute of Technology Press.

David, Sophie, and Pierre Plante
1990 Termino 1.0. Research Report of Centre d’Analyse de Textes par Ordinateur. Université du Québec, Montréal.

Dunning, Ted
1993 “Accurate Methods for the Statistics of Surprise and Coincidence”.
Computational Linguistics 19 (1): 61–74.

Fahmi, Ismail, Gosse Bouma, and Lonneke van der Plas
2007 “
Improving Statistical Method Using Known Terms for Automatic Term Extraction.” In
Proceedings of Computational Linguistics in the Netherlands (CLIN 17), ed. by
F. van Eynde,
P. Dirix,
I. Schuurman, and
V. Vandeghinste, 1–8. Belgium: University of Leuven.

Farrell, Paul
1990 Vocabulary in ESL: A Lexical Analysis of the English of Electronics and a Study of Semi-technical Vocabulary. Dublin: Centre for Language and Communication Studies.

Frantzi, Katerina T., and Sophia Ananiadou
1999 “
The C/NC Value Domain Independent Method for Multi-word Term Extraction.”
Journal of Natural Language Processing 3 (2): 115–127.

Frantzi, Katerina, Sofia Ananiadoua, and Hideki Mima
2000 “
Automatic Recognition of Multi-Word Terms: The C-value/NC-value Method.”
International Journal on Digital Libraries 3 (2): 115–130.


Geffet, Maayan, and Ido Dagan
2005 “
The Distributional Inclusion Hypotheses and Lexical Entailment.” In
Proceedings of the Annual Meeting of the ACL
, 107–114. Michigan, USA.
Heatley, Alex, and Paul Nation
2002 Range. Computer software. Wellington, New Zealand: Victoria University of Wellington.

Jacquemin, Christian
2001 Spotting and Discovering Terms through NLP. Cambridge: Massachusetts Institute of Technology Press.

Joslyn, Cliff, Patrick Paulson, and Karin Verspoor
2008 “
Exploiting Term Relations for Semantic Hierarchy Construction.” In
Proceedings of the International Conference of Semantic Computing IEEE
, 42–49. Santa Clara (CA), USA.
Justeson, John S., and Slava M. Katz
1995 “
Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text.”
Natural Language Engineering 1 (1): 9–27.


Kit, Chunyu, and Xiaoyue Liu
Lemay, Chantal, Marie-Claude L’Homme, and Patrick Drouin
Loginova, Elizabeta, Anita Gojun, Helena Blancafort, María Guegan, Tatiana Gornostay, and Ulrich Heid
Reference Lists for the Evaluation of Term Extraction Tools.” In
Proceedings of TKE 2012: Terminology and Knowledge Engineering
, 177–192. Madrid: Universidad Politécnica de Madrid. (
[URL]), Accessed January 2016.
Marín, María José
2014 “
Evaluation of Five Single-word Term Recognition Methods on a Legal Corpus.”
Corpora 9 (1): 83–107.


Marín, María José, and Camino Rea
2012 “
Structure and Design of the BLRC: A Legal Corpus of Judicial Decisions from the UK.”
Journal of English Studies 101: 131–145.


Maynard, Diana, and Sofia Ananiadou
2000 “
TRUCKS: A Model for Automatic Multi-word Term Recognition”.
Journal of Natural Language Processing 8 (1): 101–125.


Mellinkoff, David
1963 The Language of the Law. Boston: Little, Brown & Co.

Nakagawa, Hiroshi, and Tatsunori Mori
2002 “
A Simple but Powerful Automatic Term Extraction Method.” In COLING-02 on COMPUTERM
.
Proceedings of the Second International Workshop on Computational Terminology
, 1–7. Taipei, Taiwan.
Nazar, Rogelio, and María Teresa Cabré
2012 “
Supervised Learning Algorithms Applied to Terminology Extraction.” In
Proceedings of the 10th Terminology and Knowledge Engineering Conference TKE 2012, ed. by
G. Aguado de Cea,
M.C. Suárez-Figueroa,
R. García-Castro, and
E. Montiel-Ponsoda, 209–217. Madrid: Ontology Engineering Group, Association for Terminology and Knowledge Transfer.

Orts, María Ángeles
2006 Aproximación al Discurso Jurídico en Inglés: Las Pólizas de Seguro Marítimo de Lloyd’s. Madrid: Edisofer.

Panzienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto
2005 “
Terminology Extraction: An Analysis of Linguistic and Statistical Approaches.”
Studies in Fuzziness and Soft Computing 1851: 225–279.

Park, Younja, Roy Byrd, and Branimir Boguraev
2002 “
Automatic Glossary Extraction: Beyond Terminology Association.” In
Proceedings of COLING’02 19th International Conference on Computational Linguistics
, ed. by
S.C. Zeng, 1–7. Taipei, Taiwan.

Sclano, Francesco, and Paola Velardi
2007 “
A Web Application to Learn the Common Terminology of Interest Groups and Research Communities.” In Proceedings of the Conference TIA-2007, ed. by
C. Engehard and
R.D. Kuntz, 85–94. Grenoble: Presses Universitaires de Grenoble.
Scott, Mike
2008 WordSmith Tools Version 5. Liverpool: Lexical Analysis Software.

Sparck-Jones, Kathleen
1972 “
A Statistical Interpretation of Term Specificity and its Application in Retrieval.”
Journal of Documentation 281: 11–21.


Tiersma, Peter
1999 Legal Language. Chicago: The University of Chicago Press.

Trimble, Louis
1985 English for Science & Technology: A Discourse Approach. Cambridge: Cambrige University Press.

Vivaldi, Jorge
2001
Extracción de Candidatos a Término mediante Combinación de Estrategias Heterogéneas
. PhD Thesis. Universidad Politécnica de Cataluña.
Vivaldi, Jorge, Diego Cabrera, Luis Adrián, Gerardo Sierra and María Pozzi
2012 “
Using Wikipedia to Validate the Terminology Found in a Corpus of Basic Textbooks.” In
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12)
, 3820–3827. Instambul: Instambul Lütfi Kırdar Convention and Exhibition Centre. (
[URL]). Accessed January 2016.
Wang, Karen, and Paul Nation
2004 “
Word Meaning in Academic English: Homography in the Academic Word List.”
Applied Linguistics 25 (3): 291–314.


Weeds, Julie, David Weir, and Diana McCarthy
2004 “
Characterising Measures of Lexical Distributional Similarity.” In
Proceedings of Coling-04
. 1–7, Geneva, Switzerland.
West, Michael
1953 A General Service List of English Words. London: Longman.

Williams, Geoffrey
2001 “
Mediating between Lexis and Texts: Collocational Networks in Specialised Corpora.”
ASp, la Revue du GERAS 31-331: 63–76.


Cited by
Cited by 3 other publications
Llopis, María Ángeles Orts
2017.
Terror at Home On the Rhetoric of Domestic Violence Legislation in the United Kingdom and Spain.
Journal of Intercultural Communication 17:2
► pp. 1 ff.

Pérez, María José Marín & Ángela Almela
2022.
The representation of migrants in Spanish judicial decisions: using corpus data to refute hate speech.
Corpora 17:2
► pp. 167 ff.

This list is based on CrossRef data as of 1 december 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.