Building representative multi-genre corpora for legal and institutional translation research: The LETRINT approach to text categorization and stratified sampling

Prieto Ramos, Fernando; Cerutti, Giorgina; Guzmán, Diego

doi:10.1075/ts.00014.pri

Article published In:

Corpus-Based Research in Legal and Institutional Translation
Edited by Fernando Prieto Ramos
[Translation Spaces 8:1] 2019
► pp. 93–116

Building representative multi-genre corpora for legal and institutional translation research

The LETRINT approach to text categorization and stratified sampling

Fernando Prieto Ramos | University of Geneva

Giorgina Cerutti | University of Geneva

Diego Guzmán | University of Geneva

Exploring questions of representativeness, balance and comparability is essential to tailoring corpus design and compilation to research goals, and to ensuring the validity of research results. This is especially true when the target population of texts under examination is very large and transcends a restricted area of specialization and/or covers multiple genres, as in the case of texts translated in institutional settings. This paper describes the multilayered sequential approach to corpus building applied in a comparative study on legal translation in three of these settings. The approach is based on a full mapping and categorization of institutional texts from a legal perspective; it applies an innovative combination of stratified sampling techniques integrating quantitative and qualitative criteria adapted to the research aims. The resulting corpora, categorization matrix and selection records, together with the methodological detail provided, can be useful for building other multi-genre corpora in translation studies and further afield.

Keywords: corpus, representativeness, text categorization, stratified sampling, genre, balance, legal translation, institutional translation

Article outline

1.Representativeness and research needs: LETRINT’s corpus-building sequence
2.Mapping institutional text production: The LINST corpora set
3.Categorization of institutional texts: From LINST to LETRINT 0
4.Stratified systematic sampling: From LETRINT 0 to LETRINT 1
- 4.1Selection of genres
- 4.2Tailored selection of textual units by stratum
5.Concluding remarks
Notes
References

Available under the Creative Commons Attribution (CC BY) 4.0 license.

For any use beyond this license, please contact the publisher at [email protected].

Published online: 26 June 2019

https://doi.org/10.1075/ts.00014.pri

References (37)

References

Aston, Guy. 1999. “Corpus Use and Learning to Translate.” Textus 121: 289–314.

Atkins, Sue, Jeremy Clear, and Nicholas Ostler. 1992. “Corpus Design Criteria.” Literary and Linguistic Computing 7 (1): 1–16.

Biber, Douglas. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press.

. 1990. “Methodological Issues Regarding Corpus-based Analyses of Linguistic Variation.” Literary and Linguistic Computing 51: 257–269.

. 1993. “Representativeness in Corpus Design.” Literary and Linguistic Computing 8 (4): 243–257.

Bowker, Lynne, and Jennifer Pearson. 2002. Working with Specialized Language: A Practical Guide to Using Corpora. London and New York: Routledge.

Cerutti, Giorgina. 2017. “Evaluating Tools for Legal Translation Research Needs: The Case of Fourth-generation Concordancers.” Legal Translation and Court Interpreting: Ethical Values, Quality, Competence Training, edited by Annikki Liimatainen, Arja Nurmi, Marja Kivilehto, Leena Salmi, Anu Viljanmaa, and Melissa Wallace, 357–391. Berlin: Frank & Timme.

Claridge, Claudia. 2008. “Historical Corpora.” Corpus Linguistics, edited by Anke Lüdeling, and Merja Kytö, 242–259. Berlin: Mouton de Gruyter.

Corpas Pastor, Gloria, and Miriam Seghiri Domínguez. 2007. “Determinación del umbral de representatividad de un corpus mediante el algoritmo N-Cor [Establishing a corpus representativeness threshold through the N-Cor algorithm].” Procesamiento del lenguaje natural 391: 165–172.

European Commission. 2014. “Theme: Sample Selection–Main Module.” Memobust Handbook on Methodology of Modern Business Statistics. Brussels: European Commission. Accessed December 18, 2018. [URL]

Felici, Annarita. 2015. “Translating EU Legislation from a ‘Lingua Franca’: Advantages and Disadvantages.” Language and Culture in EU Law: Multidisciplinary Perspectives, edited by Susan Šarčević, 123–140. Farnham: Ashgate.

Halverson, Sandra. 1998. “Translation Studies and Representative Corpora: Establishing Links between Translation Corpora, Theoretical/Descriptive Categories and a Conception of the Object of Study.” Meta: Translators’ Journal 43 (4): 494–514.

Husa, Jaakko. 2012. “Understanding Legal Languages-Linguistic Concerns of the Comparative Lawyer.” The role of legal translation in legal harmonization, edited by Cornelis J. W. Baaij, 161–181. The Hague: Kluwer Law International.

Koester, Almut. 2010. “Building Small Specialised Corpora.” The Routledge Handbook of Corpus Linguistics, edited by Michael McCarthy, and Anne O’Keeffe, 66–79. Abingdon: Routledge.

Leech, Geoffrey. 1991. “The State of the Art in Corpus Linguistics.” English Corpus Linguistics: Studies in Honour of Jan Svartvik, edited by Karin Aijmer, and Bengt Altenberg, 8–29. London: Longman.

. 2007. “New Resources, or Just Better Old Ones? The Holy Grail of Representativeness.” Corpus Linguistics and the Web, edited by Marianne Hundt, Nadja Nesselhauf, and Carolin Biewer, 133–149. Amsterdam: Rodopi.

McEnery, Tony, and Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge and New York: Cambridge University Press.

McEnery, Tony, and Anita Wilson. 2001. Corpus Linguistics: An Introduction. Edinburgh: Edinburgh University Press.

McEnery, Tony, Richard Xiao, and Yukio Tono. 2006. Corpus-based Language Studies: An Advanced Resource Book. London and New York: Routledge.

Mellinger, Christopher D., and Thomas A. Hanson. 2017. Quantitative Research Methods in Translation and Interpreting Studies. London and New York: Routledge.

Mori, Laura (ed). 2018. Observing Eurolects. Corpus Analysis of Linguistic Variation in EU Law, Studies in Corpus Linguistics. Amsterdam and Philadelphia: Benjamins Publishing Company.

Oostdijk, Nelleke. 1991. Corpus Linguistics and the Automatic Analysis of English. Amsterdam and Atlanta: Rodopi.

Prieto Ramos, Fernando. 2004. Media and Migrants: A Critical Analysis of Spanish and Irish Discourses on Immigration. Oxford, Bern and New York: Peter Lang.

. 2014. “International and Supranational Law in Translation: From Multilingual Lawmaking to Adjudication.” The Translator 20 (3): 313–331.

. 2017. “Global Law as Translated Text: Mapping Institutional Legal Translation.” Tilburg Law Review 22 (1–2): 185–214.

. 2019. “Implications of Text Categorisation for Corpus-based Legal Translation Research: The Case of International Institutional Settings.” Research Methods in Legal Translation and Interpreting: Crossing Methodological Boundaries, edited by Łucja Biel, Jan Engberg, Rosario Martín Ruano, and Vilelmini Sosoni, 29–47. London and New York: Routledge.

Prieto Ramos, Fernando, and Diego Guzmán. 2018. “Legal Terminology Consistency and Adequacy as Quality Indicators in Institutional Translation: A Mixed-Method Comparative Study.” Institutional Translation for International Governance: Enhancing Quality in Multilingual Legal Communication, edited by Fernando Prieto Ramos, 81–101. London: Bloomsbury.

Scott, Michael. 2012. WordSmith Tools. Version 6. Stroud: Lexical Analysis Software.

Sinclair, John. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge.

. 2005. “Corpus and Text–Basic Principles.” Developing Linguistic Corpora: A Guide to Good Practice, edited by Martin Wynne, 1–6. Oxford: Oxbow Books.

Steinberg, Richard H. 2004. “Judicial Lawmaking at the WTO: Discursive, Constitutional, and Political Constraints.” American Journal of International Law 981: 247–275.

Trklja, Aleksandar, and Karen McAuliffe. 2018. “The European Union Case Law Corpus (EUCLCORP): A Multilingual Parallel and Comparative Corpus of EU Court Judgments (March 5, 2018).” Proceedings of the Second Workshop on Corpus-Based Research in the Humanities: CRH-2, edited by Andrew U. Frank, Christine Ivanovic, Francesco Mambrini, Marco Passarotti, and Caroline Sporleder, 217–226. Vienna: Gerastree Proceedings.

van Els, Theo. 2001. “The European Union, its Institutions and its Languages: Some Language Political Observations.” Current Issues in Language Planning 2 (4): 311–360.

Varantola, Krista. 2000. “Translators, Dictionaries and Text Corpora.” I corpora nella didattica della traduzione, edited by Silvia Bernardini, and Federico Zanettin, 117–133. Bologna: CLUEB.

Walter, Elizabeth. 2010. “Using Corpora to Write Dictionaries.” The Routledge Handbook of Corpus Linguistics, edited by Michael McCarthy, and Anne O’Keeffe, 428–443. Abingdon: Routledge.

Zanettin, Federico. 2012. Translation-Driven Corpora. Corpus Resources for Descriptive and Applied Translation Studies. Manchester: St. Jerome Publishing.

Zhao, Xingmin, and Deborah Cao. 2013. “Legal Translation at the United Nations.” Legal Translation in Context: Professional Issues and Prospects, edited by Anabel Borja Albi, and Fernando Prieto Ramos, 203–220. Frankfurt am Main: Peter Lang.

Cited by (5)

Cited by five other publications

Order by:

Guzmán, Diego & Fernando Prieto Ramos

2021. Assessing legal terminological variation in institutional translation. Translation and Translanguaging in Multilingual Contexts 7:2 ► pp. 224 ff.

Prieto Ramos, Fernando & Giorgina Cerutti

2021. Terminology as a source of difficulty in translating international legal discourses: an empirical cross-genre study. International Journal of Legal Discourse 6:2 ► pp. 155 ff.

Prieto Ramos, Fernando & Giorgina Cerutti

2023. Terminological hybridity in institutional legal translation. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 29:1 ► pp. 45 ff.

Prieto Ramos, Fernando & Diego Guzmán

2021. Examining institutional translation through a legal lens. Target. International Journal of Translation Studies 33:2 ► pp. 254 ff.

Prieto Ramos, Fernando & Diego Guzmán

2023. Measuring the quality of legal terminological decisions in institutional translation. In Handbook of Terminology [Handbook of Terminology, 3], ► pp. 375 ff.

This list is based on CrossRef data as of 19 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.