Terminology management and terminology quality assurance in the European Commission’s Directorate-General for Translation
1.Introduction
Language is not only a means to perceive and describe reality, it also creates it. The performative function of language is particularly visible in law: in legal texts language is used not only to report about doing something, but also to do things (Fiorito 2006). This feature of legal discourse explains the high expectations for the quality of legal translations, as translation errors may have serious consequences for individuals and businesses, as well as undermine the trust in the legal system or institutions in general (European Commission 2012; see also e.g. Byrne 2007; Matulewska 2016; Scott and O’Shea 2021).
EU is an international legal entity, and EU law is an independent supranational legal system (Wessel 2000). Hence, translation of EU legislation and case law is also legal translation.11.Legal translation is only part, though an important one, of EU translation, i.e. translation “rendered by and for European Union institutions” (Biel 2017, 32). In the context of this article, however, the reference to “EU translation” is meant to be understood as reference to “EU legal translation”. It is, however, neither translation within one legal system, nor translation across legal systems, but a special sub-genre of institutional legal translation, where the major challenge is to achieve uniform interpretation and application of legislation in all 24 languages (Biel 2007). To that effect all language versions of EU legal acts form a single legal instrument and are considered equally authentic, i.e. presumed to convey the same meaning, have the same intent and produce identical legal effects (Šarčević 1997, 2000).
Still, 23 of these language versions are a product of translation. As Strandvik (2012, 48) points out, law-making is a complicated process even in a monolingual setting, and the difficulties are exacerbated when it takes place in a multilingual drafting environment, such as the EU. The sheer number of language versions not only makes the process of translation more complex, but also increases the risk of errors and discrepancies between them. Although the instruments of EU law derive their meaning from the same (EU) legal system, which in theory should ease interpretation, they are built on concepts originating in various other legal systems. Thus, ensuring multilingual concordance, as the correspondence across all language versions is referred to, is an enormous undertaking.
Multilingual concordance is achieved by ensuring legislative, linguistic and terminological quality of the translation (Strandvik 2012, 35). Legislative quality entails compliance with EU legislative drafting rules and conventions. Since EU law is a separate legal order different from the legal orders of its Member States, specific drafting guidelines that govern the presentation of EU legal acts have been developed. These guidelines (e.g. Interinstitutional Style Guide, Joint Handbook for the Presentation and Drafting of Acts subject to the Ordinary Legislative Procedure, Joint Practical Guide of the European Parliament, the Council and the Commission for persons involved in the drafting of European Union legislation) are to be uniformly applied by drafters and translators in all language versions.22.More on the role of EU style guides can be found in Strandvik (2017) and Drugan, Strandvik, and Vuorinen (2018). An analysis of the EU Interinstitutional Style Guide is presented in Svoboda (2013) and the results of a study on translation guidelines made available for DGT’s external contractors of all EU’s languages – in Svoboda (2017). Linguistic quality is achieved by adhering to the linguistic and textual drafting conventions in a given language. This is because EU law has to be integrated in the national legal systems of the Member States, and it should – while keeping its distinctive character – also ‘fit’ in these systems.33.See Biel (2014) on the concept of textual fit. Finally, terminological quality above all consists in expressing EU concepts in a uniform way by using accurate and adequate terms.
2.Terminology in legal translation
2.1Terminology accuracy in EU legal acts
Terminology is a distinctive feature of all languages for special purposes (LSP) and legal discourse is no exception. This does not mean legal terminology only. Law is by definition interdisciplinary, as it regulates many different areas of human activity. Thus, legal texts are hybrid texts in that they follow certain legal linguistic conventions specific to their function (i.e. constituting or applying instruments governing public or private legal relations), while also containing specialized terms from fields other than law, which may even vastly outnumber legal terms (Prieto Ramos 2014b, 264–265).
This is the case of EU legal texts. Legal terms constitute only a part of terminology present in EU legal acts, but a particularly challenging one. This is because the EU legal system develops under the influence of several European legal traditions, as well as international law, and the vast majority of EU legal concepts and terms are borrowed from one or more national legal systems (Robertson 2012, 4; Šarčević 2015, 186). In this process, the original meaning of the borrowed concept may be retained or modified, and the foreign term denoting that concept may also be retained, or replaced with a neutral term (Šarčević 2015, 186). The second approach is preferred to avoid any misleading connotations, as EU terms should be transparent and easily recognizable as such, as well as to facilitate translation. When new terms are formed rather than borrowed, this happens in two ways: either a neologism is proposed to designate the new concept, or an existing word or phrase from general language is taken and then defined (ten Hacken 2010, 421). As it happens, the EU legislator often makes use of general language words that are then assigned a new EU meaning. On the one hand, this favors transparency and comprehensibility; on the other hand, such words usually have legal meaning also in national laws, which increases the risk that they will not be recognized as EU terms. Thus, whenever possible, neologisms are created, which have the advantage of being easily identifiable as EU terms and distinguishable from national terms (Šarčević 2015, 188).
When creating EU legal terminology, EU translators can also choose between reusing terms that already exist in their language and just assigning new meaning to them, or creating a new term – either with domestic roots or via calques and borrowings. As Šarčević points out (2015, 193), using functional equivalents from national law satisfies target user expectations and increases the textual fit of EU law within the national legal systems, but at the same times it creates multiple references and poses a threat to the uniform interpretation of EU law if the choice is made without a proper comparative analysis of the European and the national concept.44. Matulewska (2016) gives an example of the Polish term wyrok, which may be translated into English as judgment, sentence, decree, conviction, verdict or acquittal, and the decision cannot be made without a proper analysis of the semantic and systemic relations binding these terms, which is a time-consuming task, hardly possible under real-life conditions. On the other hand, relying on literal translation may help promote a uniform interpretation of EU terms, but it exposes the translation to criticism as being alienating and unnatural. Moreover, such literal equivalents may be less productive, that is less capable of generating derivatives. In any case, when determining the adequacy of a particular translation strategy in a legal context, legal criteria must prevail over linguistic preferences (Šarčević 2015, 193).
The choice of the drafting language matters, too. It has been many years now since English replaced French as the main drafting language of EU legislation, not without consequences. Čavoški (2017, 61) argues that having English as the drafting language for the vast majority of EU legislation leaves translators with the difficult task to reconcile common law traditions with the civil law traditions, a task she calls “mission impossible”. Otero Fernández (2020, 88) claims that English is more indeterminate than French, having fewer grammatical markers, such as gender, which leaves a bigger scope for divergences in translation, and Šarčević (2015, 196) points out to the fact that in many cases EU terms in English are borrowed from French, and that many errors can be avoided by consulting other language versions.55.This advice, although sound, is difficult to follow, as all language versions, other than the one serving as the source text, are drafted at the same time.
Still, purely legal terms tend to be rather rare in EU legal texts, when compared to the amount of specialized terminology from the domains under regulation. This comes with its own challenges. To begin with, the EU may regulate areas that are relatively new and for which specialized terminology does not exist yet in many national languages. International norms and standards, which are often incorporated directly into the EU secondary legislation, present another difficulty. Such norms and standards may contain terminology that had been already used in EU legislation, but they may also have been translated and incorporated into national legislation. This creates a problem of consistency. Even dealing with technical terms from long-established domains, for example chemistry or medicine, can be complicated. Especially in such broad fields of knowledge there exist numerous terminological variants, affected by different traditions, geographical variations, degree of specialization and the progress of knowledge, leading to the co-existence of “old” and “new” terms (Freixa 2006, 55), making it subsequently difficult for a translator to decide on the most adequate equivalent for a given term.
Finding accurate and adequate equivalents of both legal and specialized terms requires effective information mining skills from the translator. As confirmed by Prieto Ramos (2020a), the first and most important resources for EU institutional translators are in-house termbases and parallel corpora of already translated legislation and case law, together with internal guidelines. This is because their usage is mandatory and the content is binding, provided of course that they refer to the same communicative situation. Otherwise, other specific legal and technical sources need to be consulted. Bilingual glossaries and language resources serve as a good starting point, but they need to be regarded critically and the usage of terms thus mined needs to be confirmed, ideally in primary legal sources and scholarly texts. Finally, domain experts, including lawyer-linguists from the translator’s institution, may be consulted, too.
Thanks to their linguistic expertise and knowledge of institutional text production, EU translators are best placed to choose an adequate term. However, it is not always them who have the final say. It is not uncommon for EU translators to carefully select a term only to have it changed to a different one by the national administrations in their Member States in the comitology procedure or during the transposition of the EU legal act into national laws. Sometimes it is also the author of the text who wishes a particular term to be translated in a particular way (or left untranslated).66.For example, at the beginning of 2020, when documents on Covid-19 started to be translated, DGT language departments received instructions to translate “Covid-19 outbreak” using the equivalent of “pandemic” and not “epidemic” or any other equivalent. In the end, translators can only suggest an equivalent, but it is the final users of their translations – directorate-generals, national administrations and legal practitioners – who have the power to accept or reject it,77.For example, Member States often request a corrigendum to replace an EU equivalent with a national law term (see Biel and Pytel 2020, 164–165). and who ultimately decide on the interpretation of a particular term.
2.2EU guidelines on terminological consistency
Consistency of terminology is critical to ensure univocity in institutional translation in general (Prieto Ramos 2020b, 136), and is also a matter of uniform application and interpretation of EU legislation (see e.g. Baaij 2012; Mišćenić 2016; Pacho Aljanati 2017; Prieto Ramos 2014a; Šarčević 2012). It is therefore prominently featured in translation quality requirements for the various text categories set up by DGT (Directorate-General for Translation 2015). The translation specifications for legal texts require compliance with EU drafting rules, standardized formulations and templates, faithful rendering of quotations and indirect quotations from the EUR-Lex database or from authoritative national legal databases, and terminological consistency within the act itself (internal consistency) and with the legal basis and other related acts (external consistency).
However, translators can only do their part, if the source text is drafted clearly and precisely. Thus, terminological consistency is also one of the guiding principles of the Joint Practical Guide of the European Parliament, the Council and the Commission for persons involved in the drafting of European Union legislation (2015). The Joint Practical Guide (JPG) firstly distinguishes between substantive consistency, which concerns the logic of the legal acts as a whole, and formal consistency, i.e. terminological consistency (point 6.1). It then defines terminological consistency as using the same terms to express the same concepts and refraining from using identical terms to express different concepts (point 6.2). Synonyms are to be avoided (point 1.4.1). Using terms in a uniform manner serves to avoid ambiguities or contradictions and to leave out doubts as to the meaning of the term. For the same reason, the JPG advises to define terms (point 6.2.3), especially when a term has several meanings or if the meaning of a term – for the purpose of the given act – has to be limited or extended with respect to the usual meaning attributed to that term (point 14.1). What follows is that any terms not defined in a legal act should be understood as used in the given legal or specialized domain. Definitions are also used to avoid unintended references to concepts or terms, especially legal terms, from a particular national legal system (point 5.3.2).
Regarding translation, the JPG recommends drafters to make sure that translators can identify the sources drawn on in the original text, either through the wording of the text itself, or indirectly by other means (specific instructions to translators come to mind) (point 5.5.1). Additionally, the JPG advises the drafters to take comments from translators regarding the original text seriously, and rather alter the text than have the translators follow an unclear or poorly drafted original (point 5.5.2).
What seems easy and straightforward in theory, is not as easy in practice. Errors in texts sent for translation still occur, despite the existence of numerous safeguards. In the European Commission, drafters are assisted by lawyer-linguists in the Quality of Legislation team and have at their disposal an electronic drafting aid based on the JPG, namely the Drafters’ Assistance Package (DAP), which offers access to legislative drafting rules, provides guidance on how to draft various parts of legal acts and suggests standard formulations. However, the tool does not check for compliance with existing terminology, nor does it control terminological consistency. In case of inconsistencies, the translator needs to contact the author of the text, and if the inconsistency is not intended, i.e. if the different term does not refer to a different concept, the author should make appropriate changes to the original text and send it again for translation, as a new version. This takes time and impedes the translation process, but guarantees a uniform approach across all language departments. Without a new version, translators cannot correct errors and inconsistencies in individual language versions since this would lead to divergences between them and ultimately create legal uncertainty.
Correct terminology is particularly important for the autonomous acts of the European Commission,88.These are non-legislative acts adopted by the Commission, for which it is solely responsible. They fall into two categories: delegated acts and implementing acts. as these texts do not undergo any further quality checks after translation. Only in some cases are such texts sent to Member States for linguistic proofreading before adoption, so that any changes and corrections suggested by national experts can be incorporated in translation. For legal acts adopted jointly by the Council and the European Parliament, terminological quality matters, too, as the Commission’s translation of the draft legislation serves as a basis on which the translators of these two institutions work. On the one hand, this opens up the possibility to remove errors and inconsistencies, if present in the draft Commission proposal. On the other hand, as the Commission’s proposal undergoes numerous amendments in the process of negotiation and adoption, new errors may be introduced, or the amendments may cause inconsistencies with other acts already adopted by the Commission.99.Drugan, Strandvik, and Vuorinen (2018, 57) give the following example: the name of an EU initiative in the Commission’s proposal was later changed by the lawyer-linguists in the Council and in the Parliament, at the request of national authorities. In the meantime, however, more than 300 other documents featuring the now outdated name had been translated in the Commission.
Maintaining consistency in translation is also difficult because it is not uncommon for one concept to be expressed by more than one term; a phenomenon called denominative variation (Freixa 2006). Denominative variation exists for a number of reasons. Firstly, terms not only have a referential meaning, but also trigger certain connotations, which is why in certain contexts some terms may be preferred over other. Secondly, the meaning of a term is not given but negotiated by members of a discursive community. The bigger and more diverse the community, the more variation there is in terms used to refer to specific concepts. And thirdly, several contextual factors, like text type, communicative intention, intended audience, register, etc., also impact the term use (Freixa 2006).1010.This variation is also visible in the Interinstitutional termbank IATE. The structure of the termbank is concept-oriented: one entry deals with one concept only. However, IATE is also a descriptive database and records the usage of terms referring to a given concept in the given language. If authors or translators use different terms for the same concept, or if different terms for the same concept are used by domain experts in a given language, this is reflected in IATE. To facilitate the choice of an adequate term, these terms may be given labels, such as “Preferred”, “Admitted”, “Deprecated” or “Obsolete”. A preferred term is the best equivalent in the EU context or a term chosen to ensure consistency in EU texts. An admitted term is a term that is correct, but for which better synonyms exist. A deprecated term is a term that should be avoided both in base texts and in translations, because it is not correct or fit for use in EU texts. An obsolete term is a term that was used to denote the concept, but is no longer in use ( IATE User’s Handbook 2020). Although variation and synonymy are recognized as natural, necessary and functional aspects of specialized terminology (Bowker 2020, 269), maintaining the internal and external consistency of terminology in legal acts is still necessary for the sake of clarity and legal certainty.
2.3Machine Translation and terminology
Machine translation (MT) has gained relatively little focus in the legal translation studies, as it is assumed that legal texts are characterized by features that supposedly make them unsuitable for machine translation, like sentence length, syntactic complexity, lexical and syntactic ambiguity, phraseology and divergences at lexical and structural level (Matthiesen 2017, 44–46). Wiesman (2019, 121) adds to this list five more features: terminology, abbreviations, formulaic usage, ellipsis, and text type-specific deviations from normal language usage. Hoefler and Bünzli (2010) also point out to ambiguity and underspecification as barriers to natural language processing by machines.
While many of these concerns are legitimate, there are too few studies in this area to make any definite assertions, and none of these studies manages to actually support the claim that legal texts as a genre are unsuitable for machine translation. For example, Wiesman (2019) analyzed errors in several legal texts representing different genres, translated from Italian into German. The translations were produced by an NMT engine DeepL Translator, available online and for free. Based on that analysis she concluded that the quality of machine translation was insufficient and that machine translation could not be used for the translation of legal texts without significant post-editing effort. However, these conclusions seem to be too generalized and miss several important points. Firstly, MT quality still very much depends on the grammatical similarity between the languages concerned. What follows, even with the same MT engine each language pair will produce different results and one cannot generalize that poor results will be achieved for all language pairs, simply because poor results have been observed for one language pair or another. Secondly, although the basic technology is the same, each engine is different, so good or poor results produced by one engine do not necessarily mean that equally good or equally poor results will be obtained with another engine. And thirdly and most importantly, MT engines are trained on a large database of parallel sentences, so it is the quantity and the quality of data that determines the quality of the raw MT output (and the subsequent post-editing effort). What follows is that an MT engine trained on general texts cannot be expected to perform well when translating specialized texts, just like an MT engine trained on medical texts will be useless when faced with legal texts. In other words, if the text for translation does not resemble the texts in the training data, one should not expect a good outcome.
Often it is argued that complex layers of meaning requiring in-depth interpretation make machine translation unsuitable for translating legal texts (Prieto Ramos 2014b, 271) and that legal translation will remain an essentially human activity for the foreseeable future (Mattila 2013, 22). Indeed, machines do not interpret or even understand texts. As Pym (2019, 441) puts it: machines do not translate, they only search for the optimal translations previously done by a human translator and put those previous translations together in various ways. However, this mindless process can still bring surprisingly good results. Legal texts are in principle good candidates for machine translation, as they are produced according to well-defined drafting rules and guidelines, contain formulaic language, and are repetitive and standardized, all of which reduces ambiguity and variability. When trained on an appropriate corpus, MT is often able to produce a text of sufficient quality to be suitable for cost-effective post-editing.1111.As demonstrated by Farzindar and Lapalme (2009), who studied the possibility to train a statistical translation system on the domain of legal texts to increase the efficacy of translating Canadian Court judgments between English and French.
Machine translation quality has increased substantially since neural machine translation (NMT) replaced rule-based statistical machine translation (RB-SMT) as the mainstream technology. Because NMT takes into account the whole sentence, i.e. all words in the sentence and their context, when generating the final translation (Way 2019, 317), it usually produces fewer lexical, inflection and word order errors than RB-SMT (see e.g. Bentivogli et al. 2016; Castilho et al. 2017; Popović 2017 and Killman, this volume). However, it still makes errors that impact accuracy, and what is more, these errors are often difficult to detect and only a careful comparison with the original can reveal the problem (Stefaniak 2020).
Regarding the accuracy of translating terminology, there have been very few studies, and the results they reported were inconclusive and did not always support the claim of NMT’s supremacy over RB-SMT (see e.g. Chen and Kageura 2019; Haque, Hasanuzzaman, and Way 2020; Vintar 2018). In particular, NMT may omit whole terms or their parts, translate ambiguous terms in the wrong sense and make errors in the translation of terms containing proper names. On the other hand, it can also be ‘creative’ in coining translations of unknown terms, producing sometimes correct translations, and sometimes strange or non-existent words. In particular, NMT creates inconsistencies, translating the same terms in different ways. This is to be expected, as NMT still operates at the sentence level. In legal translation, where both accuracy and terminological consistency are of particular importance, mistakes of this kind adversely affect the quality of the final text (for an overview of the performance of DGT’s NMT system, eTranslation, in translating terms in all EU official languages see Stefaniak 2023).
In the end, the usefulness of any MT engine for any given translation task must be evaluated in view of the requirements that the final translation has to fulfil. Highly perishable content of little value can be translated with the help of machine translation without much consideration, but the longer the translation is supposed to be used, the broader the audience and the higher the risk of negative consequences from translation errors – the more caution should be applied when using MT. It is not the technology itself that is risky, but the lack of awareness of what this technology can or cannot do (Nunes Vieira, O’Hagan, and O’Sullivan 2020, 13).1212. Nunes Vieira, O’Hagan, and O’Sullivan (2020) present a meta-analysis of published research on the use of NMT in two critical settings: medical one and legal one. They found that MT is considered an easy alternative when no professional human translator is available, but without appropriate risk awareness as to how MT can influence decision-making, e.g. in immigration applications or court judgments. And because MT is used for already under-resourced languages, it may actually increase inequalities, instead of promoting inclusion. In view of the above, applying neural machine translation to the translation of legal texts must still be done with caution.
3.Ensuring translation quality through terminology management
3.1Terminology work in DGT
DGT, like all other EU language services, views terminological quality as one of the key aspects of translation quality and places great emphasis on terminology work and on integrating terminology in its translation process (see e.g. Stefaniak 2017; Drugan, Strandvik, and Vuorinen 2018). This means developing tools and workflows that support the identification of new terms, their collection, storage and distribution, and subsequently their enforcement during and verification after translation.
Actors
In DGT, the organization of terminology work is based on the DGT Terminology Framework (Directorate-General for Translation 2021). The DGT Terminology Framework acknowledges that terminology is crucial for the efficiency of the translation process and the quality of the final translations. The main actors – beside the translation managers, whose responsibility is to allocate sufficient resources and assign tasks – are terminologists and translators. Terminologists provide terminological support, e.g. search for terminology equivalents at the request of their fellow translators and make sure that the results of their searches are registered either in the IATE termbank or in departmental language-specific termbases. This support may also include, time permitting, extracting terms from source texts before translation and providing translators with reliable equivalents. It also comprises contacts with experts. All language departments in DGT cooperate with external experts in their Member States and with their national administrations in matters of terminology. This cooperation may be less or more formalized. For example, the Swedish and Polish translators can benefit from a network of contact persons in different ministries and bodies, who answer their terminology questions or convey them to other experts in the field. Similarly, the Romanian Language Department formed the RO+ Excellence network to consult Romanian specialists in different areas whenever it needs an opinion concerning terms for which a clear-cut equivalent into Romanian is not easy to find or when opinions of various specialists diverge. Probably the most institutionalized cooperation exists for the Italian language. In 2005, the Italian Language Department set up the REII network (Rete per l’eccellenza dell’italiano istituzionale), bringing together translation agencies, terminology and language associations, academic institutions, public administration, and translators from other European institutions, mainly to consult and validate terminology in Italian, to work out best practices in the field of terminology, and to promote plain and clear language in administration.
Main terminologists in language departments also participate in terminology work organized at the multilingual level by the Terminology Coordination (TC) Unit. The role of the TC Unit is to ensure a harmonized methodological approach of all DGT actors to terminology work as well as to the feeding (and weeding) of the IATE interinstitutional termbank.1313.A termbank is a large-scale collection of electronic records containing information about terms and concepts they represent, usually developed by an institution and maintained by dedicated terminologists to serve as a resource for in-house translators or translation departments. A termbase is a smaller and more personalized resource produced by an individual translator usually on an ad-hoc basis and for the needs of a particular translation project (Bowker 2015). To this end, it also represents DGT in interinstitutional cooperation bodies on IATE and terminology matters, and acts as the central contact point for external bodies and organizations in the field of terminology. Translators, too, can propose new terms to be introduced to IATE or thematic termbases, but their main role is to ensure terminological adequacy and consistency in the documents they translate.
Term identification
Terminology work starts with term identification. Texts to be translated usually contain terminology from previous legal acts, they may also contain legal terms defined for the purpose of the text under translation. Terminologists are not informed in advance of new or essential terminology so that terms extracted from texts under translation need to be compared with the existing terminology in IATE to identify what is new. If possible, such extraction is done before translation, usually from the first draft of the source text, as part of the TC Unit’s pro-active projects. These extractions are done semi-automatically, because tools for extracting terms are still quite unreliable, either producing too much noise or missing rare terms. Candidate terms extracted by an automatic term extractor always need to be further processed and validated by a terminologist.
Term storage
Terms harvested by means of term extraction are entered into IATE and department terminologists have the possibility to update their respective languages. Again, this may be done before the actual translation starts, time permitting, but most of the time terminology work and translation run in parallel. IATE is considered the central hub for reliable, lasting terminology, and is available for both in-house translators of all EU institutions, as well as external translators, drafters and the general public.1414.See Zorrilla-Agut and Fontenelle (2019) on a detailed description of IATE architecture, data structure and functionalities. However, it has some serious drawbacks. Firstly, it is not a tool for working on terminology. The actual terminology work is done outside IATE and only the results of this work are stored in IATE. The terminology work is done locally, often using Excel tables, which is a convenient format that enables easy editing and can effortlessly be converted into a termbase readable by DGT’s CAT tool. Many language departments also use wiki pages to collaborate on terminology. For example, the Hungarian Language Department has developed a tool to handle terminology helpdesk requests from translators. Translators are notified of any changes concerning their request via an alert mechanism, which sends updates by e-mail, once subscribed to. Previous requests are handled and stored within the tool, and it is possible to search them using various criteria and categories, as well as to export the results into a termbase.
Term enforcement
Once correct terms are found and stored in IATE, it is necessary to make sure that they are used in translation. To this end, the content of IATE is extracted and made available as a termbase to be used in the CAT tool. One drawback of this solution is timing. Extractions are done only periodically, usually every one or two months, which means that all changes made in IATE in the meantime will not be reflected in such a termbase.1515.Currently, the IATE live plugin, which enables real-live queries of the IATE termbank, to be used with DGT’s CAT tool, is being tested. Another drawback of IATE is its size. For many language pairs, extracting a full content of IATE into a termbase results in a file so big that it cannot be reliably handled by the CAT tool. That is why terms in IATE are also organized in thematic collections, for example “Aviation” or “Energy Codex”, which can be extracted independently of other areas.1616.Terms can also be extracted from IATE based on other criteria, like domain or reliability. However, collections are usually more precise and offer more targeted hits. Translators can also compare terms in a source document with the terminology stored in IATE and retrieve a termbase containing the relevant IATE entries (thanks to IATE’s Term Recognition Module, TRM). A full IATE extract, where possible, or extracts of chosen IATE collections, or termbases retrieved via the TRM can then be added to the project to facilitate the use of correct terminology during translation: In the CAT tool, terms or phrasemes in the active segment that are present in the termbases are marked in red (based on automatic term recognition), and displayed in the term recognition window, so that the translator is alerted and can either type the suggested term or insert it into the translation segment via the Autosuggest function.
IATE termbases have a limited functionality, as they are designed to be used in read-only mode, that is, translators cannot add new terms to them. That is why on many occasions it is recommended to add another, project-specific termbase to the translation project, in order to collect and share project-specific terms and phrasemes, especially for handling packages and lengthy documents, which are often translated by more than one translator. Such termbases can then also be used for the purpose of terminology verification, to check the accuracy and consistency of terms in a given text. Additionally, they may be used to import new terms to IATE, thus preserving the translator’s work and saving them time in the future if the same terms appear again in another text.
3.2Tools for ensuring terminological consistency
Having correct, project-specific terminology available during translation is only the halfway point to achieving terminological quality. When finished, the translation needs to be verified for the consistency of terminological solutions it contains. This is where human translators may sometimes fail. The processing capacity of a human brain is limited, so that when cognitive resources are allocated to a particular task (e.g. solving a translation problem), attention to other tasks (like spelling) may be lowered (Muñoz Martín 2009). The same phenomenon may account for terminology mistakes: when a translator focuses on understanding a long and complex sentence, fewer cognitive resources may be available for ensuring consistency with previously used terms. Another reason for mistakes may be reading a text on screen and under time pressure, which decreases the ability to focus on the task at hand and reduces text comprehension (Delgado and Salmerón 2021). Both circumstances constitute the usual working conditions for EU translators.
What is more, the very translation tools that are used to increase the efficiency and/or consistency of translation may be a source of errors. As Okoniewski (2019, 63) puts it, “[t]ranslating with a CAT tool is a great help to the translator, but it is also a source of new types of mistakes”. An internal analysis of errors from corrigenda requested and processed by DGT in 2017 in Bulgarian, Czech, German and Polish highlighted, among others, the fact that, while IT tools had the potential to reduce many errors, concerning in particular terminology, their inappropriate use could also facilitate mistakes (Directorate-General for Translation 2019). This is because today the basic translation unit is no longer the text, but a segment (usually a sentence). At the segment level, a minor change, e.g. a change from intralaboratory to interlaboratory, may have negligible impact on the match, but nevertheless it changes the meaning of the translation completely. Such differences are very difficult to detect by a human eye. The introduction of machine translation into the translation process, particularly neural machine translation, has only exacerbated this problem.
To reduce some basic and most common mistakes, many CAT tools have built-in quality assurance modules. They can help detect e.g. spelling mistakes, mistakes in numbers, in the names of months or days. Other types of mistakes can be dealt with by adding appropriate rules to the QA module of the CAT tool, using regular expressions (Kotwicki 2018). Regular expressions are advanced text search patterns that enable performing search or find and replace actions in programs which support them; they constitute a rule together with a condition for triggering a message and a description (Okoniewski 2019, 65–66). The rules can be used simply to alert translators if they make spelling errors undetectable by a spellchecker (like daft regulation) or if they use deprecated terms or expressions prohibited by the in-house style guides, but a skillfully written rule can also help detect some types of terminology mistakes, for example in names of legal instruments (directive instead of regulation) or names of countries (Russia instead of China).
One difficulty in using regular expressions for detecting terminology errors is accounting for the context. Computers are good at finding errors, providing that the context is unambiguous, which is not always the case (Okoniewski 2019, 64). For example, in the case of EU legal acts the word regulation should be translated into German as Verordnung, but in the case of ECE (Economic Commission for Europe) regulations, it needs to be rendered as Regelung. To account for this difference, the rule needs to be very complex, which often leads to increased ‘noise’, i.e. signaling errors where there are none.
Another problem lies in the ambiguity of language. It seems straightforward to write a rule that notifies the German translator when the month May was not translated as Mai, until one remembers that May can also be a verb (May be sold, May only be fished in Skagerrak) or a proper name (Mrs May said…). Constraining the rule so that it only works when a number precedes or follows the word May will not produce false positives, but will also not detect errors in strings like From May to April (false negatives). And a rule which does not detect errors is not optimal. Hence, writing rules based on regular expressions requires a balance between the number of errors the rule can detect, number of errors it might omit (false negatives) and number of error messages where no error has been made (false positives) (Okoniewski 2019, 67).
Verifying terminology via regular expression rules written manually for each and every translation is, of course, not feasible. Such a verification needs to target only terms present in the translation and is usually based on a project-specific termbase. This is the concept behind dedicated modules for terminology verification available in most QA tools, both those which are part of a CAT tool and the independent QA tools supporting terminology checks. The principle is easy: the tool is instructed to look for terms from an attached bilingual termbase in the source language text and to notify the user whenever the target language term from the database is absent in the target language text. However, such terminology verifiers have a number of limitations. Above all they generate an excessive number of false positives, usually because of differing grammatical rules of the source and target language (van der Meer 2019, 295). Therefore, DGT has supported the development of its own tool, which automatically converts any terminology list to regular expressions and exports it as a QA Checker profile. Although this idea may seem to be simple, it proves to be very powerful. Compared to the DGT’s CAT tool’s built-in terminology verifier (which operates based on a termbase), the checks performed by DGT’s tool produce better results in less time: it works faster, gives considerably fewer false positives and detects more errors.1717.An internal test on the consistency of legally defined terms from Commission Regulation (EU) 2019/2020 laying down eco-design requirement for light sources and separate control gears in translations from EN into six EU languages from different language families (DA, DE, FR, PL, PT, SK) revealed that DGT’s terminology verifier produced in total only 31 warnings, correctly identifying 30 errors and omitting no errors, in one tenth of the time taken by the CAT tool’s terminology verifier, which produced 129 warnings, identifying only 16 errors, omitting 14 errors and giving 112 false warnings. The tool can also extract legal definitions from published legal acts, creating a bilingual list, which can also be converted into a termbase and added to a translation project, and it can compare term lists from different sources against terminology stored in IATE. In this way, DGT’s terminology verifier also supports collecting relevant terminology before translation and enforcing it at the end of the translation process.
The tool still has limitations. As it is based on regular expressions, and the rules are created automatically, not manually, the context cannot be accounted for properly. Therefore, it will not give reliable results for single-word terms, especially short ones, which have homonyms that are translated differently. However, this is a common issue for all terminology verification tools. Another problem is catering for embedded terms, for example when the terms freezer and ice-cream freezer occur in the same text, but freezer as a single term is translated differently into the target language than freezer in the embedded term. Such terms can be taken care of by forbidding embedded term matching in the settings: if the term list contains both terms, freezer and ice-cream freezer, the checks for freezer will ignore all instances of ice-cream freezer.
Last but not least, differing linguistic structures pose a difficulty for regular expressions. Consider the following example:
Where the cargo area of a complete or completed vehicle of category N or O is modified.
Lorsque la zone de chargement d’un véhicule complet ou complété de catégorie N ou O est modifiée.
Here, a simple rule would produce a false positive message: it would find completed vehicle in the source, but would be unable to find véhicule complété in the target, because of the extra words between véhicule and complété. Thanks to customized settings, DGT’s terminology verifier can also resolve such issues.
Despite many limitations, QA tools are irreplaceable, no matter how experienced the translator may be, because some checks are simply impossible to do manually in a reasonable time (van der Meer 2019, 295). They are not meant to replace the manual quality control measures, but they can free the translators from some tedious and repetitive tasks and at the same time enhance their abilities stepping in where humans tend to fail.
4.Terminology and multilingual concordance
In spite of the numerous safeguards in place, inconsistencies and errors in translations, as well as discrepancies between language versions cannot be fully avoided. No quality assurance system is infallible, and no workflow or tool can guarantee an error-free multilingual text production (see e.g. Drugan 2013). Such errors are a potential threat to the predictability and foreseeability of EU law and need to be rectified. Requests to correct published EU legal acts are dealt with by the EU institution that adopted a given act. Corrections to the acts of the European Parliament and the Council are handled by lawyer-linguists (in the legal services of these institutions); errors in the language version that served as a source text for the translated language versions of autonomous acts of the European Commission are handled by the authoring Directorate-General of the Commission; and errors in the translated language versions are corrected by the Directorate-General for Translation under what is known as empowerment SEC(2008)2397 (European Commission 2008). It needs to be stressed that under this empowerment not all errors qualify for a corrigendum, but only clear and obvious translation errors that do not affect the substance of the adopted act and which are identifiable beyond doubt in comparison with the source text. If this is not the case, the errors need to be corrected via a correcting act, which is a procedure corresponding to the adoption of the initial act.
Terminology errors are the second most frequent type of errors corrected by DGT via a corrigendum.1818.Until recently, errors in corrigenda have been categorized to one of the following: mistranslation, terminology, omission, excess, references, clarity, grammar, spelling and punctuation. The last three types of errors could only be corrected in conjunction with corrections of the other types in the same text passage. As of January 2022 the error typology looks as follows: mistranslations, terminology, linguistic norm, general style, job-specific style and design. In 2020, terminology errors accounted for 22% of all the errors, in 2021 – 17%, and in 2022 – 24%. An increasing focus on terminology in corrigenda was observed by Biel and Pytel (2020), who analyzed corrigenda to Polish language versions of EU legal acts in two periods: 2004–2006 and 2015–2017. They detected the following trends in corrections of terminology errors: standardization of EU institutional terminology, stabilization of equivalents, domestication with terms of national law, replacement of an equivalent which triggers an inadequate concept, domestication of term-embedding collocations and elimination of intra- and intertextual variants, with the latter being the most common cause of corrigenda in the second analyzed period (Biel and Pytel 2020, 161–162). An increase in the number of corrigenda and corrections in EU legal acts was also observed by Prieto Ramos (2020c), who investigated corrigenda to French and Spanish language versions from 2005, 2010 and 2015, with incorrect terminology being again the second most frequent error category after mistranslations. This increase may be explained in part by closer cooperation between DGT language departments and the national authorities, and in part by encouraging feedback mechanisms in DGT and the development of tools enabling easier checks on terminological consistency.1919.In 2020, 42% of correction requests came from Member State authorities and 32% were triggered by DGT. Feedback from other EU institutions accounted for 8% of accepted corrigendum requests, from author DGs – 7%, Publications Office – 4%, Commission’s Legal Service – 3%, stakeholders, private companies or citizens – 4%. In 2021, 34% of correction requests came from DGT and 30% were initiated at the request of Member States. Feedback from other EU institutions accounted for 7% of corrigenda, from author DGs – 9%, Publications Office – 9%, Commission’s Legal Service – 5%, stakeholders, private companies or citizens – 5%, Secretariat General – 1%.
A terminology error occurs when the appropriate terminology has not been used in translation. In other words, to qualify as a terminology error, the error needs to concern a term which is translated with a term or lexeme other than the one expected within the domain or otherwise specified. In particular, terminology errors are considered to include: a failure to use appropriate domain-specific terminology, e.g. EU terminology available in IATE, a failure to adhere to the defined terms from the underlying or related legal acts, or the terminology of reference documents, and an inconsistent use of terminology within the text. For instance, the translation of annual energy consumption into Italian as consumo energetico annuo qualifies as a terminology error, because this term is defined in a basic legal act, where it is translated as consumo annuo di energia. In cases where the error concerns a term but where there is also a semantic distortion (different meaning), e.g. when household washing machine is translated into French as appareil de réfrigération, this should be categorized as a mistranslation.
Not all errors found in translations of EU legal acts are translators’ errors.2020.For example, an internal analysis by the German Language Department of 2019 on the causes of errors in view of their avoidability revealed that only ca. 40% of errors could have been avoided by the use of appropriate QA tools or termbases, or following recommended workflows. By comparison, ca. 5% of the errors were caused by errors in the original, and ca. 45% could have been attributed to insufficient subject domain knowledge, i.e. these were errors that could have been avoided only by means of a revision by an expert in the field. Some of them stem from errors present already in the source text. Other errors result from the ambiguity in the original text. For example, consider the sentence A preparation of sorbitan monolaurate containing > 95% of a mixture of sorbitol, sorbitan, and isosorbide esters, esterified with fatty acids derived from coconut oil, where it is not clear whether esters should refer to sorbitol, sorbitan and isosorbide or to isosorbide only. Yet other errors could have been avoided only thanks to a very thorough domain competence, going beyond the knowledge of terminology. For instance, Commission Regulation (EU) 2016/631 contains the phrase activating the provision of active power frequency response, occurring twice in that Regulation, which in the Polish language version was consistently translated in both locations as aktywowanie rezerwy mocy czynnej w odpowiedzi na wzrost częstotliwości (back translation: activating the reserve of active power in response to frequency increase). In 2019, however, a request for a corrigendum was received to change the word wzrost (increase) to spadek (decrease) in one of the two locations. An expert was consulted and confirmed that there was indeed an error in this particular location and the equivalent should be spadek (decrease). It was the context of this provision and a picture attached to it that made it clear to the expert that the frequency response in the English version had to be interpreted as a decrease, not an increase.
Not all discrepancies between language versions are errors, either. In fact, no two texts in different languages can ever have the same meaning (Schilling 2010, 50). A telling example is the equivalent of wildlife in Regulation (EU) No 139/2014. In 2018, the Polish authorities asked DGT to correct the translation of wildlife in the Polish language version of this Regulation. The Polish authorities argued that the main purpose of the Regulation was to provide adequate requirements for the safety of aircraft operations in the case of a presence of any animal on the aerodrome and its surroundings. Historically, this issue applied only to birds, but in view of observed and recorded collisions it was decided by the legislator to extend this requirement to all other animals. It was argued that the Polish equivalent of wildlife, dzika zwierzyna (back translation: wild game), did not reflect the intended meaning of the English term and narrowed it to wild animals only. According to the Polish authorities, it was not important for the safety of aircraft, passengers or ground staff whether the animal was living in wild or was kept, and the wording did not give aerodrome operators the possibility to act towards all animals, e.g. pigeons, dogs or cats. Moreover, the word zwierzyna was a poor choice, as it usually refers only to animals that can be hunted (game).
A corrigendum was drafted in line with the Polish authorities’ request, changing dzika zwierzyna to zwierzęta (animals). The French language version supported this change, as it used the word animaux. Because such a change would have been substantial, the Directorate-General for Mobility and Transport (DG MOVE), responsible for the original text, was consulted. It confirmed that the English term wildlife was correct and that it included birds, but did not cover domestic animals, such as cats or dogs. Additionally, DG MOVE referred to the origin of this provision, namely ICAO (International Civil Aviation Organization) Annex 14. However, in the French version of this Annex the term wildlife was translated as oiseaux/animaux, which raised concern as to the adequacy of the French translation of the term in Regulation (EU) No 139/2014. The subsequent analysis of other language versions revealed some other possible discrepancies: the German and Dutch versions have “wild animals” (Wildtiere, wilde dieren); the Italian and Portuguese versions feature “forest animals” (fauna selvatica, animais selvagens); the Czechs and Slovaks opted for “free living animals” (volně žijící živočich, voľne žijúce zviera); the Spanish version, like the French one, refers only to “animals” (fauna); and the Swedish version has “wild animals and birds” (vilt och fågel).
In the end, the Polish corrigendum was only partially accepted by the Legal Service with the argument that the majority of language versions of Regulation (EU) No 139/2014 did not use the term as broad as the French animaux. Only the change to dzikie zwierzęta (wild animals) was accepted to remove the unintended reference to wild game, present in dzika zwierzyna. The discrepancy between the English and the French version was deemed unproblematic, as it existed at the international level and had apparently not raised any issue.
Accurate and consistent terminology is of primary importance for the quality of translation and ultimately for the uniform interpretation of EU law and legal certainty. However, while terminological errors in EU legal acts are a serious concern, they are not necessarily a result of inadequate translation. By the same token, a corrigendum should not always be regarded as an indication of quality assurance failure. As Biel and Pytel (2020, 161) correctly observe, a corrigendum is above all “a measure of actions taken to correct errors rather than a measure of error incidence in itself”. Prieto Ramos (2020c, 129) even considers the growing number of corrigenda and corrections to EU legal texts as a sign of effectiveness of the system as a whole, which prevents more serious consequences at a later stage. On the other hand, Bobek points out to the risks of using corrigenda, many of which he considers meaning-changing, as an “ex post catching up on translation work which should have been done at the drafting stage” (2009, 957); these risks include retroactive consequences to the acquired rights and legitimate expectations of those concerned. So even if the number of corrigenda is not worrying yet, taking into account the volume of texts translated yearly by DGT under growing time constraints and staff cuts, it should serve as a warning of what happens when time, not quality, becomes the most important variable in the translation process.
5.Conclusions
EU texts are “clearly LSP texts, but with added idiosyncrasies” (Sosoni 2012, 80). These idiosyncrasies stem from the fact that these texts do not belong to one specific language or culture, but reflect textual and legal traditions of all EU official languages. EU texts are produced not by one author, but collectively, through numerous drafts and versions, by means of multilingual negotiation and translation taking place in several institutions. This makes the translation of EU legislation a special sub-genre of legal translation, with its own unique challenges and implications (Biel 2007).
One of these challenges is translating EU terminology accurately and uniformly. In this regard, EU translators and terminologists face a constant tension between creativity and conformity, between ensuring coherence within the EU legal system as a whole and at the same time providing compatibility with their national terms and concepts (Šarčević 2015). Ultimately, however, the goal of EU translators is to produce a text that conforms to the inter-institutional drafting rules and templates, so that in the end all language versions can be considered equally authentic. Equivalence with the source text in terms of accuracy and consistency takes precedence over the readability and clarity of translation. In other words, “legal consistency cannot be sacrificed in the name of readability whenever this might affect legal certainty” (Prieto Ramos 2015, 20).
To achieve this goal, DGT has developed a translation-oriented terminology workflow that supports terminology management, enforcement and verification at all stages of the translation process. This workflow clearly defines the roles of all actors, enables cooperation between DGT and other law-making EU institutions on the one hand, and between language departments and national administrations on the other hand, as well as fosters creativity and bottom-up innovations, such as the development of an in-house tool for verifying terminological consistency.