Machine translation and legal terminology: Data-driven approaches to contextual accuracy
1.Introduction
This chapter discusses what can be expected from data-driven machine translation (MT), namely statistical machine translation (SMT) and neural machine translation (NMT) in the area of legal terminology, which is often considered one of the foremost difficulties faced by legal translators and a primary reason why legal translation itself is often considered one of the most challenging areas of contemporary translation practice. SMT and NMT are both statistical systems that make extensive use of voluminous corpora and are credited with significantly improving MT output quality in the past couple of decades. Though SMT marked a paradigm shift from rule-based MT, which relies on linguistic information input, and NMT is currently being implemented to overcome SMT weaknesses, ambiguity remains a challenge for natural language processing with computers (e.g. Arnold 2003; Bar-Hillel 1960; Forcada 2010; Killman 2015; Koehn 2010; Koehn 2020).
Legal terminology, for its part, is subject to a variety of textual and extratextual factors or constraints which this chapter regards as context. On the one hand, context can be seen as having a bearing on how a term or phraseme should be understood in a particular situation when it is possible that the item may be interpreted differently in another situation or set of circumstances. In this regard, legal terminology may be particularly prone to different forms of ambiguity (e.g. Alcaraz Varó and Hughes 2002; Chromá 2011; Duro Moreno 2012; Glanert 2014; Killman 2014; Killman 2017; Prieto Ramos 2014; Simonnæs 2016). Linguistic concept designations may have more than one meaning depending on aspects of context such as legal and non-legal meanings, while phraseological or other lexical combinations may contain ambiguous words or need to be interpreted as a whole in order to be rendered adequately across languages.
On the other hand, context may prioritize how certain translation renditions should be drafted when the meanings or functions they convey may be written in a variable way depending on the situation or circumstances. Translators might tailor terminological and phraseological translation solutions according to specific legal traditions, systems, genres, stylistic expectations, among others. For example, legal terminology is often system-bound and cannot be translated straightforwardly into another language with a different legal system, resulting in translators producing different translation solutions according to specific contextual parameters on a case-by-case basis.
While these contextual constraints–in terms of how legal terminology should be interpreted and how translations of terms and phrasemes should be worded–may very well pose significant challenges for MT, corpus-based approaches have significantly made MT’s contextual Achilles’ heel less vulnerable and mark its most significant gains in accuracy. Such gains stem from how systems analyse source text (ST) and draw on corpora to provide translation renditions, as well as from the degree of relatedness of the sources of corpora themselves. It remains to be seen, however, if or to what extent NMT advances over SMT can be specifically attributed to terminological accuracy, especially in a discourse domain as specialized as the law and according to human evaluations and not automatic metrics (e.g. Castilho et al. 2017a; Toral et al. 2018).
This chapter discusses how the features of these data-driven approaches to MT may or may not ensure legal terminology output that is semantically and lexically suitable depending on various textual or extratextual circumstances. Section 2 discusses the various sources of legal translation challenge at the terminological and phraseological levels with an eye to different areas of contextual constraint. Section 3 reviews basic SMT and NMT architectures and contextual concerns, while Section 4 reviews studies on MT and legal texts and the extent to which and how legal translation, terminology, and phraseology have been addressed. Section 5 provides some conclusions and possible future avenues.
2.Translation of legal terminology
The translation of legal terminology is considered an area of specific challenge for legal translators when it comes to establishing equivalents, understanding ST, and drafting target text (TT).
Target language (TL) equivalents
In the case of translation across different legal systems, references abound emphasizing the inter-systemic conceptual incongruity occurring at the terminological level as a primary source of translation challenge (e.g. Alcaraz Varó and Hughes 2002, 47; Biel 2017; Borja Albi 2005; Cao 2007; Chromá 2011; Chromá 2014; Duro Moreno 2012; Harvey 2002; Orts Llopis 2007; Matulewska 2013; Šarčević 1997; Way 2016). The difficulty of translating system-bound terms without stable TL equivalents may lead to the tailoring of translation solutions on a case-by-case basis depending on factors including genre and the prescriptive or descriptive nature of the ST and TT. These types of terms may include any variety of legal concepts, procedures, names of specific laws, institutions, legal professions, instruments, among others. In the US context, for example, larceny is more encompassing than hurto, to which it is often equivalent in Spanish. In cases where larceny may, however, refer to forced entry into a building or vehicle, its equivalent in Spain would be robo instead of hurto, since the key distinction between these two terms is whether the theft occurs with or without force (con o sin fuerza en las cosas) (Bestué and Orozco Jutorán 2010).
ST comprehension
Legal terminology can also involve considerable lexical ambiguity (e.g. Alcaraz Varó and Hughes 2002; Chromá 2011; Duro Moreno 2012; Glanert 2014; Killman 2014; Killman 2017; Prieto Ramos 2014; Simonnæs 2016). The meaning of lexical units may vary. For example, so-called “semi-technical” terms are an example of this type of legal terminology in that they “have one meaning (or more than one) in the everyday world and another in the field of law” (Alcaraz Varó and Hughes 2002, 158–159). Furthermore, legal terms may have more than one legal meaning depending on the context. Complaint is a good example of an ambiguous term. Its everyday meaning, of course, is an expression that something is wrong or causes dissatisfaction, and the term has not one but two legal meanings: (1) the initial document or pleading filed by a plaintiff against a defendant in a lawsuit or (2) a document submitted by a lead investigator in an alleged commission of a federal crime to establish probable cause. This criminal complaint, like an information (a related semi-technical term denoting a document in which a federal prosecutor may file criminal charges), may be filed in offenses where it is allowable and doing so would be speedier than obtaining an indictment from a grand jury.
While lexical ambiguity is more prevalent in the case of single-word terms, it can also involve multiword semi-technical terms and phraseology. Spanish examples include causas de (grounds for), acción infundada (unmeritorious proceedings), or acción ejecutiva (enforcement proceedings), which in non-specialized contexts could easily be translated as “causes of,” “unfounded action,” and “executive action,” respectively (Killman 2017, 866–867). Ambiguity can also go beyond legal and non-legal variance, as “many multi-word terminological phrases have more than one legal meaning and their exact meaning in a particular context is sometimes quite hard to identify” (Chromá 2011, 37). Chromá (2011, 37) illustrates this point with legal remedy, which she argues must be translated differently according to whether the context is general legal texts or the law of equity. The frozen multiword structures of these terms may lead translators to believe these terms are specific to a particular legal context or likely do not present semantic variance in another legal context. Whatever the source of ambiguity may be, Alcaraz Varó and Hughes (2002, 17) assert that semi-technical terms “are more difficult to recognize and assimilate than wholly technical terms,” and Chromá (2011, 37) claims that “vocabulary acquiring its precise legal meaning in a particular legal context is the most difficult for a translator to tackle and transmit into the target text properly”. There is a risk that these terms are not interpreted according to their intended legal meanings.
Ambiguity can remain problematic even in cases where multiword terms and phraseology are not entirely polysemous but nonetheless contain words which are. Such terms or phrasemes may also need to be translated as single units (e.g. complex prepositions) or with very limited room for decomposition (e.g. collocations). De conformidad con (pursuant to), en lo referente a (concerning), and con arreglo a (under) are examples of complex prepositions in Spanish that are best translated as single units, while file/lodge/bring an appeal or enter into a contract are collocations with their own stock phraseology in Spanish (interponer un recurso/una apelación and celebrar/formalizar un contrato). Legal phraseology can be particularly formulaic and linked to specific co-text patterns, as well as different aspects of extratextual context that translators must be aware of (Kjaer 1990; Vanallemeersch and Kockaert 2010).
TT drafting
In terms of TT drafting, the same lexical unit conveying the same meaning but in different contexts may also have to be translated in an accordingly variable way. Alcaraz Varó (2009, 192) views this type of phenomenon as stemming from linguistic anisomorphism in the translation of legal texts, which he stresses “cannot, at any rate, be reduced to a simple question of polysemy or of false friends; it is more complex than that”. Linguistic anisomorphism can be understood as asymmetry “based on the fact that languages are not objective correlates of the real world and each one structures and divides reality in a different way” (Franco Aixelá 2022). One of the examples Alacaraz Varó (2009, 186–187) provides is responsable, a technical term in Spanish that may be translated to English as “answerable,” “accountable,” “liable,” or “responsible”. The first two terms are closest synonyms, whereas “liable” is most often legal, and “responsible,” mostly moral in nature. The Spanish term ajenidad presents similar challenges, in that adequate translations of ajenidad may range from “(paid) employment,” “work as an employee/employed person,” or “individual/person working as an employee/under the employ of another,” depending on the circumstances.
Also involving drafting challenges are so-called everyday terms, terms in general use with considerable frequency in legal texts (Alcaraz Varó and Hughes 2002, 18). These “terms are easier to understand than to translate, precisely because they tend to be contextually bound” (Alcaraz Varó and Hughes 2002, 162). For example, appear and appearance in everyday contexts may, respectively, be aparecer and aparición in Spanish, but in the context of court, they may be comparecer and comparecencia instead. A Spanish phraseological example is situarse en la misma línea, which may be translated, for example, as “follow the same course” or “be along the same lines,” but not as “situate/position in the same line,” a word-for-word rendition that would be unidiomatic or contextually out of place.
A final terminological category in this section, supranational or international legal terminology, may involve both source ambiguity and target drafting peculiarity. There is the possibility that concepts come from specific national legal systems and undergo semantic adaptation (Prieto Ramos 2014, 318). EU legal instruments may borrow terms and phrasemes from national institutions and systems (e.g. Glanert 2008; McAuliffe 2009; Prieto Ramos 2014; Šarčević 2018) or from international organizations such as the UN or WTO (e.g. Robertson 2011; Šarčević 2000). These interconnected contexts marked by increasing interplay between national, international, and supranational systems give rise to overlap, polysemy, and interpretation and/or translation challenges (McAuliffe 2009, 107; Prieto Ramos 2014, 318; Robertson 2011, 53). Moreover, national systems incorporate EU legal terminology (e.g. Biel 2007; Garrido Nombela 1996; Killman 2017; Pym 2000), which adds another layer of complication when translating national legal texts. Whatever the case may be, translators may often have to resort to official institutional TL designations or specific phraseological patterns that must be consistently adhered to in the TL instead of any other semantically plausible renditions or at least distinguish in which cases specific institutional terminological or phraseological wording is not required in the translation.
The legal terminological and phraseological challenges discussed in this section, as well as other such challenges, have motivated translation studies researchers to contemplate the importance of a series of “entornos” (dimensions) (Duro Moreno 2012), “dimensions” (Matulewska 2013), “parameters” (Prieto Ramos 2016), “context” ( Alcaraz Varó and Hughes 2002, 37; Kjaer 1990; Vanallemeersch and Kockaert 2010), or “anisomorphism” (Alcaraz Varó 2009). These criteria tend to weigh the importance that textual or extratextual factors may have on informing translation decisions. In particular, the importance of co-text or surrounding text is emphasized to disambiguate or distinguish different senses or usage (Alcaraz Varó and Hughes 2002, 37; Duro Moreno 2012; Kjaer 1990; Vanallemeersch and Kockaert 2010). Another textual element is intertextuality or a text’s relationship with other texts (Duro Moreno 2012), which is an important consideration not only when grasping the semantics of source terminology but also when deciding on adequate ways of rendering translation equivalents according to relevant extratextual context such as a “communicative community” (Matulewska 2013).
Drawing on this complex array of contextual parameters has always been a considerable source of challenge for human translators of legal texts. The extent to which machines can do so will be limited to the relatedness of the textual resources they rely on and on the resourcefulness of how they process them.
3.Data-driven MT
SMT and NMTBy the end of the first period of MT development during the late 1950s, optimism was fading. Early research challenges underscored how systems at the time had serious contextual blind spots. The main issue, as pointed by Yehoshua Bar-Hillel, was that the real-world knowledge of humans could never be replicated by artificial intelligence (Bar-Hillel 1960). The relevant contexts of words being translated were often missed. The systems at the time relied on limited data sets such as dictionaries or grammars, and the heuristic capabilities were restrained.
Fast forward to the 2000s and SMT (Koehn, Och, and Marcu 2003) enters the scene in earnest, marking a paradigm shift from rule-based MT to MT trained on large amounts of corpus data, which may comprise millions of sentences in one language that have been aligned with equivalent sentences in another language. SMT was the state of the art until not long ago (Forcada 2017, 292). For example, at the end of 2016, Google Translate transitioned to a neural system, after having operated as a phrase-based statistical system since 2007 when it replaced its previous rule-based system (provided by Systran). In the case of the parallel corpora a system draws on, it is important to note that it draws on past translation work completed by humans, rendering it “essentially a tool for massive sophisticated plagiarism” (Bendana and Melby 2012, 45). A good deal of the bilingual data on which systems are trained comes from supranational or international organizations with an abundance of documentation concerning laws, justice, or legal matters, such as the European Union and the United Nations (e.g. Crego et al. 2016; Junczys-Dowmunt, Dwojak, and Hoang 2016; Koehn 2005; Koehn 2010, 53; Koehn and Knowles 2017).
Regardless of the MT system, it is important to be aware of where computers and human language might very well run into problems when it comes to natural language processing for translation purposes. Arnold (2003) divides problematic areas according to two stages: analysis and transfer. A particularly relevant area noted by Arnold (2003), Forcada (2010), and Koehn (2010, 2020) covers the analysis problem of ambiguity, which is frequently an issue in legal translation and especially challenging as described in the previous section. According to Kohen (2020, 5), ambiguity is the:
one word that encapsulates the challenge of natural language processing with computers […] Natural language is ambiguous on every level: word meaning, morphology, syntactic properties and roles, and relationships between different parts of a text. Humans are able to deal with this ambiguity somewhat by taking in the broader context and background knowledge, but even among humans there is a lot of misunderstanding.
While humans may often be touted as superior by default, humans are often prone to misunderstanding. Another relevant challenge, pointed out by Arnold (2003) and Forcada (2010), refers to the transfer problem when two languages do not structure meaning in the same ways. As Arnold (2003, 122) explains, when straightforward correspondence is undesirable, the transfer problem can give way to “translationese”. For the purposes of legal terms and phrases, this issue might refer to departure from authentic phrasing according to an area of law or legal genre or context area, contextually tailored translation solutions in cases of legal conceptual incongruence, and/or time-tested or “established” translation solutions (Molina and Hurtado 2002, 510), which often exist in certain legal domains (Biel 2008).
A mechanical approach to language is limited to programmable and computable processes that completely rely on textual resources (Forcada 2010, 216). Unlike humans, computers cannot draw on relevant extralinguistic context or on an actual understanding of text (Melby and Foster 2010, 11). While a computer is at an unfair disadvantage in this regard, data-driven MT quickly processes large amounts of potentially relevant data in increasingly sophisticated ways that cannot be replicated by humans. For these reasons, it is now more important than ever to clearly delimit the division of translation labour between humans and machines in a way that takes full advantage of the former’s greater ability to understand translation needs and the latter’s greater ability to quickly process large amounts of data.
In the case of both SMT and NMT, the basic premise is that “a target sentence is a translation of a source sentence with a certain probability of likelihood” (Forcada 2017, 300). Nevertheless, how each type of system determines this probability varies and may very well have a terminological accuracy effect in as contextually peculiar a domain as the law, especially if corpora are not (entirely) homogeneous. SMT, for its part, constructs translations by linking translations of phrases that need not necessarily be categorizable as constituents in the syntax of a sentence (Kenny and Doherty 2014, 284; Forcada 2017, 301; Koehn 2010, 127). Both the source phrase and the target phrase are the result of chunking source content into multiword subsegments that are selected as the SMT system is trained on the parallel corpora. As Forcada explains (2017, 301), first the system aligns source words to the target words of these phrases depending on probabilities acquired from the bilingual corpus, then it identifies source and target phrases that are compatible with these individual word alignments and in what is referred to as a translation table, it assigns scores to these phrase pairs. These phrase pair scores, in a process known as tuning (Kenny and Doherty 2014, 283), are combined with TL probabilities that are computed from very large corpora in the TL to select those phrases which are “best”. If the system “has the choice of using a longer phrase translation, it tends to use it. This is preferable because longer phrases include more context. Of course, longer phrases are less frequent and hence less statistically reliable” (Koehn 2010, 141). So while a relevant longer phrase translation exists, it will not be selected by the system if it is not statistically reliable enough. What systems can do to mitigate this issue of statistical reliability is referred to as lexical weighting, which means decomposing a rare phrase pair into its word translations to check how well they coincide. For instance:
if an English word is aligned to multiple foreign words, the average of the corresponding word translation probabilities is taken. If an English word is not aligned to any foreign word, we say it is aligned to the NULL word, which is also factored in as a word translation probability.(Koehn 2010, 139)
Lexical weighting is basically a discounting method to smooth the phrase translation probability by relying on probability distributions supported by “richer statistics and hence more reliable probability estimates” (Koehn 2010, 139). It, of course, may, however, discount a statistically improbable yet desirable translation phrase as a bad phrase. On the whole, it is important to comprehend that in the case of SMT, the probability of a TL sentence is a calculation that is based on the joint probabilities of the phrase-pairs obtained (Forcada 2017, 301; Kenny and Doherty 2014, 281).
NMT, for its part:
uses a completely different computational approach: neural networks […] composed of thousands of artificial units that resemble neurons in that their output or activation […] depends on the stimuli they receive from other neurons and the strength of the connections along which these stimuli are passed.(Forcada 2017, 292)
The activations of individual neural units in most systems only make sense when joined with the activations of other neural units in layers. The hundreds of neural units that these layers often contain are connected by weights, which connect all the units of one layer with all those in the following layer (Forcada 2017, 295). These groups of neural units “build distributed representations of words and their contexts, both in the context of the source sentence being processed and in the context of the target sentence being produced” (Forcada 2017, 293–294). It is important to note as well that representations tend to be “deep”; they are built in stages or in layers of less profound representations, with each layer giving rise to thousands of connections (Forcada 2017, 295). These representations of knowledge are more multidimensional or deeper than in the case of SMT. Simply put, subsegments and their translations are not identified in a direct way, as the translation output “is produced word by word taking the whole source segment into account” (Forcada 2017, 301). The probability of the target sentence is computed by examining the probability of each target word, considering both the source sentence and the preceding words in the target sentence (Forcada 2017, 300–301).
While SMT builds translations of sentences in piecemeal fashion by training subsegments independently, NMT “attempts to build and train a single, large neural network” (Bahdanau, Cho, and Bengio 2015, 1), “whose connection weights are all jointly trained” (Forcada 2017, 301). The NMT systems that tend to be considered optimal combine encoder-decoder architectures (Sutskever, Vinyals, and Le 2014) with attention models (Bahdanau, Cho, and Bengio 2015). In this set-up:
the full context of the sentence [i.e.] all source words and their content […] are encoded in a single numerical representation […] which is sent to the decoder to generate a target-language string. […] Rather than accepting that all source words are equally important in suggesting target-language words, the attention model (similar to word and phrase alignments in SMT) demonstrates which source words are most relevant when it comes to hypothesizing target-language equivalents. In practice this means that each translation is generated from specific encoder states, with information which is much less relevant from other words–perhaps some distance away from the current word or focus and of little or no relevance to its translation–being ignored.(Way 2020, 317)
On the basis of this NMT configuration, it appears a sort of tightrope is walked between a more holistic and an immediately relevant prioritization of co-text, the source of context on which systems rely. An SMT system, by only being able to prioritize more immediately surrounding co-text, risks faltering when further away co-text could be more relevant. While a state-of-the-art NMT system tries to balance the two co-texts, it risks swinging too far on either side of the pendulum. Given this dilemma, the question remains about which approach might yield more consistently given the various contextual patterns and constraints surrounding legal terminology.
In non-legal domains, NMT output is generally found to be higher in quality, especially when it comes to fluency (e.g. Bentivogli et al. 2016; Bojar et al. 2016; Castilho et al. 2017b; Forcada 2017, 305; Moorkens 2018; Toral and Sánchez-Cartagena 2017). However, there are exceptions, especially in the case of adequacy or semantic accuracy (e.g. Castilho et al. 2017a; Koehn and Knowles 2017), an area where terminology, regardless of domain area, is indeed an important concern. Furthermore, NMT quality may suffer when faced with translating rare or infrequent words (Sennrich, Haddow, and Birch 2016; Wu et al. 2016), which are also a concern when it comes to domain-specific terminology, such as in the legal domain.
In any event, legal terms and phrasemes risk vulnerability to MT. As pointed out, legal terminology is especially prone to lexical ambiguity that may not only be difficult for humans to deal with, but especially challenging for natural language processing with computers relying only on written resources. Moreover, legal TL drafting challenges are such that terms and phrasemes may need to vary according to a variety of circumstances such as the legal area, genre, system, tradition, or stylistics. Such factors in addition to the potential rarity of legal terminology may indeed have a special effect on MT, perhaps more so than in other specialized translation domain areas, and are indeed worthy of study in their own right.
4.Research on machine translation of legal texts
Legal terminologyThough the present chapter focuses on data-driven MT, a study by Yates (2006) tested the accuracy of Babel Fish translating portions of civil codes from Mexico and Germany and press releases from the foreign ministries in these countries. Babel Fish was a direct rule-based MT system that was freely available on the Web and provided by Systran (like the rule-based Google MT system). While results were mostly considered poor in this study, the German results were less poor than the Spanish ones. Below one can appreciate an output example from Yates (2006, 495), which is a translation of Article 2226 of the Mexican Civil Code11.Código Civil Federal: Nuevo Código publicado en el Diario Oficial de la Federación en cuatro partes los días 26 de mayo, 14 de julio, 3 y 31 de agosto de 1928. https://www.diputados.gob.mx/LeyesBiblio/pdf/2_110121.pdf and is also accompanied by a professional translation of the same sentence, which she used as a gold standard or reference translation:
La nulidad absoluta por regla general no impide que el acto produzca provisionalmente sus efectos, los cuales serán destruidos retroactivamente cuando se pronuncie por el juez la nulidad.
The absolute invalidity as a rule does not prevent that the act produces provisionally its effects, which will be destroyed retroactively when the invalidity is pronounced by the judge.
Absolute nullity, as a general rule, does not prevent an act from having provisional consequences, which can be retroactively abolished upon an adjudication of nullity by a judge. 22.Código Civil Federal: Nuevo Código publicado en el Diario Oficial de la Federación en cuatro partes los días 26 de mayo, 14 de julio, 3 y 31 de agosto de 1928, trans. Abraham Eckstein and Enrique Zepeda Trujillo (St. Paul, Minnesota: West Pub. Co., 1996).
This is one of the Spanish outputs that was translated best despite its various errors (Yates 2006, 495). As we can see in this case, phraseology is particularly problematic, such as in the case of producir provisionalmente efectos and pronunciar la nulidad, which were unsuccessfully handled by this rule-based system. The first example would be an example of an everyday phrase which should be rendered in a contextually unique way, such as “have provisional consequences” and the second example, an ambiguous technical phrase that should be translated appropriately as “adjudication of nullity”. While some may prefer “nullity” instead of “invalidity,” both terms are likely fine in this instance. Finally, to illustrate progress made by data-driven MT, the following are a couple of renditions produced by DeepL, an NMT system, and Google Translate’s current NMT system:
Absolute nullity as a general rule does not prevent the act from provisionally producing its effects, which will be destroyed retroactively when the nullity is pronounced by the judge.
The absolute nullity as a general rule does not prevent the act from provisionally producing its effects, which will be retroactively destroyed when the nullity is pronounced by the judge.
As can be observed, there are considerable improvements in fluency, but the phraseology issues remain unresolved. This underscores the fact that data-driven MT or NMT is by no means perfect, especially when the corpora on which such a system draws are not highly specifically related.
Jumping to the era of SMT, there are several studies involving SMT and legal translation (Farzindar and Lapalme 2009; García 2010, 2011; Gotti et al. 2008; Killman 2014; Şahin and Dungan 2014). Gotti et al. (2008) present results from an SMT system they designed called TransLI (Translation of Legal Information) to assist the Canadian federal courts with their requirement to produce English and French translations of judgments. The system was trained on corpora from the same courts and thus attained positive results according to various automatic metrics and outperformed the open-domain Google Translate, which was an SMT system at the time. It also helped that judgments tend to have repetitive features (Gotti et al. 2008, 4). Farzindar and Lapalme (2009) follow up on this study with a pilot project where they post-edit TransLI output. The following is an example of how few edits needed to be made in the translation to English of the context section of a French judgment33.Kouka v. Canada (Citizenship and Immigration), 2008 FC 1224 (2008). https://decisions.fct-cf.gc.ca/fc-cf/decisions/en/item/55992/index.do?q=%282008fc1224 in their study (Farzindar and Lapalme 2009, 70):
Le 13 avril 2007, le demandeur s’est prévalu d’un Examen des risques Avant renvoi (« ERAR ») et, le 16 mai 2007, il présentait une deuxième demande de résidence permanente pour raisons humanitaires. Ces deux dernières demandes furent entendues par le même agent i.e. Patricia Rousseau, laquelle, par décision du 31 juillet 2008, rejetait les deux demandes.
On April 13, 2007, the Applicant availed of a pre-removal risk assessment (“PRRA”) and, on May 16, 2007, he submitted a second application for permanent residence on humanitarian and compassionate grounds. These last two applications were heard by the same officer Patricia Rousseau, i.e. that, by decision dated July 31, 2008, dismissed both applications.
On April 13, 2007, the Applicant availedavailed himself of a pre-removal risk assessment (“PRRA”) and, on May 16, 2007, he submitted a second application for permanent residence on humanitarian and compassionate grounds. These last two applications were heard by the same officerofficer, i.e. Patricia Rousseau, i.e. thatwho, by decision dated July 31, 2008, dismissed both applications.
In this example the only issues are the reflexive in one case (“himself”), syntax in another (“i.e.”), and a relative pronoun issue (“that”). The terminology was appropriately rendered according to context. For example, examen was rendered as “assessment” and not as “exam” or “examination,” the abbreviation PRRA was rendered correctly according to its official designation in Canadian federal courts, demande was rendered as “application” and not as “request” or “demand,” raisons was rendered as “grounds” and not as “reasons,” and rejetait as “dismissed” and not “rejected”. The example highlights how remarkably or unremarkably plagiaristic (Bendana and Melby 2012, 45) an SMT system can become when training corpora and texts being translated are highly similar or related.
In the context of an open-domain system, García (2010, 2011) and Şahin and Dungan (2014) carried out post-editing studies comparing translating from scratch with post-editing output from Google Translate when it was an SMT system. García (2010, 2011), who looked at English-Chinese, found that legal passages involved the worst scores in two of the three sets of tests covered by his two related studies; nevertheless, post-editing helped increase quality a bit (García 2011, 227). These increases in quality were accompanied by minor gains in speed in one of these legal passages and a minor decrease in the other passage (García 2011, 223). Though quality term and phraseme suggestions may have contributed to these increases in quality, one wonders whether the participants may have been bogged down by assessing various term and phraseme suggestions of varying complexity in the case of the passage where speed suffered a bit. On the contrary, however, Şahin and Dungan (2014, 76), who looked at English-Turkish, found a slight quality advantage when legal texts were translated from scratch by the participants in the test where they were allowed access to just the Internet.
Killman (2014) conducted a different type of study that also involved Google Translate when it was an SMT system. The study presents the results of a human evaluation of the quality of English machine translations produced for a set of over 600 legal terms (n = 421) and phrasemes (n = 200) that originate from a 12,000+ word text of civil judgment summaries produced by the Supreme Court of Spain: The Civil Division (Sala de lo Civil/Sala Primera) section of the Crónica de la Jurisprudencia del Tribunal Supremo: 2005–2006 (Reports of Cases before the Supreme Court: 2005–2006). These terms and phrasemes were selected because they were considered challenging enough to be researched when the judgments were translated in a translation commission before the study was carried out. The entire text was fed to Google Translate to produce the most contextually adequate output possible, but only the 621 terms and phrasemes were assessed for quality. The terms and phrasemes themselves could be categorized as: functional (n = 7), such as complex conjunctions or prepositional phrases; purely legal (n = 331); semi-technical (n = 126); everyday terminology or phraseology frequently found in legal texts (n = 118); and as official (n = 39), i.e., national and/or supranational laws, conventions, titles of legal professions or documents. In terms of contextual sensitivity, all the semi-technical items are, of course, contextually sensitive due to their inherent ambiguity, and so are the functional items, either because they needed to be translated in a legally peculiar way or are non-compositional. Nevertheless, many other terms and phrasemes in the sample are also contextually sensitive for these same reasons or because they also include lexical ambiguity in multiword terms and phrasemes. According to these contextual parameters, 60% of the sample is contextually sensitive (n = 370). According to the results of the study, a little over 64% of all 621 terms and phrasemes were translated appropriately (n = 400) and of the 370 contextually sensitive terms and phrasemes, 52% were translated appropriately (n = 191). These results indicate open-domain-MT accuracy in over half of the cases of an authentic sample of legal terminology, even when context is an issue, as is often the case in legal translation. Nevertheless, 81% of the 221 total incorrectly translated terms and phrasemes are contextually sensitive (n = 179), which clearly indicates MT vulnerability to legal translation aspects of context. To illustrate the SMT output in context, Killman (2014, 88) provides the following judgment example,44.Sentencia del Tribunal Supremo 28–9–2005, Sala Primera de la Crónica de la jurisprudencia del Tribunal Supremo: 2005–2006 (2006). https://www.poderjudicial.es/cgpj/es/Poder-Judicial/Tribunal-Supremo/Actividad-del-TS/Cronica-de-Jurisprudencia/Cronica-de-la-jurisprudencia-del-Tribunal-Supremo-2005-2006 accompanied by its Google Translate SMT output and a post-edited version thereof:
La STS 28–9–2005 (RC 769/2005) destaca porque en ella, al examinar un supuesto de responsabilidad por abordaje, diferenciando sus distintas clases, se declara que, sin perjuicio de que las disposiciones contenidas en el Convenio de Bruselas de 23 de noviembre de 1910 sobre unificación de ciertas reglas en materia del abordaje, formen parte del ordenamiento jurídico español y sean de aplicación directa, resulta aplicable la legislación interna, con exclusión de cualquier otra, cuando los buques implicados son de nacionalidad española y el abordaje ha tenido lugar en aguas jurisdiccionales españolas.
The STS 28.09.2005 (RC 769/2005) stands out because in it, to consider a theory of liability for collision, differentiating their various classes, states that, without prejudice to the provisions of the Brussels Convention of 23 November 1910 on the unification of certain rules relating to the collision, part of the Spanish legal system and have a direct, domestic law applies to the exclusion of any other, when the vessels involved are of Spanish nationality and the approach has taken place in Spanish waters.
The sts 28.09.2005Judgment of the Supreme Court of 28–9–2005 (rcAppeal 769/2005) stands out because in it, to consider a theory ofwhen it considers liability for collision,collision by differentiating theirits various classes, statesthat,it states that without prejudice toeven though the provisions ofin the Brussels Convention of 23 November 1910 onfor the unificationUnification of certainCertain rulesRules of Law relating to the collision,with respect to Collision between Vessels partare part of the Spanish legal system and have a directare directly applicable, domestic law appliesapplies, to the exclusion of any other,other law when the vessels involved are of Spanish nationality and the approachcollision has taken place in Spanish waters.
This SMT example shows more fluency issues than terminological or phraseological problems. The few terminological issues include untranslated abbreviations STS (sentencia del tribunal supremo) and RC (recurso de casación), the contextually incorrect rendition (“theory”) of the ambiguous supuesto (a translation of which may be omitted), partially inaccurate wording of the supranational Brussels Convention (e.g. “rules” and “collision,” which are technically plausible translations of the ambiguous reglas and abordaje, but not the official wording), the contextually incorrect translation (“other”) of the anaphor otra, and the contextually incorrect rendition (“approach”) of the third and final occurrence of abordaje. The phrase sin perjucio de, technically speaking, was not rendered incorrectly as “without prejudice to”. In the context of this long judgment sentence, however, “even though” works better or made it easier to make the sentence flow naturally. “Relating to” is technically an acceptable translation of the complex preposition en materia de, but not the official wording in the Brussels convention (“with respect to”). In any event, there are more accurate than inaccurate term and phraseme translations, all of which contain elements of ambiguity: responsabilidad (“liability”), abordaje (“collision”), declara (“states”), disposiciones (“provisions”), ordenamiento jurídico (“legal system”), de aplicación directa (“directly applicable”), legislación interna (“domestic law”), con exclusión de (“to the exclusion of”), and aguas jurisdiccionales españolas (“Spanish waters”). The SMT version of Google Translate remarkedly did not grind out word-for-word translations of these items such as “responsibility,” “approach,” “declares,” “dispositions,” “legal code,” “of direct application,” “internal legislation,” “with the exclusion of,” or “Spanish jurisdictional waters” (“Spanish territorial waters” would have been suitable, for example).
For comparative purposes, two neural machine translations of this same judgment summary (Killman 2014, 88) are provided from DeepL and Google Translate’s current neural MT system:
The STS 28–9–2005 (RC 769/2005) stands out because in it, when examining a case of liability for collision, differentiating its different types, it states that, without prejudice to the fact that the provisions contained in the Brussels Convention of 23 November 1910 on the unification of certain rules on collision form part of the Spanish legal system and are directly applicable, the domestic legislation is applicable, to the exclusion of any other, when the ships involved are of Spanish nationality and the collision has taken place in Spanish jurisdictional waters.
STS 9–28–2005 (RC 769/2005) stands out because in it, when examining an assumption of collision liability, differentiating its different classes, it is declared that, without prejudice to the fact that the provisions contained in the Brussels Convention of November 23, 1910 on the unification of certain rules regarding boarding, form part of the Spanish legal system and are directly applicable, internal legislation is applicable, to the exclusion of any other, when the vessels involved are of Spanish nationality and the boarding has taken place in Spanish jurisdictional waters.
In terms of terminological accuracy, the NMT output from these two systems could be considered as more-or-less equal in the case of DeepL or somewhat inferior in that of Google Translate (NMT). DeepL does manage to translate the ambiguous supuesto adequately as “case,” which both Google Translate systems could not do (“theory” and “assumption”). It also manages to translate the third/final instance of abordaje correctly as “collision,” unlike the two Google systems with “approach” and “boarding”. Nevertheless, “ship” is provided instead of “vessel,” which has a broader semantic range than “ship”. This may, however, not be a serious issue since ships will typically be concerned under the Brussels Convention in real world applications. “Spanish jurisdictional waters” instead of “Spanish waters” may be considered a very minor collocation concern. The NMT Google output does not show any improvements. In terms of issues, it also features “Spanish jurisdictional waters,” as well as others such as “declare” instead of “state,” institutionally incompatible dating (“November 23, 1910”), “boarding” instead of “collision” in the second instance, and the collocation “internal legislation,” which may be nit-picky seeing its appearance in certain EU documentation. Whatever the case may be, these NMT results do not show terminological or phraseological improvement or may even reveal a bit of a decline in the case of Google Translate.
Comparing NMT and SMT with automatic metrics (BLEU scores), Koehn and Knowles (2017) conducted a study assessing German-English output quality in five domain areas by training systems with corpora from each of these areas, including the legal domain (acquis).55.BLEU stands for bilingual evaluation understudy (Papineni et al. 2002). A widely used automatic metric for evaluating the quality of machine translated text, BLEU evaluates overlap between reference translations and output to assign a quality score between 0 and 1. Acquis, for its part, is shorthand for the Union acquis, the total body of European Union law applicable to EU Member States. Voluminous bilingual corpora from the acquis can be used to train data-driven MT in many of the official EU languages. BLEU scores were found to be similar in the in-domain portion of tests, tests whose scores were yielded from systems trained on data that were sub-sampled to produce the tests; the sub-sampled data, for their part, were not used in the training of the systems. Nevertheless, the BLEU scores reflect that SMT performed better in the legal, medical and religious domains (i.e. Quran) and that NMT fared better in the case of IT and subtitles. The out-of-domain performances, yielded using test sets obtained from data on which the system was not trained, show that NMT systems were “worse in almost all cases, sometimes dramatically so” (Koehn and Knowles 2017, 29–30). According to the tests run on the NMT and SMT systems trained on all five corpora, SMT was superior in the case of law and IT only, while in the case of subtitles and the medical domain NMT performed better. Both NMT and SMT trained on all five corpora were equal in the case of the Quran. It is particularly noteworthy that the BLEU legal domain score of this SMT system trained on all five corpora proved to be slightly higher than that of the legal in-domain trained NMT system. Such a finding may lend some credence to the earlier SMT and NMT observations reflected on in this section with regard to the study conducted by Killman (2014).
In any event, several recent studies focus on domain-specific MT use by translators at the Directorate-General for Translation (DGT) (Arnejšek and Unk 2020; Cadwell et al. 2016; Desmet 2021; Lesznyák 2019; Macken, Prou, and Tezcan 2020; Rossi and Chevrot 2019; Stefaniak 2020; Vardaro, Schaeffer, and Hansen-Schirra 2019). DGT translators are given the option of being provided with MT output via the predictive typing feature in their translation memory tool or having MT output populate empty segments when a translation memory match is not found.
MT@EC, the now defunct SMT system from 2011–2017, is included in a few of these studies (Cadwell et al. 2016; Macken, Prou, and Tezcan 2020; Rossi and Chevrot 2019). Cadwell et al. (2016) carried out a focus group with DGT translators from all 24 language departments to understand their reasons for choosing whether to use MT@EC in their work. The majority of these translators reported using MT on a daily basis and perceiving it as useful. Both the translators who perceived it as useful and not useful emphasized output quality as a primary reason, while the former group also emphasized speed or productivity gains. Rossi and Chevrot (2019) surveyed DGT translators from 15 language departments and found differing MT adoption and response rates in these departments, but an overall high adoption rate. The translators who reported choosing to use MT indicated doing so primarily to save time or in some cases to receive terminology suggestions or assistance with meaning or grammar structures. Macken, Prou, and Tezcan (2020) collected data from translation and post-editing tasks carried out by translators in the French and Finnish departments who respectively used MT@EC and eTranslation, the NMT system that replaced MT@EC in 2017. On average, in the case of both systems and groups of translators, post-editing was somewhat faster than translating. Moreover, MT output quality was rated similarly as mostly in the 3–4 range by both groups of translators on a five-point rating scale. Nevertheless, the French translators mentioned fluency problems as the main problems with the SMT output, while the Finnish translators working with the NMT output mostly noted accuracy, which is in alignment with previous SMT-NMT comparative findings.
The remaining DGT studies focus exclusively on the neural eTranslation (Arnejšek and Unk 2020; Desmet 2021; Lesznyák 2019; Stefaniak 2020; Vardaro, Schaeffer, and Hansen-Schirra 2019). Lesznyák (2019), who interviewed DGT translators from the Hungarian department, found that while many of the translators consider MT useful for saving time or inspiration (e.g. the possibility of eloquent solutions) or use it to avoid having to translate from scratch, the majority have reservations. Leznyák (2019) posits that these reservations might have to do with DGT translators’ documentation burden to make their translations consistent with other related texts in the TL, which Stefaniak (2020) also suggests is a concern in DGT workflows with MT in her post-editing study with translators from the DGT Polish department. While post-editing speed varied on an individual basis, post-editing was, on average, faster than translating. A lack of consistency in the output at the document and/or sentence level was observed by the Polish translators, as well as terminological issues such as the contextual appropriateness of a term suggestion or imprecise wording of titles of legal acts needing to be rendered in an official or specific way. Nevertheless, the MT output provided in the case of legislative texts involved fewer edits than the non-legislative texts that are also but not as frequently translated at the DGT, which speaks to the domain-specific nature of eTranslation. Vardaro, Schaeffer, and Hansen-Schirra (2019) and Arnejšek and Unk (2020), in their respective studies on German and Slovene output errors, found that errors frequently related to terminology, register, polysemy, function words, omissions, among others. Finally, Desmet (2021), in her study on post-edits carried out in Dutch, finds that changes made by translators were primarily related to style, register, or semantics.
To conclude, this section refers to recent MT legal translation studies (Dik 2020; Heiss and Soffritti 2018; Mileto 2019; Roiss 2021; Wiesmann 2019), which assess in one way or another the quality or potential quality of NMT in most cases and cover several language pairs: German-Italian, English-Italian, Dutch-English, and German-Spanish. All but one of these studies weigh or discuss incorporating MT in the legal translation classroom (Heiss and Soffritti 2018; Mileto 2019; Roiss 2021; Wiesmann 2019). Systems include DeepL in all but one case (Dik 2020; Heiss and Soffritti 2018; Roiss 2021; Wiesmann 2019). Wiesmann (2019) also looked at MateCat, a translator workbench drawing on DeepL, the NMT version of Google Translate, and Microsoft Translator (still SMT at the time). Mileto (2019), for her part, looked at MT@EC, as well as SDL Language Cloud and Google Translate via SDL Trados Studio (which one presumes were SMT systems when the study was carried out, since the defunct MT@EC was used in the study as well). In all cases, terminology errors were noted, often being the most prominent category of errors (Dik 2020, 48; Wiesmann 2019, 140) or category of errors specifically focused on (Roiss 2021) in studies where they were systematically and/or empirically analysed. DeepL appears to be regarded as having potential (Dik 2020; Heiss and Soffritti 2018) or as just another available tool given the lexical, terminological, and register errors that DeepL produces (Roiss 2021, 503). While Wiesmann (2019), for her part, finds DeepL better than the other systems in her Italian-to-German study, her determination is that MT is not at a point where legal texts may be translated without serious post-editing effort. Mileto (2019) does not directly compare the quality of systems in her study.
The research reviewed in this section on the machine translation of legal texts relates to the creation and evaluation of different systems and to the productivity of using different systems. The systems themselves range from open-domain, in-domain, and out-of-domain. Of course, in-domain is ideal, but open-domain does not appear far-fetched, especially in the case of SMT. Finally, the specific issue of legal terminology is covered to different extents in the studies but is often alluded to or posited as an underlying factor of quality and usability. In all cases, the studies concern what can be expected from MT in terms of productivity and quality in the legal context, which the current study and various others consider to be a terminologically challenging area.
5.Conclusions
This chapter has shed light on what might be expected from data-driven MT when it comes to the translation of potentially complex legal terminology and phraseology, depending on technical aspects of a system’s architecture and its corpus resources. Though traditionally regarded as problematic, the stability, frozenness, or repetitive nature of legal terms and phrasemes may make them compatible with corpus-based approaches to natural language processing, whereas the corpora are sufficiently related, and the analysis and transfer capabilities of systems can adequately respond to the situationally dependent legal translation task. Nevertheless, context remains the Achilles’ heel of MT, and legal terminology can be considerably complex in this regard and particularly error-prone to mechanical approaches to language.
To continue answering the question of what one might expect, quality studies should be carried out to further understand whether there is a measurable trade-off between terminological/phraseological output quality, on the one hand, and fluency or morphosyntactic quality, on the other, which might help determine if mixed approaches to systems might be ideal instead. In other contexts, SMT and NMT systems have been found to be complementary (e.g. Popović 2017). Given the static nature of legal language, ongoing legal translation demand, and continuous translation needs in international or supranational institutions, there appears to be value in highly developed, legal-specific MT systems, hybrid or otherwise.
Moreover, more studies are necessary to empirically understand how translators may or may not be served by MT in the legal domain, particularly to answer the question of whether terminological/phraseological benefits or distractions while post-editing are more a concern than morphosyntactic assistance or interruptions from the output. Though the literature review in this paper reveals progress, more studies focusing on MT productivity gains in legal translation could help Legal Translation Studies reach a threshold of statistical data which might strengthen descriptive conclusions or even allow for inferential conclusions to be drawn. Further research that builds systematically on previous research could contribute to data reliability.
Studies could attempt to test which areas of output quality are most important in legal translation or how output might best be provided to legal translators. In institutional contexts, output tends to be made available to translators via a feed in a translator workbench, not only in the case of the DGT (as noted in the previous section) but also in that of the United Nations (e.g. Juncal 2009; Pasteur 2013), where the output will invariably interface in a variety of ways with other sources of content. If MT tends to be used to translate subsegments or segments of text for which institutional translation memories or document repositories are unable to provide translation suggestions, then it follows that the terminological and/or phraseological output that MT systems can provide and translators might rely on become all the more important in these types of legal settings, perhaps more so than the fluency capabilities which appear to be comprising the bulk of NMT progress.
Studies should also be conducted to determine uptake among legal translators who are freelancers. Do they choose to use MT? Why? Or why not? Do they use any particular systems (e.g. DeepL, Google Translate, eTranslation)? How do they use them? For entire texts, empty segments in a workbench, via predictive typing, to repair fuzzy matches, or for general drafting, terminological, or phraseological suggestions whatever the case may be? Another area of enquiry might address the sectors or settings in which legal freelance translators work, and whether MT use is compatible with the culture or prestige of the entities for which they provide legal translation services and the translation rates they may command (Borja Albi 2013). Is the legal translation activity a niche area where language services remain “more artisanal (‘hand-made,’ even more erroneously dubbed ‘fully human’), where presumed quality can justify the price-tag of luxury” (Pym 2011, 5)? Or are there other sectors in the legal freelance market where automation or post-editing are tolerated or even required? Do clients or translation companies require legal translators to use specific systems, in-house or otherwise? In what specific sectors of legal translation practice might text granularity or singularity be considered incompatible (García 2010, 6)? What is it about these texts? Or what is it about these specific settings where such perceptions are supported?
Legal translation is an area where reliable resources have traditionally been hard to come by due to the frequency of complex terminology and phraseology in this area. For example, dictionaries and termbases have often been deemed of limited or questionable value (e.g. de Groot and van Laer 2008; Kim-Prieto 2008; Kockaert, Vanallermeersch, and Steurs 2008; Prieto Ramos 2016; Prieto Ramos and Orozco Jutorán 2015; Thiry 2009). Legal Translation Studies should continue to address MT and other emerging or existing translation technologies. Legal Translation Studies must neither disregard nor contemplate MT in absolutist terms or in ways that are merely convenient according to dominant or traditional ideologies. As Pym (2011, 5) posits, “[t]he technology, for better or for worse, is here to stay”. We must continue to take stock of its progression and provide input from a Legal Translation Studies perspective so that the dominant forces in its inevitable progression are not solely driven from an industrial perspective, but also take into consideration the needs, perspectives, and viewpoints of the legal translators themselves.