Cover not available

Part of

Parallel Corpora for Contrastive and Translation Studies: New resources and applications
Edited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019

Index

A

–ment adverb50

addition111, 146, 148

agreement metric169

agreement study163, 164, 169

Aleuska198, 233, 238–239

alignment6, 67, 84–89 ; , 101–103 ; , 108–113 ; , 144–146 ; , 204–205 ; , 221–222 ; , 239–240 ; , see also word alignmenttext-to-video alignment

anchor70, 256

annotation
- discourse phenomena160, 179
- layer65, 72, 82, 165, 166
- scheme156, 163, 165–166 ; , 170, 176see also corpus annotationautomatic annotation

AntPConc67

Apertium258

automatic annotation80, 162

automatic speech recognition (=ASR)282, 283

B

base 268–269

bilingual cognates13, 252, 253, 256–258 , 262, 263

bilingual equivalence275

bilingual extraction251, 252

bilingual lexicon induction59, 62

bilingual lexicon59, 62, 109, 251

bilingual word embeddings282, 286, 287

bitext15, 70, 108–112 , 121

Bizimena235

BleuAlign6, 85

British National Corpus234

Brown Corpus1, 39, 233

C

Canadian Hansard Corpus2

causative construction100

code-switching 79–81 , 89

cognate extraction251, 256–257 , 263

collocate268, 271

collocation269, 270
- equivalent 267–269
- extraction268, 269

COMPARA40, 124, 138, 197

comparable corpus19, 20, 44, 198, 251, 259

comparable parallel corpus11, 20, 21

continuous bag-of-words (=CBOW )286

copyright63, 144

corpus analysis208, 211, 219

corpus annotation 81–84 , 86, 163

corpus compilation106, 192, 197, 198, 200, 201, 217, 218, 220, 238

Corpus Query Processor (=CQP)198, 200, 203–205 ; , 206, 207see also CQPweb 198, 201, 203

Corpus Workbench (=CWB)198, 200, 212, 213, 219, 220, 223, 231

cosine distance271, 272, 274

cosine similarity score285, 286

CREA20, 29, 30, 69

culture-specific item46, 49

Czech National Corpus93, 94

D

degree of comparability257

Déjà Vu201

dependency parsing83, 267, 269, 270

DepPattern259

Dice Coefficient268

discourse phenomenon160, 162, 179

distributional hypothesis254

distributional model270, 272, 273

distributional semantics269, 271, 285

distributional similarity253, 254, 256

dynamic corpus193, 238

E

Edit Distance252, 257

Egungo Testuen Corpusa235

EHUskaratuak235, 237

elliptical compound84, 88

empirical turn34

encoding 153–156 , 203, 204

engagement marker176

English progressive22, 24, 25

English-Norwegian Parallel Corpus (=ENPC)2, 40, 67, 221, 227

English-Swedish Parallel Corpus (=ESPC)2, 40, 227

Eroski Consumer corpus235, 236

Europarl2, 60, 79

evidentiality159, 160

explicitation22, 44

extended tagset163, 170

extraction method270

F

finite state machines (=FSM)283

Foma243

FreeLing117

functional-semantic tagset170

G

Galnet153, 158

genre28, 59, 66, 107, 109, 186, 218

gerund25, 31–33 , 226

GIZA ++99, 288

granularity68, 163

gravitational pull hypothesis23, 24, 31–33

H

Hizkuntzen arteko corpusa235

Hunalign109, 131

indexation7, 197, 200

inter-annotator agreement179

interference50, 54, 135

interlanguage64, 71

InterLingual Index152

intermodal corpus124, 125, 138

InterText94, 95

intra-annotator agreement164

IULA198, 212

IXA pipe tools246

J

JRC Acquis79

K

Kappa coefficient164, 169

Key Word in Context (=KWIC)114, 115

Klasikoen gordailua235

Kontext94, 98

L

Lancaster-Oslo/Bergen Corpus39

language identification80, 81

language learning10, 43, 64, 104, 105, 151

language variety26, 233

lemmatization84, 114, 132

lexical semantics141, 142, 285

LF Aligner109, 201

LinguaKit274

Linguee2, 106

loan word 133–135

log-likelihood51, 52

M

machine learning72

MaltParser83, 274

manner adverb50, 54

META-NORD30

metadata108, 118, 129–130 , 204, 208

metadiscourse marker175

metatextual tagging201

minority language9, 253

modality171, 172

mode of translation238, 241

monitor corpus27, 65, 238

moses290

Multilingwis86, 104

multimedia parallel corpus147, 155

multivec286

mutual information271

multi-word expression (=MWE)243

N

naïve bilingual distributional model272

named entity recognition (=NER)82, 241

Natural Language Processing Toolkit (=NLTK)288

negative sampling286

neural network271, 286

normalization52, 135–136 ; see also shorthand form normalization

Norwegian Spanish Parallel Corpus (= NSPC)28

O

onomatopoeic expression243, 246

OpenSubtitles 2016274, 276

OPUS60, 109, 114, 143

overrepresentation hypothesis22, 24

P

parallel concordancing126, 183

part-of-speech tagging (=POS tagging)81, 82, 84, 117, 124, 127, 132, 162, 241–244 ; , 246see also universal POS tag

PETRA 1.0 69–70

phonetic sequence284

pivot language 94100, 252, 255

post-editing tool69

prototypicality23, 31–33

pseudo-parallel text256

R

re-attachment of German verb prefixes84

reciprocal corpus124

register142, 160–162

regularity of patterns218

reliability58, 99
- of the annotation scheme and guidelines 163–164

reordering145

replicable corpus building protocols65

replicability11, 21

reusable parallel resources65

reusability5, 30, 60, 65

sampling frame21, 25–26 , 28–29

seed context256

segmentation112, 144, 241, 283, 289, 290, 295
- of spoken language128, 131

semantic annotation82, 163
- semantically annotated (corpus)152

semantic distance272

semantic mirror image49

semantic network152

semantic tagging153

SemCor Corpus 152–153

SensoGal Corpus 152–154

sentence alignment6, 95, 108, 298

sentence division221

sentiment annotation83

shining-through50

shorthand form281
- normalization283, 290

similarity measure109, 268, 271

SketchEngine 60–61 , 86

skewedness188

skip-gram architecture272

sms4science283, 287

statistical machine translation (=SMT) 282–284 , 289

Solr 116–119

specialized translation142

spelling similarity256

spelling variant84, 287–294

standardization58, 65

State treaty184, 187, 191

Statistical Corpus of the Twentieth Century234, 235

stylistic aspect of translation145, 148

subtitling132, 147, 150

synonym detection87

synset152

T

tagset81, 132, 176see also extended tagset

tagger61

TAligner239

Translation Corpus Aligner (=TCA)67, 221

text message 281–284 , 287–289

text-to-video alignment131

TextHammer186

textual/audiovisual interface148

textual mark-up108

thematization164

TMX 144–146 , 150, 153

tokenization80, 97, 162, 242

training corpus113, 117, 162, 176

training materials60, 290, 294

transcription128

translation candidate99, 252, 254, 259, 284

translation direction 161–162

translation equivalent40, 104, 231, 253, 259, 263, 269

translation error detection87

translation memory60, 144, 201

translation norm43

translation problem45, 49, 53

translation process22, 32, 109, 127

translation universal19, 22, 41

Translational Database of the Gipuzkoan Provincial Government235

translationese40, 44

TreeTagger68, 82, 98, 201, 222

tweet284

U

UNESCO Index Translationum217

unique items hypothesis22, 24, 32

Universal Dependencies270
- universal dependency label83

universal of translation21

Unix pipes metaphor242

universal POS tag81

usability60, 65, 95, 227

usefulness65, 71, 227

user group80, 114

user interface198, 224

V

validation65, 253, 255, 260

variety218, 233, 276
- of Norwegian27
- of Italian135
- of French9

vector271, 285
- representation 285–286

VEIGA 147–151

verb-object collocation275

W

Web Corpusen Ataria237

web interface216

word alignment80, 85, 119, 272

word boundary288, 290

word embedding271, 274, 285

word2vec286

WordNet152

X

XML144, 153, 221

Z

Zientzia eta teknologia corpusa235

Zientzia Irakurle Ororentzat235

Zuzenbide corpusa235