Automatic extraction of specialized verbal units
A comparative study on Arabic, English and French
Nizar Ghazzawi | Université de Montréal
Benoît Robichaud | Université de Montréal
Patrick Drouin | Université de Montréal
Fatiha Sadat | Université du Québec à Montréal
This paper presents a methodology for the automatic extraction of specialized
Arabic, English and French verbs of the field of computing. Since nominal terms
are predominant in terminology, our interest is to explore to what extent verbs
can also be part of a terminological analysis. Hence, our objective is to verify
how an existing extraction tool will perform when it comes to specialized verbs
in a given specialized domain. Furthermore, we want to investigate any
particularities that a language can represent regarding verbal terms from the
automatic extraction perspective. Our choice to operate on three different
languages reflects our desire to see whether the chosen tool can perform better
on one language compared to the others. Moreover, given that Arabic is a
morphologically rich and complex language, we consider investigating the results
yielded by the extraction tool. The extractor used for our experiment is
TermoStat (Drouin 2003). So far, our
results show that the extraction of verbs of computing represents certain
differences in terms of quality and particularities of these units in this
specialized domain between the languages under question.
Keywords: specialized verbs, verbal terminological units, Arabic, French, English, terms extraction, terminology, corpus linguistics
Article outline
- 1.Introduction
- 2.Previous work
- 3.Methodology
- 3.1TermoStat: A general overview
- 3.2Integrating Arabic language to TermoStat
- 3.3Compiling specialized corpora
- 3.4Pre-processing
- 3.5Managing specificity
- 4.Results
- 4.1Filtering results
- 4.1.1Tagging errors
- 4.1.2General language units
- 4.1.3Concordance list
- 4.2Results of filtering
- For Arabic
- For English
- For French
- 4.2.1Specificity
- 4.2.2Certain particularities
- 4.3VTU validation criterion
- 4.3.1Some particularities regarding validation: English and French
- 4.3.2Some particularities regarding validation: Arabic
- 4.1Filtering results
- 5.Evaluation and discussion
- 5.1Precision
- 5.2Comparison between VTUs and NTUs
- 6.Conclusion
- Notes
-
References
This article is currently available as a sample article.
Published online: 19 January 2018
https://doi.org/10.1075/term.00002.gha
https://doi.org/10.1075/term.00002.gha
References
Abed, A. M., S. Tiun, and M. Abared
Ahmad, K., A. Davies, H. Fulford, and M. Rogers
Almaany
Attia, M., P. Pecina, A. Toral, L. Tounsi, and J. van Genabith
2011 “A Lexical Database for Modern Standard Arabic Interoperable with
a Finite State Morphological Transducer.” In Proceedings Systems and Frameworks for Computational Morphology: Second
International Workshop, SFCM 2011, Zurich, Switzerland, August 26,
2011, ed. by M. Cerstin and M. Piotrowski, 98–118. Zurich, Switzerland: Springer Berlin Heidelberg. 

Attia, M., P. Pecina, A. Toral, and J. van Genabith
Chung, T. M.
Church, K., and P. Hanks
Déjean, H., and E. Gaussier
DiCoInfo
2017 http://olst.ling.umontreal.ca/cgi-bin/dicoinfo/search2.cgi?ui=fr. Accessed 30 March 2017.
Drouin, P.
Fung, P.
Galisson, R
Ghazzawi, N.
Guilbert, L.
Habash, N., and F. Sadat
Habash, N., O. Rambow, and R. Roth
Habash, N.
Lemay, C., M.-C. L’Homme, and P. Drouin
L’Homme, M.-C.
Lorente, M.
2007 “Les unitats lèxiques verbals dels textos especialitzats.
Redefinició d’una proposta de classificació.” In Estudis de lingüístics i de lingüística aplicada en honor de M. Teresa
Cabré Catellví. Volum II: De deixebles, ed. by M. Lorente, R. Estopà, J. Freixa, J. Martí, and C. Tebé, 365–380. Barcelona: Institut Universitari de Lingüística Aplicada de la Universitat Pompeu Fabra.
Mel’čuk, I., A. Clas, and A. Polguère
Meyer, I.
2000 “Computer Words in Our Everyday Lives: How are They Interesting
for Terminography and Lexicography?” In Proceedings of the Ninth EURALEX International Congress, EURALEX
2000, ed. by H. Ulrich, S. Evert, E. Lehmann, and C. Rohrer, 39–58. Stuttgart, Germany: Institut für Maschinelle Sprachverarbeitung.
Meyer, I. and K. Mackintosh
Monsonego, S.
Muller, C.
Nelson, M. B.
2000 Corpus-based Study of the Lexis of Business English and Business
English Teaching Materials. Unpublished Ph.D Thesis, University of Manchester, Manchester.
Rapp, R.
1999 “Automatic Identification of Word Translations from Unrelated
English and German Corpora.” In Proceedings of the 37th Annual Meeting of the Association for
Computational Linguistics on Computational Linguistics, ed. by R. Dale and K. Church, 519–526. Stroudsburg, PA, USA: Association for Computational Linguistics. 

Rayson, P., and R. Garside
Reppen, R.
Rey, A.
Teubert, W.
Toutanova, K., and C. Manning
Toutanova, K., D. Klein, C. D. Manning, and Y. Singer
Xu, F., D. Kurz, J. Piskorski, and S. Schmeier
2002 “A Domain Adaptive Approach to Automatic Acquisition of Domain
Relevant Terms and their Relations with Bootstrapping.” In Proceedings of the Third International Conference on Language Resources
and Evaluation (LREC’02), ed. by M. González Rodríguez and C. Paz Suarez Araujo, 134–145. Las Palmas, Canary Islands, Spain: European Language Resources Association (ELRA).
Full-text
Cited by
Cited by 1 other publications
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
This list is based on CrossRef data as of 07 february 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.