Collier, Nigel, Chikashi Nobata and Junichi Tsujii. 2001. Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain. Terminology 7 (2) : 239–257.
This article describes a method to identify and classify terms in the domain of molecular biology according to examples in a corpus of abstracts taken from the Medline database. Automatic acquisition of biomedical term lists has so far been slow due to high variability in both the terms and their classification scheme. Nevertheless, the explosive growth in online molecular biology literature makes a persuasive case for automating many tasks. This includes acquisition of records for gene-product databases such as SwissProt which are currently updated by human experts, a task that is both time- consuming and often highly idiosyncratic. In this article results are reported from a tool based on a hidden-Markov model for extracting and classifying terms that can be used as a key component in an information extraction system. The results are discussed in light of lexical, syntactic and semantic properties of terms that were revealed by the study.
