The authors present a system used in the term recognition competition, one of the subtasks covered by the NTCIR tmrec group, and evaluate its term recognition results. They regard terms as lexical items, characteristic of a field, which have the following three features: (1) they appear frequently in documents of the target field; (2) they are not common words in the target field; and (3) they appear less frequently in the corpora of other fields. The authors’ system uses corpora from different fields and uses these features to recognize terms. The authors then analyze the differences between their term list and the manual candidates list produced by the NTCIR tmrec group. In this article they identify features that are important for automatic term recognition. Furthermore, through comparative experiments based on manual candidates, the authors establish the importance of indices in extracting a term list.
