This article describes a method for extracting terms that combines term frequency with a novel measure of term representativeness (i.e., informativeness or domain specificity). The measure is defined as the normalized distance between the word distribution in the documents which contain the term and the word distribution in the whole corpus. The measure is particularly effective in discarding uninformative terms that frequently appear and has a well-defined threshold value for judging the representativeness of a term. We combined the new measure with term frequency and applied it to the extraction of terms from abstracts of artificial intelligence papers. This article introduces the measure and reports on its effectiveness in term extraction.
Suzuki, T., S. Kawamura, F. Yoshikane, K. Kageura & A. Aizawa
2012. Co-occurrence-based indicators for authorship analysis. Literary and Linguistic Computing 27:2 ► pp. 197 ff.
KOYAMA, Teruo
2010. Composite Term Extraction from Japanese Texts. Joho Chishiki Gakkaishi 19:4 ► pp. 306 ff.
Suzuki, T.
2009. Investigating Japanese government's perceptions of the postwar world as revealed in prime ministers' Diet addresses: focussing on East-West and North-South issues. International Relations of the Asia-Pacific 9:2 ► pp. 317 ff.
Horyu, Daisuke & Seishi Ninomiya
2007. Additional Selection of Extracted Terms for a Specific Area. Agricultural Information Research 16:2 ► pp. 52 ff.
Hattori, S., T. Tezuka & K. Tanaka
2006. 7th International Conference on Mobile Data Management (MDM'06), ► pp. 77 ff.
Wakaki, Hiromi, Tomonari Masada, Atsuhiro Takasu & Jun Adachi
2006. A New Measure for Query Disambiguation Using Term Co-occurrences. In Intelligent Data Engineering and Automated Learning – IDEAL 2006 [Lecture Notes in Computer Science, 4224], ► pp. 904 ff.
Iwayama, Makoto & Yoshiki Niwa
2005. Just-in-Time Interactive Document Search. In Professional Knowledge Management [Lecture Notes in Computer Science, 3782], ► pp. 710 ff.
Hisamitsu, Toru & Jun-ichi Tsujii
2003. Measuring Term Representativeness. In Information Extraction in the Web Era [Lecture Notes in Computer Science, 2700], ► pp. 45 ff.
This list is based on CrossRef data as of 10 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.