Question terminology is a set of terms which appear in keywords, idioms and fixed expressions commonly observed in questions. This paper investigates ways to automatically extract question terminology from a corpus of questions and represent them for the purpose of classifying by question type. The key interest is to see whether or not semantic features can enhance the representation of strongly lexical nature of question sentences. The author compares two feature sets: one with lexical features only, and another with a mixture of lexical and semantic features. For evaluation, the author measures the classification accuracy made by two machine learning algorithms, C5.0 and PEBLS, by using a procedure called domain cross-validation, which effectively measures the domain transferability of features.
