Publication details [#464]

Kim, Sung-Dong, Byoung-Tak Zhang and Yung Taek Kim. 2001. Learning-based intrasentence segmentation for efficient translation of long sentences. Machine Translation 16 (3) : 151–174.
Publication type
Article in jnl/bk
Publication language


Long-sentence analysis has been a critical problem in machine translation because of its high complexity. Intra-sentence segmentation has been proposed as a method for reducing parsing complexity. This paper presents a two-step segmentation method: (1) identifying potential segmentation positions in a sentence and (2) selecting an actual segmentation position amongst them. The authors have attempted to apply machine-learning techniques to the segmentation task: “concept learning” and “genetic learning”. By learning the “SegmentablePosition” concept, the rules for identifying potential segmentation positions are postulated. The selection of the actual segmentation position is based on a function whose parameters are determined by genetic learning. Experimental results are presented which illustrate the effectiveness of our approach to long-sentence parsing for MT. The results also show improved segmentation performance in comparison to other existing methods.
Source : Based on abstract in journal