SHOTGUN: converting words into triplets
A hybrid approach to grapheme-phoneme conversion in Dutch
Software systems convert between graphemes and phonemes using lexicon-based, rule-based or data-driven techniques. SHOTGUN combines these techniques in a hybrid system which converts between graphemes and phonemes bi-directionally, adds linguistic and educational information about the relationships between graphemes and phonemes and provides estimates about the likelihood that the generated output is correct. We describe the components from which SHOTGUN is built and determine its accuracy by running tests on two data sources, the BasisSpellingBank and CELEX, comparing the results to Nunn’s (1998) rule-based conversion system. SHOTGUN converts phonemes to graphemes and vice versa with precision of 81% and 86% when tested on the BasisSpellingBank, and 80% and 81% when tested on CELEX. SHOTGUN proves to be a powerful new conversion tool.
Keywords: automatic bi-directional grapheme-phoneme conversion, Dutch language, grapheme-phoneme relationship, overlap algorithm, triplet analysis
Published online: 01 June 2017
https://doi.org/10.1075/wll.19.2.02bee
https://doi.org/10.1075/wll.19.2.02bee
References
Busser, Bertjan, Walter Daelemans & Antal van den Bosch
Cranshoff, Betty & Johan Zuidema
Daelemans, Walter
Daelemans, Walter & Antal van den Bosch
Daelemans, Walter, Antal van den Bosch & Ton Weijters
Daelemans, Walter & Helmer Strik
Damper, Robert & John Eastmond
Decadt, Bart, Jacques Duchateau, Walter Daelemans & Patrick Wamback
Galescu, Lucian & James F. Allen
Geeraerts, Dirk
Hamming, Richard
Hoste, Veronique, Steven Gillis & Walter Daelemans
Jongenburger, Willy & Vincent J. van Heuven
Levenshtein, Vladimir
Marchand, Yannick & Robert Damper
Nunn, Anneke M.
Nunn, Anneke M. & Vincent J. van Heuven
Santen, Jan P.H. van, Richard W. Sproat, Joseph P. Olive & Julia Hirschberg
Zuidema, Johan & Anneke Neijt
(2012) Verkennend onderzoek naar de wenselijkheid en de haalbaarheid van een verrijking van de Woordenlijst Nederlandse Taal ten behoeve van spellingonderwijs. Nijmegen: Radboud Universiteit Nijmegen. Online available: http://taalunieversum.org/sites/tuv/files/downloads/rapport%20VWS%2015022013.pdf.
to appear). The BasisSpellingBank – spelling knowledge stored in a lexicon of triplets.
Cited by
Cited by 1 other publications
Zuidema, Johan & Anneke Neijt
This list is based on CrossRef data as of 11 november 2021. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.