Abstract
This study examines methods for recognizing native and accented voiceless stops based on voice onset time (VOT). These methods are tested on data from the Tball corpus of early elementary school children, which includes both native English speakers and Spanish speakers learning English, and which is transcribed to highlight pronunciation variation. We examine the English voiceless stop series, which have long VOT and aspiration, and the corresponding voiceless stops in Spanish accented English, which have short VOT and little aspiration. The methods tested are : (1) to train hidden Markov models (HMMs) based on native speech and then extract the VOT times by post-processing phone-level alignments, (2) to train HMMs with explicit aspiration models, and (3) to train, for each phoneme, different HMMs for native and accented variants. Error rates of 23%–53% for distinguishing phone VOT characteristics are reported for the first method, 5%–57% for the second method, and 0%–36% for the third. The error rates varied depending on the different phones examined. In general, the /p/ and /k/ phones had results that varied more than /t/. These results are discussed in light of each method’s usefulness and ease of implementation, and possible improvements are proposed.