Traditionally, historical phonologists have relied on tedious manual derivations to sequence the sound changes that have shaped the phonological evolution of languages. However, humans are prone to errors, and cannot track thousands of parallel derivations in any efficient manner. We demonstrate
computerized forward reconstruction (CFR), deriving each etymon in parallel, as a task with metrics to optimize, and as a tool which drastically facilitates inquiry. To this end we present DiaSim, an application which simulates “cascades” of diachronic developments over a language’s lexicon and provides various diagnostics for “debugging” those cascades. We test our method on a Latin-to-French reflex prediction task, using a newly compiled, publicly available dataset
FLLex consisting of 1368 paired Latin and Modern French forms. We also introduce a second dataset,
FLLAPS, which maps 310 reflexes from Latin through five attested intermediate stages up to Modern French, derived from
Pope’s (1934) periodic development tables. We present publicly available rule cascades: the baseline
BaseCLEF and
BaseCLEF* cascades, based on
Pope’s (1934) widely-cited view of French development, and
DiaCLEF, made from incremental corrections to
BaseCLEF aided by DiaSim’s diagnostics. DiaCLEF outperforms the baselines by large margins, improving raw accuracy on FLLex from 3.2% to 84.9% of etyma, with similarly large improvements for each of FLLAPS’ periods. Changes were made to build DiaCLEF considering only the baseline and DiaSim’s diagnostics, but they often independently reproduced past work in French diachronic phonology, corroborating both our procedure and past endeavors; we discuss the implications of some of our findings in detail.