yeah, the lpc-10 voice codec i mentioned uses the same linear predictive coding approach the chip in the speak & spell used; it requires about 10 multiplications per output sample, which is about 80000 per second, which may be challenging on an attiny2313 or attiny45 where you have to do the multiplications with repeated shifts and adds, but i think ought to be doable on the atmega328