Presumably text-to-speech systems have signatures. For example how many milliseconds they need to pronounce each syllable of a particular word, say "watermelon". If the timings match for a whole sentence, this machine would probably be able to recognize that... an easy defeat would be a TTS engine that adds random milliseconds to each syllable.