If you pad your output with something that doesn't represent silence, then any outputs that happen to have a non-standard length (i.e. nearly all outputs) will end with whatever sound your padding bits represent in the model's embedding space. if "0000" represents "Whoosh," then most of your outputs will end in "whoosh."
Here's a non-AI example: If all HN comments had to be some multiple of 50 characters long and comments were padded with the letter "A," then most HN comments would look like the user was screaming at the end. AAAAAAAAAAAAAAAAAA
Here's a non-AI example: If all HN comments had to be some multiple of 50 characters long and comments were padded with the letter "A," then most HN comments would look like the user was screaming at the end. AAAAAAAAAAAAAAAAAA