To summarize what I put in the writeup, the 7-bit PCM audio was streamed in at approximately 25 Khz, (reading from the controller and writing to address $4011 every 71 CPU cycles.) while occasionally dipping to 9 Khz while streaming in the graphics data.
Incredible work! I recently gave a talk about TASing to an audience of CS grad students, and of course I had to mention your SMB3 runs. Your videos are phenomenal at making this stuff accessible outside of hardcore gaming circles.
I never really got the part about the audio conversion; are you just rounding it from 16 to 7 bits, or are you doing dithering + noise shaping (as you definitely should at such low bit depths)? And similarly, how are you downsampling from 44 kHz to a time-varying sample rate; are you properly filtering, or are you getting tons of aliasing?
The conversion from 16 bits to 7 bits was using rounding.
My method of downsampling was complicated. Since the creation of the TAS was being automated, and since I also needed to stream in graphics data occasionally, I ran into the issue of needing to know exactly what byte to read from the .wav file at any given moment. I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
To be completely honest, this project was my first time directly reading the contents of a .wav file like this, and I had no prior experience writing code for audio conversion or playback. If I were to do this project again, I'd look into noise dithering + noise shaping, as well as filtering methods. I know at the very end of the TAS, there's certainly some weird audio artifacts that I couldn't figure out how to fix at the time.
> The conversion from 16 bits to 7 bits was using rounding.
As a very quick fix, you can dither by just adding a random value from [-0.5, +0.5] before rounding (to -64..+63 or whatever your range is). It will give you a dither, and probably sound slightly better; a bit more noise for much less distortion. Noise shaping is left as an exercise for the reader :-) (It is probably nontrivial to get perfect with variable sample rate anyway.)
> I used a custom NES emulator to emulate the generated inputs, and I had it count CPU cycles so I can convert that into seconds, then parse the .wav file with that info.
It sounds like you are just picking one sample without any filtering/averaging/anything (nearest neighbor); this will cause aliasing, which is another part of the reason for the “roughness” you may hear in the sound. You can do a very cheap trick here as well: Take some audio software you trust (say, Audacity) and convert the .wav file to 25208 Hz. This means that you'll get good filtering for most of your audio, and less bad filtering for the 13.85kHz parts.
I'm not the author, but these video-in-game projects typically work with a few phases:
1. Get the game into a specific state by performing specific actions, moving to specific positions, performing specific inputs, etc. so that a portion of the game state in RAM happens to be an executable program.
2. Jump to that executable code such as by corrupting the return address in the stack with a buffer overflow
3. (optional) The program from 1 may be a simple "bootstrap" program which lets the player directly write a new, larger program using controller inputs then jumps to the new program.
4. The program reads the video and audio from the stream of controller inputs, decodes them, and displays them. The encoding is usually an ad-hoc scheme designed to take advantage of the available hardware. The stream of replayed inputs is computed directly from the media files.
Specifically, this TAS abuses the fact that SMB doesn't clear RAM on bootup to use SMB3 to write $16 to the "continue world" RAM location, and then hotswaps to SMB1 to start the game in World N-1 which makes the rest of the TAS possible. If you download the TAS and use it with SMB1 on an emulator, the included base savestate will already have $16 in that RAM location, for convenience. The main HN link for this submission has the full technical writeup.
I share the full assembly code in the tasvideos writeup: https://tasvideos.org/8991S#HereSTheAsmCode
To summarize what I put in the writeup, the 7-bit PCM audio was streamed in at approximately 25 Khz, (reading from the controller and writing to address $4011 every 71 CPU cycles.) while occasionally dipping to 9 Khz while streaming in the graphics data.