We had to ETL .csv data that must have originated in SQLServer.
The utf-16 fact about Windows was apparently unknown to my predecessor.
Who wrote some nasty c-language binary to copy the data, knock the upper byte off of each character ahead, and save the now ASCII text to a new file of the mysql load.
The encoding='utf-16' argument was all that was needed.
I've had to fix this before. A co-worker working with data from a 3rd party supplier had gone "Oh this input data is mangled with stray zero bytes, I'll fix that" and of course that destroys any non-ASCII inputs, eventually I'm told that sometimes the import fails, I investigate, and I realise the "mangled" input is just UTF-16 encoded, conditionally remove the "strip zero bytes" hack and tell the decoder it's UTF-16 and it just works correctly.
The "maybe strip null bytes" code lived for years "just in case" after I fixed that because people couldn't believe that's all that was ever "wrong" with the data.
The utf-16 fact about Windows was apparently unknown to my predecessor.
Who wrote some nasty c-language binary to copy the data, knock the upper byte off of each character ahead, and save the now ASCII text to a new file of the mysql load.
The encoding='utf-16' argument was all that was needed.
For want of a nail. . .