This is a bit off-topic, but I wonder if there are people/teams right now creating git repos, doing the source code equivalent of "SEO" on it, and embedding backdoors in stupidly overoptimized for the training process code?
I wonder when we'll hear about the first big hack that gets traced back to production code pushed live after CoPilot "suggested" eval(base64decode({webshell}))
The less overt version of that is to figure out what mistakes copilot already makes (either things that are common in tutorials but not good in production, or things that are outdated, like hashing passwords with md5), and then systematically looking for software that includes such copilot suggestions.
Is there a technique to scan for software that includes copilot suggestions? Or is this just theoretical? Sounds impossible given MS/GH's monopoly on access to the model input data.
Probably not, but I can imagine something similar to hijacking a popular library and publishing a new version that opens a certain port and waits for instructions. All a malicious actor needs to do is increase the amount of exposed servers to be caught while they later scan the internet for anyone with that port open.
What is easier is to probably sneak in a new dependency that points to a malicious fork. Something like an XML to JSON library that does do the thing that is advertised but also additional things.
I wonder when we'll hear about the first big hack that gets traced back to production code pushed live after CoPilot "suggested" eval(base64decode({webshell}))