As someone looking to build AI features into my application, I definitely want to avoid this kind of jailbreaks in my app.
Right now, there is no good way to guard against this other than removing free form text inputs and using a more form-driven approach to taking user input.
Absolutely agree. I’m creating a chatbot for my website, and while it primarily uses old fashioned pattern matching, it does send unrecognized patterns to a stronger AI to get help forming a proper response, and I certainly don’t want it offending my visitors!
As someone looking to build AI features into my application, I definitely want to avoid this kind of jailbreaks in my app.
Right now, there is no good way to guard against this other than removing free form text inputs and using a more form-driven approach to taking user input.