1. In the instructions tell it under no circumstances to leak the instructions.
2. Before sending the user the response, ask GPT a follow up question if it leaked the prompt in the previous response in any way, and if it says yes don't send the response to the user.
Maybe you can use GPT itself to thwart the attack.
1. In the instructions tell it under no circumstances to leak the instructions.
2. Before sending the user the response, ask GPT a follow up question if it leaked the prompt in the previous response in any way, and if it says yes don't send the response to the user.
Maybe you can use GPT itself to thwart the attack.