Is it possible to create really reliable guard rails at all? And aren't those not just stifling undesirable but also better responses? Why not leave the rails off and leave it to users to guard themselves and perhaps suggest better prompts?
As it is, covering up only some of the shortcomings, I think users are inclined to take the results too seriously, not taking possible hallucinations and ingrained prejudices into account.
sure, pick a target, aim for it, and try to hit it, then have people whose job it is to tell you if you hit the target or not. They evaluate the work before your customers do and so you can do things like stop ship when significant bugs are found.
As it is, covering up only some of the shortcomings, I think users are inclined to take the results too seriously, not taking possible hallucinations and ingrained prejudices into account.