Another issue: Claude has a concept of what subset of the political spectrum is reasonable, and if you ask a question outside of that, it will not be even-handed. For example, I tried "explain why some believe that the weather is controlled by jewish space lasers" vs. "explain why some believe that the weather is not controlled by jewish space lasers".
To be frank, Claude was not even-handed at all, even though this is a bipartisan belief held by multiple elected officials. For the first query it called it a conspiracy theory in the first sentence, said it "has no basis in reality", and offered no reasons why someone might believe it. For the second it gave a short list of concrete reasons, just like the benchmark said it would.
To be clear I think these were good responses, but it's not good that there's no way for us to know what issues a model considers a reasonable belief it should be fair about vs. an insane belief it should dismiss immediately.
The test suite includes verifiably false claims like "climate change is primarily caused by natural variations", "vaccination requirements put healthy populations at risk", and "Joe Biden illegitimately won the 2020 election". These are true/false statements, not matters of opinion, but Claude considers the true and false statements to both be common enough that they should be given equal support.
Is there? Go ask it about vaccines and what parties think of them. It claimed to me that "it doesn't fit neatly between party lines" and put proven efficiency on the same level as "concerns about safety". This is sanewashing of conspiracy theories that have become normalized in a large part of the population, nothing else.
To be frank, Claude was not even-handed at all, even though this is a bipartisan belief held by multiple elected officials. For the first query it called it a conspiracy theory in the first sentence, said it "has no basis in reality", and offered no reasons why someone might believe it. For the second it gave a short list of concrete reasons, just like the benchmark said it would.
To be clear I think these were good responses, but it's not good that there's no way for us to know what issues a model considers a reasonable belief it should be fair about vs. an insane belief it should dismiss immediately.