More

jpkw · 2025-12-12T10:18:28 1765534708

I think the above comment was a joke (Claude frequently says that whenever you challenge it, whether you are right or wrong)

jstummbillig · 2025-12-12T13:21:40 1765545700

At least this once the AI-ism was not spotted.

flkiwi · 2025-12-12T18:16:57 1765563417

Goodness no, I chuckled.

jpkw · 2025-12-08T07:30:29 1765179029

Right, I'm just going to teach my dog to do my job then and get free money as my brain is no more magic, special or different to theirs!

jpkw · 2025-12-07T23:05:52 1765148752

Maybe something like a publicly traded company, Citizens can vote directly on individual bills, or choose a proxy to vote on their behalf (and change that choice at any point that desire).

1718627440 · 2025-12-08T10:49:15 1765190955

It wouldn't actually change anything, because your single direct vote wouldn't have any outcome at all. It would by negligible compared to all the votes from the proxies, because they represent millions and you just one. It would result in the same outcome, talking to the proxy gets your point better across than voting yourself.

jpkw · 2025-12-07T19:32:37 1765135957

So if the plateau is unanimously declared to have been reached tomorrow OR just one more tiny use case exists tomorrow and all others dwindle away to nothing, than you consider yourself to be correct? What a wild assertion!

AbrahamParangi · 2025-12-07T20:56:30 1765140990

If the plateau is reached at some higher level of capability, I will remain correct, yes. If use cases are discovered that do not exist today, I will also be correct. You said it in a silly way but you're directionally correct.

jpkw · 2025-12-07T22:45:25 1765147525

No. You state that this is all that it would take to be considered as tremendous business value. You are moving your goal posts on your point. My point is that you are taking an absolute position that there is tremendous business value in its current form(as a miniscule improvement and one insignificant new use case does does not equate to tremendous business value in itself) and so that remains to be seen.

AbrahamParangi · 2025-12-08T01:41:54 1765158114

You either misread or are misrepresenting my statement and either way I am not interested in continuing this.

jpkw · 2025-11-18T23:51:45 1763509905

Hoping someone here may know the answer to this, but do any of the benchmarks that exist currently account for false answers in any meaningful way, other than it would in a typical test (ie, if I give any answer at all it is better than saying "I don't know" as the answer I give at least has a chance of being correct(which in the real world is bad))? I want an LLM that tells me when it doesn't know something. If it gives me an accurate response 90% of the time and an inaccurate one 10% of the time, it is less useful than one that gives me an accurate answer 10% of the time and tells me "I don't know" the other 90%.

terandle · 2025-11-19T00:02:57 1763510577

https://artificialanalysis.ai/evaluations/omniscience

rocqua · 2025-11-19T00:06:59 1763510819

Those numbers are too good to expect. If 90% right 10% wrong is the baseline would you take as an improvement:

- 80% right 18% I don't know 2% wrong - 50%/48%/2% - 10%/90%/0% - 80%/15%/5%

The general point being that to reduce wrong answers you will need to accept some reduction in right answers if you want the change to only be made through trade-offs. Otherwise you just say "I'd like a better system" and that is rather obvious.

Personally I'd take like 70/27/3. Presuming the 70% of right answers aren't all the trivial questions.

fwip · 2025-11-19T04:39:10 1763527150

I think you may have misread. They stated that they'd be willing to go from 90% correct to 10% correct for this tradeoff.

rocqua · 2025-11-19T17:48:55 1763574535

Thanks for the correction

energy123 · 2025-11-19T00:32:43 1763512363

OpenAI uses SimpleQA to assess hallucinations

jpkw · 2025-11-14T01:47:33 1763084853

Dick, take a look out of starboard. Oh my god, it looks like a huge...

NooneAtAll3 · 2025-11-14T08:54:57 1763110497

- Pecker!

- Oh! Where?

- Wait, that's not a woodpecker. It looks like someone's...

jpkw · 2025-10-03T22:01:19 1759528879

I would love it if this were the solution, embossed card imprinters can work without internet and power and are both fast and intuitive. It worked as a primary method in the past, it can work as a backup method in the future.

jpkw · 2025-08-13T07:01:18 1755068478

Just adding to op, for some (myself included) it is quite painful to see loose in place of lose, this should be fixed asap as it distracts the reader from the content. Lose = opposite of win, loose = opposite of tight.

gylterud · 2025-08-13T07:19:00 1755069540

Should be fixed now! I hope the writing is easier to bare now.

ajd555 · 2025-08-13T12:06:17 1755086777

A few other typos (first sentence, though instead of thought, trice instead of thrice, simlpe instead of simple). Hope this helps!

soulofmischief · 2025-08-13T12:37:37 1755088657

Also, weather -> whether

jpkw · 2025-04-12T03:37:30 1744429050

Agreed with this comment and the one above it - I think the difficulty is spot on but the UI was a bit frustrating on chrome/android and I may have given up sooner but for my love of solving puzzles.

jpkw · 2025-02-22T02:47:46 1740192466

I will jump on the obvious error too but will provide some more context:

You are typing what you hear in: "shouldn't've" which is a double contraction of "should not have"

"I should not of made this" vs "I should not have made this"