Hacker Newsnew | past | comments | ask | show | jobs | submit | jpkw's commentslogin

I think the above comment was a joke (Claude frequently says that whenever you challenge it, whether you are right or wrong)

At least this once the AI-ism was not spotted.

Goodness no, I chuckled.

Right, I'm just going to teach my dog to do my job then and get free money as my brain is no more magic, special or different to theirs!

Maybe something like a publicly traded company, Citizens can vote directly on individual bills, or choose a proxy to vote on their behalf (and change that choice at any point that desire).

It wouldn't actually change anything, because your single direct vote wouldn't have any outcome at all. It would by negligible compared to all the votes from the proxies, because they represent millions and you just one. It would result in the same outcome, talking to the proxy gets your point better across than voting yourself.

So if the plateau is unanimously declared to have been reached tomorrow OR just one more tiny use case exists tomorrow and all others dwindle away to nothing, than you consider yourself to be correct? What a wild assertion!

If the plateau is reached at some higher level of capability, I will remain correct, yes. If use cases are discovered that do not exist today, I will also be correct. You said it in a silly way but you're directionally correct.

No. You state that this is all that it would take to be considered as tremendous business value. You are moving your goal posts on your point. My point is that you are taking an absolute position that there is tremendous business value in its current form(as a miniscule improvement and one insignificant new use case does does not equate to tremendous business value in itself) and so that remains to be seen.

You either misread or are misrepresenting my statement and either way I am not interested in continuing this.

Hoping someone here may know the answer to this, but do any of the benchmarks that exist currently account for false answers in any meaningful way, other than it would in a typical test (ie, if I give any answer at all it is better than saying "I don't know" as the answer I give at least has a chance of being correct(which in the real world is bad))? I want an LLM that tells me when it doesn't know something. If it gives me an accurate response 90% of the time and an inaccurate one 10% of the time, it is less useful than one that gives me an accurate answer 10% of the time and tells me "I don't know" the other 90%.



Those numbers are too good to expect. If 90% right 10% wrong is the baseline would you take as an improvement:

- 80% right 18% I don't know 2% wrong - 50%/48%/2% - 10%/90%/0% - 80%/15%/5%

The general point being that to reduce wrong answers you will need to accept some reduction in right answers if you want the change to only be made through trade-offs. Otherwise you just say "I'd like a better system" and that is rather obvious.

Personally I'd take like 70/27/3. Presuming the 70% of right answers aren't all the trivial questions.


I think you may have misread. They stated that they'd be willing to go from 90% correct to 10% correct for this tradeoff.


Thanks for the correction


OpenAI uses SimpleQA to assess hallucinations


Dick, take a look out of starboard. Oh my god, it looks like a huge...


- Pecker!

- Oh! Where?

- Wait, that's not a woodpecker. It looks like someone's...


I would love it if this were the solution, embossed card imprinters can work without internet and power and are both fast and intuitive. It worked as a primary method in the past, it can work as a backup method in the future.


Just adding to op, for some (myself included) it is quite painful to see loose in place of lose, this should be fixed asap as it distracts the reader from the content. Lose = opposite of win, loose = opposite of tight.


Should be fixed now! I hope the writing is easier to bare now.


A few other typos (first sentence, though instead of thought, trice instead of thrice, simlpe instead of simple). Hope this helps!


Also, weather -> whether


Agreed with this comment and the one above it - I think the difficulty is spot on but the UI was a bit frustrating on chrome/android and I may have given up sooner but for my love of solving puzzles.


I will jump on the obvious error too but will provide some more context:

You are typing what you hear in: "shouldn't've" which is a double contraction of "should not have"

"I should not of made this" vs "I should not have made this"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: