Hacker Newsnew | past | comments | ask | show | jobs | submit | jampekka's commentslogin

LLMs can interact with the world via e.g. function calling.

A good heuristic is that if an argument resorts to "actually not doing <something complex sounding>" or "just doing <something simple sounding>" etc, it is not a rigorous argument.

That seems somewhat similar to perplexity based detection, although you can just get the probabilities of each token instead of picking n-best, and you don't have to generate.

It kinda works, but is not very reliable and is quite sensitive to which model the text was generated with.

This page has nice explanations:

https://www.pangram.com/blog/why-perplexity-and-burstiness-f...


These figures seem to include ownership of mutual funds.

https://fred.stlouisfed.org/series/WFRBST01122


1491 vs 1418 ELO means the stronger model wins about 60% of the time.

Probably naive questions:

Does that also mean that Gemini-3 (the top ranked model) loses to mistral 3 40% of the time?

Does that make Gemini 1.5x better, or mistral 2/3rd as good as Gemini, or can we not quantify the difference like that?


Yes, of course.

Wow. If all the trillions only produces that small of a diff... that's shocking. That's the sort of knowledge that could pop the bubble.

I wouldn't trust LMArena results much. They measure user preference and users are highly skewed by style, tone etc.

You can litteraly "improve" your model on LMArena by just adding a bunch of emojis.


> Have you tried Polars? It really discourages the inefficient creation of intermediate boolean arrays such as in the code that you are showing.

The problem is not usually inefficiency, but syntactic noise. Polars does remove that in some cases, but in general gets even more verbose (apparently by design), which gets annoying fast when doing explorative data analysis.


> And pandas is essentially a separate programming language.

I'd say dplyr/tidyverse is a lot more a separate programming language to R than pandas is to Python.


I wonder what the last example of "logistics without libraries" would look like in R. Based on my experience of having to do "low-level" R, it's gonna be a true horror show.

In R it's often that things for which there's a ready made libraries and recipes are easy, but when those don't exist, things become extremely hard. And the usual approach is that if something is not easy with a library recipe, it just is not done.


Python: easy things are easy, hard things are hard.

R: easy things are hard, hard things are easy.


The way you describe it, can we say that R was AI-first without even knowing?


R is overtly and heavily inspired by Lisp which was a big deal in AI at one point. They knew what they were doing.


Segmentation doesn't need to count legs. I'd guess something like YOLO could segment 5 legged dogs too.


YOLO is not a segmentation model.



Thanks! TIL there's a class of segmentation models with the YOLO naming scheme.


I thought it was a joke about YAML


Lol you obviously haven't seen what cheats for FPS games look like in the last 3 years.

https://github.com/Babyhamsta/Aimmy


> over a 100€ out-of-pocket

My understanding is that you typically pay something like this in the US for a specialist visit even if you have insurance, especially if you haven't already paid the year's deductibles.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: