More

jampekka · 2025-12-08T09:28:45 1765186125

LLMs can interact with the world via e.g. function calling.

jampekka · 2025-12-08T09:26:32 1765185992

A good heuristic is that if an argument resorts to "actually not doing <something complex sounding>" or "just doing <something simple sounding>" etc, it is not a rigorous argument.

jampekka · 2025-12-07T10:04:12 1765101852

That seems somewhat similar to perplexity based detection, although you can just get the probabilities of each token instead of picking n-best, and you don't have to generate.

It kinda works, but is not very reliable and is quite sensitive to which model the text was generated with.

This page has nice explanations:

https://www.pangram.com/blog/why-perplexity-and-burstiness-f...

jampekka · 2025-12-04T09:25:17 1764840317

These figures seem to include ownership of mutual funds.

https://fred.stlouisfed.org/series/WFRBST01122

jampekka · 2025-12-02T16:42:34 1764693754

1491 vs 1418 ELO means the stronger model wins about 60% of the time.

supermatt · 2025-12-02T16:50:56 1764694256

Probably naive questions:

Does that also mean that Gemini-3 (the top ranked model) loses to mistral 3 40% of the time?

Does that make Gemini 1.5x better, or mistral 2/3rd as good as Gemini, or can we not quantify the difference like that?

esafak · 2025-12-02T16:54:10 1764694450

Yes, of course.

uejfiweun · 2025-12-03T01:28:18 1764725298

Wow. If all the trillions only produces that small of a diff... that's shocking. That's the sort of knowledge that could pop the bubble.

JustFinishedBSG · 2025-12-03T10:10:25 1764756625

I wouldn't trust LMArena results much. They measure user preference and users are highly skewed by style, tone etc.

You can litteraly "improve" your model on LMArena by just adding a bunch of emojis.

jampekka · 2025-11-26T08:41:22 1764146482

> Have you tried Polars? It really discourages the inefficient creation of intermediate boolean arrays such as in the code that you are showing.

The problem is not usually inefficiency, but syntactic noise. Polars does remove that in some cases, but in general gets even more verbose (apparently by design), which gets annoying fast when doing explorative data analysis.

jampekka · 2025-11-26T08:38:28 1764146308

> And pandas is essentially a separate programming language.

I'd say dplyr/tidyverse is a lot more a separate programming language to R than pandas is to Python.

jampekka · 2025-11-26T08:14:33 1764144873

I wonder what the last example of "logistics without libraries" would look like in R. Based on my experience of having to do "low-level" R, it's gonna be a true horror show.

In R it's often that things for which there's a ready made libraries and recipes are easy, but when those don't exist, things become extremely hard. And the usual approach is that if something is not easy with a library recipe, it just is not done.

debtta · 2025-11-26T11:44:37 1764157477

Python: easy things are easy, hard things are hard.

R: easy things are hard, hard things are easy.

m000 · 2025-11-26T10:39:48 1764153588

The way you describe it, can we say that R was AI-first without even knowing?

nerdponx · 2025-11-26T14:07:26 1764166046

R is overtly and heavily inspired by Lisp which was a big deal in AI at one point. They knew what they were doing.

jampekka · 2025-11-25T13:00:11 1764075611

Segmentation doesn't need to count legs. I'd guess something like YOLO could segment 5 legged dogs too.

chompychop · 2025-11-25T13:01:43 1764075703

YOLO is not a segmentation model.

jampekka · 2025-11-25T13:07:23 1764076043

https://docs.ultralytics.com/tasks/segment/

chompychop · 2025-11-25T13:51:18 1764078678

Thanks! TIL there's a class of segmentation models with the YOLO naming scheme.

lucasban · 2025-11-25T13:26:32 1764077192

I thought it was a joke about YAML

Der_Einzige · 2025-11-25T15:30:20 1764084620

Lol you obviously haven't seen what cheats for FPS games look like in the last 3 years.

https://github.com/Babyhamsta/Aimmy

jampekka · 2025-11-24T09:37:48 1763977068

> over a 100€ out-of-pocket

My understanding is that you typically pay something like this in the US for a specialist visit even if you have insurance, especially if you haven't already paid the year's deductibles.