More

shpongled · 2025-10-30T01:26:47 1761787607

If you consider correctly citing a source that is explicitly provided in the context via tool use, then sure.

They absolutely cannot correctly cite sources otherwise.

dcre · 2025-10-30T03:13:02 1761793982

Yes, I consider citing a source to be citing a source.

shpongled · 2025-10-29T16:42:11 1761756131

The value proposition is multi-cursor editing, which is very nice

shpongled · 2025-09-26T19:22:32 1758914552

It's not totally novel, but it's very cool to see the continued simplification of protein folding models - AF2 -> AF3 was a reduction in model architecture complexity, and this is a another step in the direction of the bitter lesson.

hashta · 2025-09-26T19:40:59 1758915659

I’m not sure AF3’s performance would hold up if it hadn’t been trained on data from AF2 which itself bakes in a lot of inductive bias like equivariance

shpongled · 2025-09-26T19:21:03 1758914463

Probably because ByteDance and Facebook (spun out into EvolutionaryScale) are doing it

shpongled · 2025-09-26T04:36:11 1758861371

Protein folding is in no way "solved". AlphaFold dramatically improved the state-of-the-art, and works very well for monomeric protein chains with structurally resolved nearest neighbors. It abjectly fails on the most interesting proteins - just go check out any of the industry's hottest undrugged targets (e.g. transcription factors)

Wololooo · 2025-09-26T08:58:00 1758877080

> When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.

"When things are complicated, if I just dream that it is not complicated and solve another problem than the one I have, I find a great solution!"

Joking apart, models that can help target potentially very interesting sub phase space much smaller than the original one, are incredibly useful, but fundamental understanding of the underlying principles, allowing to make very educated guesses on what can and cannot be ignored, usually wins against throwing everything at the wall...

And as you are pointing out, when the complex reality comes knocking in it usually is much much messier...

RHSeeger · 2025-09-26T14:04:34 1758895474

I have your spherical cow standing on a frictionless surface right here, sir. If you act quickly, I can include the "spherical gaussian sphere" addon with it, at no extra cost.

shpongled · 2025-09-24T20:51:26 1758747086

There's no problem with randomness in FP?

You could use a monad/external state for an OS-level RNG, or define a purely functional PRNG

shawn_w · 2025-09-25T02:55:24 1758768924

It's usually quicksorting a linked list, where a random pivot, median of three, etc. are terrible for performance.

(Merge sort is of course the natural sort for lists, but qs is like 2 lines of Haskell so it gets demoed for being clever)

shpongled · 2025-09-17T03:50:09 1758081009

2016 remains one the greatest single player FPS games I've played (Titan Fall 2 is the other)

shpongled · 2025-08-14T03:08:59 1755140939

As someone who loves SML/OCaml and has written primarily Rust over the past ~10 years, I totally agree - I use it as a modern and ergonomic ML with best-in-class tooling, libraries, and performance. Lifetimes are cool, and I use them when needed, but they aren't the reason I use Rust at all. I would use Rust with a GC instead of lifetimes too.

hawk_ · 2025-08-14T07:09:54 1755155394

How do you use Rust without lifetimes?

legobmw99 · 2025-08-14T12:20:15 1755174015

Either a lot of clones or a lot of reference counted pointers. Especially if your point of comparison is a GC language, this is much less of a crime than some people think

shpongled · 2025-08-14T15:20:48 1755184848

When I mean "use" them, I mean make heavy use of them, e.g. structs or functions annotated with multiple lifetimes, data flows designed to borrow data, e.g. You can often get by just with `clone` and lifetime elision, and if you don't need to eke out that last bit of performance, it's fine.

shpongled · 2025-08-05T17:35:12 1754415312

I looked through their torch implementation and noticed that they are applying RoPE to both query and key matrices in every layer of the transformer - is this standard? I thought positional encodings were usually just added once at the first layer

m_ke · 2025-08-05T17:36:46 1754415406

No they’re usually done at each attention layer.

shpongled · 2025-08-05T17:51:14 1754416274

Do you know when this was introduced (or which paper)? AFAIK it's not that way in the original transformer paper, or BERT/GPT-2

spott · 2025-08-05T18:43:52 1754419432

All the Llamas have done it (well, 2 and 3, and I believe 1, I don't know about 4). I think they have a citation for it, though it might just be the RoPE paper (https://arxiv.org/abs/2104.09864).

I'm not actually aware of any model that doesn't do positional embeddings on a per-layer basis (excepting BERT and the original transformer paper, and I haven't read the GPT2 paper in a while, so I'm not sure about that one either).

shpongled · 2025-08-05T20:08:12 1754424492

Thanks! I'm not super up to date on all the ML stuff :)

Scene_Cast2 · 2025-08-05T18:06:35 1754417195

Should be in the RoPE paper. The OG transformers used multiplicative sinusoidal embeddings, while RoPE does a pairwise rotation.

There's also NoPE, I think SmolLM3 "uses NoPE" (aka doesn't use any positional stuff) every fourth layer.

Nimitz14 · 2025-08-05T18:20:17 1754418017

This is normal. Rope was introduced after bert/gpt2

shpongled · 2025-07-24T01:16:01 1753319761

I think you cropped out the important part of the quote:

> It’s rare that I have more than a drink or two in one night.

I don't drink that often any more, but 2-3 drinks in a night, done occasionally is not a problem. I've had weeks where I drink a beer (or two!) every night, and also don't struggle with any alcohol problems.

2 drinks every single night? Leaning that way - and not great for you just from a health/caloric perspective.

Jtsummers · 2025-07-24T01:36:53 1753321013

I always wonder why people would make such obvious selective edits that completely change the meaning of a sentence and quote it as if it was what the author intended.

Do they not think people will notice? Or do they not notice that they've even done it?

TheBoozyGenius · 2025-07-24T01:40:43 1753321243

Maybe they got really excited while reading...