More

bgirard · 2025-11-04T18:04:26 1762279466

I'm legitimately curious what his overall performance is. Are those just cherry picked trades?

embedding-shape · 2025-11-04T18:23:57 1762280637

As far as I can tell, he been doing lots of puts and "winning" maybe 1/8 of the ones I've known about, so pretty hit and miss history so far.

Havoc · 2025-11-04T18:09:27 1762279767

His wins were quite big so I’d imagine he’s doing really well overall

bgirard · 2025-11-04T18:13:46 1762280026

Is there anything more concrete than that? Large wins on their own aren't meaningful if they aren't good risk adjusted trades or repeatable. I've made big wins but I don't consider myself a good trader.

zipy124 · 2025-11-04T18:49:54 1762282194

From may 2020 to may 2023 they did 56% annualised. Performance apart from that is unreported, so likely lower or negative.

lotsofpulp · 2025-11-04T20:37:32 1762288652

If this is the source, it does not seem like an objective, audited figure:

https://edition.cnn.com/2023/08/15/investing/michael-burry-s...

>Traders following the investments disclosed by Scion’s over the last 3 years (between May of 2020 and May 2023) would have made annualized returns of 56% according to an analysis by Sure Dividend

Seems like Scion Capital could have just disclosed winning trades, that they may or may not have made?

bgirard · 2025-10-28T16:48:03 1761670083

> Another was a code that was inpatient only and because it was an emergency he had never been admitted.

The threads says this was 4 hours of work and they billed for things that weren't even used.

bgirard · 2025-10-28T16:41:39 1761669699

I used AI to deal with customer support when a company tried to assign me the rental contract from the previous owner. ChatGPT correctly quoted the relevant Ontario Consumer Protection Act sections that applied. I just did quick verifications to make sure it wasn't hallucinating (it didn't). They tried to push back, but I had ChatGPT write a few responses standing first and they relented after a few exchanges.

bgirard · 2025-10-21T19:39:24 1761075564

That makes sense. But then the idea of a simultaneous failure preventing, say AWS and GCP/Azure, from restarting due to accidental circular dependencies in the init sequence needed to reboot the two competing cloud infrastructure is interesting.

bgirard · 2025-10-20T21:30:08 1760995808

The big issue I have with this experience is that you don't get a clear charge price before you leave. So you have to check a page either some minutes or hours later and hope that the total is correct. Like the article said, I don't love the idea of being charged for 3 overpriced bottles of water when I only took two. I'd rather just settle my transactions in the moment than try to remember what my total was and dispute things later from memory on the occasional times it's wrong.

teeray · 2025-10-21T04:22:42 1761020562

> you don't get a clear charge price before you leave. So you have to check a page either some minutes or hours later and hope that the total is correct

Oh, I’m very much sure this is a feature. Because, you see, only some percentage of people will actually look at the receipt. Some fraction of them will notice the error. Some fraction of those people will actually be motivated to spend their time on the phone clawing back an extra $8 water. The complement of that small percentage is a lucrative chance to sell the same overpriced water more than once.

bgirard · 2025-10-20T21:22:44 1760995364

> The person in front of me bought two items and saw she got charged for three. Since there were no paper receipts, she took a photo of the machine before going to the guest services to complain. I missed ten minutes of the game getting water.

I wish payment processors / consumer protection would have a significant penalty for sloppy overcharges. I've had to deal with sloppy overcharges like this (one for over $1,000) and you lose a lot of time and the outcome is just 'oppsies, my bad'. There's very little repercussion for sloppy overcharges so it's easy for them to perpetuate.

Doxin · 2025-10-21T07:06:35 1761030395

Back in the olden times overcharging like that would be dealt with the same way as theft would. I'm not entirely convinced it's the wrong way to go about things. It's how the concept of a bakers dozen came to be. Better give everyone 13 in a dozen, just in case you ever miscount.

bgirard · 2025-10-20T20:41:41 1760992901

I have a very similar experience. I was heavily invested in Anthropic/Claude Code, and even after Sonnet 4.5, I'm finding that Codex is performing much better for my game development project.

mmaunder · 2025-10-20T21:54:18 1760997258

It seems particularly good at high performance programming in low level languages.

bgirard · 2025-10-20T20:39:55 1760992795

Looks promising.

I got my environment working well with Codex's Cloud Task. Trying to same repo with Claude Code Web (which started off with Claude Code CLI mind you), and the yarn install just hangs with no debuggable output.

bgirard · 2025-09-26T15:34:34 1758900874

I think that's the real issue. If the LLM spends a lot of context investigating a bad solution and you redirect it, I notice it has trouble ignoring maybe 10K tokens of bad exploration context against my 10 line of 'No, don't do X, explore Y' instead.

rco8786 · 2025-09-26T15:51:25 1758901885

I think the general term for this is "context poisoning" and is related but slightly different to what the poster above you is saying. Even with a "perfect" context, the LLM still can't infer intent.

dingnuts · 2025-09-26T15:47:39 1758901659

that's because a next token predictor can't "forget" context. That's just not how it works.

You load the thing up with relevant context and pray that it guides the generation path to the part of the model that represents the information you want and pray that the path of tokens through the model outputs what you want

That's why they have a tendency to go ahead and do things you tell them not to do..

also IDK about you but I hate how much praying has become part of the state of the art here. I didn't get into this career to be a fucking tech priest for the machine god. I will never like these models until they are predictable, which means I will never like them.

dragonwriter · 2025-09-26T15:57:53 1758902273

This is where the distinction between “an LLM” and “a user-facing system backed by an LLM” becomes important; the latter is often much more than a naive system for maintaining history and reprompting the LLM with added context from new user input, and could absolutely incorporate a step which (using the same LLM with different prompting or completely different tooling) edited the context before presenting it to the LLM to generate the response to the user. And such a system could, by that mechanism, “forget” selected context in the process.

yggdrasil_ai · 2025-09-26T16:24:26 1758903866

I have been building Yggdrasil for that exact purpose - https://github.com/zayr0-9/Yggdrasil

PantaloonFlames · 2025-09-26T17:18:20 1758907100

At least a few of the current coding agents have mechanisms that do what you describe.

keeda · 2025-09-26T16:48:35 1758905315

> I didn't get into this career to be a fucking tech priest for the machine god.

You may appreciate this illustration I made (largely with AI, of course): https://imgur.com/a/0QV5mkS

The context (heheheh) is a long-ass article on coding with AI I wrote eons ago that nobody ever read, if anybody is curious: https://news.ycombinator.com/item?id=40443374

Looking back at it, I was off on a few predictions but a number of them are coming true.

davedx · 2025-09-26T15:59:45 1758902385

Yeah I start a new session to mitigate this. Don’t keep hammering away - close the current chat/session whatever and restate the problem carefully in a new one.

sethhochberg · 2025-09-26T16:59:38 1758905978

I've had great luck with asking the current session to "summarize our goals, conversation, and other relevant details like git commits to this point in a compact but technically precise way that lets a new LLM pick up where we're leaving off".

The new session throws away whatever behind-the-scenes context was causing problems, but the prepared prompt gets the new session up and running more quickly especially if picking up in the middle of a piece of work that's already in progress.

DenisM · 2025-09-26T17:34:00 1758908040

Wow, I had useless results asking “please summarize important points of the discussion” from ChatGPT. It just doesn’t understand what’s important, and instead of highlighting pivoting moments of the conversation it produce a high level introduction for a non-practitioner.

Can you share you prompt?

sethhochberg · 2025-09-26T19:06:59 1758913619

Honestly, I just type out something by hand that is roughly like what I quoted above - I'm not big on keeping prompt libraries.

I think the important part is to give it (in my case, these days "it" is gpt-5-codex) a target persona, just like giving it a specific problem instead of asking it to be clever or creative. I've never asked it for a summary of a long conversation without the context of why I want the summary and who the intended audience is, but I have to imagine that helps it frame its output.

cjbgkagh · 2025-09-26T16:03:52 1758902632

There should be a simple button that allows you refine the context. A fresh LLM could generate a new context from the input and outputs of the chat history, then another fresh LLM can start over with that context.

pulvinar · 2025-09-26T16:23:26 1758903806

It's easy to miss: ChatGPT now has a "branch to new chat" option to branch off from any reply.

PantaloonFlames · 2025-09-26T17:20:13 1758907213

You are saying “fresh LLM” but really I think you’re referring to a curated context. The existing coding agents have mechanisms to do this. Saving context to a file. Editing the file. Clearing all context except for the file. It’s sort of clunky now but it will get better and slicker.

cjbgkagh · 2025-09-26T17:52:15 1758909135

It seems that I have missed this existing feature, I’m only a light user of LLMs, I’ll keep an eye out for it.

fzzzy · 2025-09-26T23:04:07 1758927847

some sibling comments mentioned Claude code has this

adastra22 · 2025-09-26T16:17:55 1758903475

/compact in Claude Code.

moffkalast · 2025-09-26T16:20:07 1758903607

That's not how attention works though, it should be perfectly able to figure out which parts are important and which aren't, but the problem is that it doesn't really scale beyond small contexts and works on a token to token basis instead of being hierarchical with sentences, paragraphs and sections. The only models that actually do long context do so by skipping attention layers or doing something without attention or without positional encodings, all leading to shit performance. Nobody pretrains on more than like 8k, except maybe Google who can throw TPUs at the problem.

spyder · 2025-09-26T17:23:01 1758907381

This is false:

"that's because a next token predictor can't "forget" context. That's just not how it works."

An LSTM is also a next token predictor and literally have a forget gate, and there are many other context compressing models too which remember only the what it thinks is important and forgets the less important, like for example: state-space models or RWKV that work well as LLMs too. But even just a the basic GPT model forgets old context since it's gets truncated if it cannot fit, but that's not really the learned smart forgetting the other models do.

victorbjorklund · 2025-09-26T15:51:45 1758901905

You can rewrite the history (but there are issues with that too). So an agent can forget context. Simply dont feed in part of the context on the next run.

Mikhail_Edoshin · 2025-09-26T20:27:41 1758918461

Well, "a sufficiently advanced technology is indistinguishable from magic". It's just that it is same in a bad way, not a good way.

jofla_net · 2025-09-26T16:21:26 1758903686

Relax friend! I can't see why you'd be peeved in the slightest! Remember, the CEOs have it all figured out and have 'determined' that we don't need all those eyeballs on the code anymore. You can simply 'feed' the machine and do the work of forty devs! This is the new engineering! /s

ericmcer · 2025-09-26T19:44:36 1758915876

It seems possible for openAI/Anthropic to rework their tools so they discard/add relevant context on the fly, but it might have some unintended behaviors.

The main thing is people have already integrated AI into their workflows so the "right" way for the LLM to work is the way people expect it to. For now I expect to start multiple fresh contexts while solving a single problem until I can setup a context that gets the result I want. Changing this behavior might mess me up.

vel0city · 2025-09-26T21:54:23 1758923663

A number of agentic coding tools do this. Upon an initial request for a larger set of actions, it will write a markdown file with its "thoughts" on its plan to do something, and keep notes as it goes. They'll then automatically compact their contexts and re-read their notes to keep "focused" while still having a bit of insight on what it did previously and what the original ask was.

cvzakharchenko · 2025-09-26T22:02:16 1758924136

Interesting. I know people do this manually. But are there agentic coding tools that actually automate this approach?

sshine · 2025-09-26T22:58:21 1758927501

Claude Code has /init and /compact that do this. It doesn’t recreate the context as-is, but creates a context that is presumed to be functionally equivalent. I find that’s not the case and that building up from very little stored context and a lot of specialised dialogue works better.

vel0city · 2025-09-27T02:16:20 1758939380

I've seen this behavior with Cursor, Windsurf, and Amazon Q. It normally only does it for very large requests from what I've seen.

tom_m · 2025-09-28T13:14:38 1759065278

This does help, yes. Todo lists are important. They also reinforce order of operations.

sshine · 2025-09-26T21:48:20 1758923300

> rework their tools so they discard/add relevant context on the fly

That may be the foundation for an innovation step in model providers. But you can achieve a poor man’s simulation if you can determine, in retrospect, when a context was at peak for taking turns, and when it got too rigid, or too many tokens were spent, and then simply replay the context up until that point.

I don’t know if evaluating when a context is worth duplicating is a thing; it’s not deterministic, and it depends on enforcing a certain workflow.

Nerdx86 · 2025-09-27T09:01:13 1758963673

So this is where having subagents fed specific curated context is a help.. As long as the "poisoned" agent can focus long enough to generate a clean request to the subagent, the subagent works posion-free. This is much more likely than a single agent setup with the token by token process of a transformer.

The same protection works in reverse, if a subagent goes off the rails and either self aborts or is aborted, that large context is truncated to the abort response which is "salted" with the fact that this was stopped. Even if the subagent goes sideways and still returns success (Say separate dev, review, and test subagents) the main agent has another opportunity to compare the response and the product against the main context or to instruct a subagent to do it in a isolated context..

Not perfect at all, but better than a single context.

One other thing, there is some consensus that "don't" "not" "never" are not always functional in context. And that is a big problem. Anecdotally and experimental, many (including myself) have seen the agent diligently performing the exact thing following a "never" once it gets far enough back in the context. Even when it's a less common action.

cma · 2025-09-26T16:43:19 1758904999

Not that this shouldn't be fixed in the model, but you can jump to an earlier point in claude code and on web chat interfaces to get it out of the context, just sometimes you have other important stuff you don't want it to lose.

Nerdx86 · 2025-09-27T09:06:09 1758963969

The other issue with this is that if you jump back and it has edited code, it loses the context of those edits.. It may have previous versions of the code in memory and no knowledge of the edits leading to other edits that no longer align.. Often it's better to just /clear.. :/

PantaloonFlames · 2025-09-26T17:17:05 1758907025

Likewise Gemini CLI. There’s a way to backup to a prior state in the dialogue.

reissbaker · 2025-09-26T20:17:38 1758917858

IMO specifically OpenAI's models are really bad at being steered once they've decided to do something dumb. Claude and OSS models tend to take feedback better.

GPT-5 is brilliant when it oneshots the right direction from the beginning, but pretty unmanageable when it goes off the rails.

bgirard · 2025-09-08T17:46:02 1757353562

Did you attach the debugger and see what it was crashing on?

From when I used to work on performance and reliability at Mozilla, these types of user specific crashes were often caused by faulty system library or anti-virus like software doing unstable injections/hooks. Any kind of frequent crashes were also easier to reproduce and as a result fix.

ars · 2025-09-08T20:39:05 1757363945

Here is one of the crash reports: https://crash-stats.mozilla.org/report/index/15999dc2-9465-4...

(I happened to have an un-subitted one which I just submitted, all the other ones I submitted are older than 6 months and have been purged.)

It would crash in random spots, but usually some kind of X, GLX, EGL or similar library.

But I don't think it was GLX, etc, because it also didn't save my profile except very very rarely, which was actually a much worse problem!!

(This crash is from a long time ago, hence the old Firefox version.)

sfink · 2025-09-09T19:56:24 1757447784

I understand this might seem unlikely given that it's working fine as 64-bit, but that crash dump makes me want to suggest running a memory tester. It has "Possible bit flips max confidence" of 25%. Ignore the exact percentage, I don't think it means much, but nonzero is atypical and concerning. (Definitely not a proof of anything!)

"It would crash in random spots" is another piece of evidence. Some legitimate problems really do show up in a variety of ways, but it's way more common for "real" problems to show up in recognizably similar ways.

And having a cluster in graphics stuff can sometimes mean it's triggered by heat or high bus traffic.

I'll admit I'm biased: I work at Mozilla, and I have looked at quite a few bugs now where there were good reasons on the reporter's side as to why it couldn't possibly be bad RAM, and yet the symptoms made no sense, and it ended up being bad RAM. Our hardware is surprisingly untrustworthy these days. But I also happen to be working on things where bad RAM problems are likely to show up, so my priors are not your priors.

capitainenemo · 2025-09-08T23:28:57 1757374137

Is that proprietary nvidia driver stuff in the stack trace?

ars · 2025-09-09T05:16:31 1757394991

Yes, without it YouTube plays at like 5fps. If it matters, that proprietary nvidia stuff is still there with the 64 bit which is rock solid.

Also, I tried disabling EGL in the 32 bit with no effect on the crashing.

capitainenemo · 2025-09-09T23:32:30 1757460750

hm. would be nice to have one of the traces with egl disabled. but I guess this is not really my problem to debug, esp since it's such an edge case (32 bit on 64 bit machine + nvidia blob) and given mozilla is abandoning support anyway.