More

jackfranklyn · 2026-01-13T18:17:55 1768328275

The proxy pattern here is clever - essentially treating the LLM context window as an untrusted execution environment and doing credential injection at a layer it can't touch.

One thing I've noticed building with Claude Code is that it's pretty aggressive about reading .env files and config when it has access. The proxy approach sidesteps that entirely since there's nothing sensitive to find in the first place.

Wonder if the Anthropic team has considered building something like this into the sandbox itself - a secrets store that the model can "use" but never "read".

jackfranklyn · 2026-01-13T13:53:15 1768312395

The accounting pain nicbou mentioned is real. Bank reconciliation seems simple - two lists, match them - but then you hit timing differences where something cleared on different dates in each system, or description mismatches where the bank shows "PAYPAL *ACME" but you recorded "Acme Ltd - Invoice 4521".

Transaction categorisation is arguably worse because there's no universal standard. What one accountant calls "Office Expenses" another puts in "General Admin" - both correct for their context. Any automation that works for one client's books tends to break when you switch to another.

jackfranklyn · 2026-01-13T10:26:25 1768299985

The timezone handling alone makes Temporal worth the wait. I've lost count of how many bugs I've shipped because Date silently converts everything to local time when you least expect it.

The ZonedDateTime type is the real win here - finally a way to say "this is 3pm in New York" and have it stay 3pm in New York when you serialize and deserialize it. With Date you'd have to store the timezone separately and reconstruct it yourself, which everyone gets wrong eventually.

Only downside I can see is the learning curve. Date was bad but it was consistently bad in ways we all memorized. Temporal is better but also much larger - lots of types to choose between.

GoblinSlayer · 2026-01-13T17:42:08 1768326128

>I've lost count of how many bugs I've shipped because Date silently converts everything to local time when you least expect it.

You mean methods like getHours/getUTCHours?

jackfranklyn · 2026-01-12T22:00:21 1768255221

The DoorDash pizza arbitrage comparison is apt. Both cases expose the same fundamental thing: venture-subsidised pricing creates artificial market conditions that clever people will exploit.

What I find interesting is how long these windows stay open. You'd think someone at Stamps.com or UPS would notice the pricing anomaly, but large organisations are often too siloed. The team setting international rates probably doesn't talk to whoever monitors small parcel economics.

The author mentions making a few hundred dollars - but the real question is scalability. At what volume does this become attractive enough for the postal services to close the loophole? There's probably a sweet spot between "not worth their attention" and "actually profitable."

alexfoo · 2026-01-12T22:06:47 1768255607

The USPS have been on the receiving end of it themselves back in 1916 when someone mailed a building:

https://postalmuseum.si.edu/object/npm_2022.2007.1

https://about.usps.com/who-we-are/postal-history/bank-of-ver...

xoxxala · 2026-01-13T02:24:02 1768271042

People used to mail their babies for a while, but the USPS put a kibosh on that.

https://rarehistoricalphotos.com/mailing-babies-postal-servi...

Mailing a building is impressive!

jackfranklyn · 2026-01-11T18:41:23 1768156883

Building CodeIQ - an AI tool that automates transaction coding for accountants and bookkeepers.

The interesting technical bit: it analyses your historic general ledger to reverse-engineer how you specifically categorise transactions. So instead of generic rules, it learns your firm's actual patterns - "oh, they always code Costa Coffee to Staff Welfare, not Refreshments" - that kind of thing.

Posts directly to Xero, QuickBooks, Sage, and Pandle. The VAT handling turned out to be surprisingly gnarly (UK tax rules are... something).

Been working on it about 6 months now. Still figuring out the right balance between automation confidence and "just flag this for human review".

jackfranklyn · 2026-01-10T13:57:49 1768053469

The point about grid size assumptions is interesting. I studied physics and one of the first things you learn is that your choice of coordinate system can make a problem trivially easy or impossibly hard. Same underlying reality, wildly different solution paths.

Reminded me of a pattern I keep seeing in business software - teams spend months optimizing the wrong abstraction. They'll build an incredibly efficient data pipeline that turns out to process information nobody actually needs, or an algorithm that minimizes compute time when the real bottleneck is waiting for a human approval that takes 3 days.

The simulated annealing approach wasn't wrong per se - it's just that "minimise distance walked" was never actually the objective function that mattered to the humans doing the walking.

jackfranklyn · 2026-01-09T16:49:48 1767977388

The measurement problem here is real. "10x faster" compared to what exactly? Your best day or your average? First-time implementation or refactoring familiar code?

I've noticed my own results vary wildly depending on whether I'm working in a domain where the LLM has seen thousands of similar examples (standard CRUD stuff, common API patterns) versus anything slightly novel or domain-specific. In the former case, it genuinely saves time. In the latter, I spend more time debugging hallucinated approaches than I would have spent just writing it myself.

The atrophy point is interesting though. I wonder if it's less about losing skills and more about never developing them in the first place. Junior developers who lean heavily on these tools might never build the intuition that comes from debugging your own mistakes for years.

jackfranklyn · 2026-01-09T09:21:50 1767950510

The quality variation from month to month has been my experience too. I've noticed the models seem to "forget" conventions they used to follow reliably - like proper error handling patterns or consistent variable naming.

What's strange is sometimes a fresh context window produces better results than one where you've been iterating. Like the conversation history is introducing noise rather than helpful context. Makes me wonder if there's an optimal prompt length beyond which you're actually degrading output quality.

wumms · 2026-01-09T09:58:47 1767952727

> Like the conversation history is introducing noise rather than helpful context.

From https://docs.github.com/en/copilot/concepts/prompting/prompt...:

Copilot Chat uses the chat history to get context about your request. To give Copilot only the relevant history:

- Use threads to start a new conversation for a new task

- Delete requests that are no longer relevant or that didn’t give you the desired result

mrtesthah · 2026-01-09T09:57:13 1767952633

Remember that the entire conversation is literally the query you’re making, so the longer it is the more you’re counting on the rational comprehension abilities of the AI to follow it and determine what is most relevant.

jackfranklyn · 2026-01-08T20:46:51 1767905211

The benchmark point is interesting but I think it undersells what the complexity buys you in practice. Yes, a minimal loop can score similarly on standardised tasks - but real development work has this annoying property of requiring you to hold context across many files, remember what you already tried, and recover gracefully when a path doesn't work out.

The TODO injection nyellin mentions is a good example. It's not sophisticated ML - it's bookkeeping. But without it, the agent will confidently declare victory three steps into a ten-step task. Same with subagents - they're not magic, they're just a way to keep working memory from getting polluted when you need to go investigate something.

The 200-line version captures the loop. The production version captures the paperwork around the loop. That paperwork is boring but turns out to be load-bearing.

dfajgljsldkjag · 2026-01-08T20:52:55 1767905575

[flagged]

rd · 2026-01-08T21:45:49 1767908749

Anyone who disagrees with this, please check the OP's previous comments. That's all the proof you need.

And then, as an exercise, ask yourself why you were willing to give this comment leniency?

FrontierProject · 2026-01-08T21:39:18 1767908358

This site has gone full Tower of Babel. I've seen at least a thousand "AI comment" callouts on this site in the last month and at this point I'm pretty sure 99% of them are wrong.

In fact, can someone link me to a disputed comment that the consensus ends up being it's actually AI? I don't think I've seen one.

NitpickLawyer · 2026-01-08T21:46:18 1767908778

You know how the chicken sexers do their thing, but can't explain it? Like they can't write a list of things they check for. And when they want to train new people they have them watch (apprentice style) the current ones, and eventually they also become good at doing it themselves?

It's basically that. I can't explain it (I tried listing the tells in a comment below), but it's not just a list of things you notice. You notice the whole message, the cadence, the phrases that "add nothing". You play with enough models, you see enough generations and you start to "see it".

If you'd like to check for yourself, check that user's comment history. It will become apparent after a few messages. They all have these tells. I don't know how else to explain it, but it's there.

matsemann · 2026-01-08T22:01:36 1767909696

> You know how the chicken sexers

That's certainly a novel and confusing entry in my search history.

lpellis · 2026-01-08T22:32:23 1767911543

I think this might be one of the first times I didnt notice it, but just look through the comment history of https://news.ycombinator.com/threads?id=jackfranklyn , they all look the same.

FrontierProject · 2026-01-09T13:50:11 1767966611

Yeah on a second look GP might actually be on to something here. Jackfranklyn only makes top level comments, never dialogs with anyone, and I count at least 3 instances of "as someone who does this for a living" that are too seperated in scope to be plausibly realistic.

tripdout · 2026-01-08T22:36:48 1767911808

This article reads like AI

dragonwriter · 2026-01-08T21:41:26 1767908486

“Comment I don't like is a bot” is the new “Comment I don’t like is a product of the HN hivemind conspiracy”.

dfajgljsldkjag · 2026-01-08T21:47:16 1767908836

The comment isn't saying anything controversial so why would I dislike it or want an excuse to throw shade on it?

It's a bot. Period.

dragonwriter · 2026-01-08T21:52:55 1767909175

You might notice I wasn't responding to your specific claim about a particular comment but to a later post by a different poster commenting on a wider phenomenon. Perhaps stop trying so hard to insert the idea you want to argue against into posts where it doesn't actually exist just so you can have something to argue about. (Especially given there are many direct responses to your post actually arguing with your claim that you could instead argue with.)

shpongled · 2026-01-08T20:54:18 1767905658

Unclear why you think this is ChatGPT, doesn't read like it at all to me. Many people - myself included - use punctuation to emphasize and clarify.

NitpickLawyer · 2026-01-08T21:29:13 1767907753

The tells are in the cadence. And the not x but y. And the last line that basically says nothing, while using big words. It's like "In conclusion", but worded differently. Enough tells for me to click on their history. They have the exact same cadence on every comment. It's a bit more sophisticated than "chatgpt write a reply", but it's still 100% aigen. Check it out, you'll see it after a few messages in their history.

dfajgljsldkjag · 2026-01-08T21:03:12 1767906192

That comment has tons of AI tells, not simply a few punctuation.

shpongled · 2026-01-08T21:13:29 1767906809

No, it doesn't. The "I'm an expert at AI detection" crowd likes to cite things like "It's not X, it's Y" and other expression patterns without stopping to think that perhaps LLMs regurgitate those patterns because they are frequently used in written speech.

I assign a <5% probability that GP comment was AI written. It's easy to tell, because AI writing has no soul.

NitpickLawyer · 2026-01-08T21:26:48 1767907608

The message is 100% AI written. And if you click on their username and check their comment history you'll see that ALL their comments are "identical". Just do it, you'll see it by the 5th message. No one talks like that. No one talks like that on every message.

ewoodrich · 2026-01-08T22:57:59 1767913079

Exactly, if a comment just feels a little off but you're unsure, do a quick scan of the profile, takes 15-30 seconds at most to get sufficient signal.

If it's actually AI, the pattern becomes extremely obvious reading them back-to-back. If no clear pattern, I'll happily give them the benefit of the doubt at that point. I don't particularly care if someone occasionally cleans up a post with an LLM as long as there is a real person driving it and it's not overused.

The other day on Reddit I saw a post in r/sysadmin that absolutely screamed karma farming AI and it was really depressing seeing a bunch of people defending them as the victim of an anti-AI mob without noticing the entire profile was variations of generic "Does anyone else dislike [Tool X], am I alone? [generic filler] What does everyone else think?" posts.

shpongled · 2026-01-08T21:37:50 1767908270

Looking at their profile I'm inclined to agree. But I think in isolation, this one post isn't setting off enough red flags for me. At the very least, they aren't just using default prompts.

sponnath · 2026-01-08T21:27:55 1767907675

I think at this point it's not easy to accurately detect whether or not something is AI written. A real person can definitely write like this. In fact, that's probably where the LLMs got their writing style from.

prodigycorp · 2026-01-08T21:20:29 1767907229

It doesn’t read like ChatGPT at all. It is well written, hardly a crime for a comment section.

handfuloflight · 2026-01-08T22:36:19 1767911779

Right. It's Claude.

igravious · 2026-01-08T21:06:37 1767906397

GP defo did not tripper my AI slop detector :/

jackfranklyn · 2026-01-08T18:46:50 1767898010

This resonates with something I noticed in client work. A surprising number of "urgent" requests resolve themselves if you wait a day - the person either figures it out, realises they asked the wrong question, or the underlying situation changes.

The tricky part is building enough trust that people don't feel ignored. I've started replying with "I'll look at this tomorrow" rather than going silent. Same delay, but it signals intentionality. People seem fine waiting when they know you've acknowledged the request.

Though I'll admit the line between strategic delay and just being slow is thin when you're managing multiple things at once.