Your post made me curious to try a problem I have been coming back to ever since ChatGPT was first released: https://open.kattis.com/problems/low
I have had no success using LLM's to solve this particular problem until trying Gemini 3 just now despite solutions to it existing in the training data. This has been my personal litmus test for testing out LLM programming capabilities and a model finally passed.
Just went to the comments searching for a comment like yours and I'm surprised it seems to be the only one calling this out. My take on this is also that "Skills" is just detailed documentation, which like you correctly point out, basically never exist for any project. Maybe LLM skills will be the thing that finally makes us all write detailed documentation but I kind of doubt it.
I generally find the aversion to documentation comes from one of three places:
* A belief that sufficient documentation means their job is at risk (which, to be fair, is 100% correct in this Capitalist hellscape - ask me how I know first-hand)
* It’s irrelevant since the code will change again in a short amount of time
* A fierce protection over one’s output, sometimes manifesting as a belief that nobody but you could ever understand what you created
Sure, sometimes there’s wholly incompetent developers who can’t even tell you their own dependencies, but I’d like to believe they’re still the exception rather than the rule. As for the value proposition, collaborators and cooperators understand the immense value of good, thorough documentation; those who don’t see the value, at least in my experience, are often adversarial instead of cooperative.
Can we really? All the reporting on climate change definitely has me thinking otherwise. There are options more respectful to our planet than digging tunnels like for example planting trees to help mediate temperatures.
Great resource, definitely a good place to take the next step. As I looked into detail, the natural question came (based on software developing experience), how do I evaluate the correctness of output produced by LLM given the inputs. Clearly, unit test with fixed in/out pairs won't help so learning methods to evaluate as we develop iteratively will be very useful.
While I agree with you in principle give Claude 4 a try on something like: https://open.kattis.com/problems/low .
I would expect this to have been included in the training material as well as solutions found on Github. I've tried providing the problem description and asking Claude Sonnet 4 to solve it and so far it hasn't been successful.
Just remembered some more details. The speaker covers the difference in applying something akin to the scientific method vs jumping to conclusions based on previous encounters of the same/similar issues.
The example here is basically an 8-fold memory saving going from `long[]` from `byte[]` - while still retaining polymorphism (whereas in Java the two are unrelated types).
Hard to say exactly how much performance one would get, as that depends on access patterns.
The reason that a byte array is in reality layed out as a (mostly empty) long array in Java, is actually for performance.
Computers tend to have their memory aligned at 8 byte intervals and accessing such an address is faster than accessing an address that's at an offset of an 8 byte interval.
Of course it depends on your use case, in some cases a compact byte array performs better anyway, for instance because now you're able to fit it in your CPU cache.
> a byte array is in reality layed out as a (mostly empty) long array in Java
Are you saying each byte takes up a word? That is the case in the `char array` in OCaml, but not Java's `byte[]`. AFAIK The size of a byte array is rounded up to words. Byte arrays of length 1-8 all have the same size in a 64-bit machine, then length 7-16 take up one more word.
Lamport's website has his collected works. The paper to start with is "Time, clocks, and the ordering of events in a distributed system." Read it closely all the way to the end. Everyone seems to miss the last couple sections for some reason.
I have had no success using LLM's to solve this particular problem until trying Gemini 3 just now despite solutions to it existing in the training data. This has been my personal litmus test for testing out LLM programming capabilities and a model finally passed.