It should work with any type of model, obviously longer chain of thoughts will be more difficult to analyse by the evaluation model, because it will have way more reasoning steps to identify and separate. The quality of the outcome depends a lot on the chosen model to give you insights. We tested with Llama3-70B and worked smoothly most of the times.
We are currently giving broad suggestions with an insight model that can be chosen during the setup. We will try to update and improve the suggestion prompt/code to make them more granular with new releases
Unfortunately LLMs are a gigantic monster to understand, we were considering your same approach with sliding window and we will try to keep the library updated with better and more reliable approaches based on new research papers and our internal tests.
Exactly! Uncertainty is critical to correctly evaluate LLM performance and we don't need reasoning models to spend thousands of tokens on simple questions
Not all people (and/or not in all development phases) granulate commits to something easily describable that is not “update code”. Having mass changes or flow of consciousness style refactorings in a single commit is absolutely normal.
An author doesn’t need to please a repo reader until they see a good reason to do so.
Indeed, that's how most of my project commit logs look like in the startup phase. Eventually i make a commit with a "MVP" message and then I try to go from there with meaningful messages.
Agree! The 'clean commit' is an ideal, not a reality. I just know that looking back on some of my own repo's that I should've included a little more reasoning context, if only intermittently..