Do you use a spell checker? We’ll guess you do. Would you use a button that just said “correct all spelling errors in document?” Hopefully not. Your word processor probably doesn’t even offer that as an option. Why? Because a spellchecker will reject things not in its dictionary (like Hackaday, maybe). It may guess the wrong word as the correct word. Of course, it also may miss things like “too” vs. “two.” So why would you just blindly accept AI code review? You wouldn’t, and that’s [Bill Mill’s] point with his recent tool made to help him do better code reviews.
He points out that he ignores most of the suggestions the tool outputs, but that it has saved him from some errors. Like a spellcheck, sometimes you just hit ignore. But at least you don’t have to check every single word.
The basic use case is to evaluate PRs (pull requests) before sending them or when receiving them. He does mention that it would be rude to simply dump the tool’s comments into your comments on a PR. This really just flags places a human should look at with more discernment.
The program uses a command-line interface to your choice of LLM. You can use local models or select among remote models if you have a key. For example, you can get a free key for Google Gemini and set it up according to the instructions for the llm program. Of course, many people will be more interested in running it locally so you don’t share your code with the AI’s corporate overlords. Of course, too, if you don’t mind sharing, there are plenty of tools like GitHub Copilot that will happily do the same thing for you.
The review tool is just a bash script, so it is easy to change, including the system prompt, which you could tweak to your liking:
Please review this PR as if you were a senior engineer.
## Focus Areas
– Architecture and design decisions
– Potential bugs and edge cases
– Performance considerations
– Security implications
– Code maintainability and best practices
– Test coverage## Review Format
– Start with a brief summary of the PR purpose and changes
– List strengths of the implementation
– Identify issues and improvement opportunities (ordered by priority)
– Provide specific code examples for suggested changes where applicablePlease be specific, constructive, and actionable in your feedback. Output the review in markdown format.
Will you use a tool like this? Will you change the prompt? Let us know in the comments. If you want to play more with local LLMs (and you have a big graphics card), check out msty.
Regarding local models, does someone have any recent experiences whether the models are actually usable for tasks like code review now? I tried some llama variant a year ago, and the quality was useless and with low-tier GPU it was way too slow.
It very much depends on the size of the model you are using in addition to the model itself. Personally, I find any models less than a minimum of 24B just simply producing so much garbage they’re completely unuseable and even with 24B models you still have to pay a ton of attention to what, exactly, it’s doing or producing.
Running a 24B model or bigger, let alone at a decent speed? Yeah, you’re gonna need some beefy (and expensive) hardware for that.
We haven’t developed a ton of more and less sophisticated tools deterministicly checking our code according to rules we devised (spell checkers, style checkers, static code analysers, ci test pipelines) just to replace them with some vibing parrot without any regard for the rules whatsoever.
Inability to follow rules of logic is what makes most of thes AI/LLM tools useless in our industries which were built from the ground up on logic. From a shoemaker’s shop to a rocket factory, we strive for repeatability by following rules/recipes. These tools don’t fit that paradigm.
I’ll just leave this here,
https://www.calendar.com/blog/claude-opus-4-achieves-record-performance-in-ai-coding-capabilities/
That’s a puff piece of the highest order, and reads as though it was AI generated itself. I’m with steel man on this one.
I also stand with steel man
I can see the shadows of a dystopian future where programming will be reduced to mere commands to an AI…. and the future is nigh