It span across 9000 LOC, 63 new files, including a DSL parser and much more.
How would you go about reviewing a PR like this?
It span across 9000 LOC, 63 new files, including a DSL parser and much more.
How would you go about reviewing a PR like this?
76 comments
If expectations have been shared and these changes contradict them, you can quickly close the PR, explain why it's not acceptable, and ask them to redo it.
If you don't have clear guidelines on AI usage or haven't shared your expectations, you'll need to review the PR more carefully. First, verify whether your assumption that it’s a simple service is accurate. If it is, talk to the author and point out that it's more complicated than necessary. You can also ask if they used AI and warn them about the complexities it can introduce.
For work? Close it and remind them that their AI velocity doesn't save the company time if it takes me many hours (or even days depending on the complexity of the 9k lines) to review something intended to be merged into an important service. Ask them to resubmit a smaller one and justify the complexity of things like a DSL if they wanted it included. If my boss forces me to review it, then I do so and start quietly applying for new jobs where my job isn't to spend 10x (or 100x) more time reviewing code than my coworkers did "writing" it.
Another equally correct approach (given the circumstances of the organisation) is to get a different AISlopBot to do the review for you, so that you spend as much time reviewing as the person who submitted the PR did coding.
In this job market? And where pretty much every company seems to be following the top-down push for AI-driven "velocity"?
If AI slop infiltrates projects enterprises are built upon, its likely companies and their customers are metaphorically hurt too, because of a spike in outages etc... (which already happens given AWS got like 7000 outage reports after getting rid of another 14000 employees).
Yes AI can be cool, but can we stop being this blind regarding its limitations, usecases, how its actually used, how it actually benefits humanity, and so on? Like give me a valid reason for Sora existing (except for monetizing attentionspans of humans, which I consider highly unethical).
Otherwise you need to be the person at the company who cuts through the bullshit and saves it from when the VibeCodeTechDebt is popping the industry.
Depends on the context. Is this from:
1. A colleague in your workplace. You go "Hey ____, That's kind of a big PR, I am not sure I can review this in a reasonable time frame can you split it up to more manageable pieces? PS: Do we really need a DSL for this?"
2. A new contributor to your open source project. You go "Hey ____, Thanks for your interest in helping us develop X. Unfortunately we don't have the resources to go over such a large PR. If you are still interested in helping please consider taking a swing at one of our existing issues that can be found here."
3. A contributor you already know. You go "Hey I can't review this ___, its just too long. Can we break it up to smaller parts?"
Regardless of the situation be honest, and point out you just can't review that long a PR.
Edit: left out that the user got flamed by non contributors for their apparently AI generated PR and description (rude), in defense of which they did say they were using several AI tools to drive the work. :
We have a performance working group which is the venue for discussing perf based work. Some of your ideas have come up in that venue, please go make issues there to discuss your ideas
my 2 cents on AI output: these tools are very useful, please wield them in such a way that it respects the time of the human who will be reading your output. This is the longest PR description I have ever read and it does not sound like a human wrote it, nor does it sound like a PR description. The PR also does multiple unrelated things in a single 1k line changeset, which is a nonstarter without prior discussion.
I don't doubt your intention is pure, ty for wanting to contribute.
There are norms in open source which are hard to learn from the outside, idk how to fix that, but your efforts here deviate far enough from them in what I assume is naivety that it looks like spam.
“AI Slop attacks on the curl project” https://youtu.be/6n2eDcRjSsk
Much more so than before, I'll comfortably reject a PR that is hard to follow, for any reason, including size. IMHO, the biggest change that LLMs have brought to the table is that clean code and refactoring are no longer expensive, and should no longer be bargained for, neglected or given the lip service that they have received throughout most of my career. Test suites and documentation, too.
(Given the nature of working with LLMs, I also suspect that clean, idiomatic code is more important than ever, since LLMs have presumably been trained on that, but this is just a personal superstition, that is probably increasingly false, but also feels harmless)
The only time I think it is appropriate to land a large amount of code at once is if it is a single act of entirely brain dead refactoring, doing nothing new, such as renaming a single variable across an entire codebase, or moving/breaking/consolidating a single module or file. And there better be tests. Otherwise, get an LLM to break things up and make things easier for me to understand, for crying out loud: there are precious few reasons left not to make reviewing PRs as easy as possible.
So, I posit that the emotional reaction from certain audiences is still the largest, most exhausting difference.
Are you contending that LLMs produce clean code?
I 100% know what you mean, and largely agree, but you should check out the guidelines, specifically:
> Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.
And like, the problem _is_ *bad*. A fun, on-going issue at work is trying to coordinate with a QA team who believe chatgpt can write css selectors for HTML elements that are not yet written.
That same QA team deeply care about the spirit of their work, and are motivated by, the _very_ relatable sentiment of, you DONT FUCKING BREAK USER SPACE.
Yeah, in the unbridled, chaotic, raging plasma that is our zeitgeist at the moment, I'm lucky enough to have people dedicating a significant portion of their life to trying to do quality assurance in the idiomatic, industry best-standard way. Blame the FUD, not my team.
I would put to you that the observation that they do not (yet) grok what, for lack of a more specific universally understood term we are calling, "AI" (or LLMs if you are Fancy. But of course none of these labels are quite right). People need time to observe, and learn. And people are busy with /* gestures around vaguely at everything /*.
So yes, we should acknowledge that long-winded trash PRs from AI are a new emergent problem, and yes, if we study the specific problem more closely we will almost certainly find ever more optimal approaches.
Writing off the issue as "stupidity" is mean. In both senses.
No. We need spam filters for this stuff. If it isn't obvious to you yet, it will be soon. (Or else you're one of the spammers.)
If PR author can satisfy it - I'm fine with it.
Possibly you reject it with "this seems more suitable for a fork than a contribution to the existing project". After all there's probably at least some reason they want all that complexity and you don't.
"review it like it wasn't AI generated" only applies if you can't tell, which wouldn't be relevant to the original question that assumes it was instantly recognisable as AI slop.
If you use AI and I can't tell you did, then you're using it effectively.
After pointing out 2-3 things, you can just say that the quality seems too low and to come back once it meets standards. Which can include PR size for good measure.
If the author can't explain what the code does, make an explicit standard that PR authors must be able to explain their code.
While it would be nice to ship this kind of thing in smaller iterative units, that doesn’t always make sense from a product perspective. Sometimes version 0 has bunch of requirements that are non-negotiable and simply need a lot of code to implement. Do you just ask for periodic reviews of the branch along the way?
Are you hiding them from CIA or Al-Qaeda?
Feature toggles, or just plain Boolean flag are not rocket science.
- Chains of manageable, self-contained PRs each implementing a limited scope of functionality. “Manageable” in this context means at most a handful of commits, and probably no more than a few hundred lines of code (probably less than a hundred tbh).
- The main branch holds the latest version of the code, but that doesn’t mean it’s deployed to production as-is. Releases are regularly cut from stable points of this branch.
- The full “product” or feature is disabled by a false-by-default flag until it’s ready for production.
- Enablement in production is performed in small batches, rolling back to disabled if anything breaks.
If it's a newcomer to the project, a large self contained review is more likely to contain malware than benefits. View with suspicion.
That alone should be the reason to block it. But LLM generated code is not protected by law, and by extension you can damage your code base.
My company does not allow LLM generated code into anything that is their IP. Generic stuff outside of IP is fine, but every piece has to flagged that it is created by an LLM.
In short, these are just the next evolution of low quality PRs.
Accepting code into the project when only one person (the author) knows what it does is a very bad idea. That's why reviews exist. Accepting code that zero persons know what it does is sheer screaming insanity.
that's the point though, if they can't do it, then you close the ticket and tell them to fork off.
“This PR is really long and I’m having a hard time finding the energy to review it all. My brains gets full before I get to the end. Does it need to be this long?”
Force them to make a case for it. Then see how they respond. I’d say good answers could include:
- “I really trieeld to make it smaller, but I couldn’t think of a way, here’s why…”
- “Now that I think about it, 95% of this code could be pushed into a separate library.”
- “To be honest, I vibe coded this and I don’t understand all of it. When I try to make it smaller, I can’t find a way. Can we go through it together?”
In your case, I'd just reject it and ensure repo merges require your approval.
Personally I think it's difficult to address these kinds of PR's but I also think that git is terrible at providing solutions to this problem.
The concept of stacked PR's are fine up to the point where you need to make changes throughout all yours branches, then it becomes a mess. If you (like me) might have a tendency to rewrite your solution several times before ending up with the final result, then having to split this into several PR's does not help anyone. The first PR will likely be outdated the moment I begin working on the next.
Open source is also more difficult in this case because contrary to working for a company with a schedule, deadlines etc... you can't (well you shouldn't) rush a review when it's on your own time. As such PR's can sit for weeks or months without being addressed. When you eventually need to reply to comments about how, why etc.. you have forgotten most of it and needs to read the code yourself to re-claim the reasoning. At that time it might be easier to re-read a 9000 lines PR over time rather than reading 5-10 PR's with maybe meaningful descriptions and outcome.
Also, if it's from a new contributor, I wouldn't accept such a PR, vibe coded or not.
One suggestion that possibly is not covered is that you/we can document clearly how AI generated PRs will be handled, make it easy for contributors to discover it and if/when such PR shows up refer the documented section to save yourself time.
"Thanks for the effort, but my time and energy is limited and I can't practically review this much code, so I'm closing this PR. We are interested in performance improvements, so you are welcome to pick out your #1 best idea for performance improvement, discuss it with the maintainers via ..., and then (possibly) open a focused PR which implements that improvement only."
As a reviewer or as a submitter?
So to me, it's less about being ridiculous (and "ridiculous" is a fighting word) and more a simple "that's not how this team does things because we don't have the resources to work that way."
Mildly hurt feelings in the most likely worst case (no food for a viral overtop tweet). At best recruitment of someone with cultural fit.
My experience is “too large/complex” provides an opening for arguementivenes and/or drama.
“We don’t do it like this” does not so much. It is social, sufficient and not a matter of opinion (“too” is a matter of opinion).
The PR would then be split into small ones up to 400 lines long.
In truth, such a big PR is an indicator that either (a) the original code is a complete mess and needs reengineering or more likely (b) the PR is vibe coded and is making lots of very poor engineering decisions and goes in the bin.
We don’t use AI agents for coding. They’re not ready. Autocomplete is fine. Agents don’t reason like engineers, they make crap PRs.
10 lines of code = 10 issues.
500 lines of code = "looks fine."
Code reviews.
Personally, I've felt drained dealing with small PRs fixing actual bugs by enthusiastic students new to projects in the pre-slop era.
Particularly if I felt they were doing it more to say they'd done it, rather than to help the project.
I imagine that motive might help drive an increase in this kind of thing.
The question itself doesn't matter. Just ask something. If their answer is genuine and making sense you deal with it like a normal PR. If their answer is LLM-generated too then block.
The you can say (and this is hard), this looks like it is vibe code and misses that first human pass we want to see in these situations (link), please review and afterwards feel free to (re)submit.
In my experience they'll go away. Or they come back with something that isn't cleaned up and you point out just one thing. Or sometimes! they actually come back with the right thing.
How does one handle that with tact and not lose their minds?
If you don't have test coverage, or if the "refactor" is also changing behaviour, that project is probably dead. Make sure there's a copy of the codebase from before the new lead joined so there's a damage mitigation roll back option available.
State the PR is too large to be reviewed, and ask the author to break it down into self-contained units.
Also, ask which functional requirements the PR is addressing.
Ask for a PR walkthrough meeting to have the PR author explain in detail to an audience what they did and what they hope to achieve.
Establish max diff size for PRs to avoid this mess.
This doesn't help all the time. There are those people who still keep finding things they want you to change a week after they first reviewed the code. I try to avoid including them in the code review. The alternative is to talk to your manager about making some rules, like giving reviewers only a day or two to review new code. It's easy to argue for that because those late comments really hinder productivity.
That data point is waaaaaay more important than any other when considering if you should think about reviewing it or not.
I would ask them to break it up into smaller chunks.
If you are lucky, they will also vibe fix it.
If I can write 8 9k line PRs everyday and open them against open source projects, even closing them let alone engaging with them in good faith is an incredible time drain vs the time investment to create them.
The only exception is some large migration or version upgrade that required lots of files to change.
As far it goes for Vibe coded gigantic PRs It's a straight reject from me.
In my eyes, there really shouldn't be more than 2-3 "full" files worth of LOC for any given PR (which should aim to to address 1 task/bug each. If not, maybe 2-3 at most), and general wisdom is to aim to keep "full" files around 600 LOC each (For legacy code, this is obviously very flexible, if not infeasible. But it's a nice ideal to keep in mind).
An 1800-2000 LOC PR is already pushing what I'd want to review, but I've reviewed a few like that when laying scaffolding for a new feature. Most PR's are usually a few dozen lines in 4-5 files each, so it's far below that.
9000 just raises so many red flags. Do they know what problem they are solving? Can they explain their solution approach? Give general architectual structure to their implementation? And all that is before asking the actual PR concerns of performance, halo effects, stakeholders, etc.
I'll just assume good intent first of all. Second, 9000 LOC spanning 63 lines is not necessarily an AI generated code. It could be a code mod. It could be a prolific coder. It could be a lot of codegen'd code.
Finally, the fact that someone is sending you 9000 LOC code hints that they find this OK, and this is an opportunity to align on your values. If you find it hard to review, tell them that I find it hard to review, I can't follow the narrative, its too risky, etc. etc.
Code review is almost ALWAYS an opportunity to have a conversation.
- Large prs - vibe coding - development quality”
Was your project asking for all this? No? Reject.
As someone on the "senior" side AI has been very helpful in speeding up my work. As I work with many languages, many projects I haven't touched in months and while my code is relatively simple the underlying architecture is rather complex. So where I do use AI my prompts are very detailed. Often I spot mistakes that get corrected etc. With this I still see a big speedup (at least 2x,often more). The quality is almost the same.
However, I noticed many "team leads" try to use the AI as an excuse to push too difficult tasks onto "junior" people. The situation described by the OP is what happens sometimes.
Then when I go to the person and ask for some weird thing they are doing I get "I don't know, copilot told me"...
Many times I tried to gently steer such AI users towards using it as a learning tool. "Ask it to explain to you things you don't understand" "Ask questions about why something is written this way" and so on. Not once I saw it used like this.
But this is not everyone. Some people have this skill which lets them get a lot more out of pair programming and AI. I had a couple trainees in the current team 2 years ago that were great at this. This way as "pre-AI" in this company, but when I was asked to help them they were asking various questions and 6 months later they were hired on permanent basis. Contrast this with: - "so how should I change this code"? - You give them a fragment, they go put it in verbatim and come back via teams with a screenshot of an error message...
Basically expecting you will do the task for them. Not a single question. No increased ability to do it on their own.
This is how they try to use AI as well. And it's a huge time waster.
As example, you have made summarization app. User is try to upload 1 TB file. What you do? Reject request.
You have made summarization app. User is try upload 1 byte file 1000 times. What you do? Reject request.
However, this is for accidental or misconfigured user. What if you have malicious user? There are many technique for this as well: hell-ban, tarpit, limp.
For hell-ban simply do not handle request. It appear to be handled but is not.
For tarpit, raise request maker difficulty. e.g. put Claude Code with Github MCP on case, give broad instructions to be very specific and request concise code and split etc. etc. then put subsequent PRs also into CC with Github MCP.
For limp, provide comment slow using machine.
Assuming you're not working with such person. If working with such person, email boss and request they be fired. For good of org, you must kill the demon.
A simple email could tell the difference.
Unless you really trust them, it's up to the contributor to make their reasoning work for the target. Else, they are free to fork it if it's open source :).
I am a believer in using llm codegen as a ride along expert, but it definitely triggers my desire to over test software. I treat most codegen as the most junior coder had written it, and set up guardrails against as many things llm and I can come up with.
I once had someone submit a patch (back in the SVN days), that was massive, and touched everything in my system. I applied it, and hundreds of bugs popped up.
I politely declined it, but the submitter got butthurt, anyway. He put a lot of work into it.
PRs should be under 1000 lines.
The alternative is to sit down with them and ask what they're trying to accomplish and solve the problem from that angle.
(ctrl-v)
If they don't bother writing the code, why should you bother reading it? Use an LLM to review it, and eventually approve it. Then of course, wait for the customer to complain, and feed the complaint back to the LLM. /s
Large LLM generated PRs are not a solution. They just shift the problem to the next person in the chain.
And if they did in fact spend 6 months painstakingly building it, it wouldn't hurt to break it down into multiple PRs. There is just so much room for error reviewing such a giant PR.
Sometimes it doesn't split it among optimal boundaries, but it's usually good enough to help. There's probably room for improvement and extension (eg. re-splitting a branch containing many not-logical commits, moving changes between commits, merging commits, ...) – contributions welcome!
You can install it as a Claude Code plugin here: https://github.com/KevinWuWon/kww-claude-plugins (or just copy out the prompt from the repo into your agent of choice)
Congratulations, you discovered that generating code is only part of software development process. If you don't understand what the code is actually doing, good luck maintaining it. If it's never reviewed, how do you know these tests even test anything? Because they say "test passed"? I can write you a script that prints "test passed" a billion times - would you believe it is a billion unit tests? If you didn't review them, you don't have tests. You have a pile of code that looks like tests. And "it takes too long to review" is not an excuse - it's like saying "it's too hard to make a car, so I just took a cardboard box, wrote FERRARI on it and sit inside it making car noises". Fine, but it's not a car. It's just pretending. If it's not properly verified, what you have is not tests, it's just pretending.