The Claude Code Framework Wars

shmck.substack.com

119 points by ShMcK a day ago

grim_io 15 hours ago

I've tried some of those "frameworks" for claude code, but it's difficult to measure any objective improvement.

I tend to lean towards them being snake oil. A lot of process and ritual around using them, but for what?

I don't think the models themselves are a good fit for the way these frameworks are being used. It probably goes against their training.

Now we try to poison the context with lots of (for my actual task at hand) useless information so that the model can conform to my superficial song-and-dance process? This seems backwards.

I would argue that we need less context poisoning with useless information. Give the model the most precise information for the actual work to be done and iterate upon that. The song and dance process should happen outside of the context constrained agent.

bicx 14 hours ago

I adopted a couple practices (using dev containers and worktrees) just to make life a little easier. I also built my own shell script “framework” to help manage the worktrees and create project files. However, that took me just a couple days to do on my own (also using CC), and it didn’t lock me into a specific tool.
I do agree that context poisoning is a real thing to watch out for. Coincidentally, I’d noticed MCP endpoint definitions had started taking a substantial block of context for me (~20k tokens), and that’s now something I consider when adopting any MCP.
- grim_io 13 hours ago
  
  I'm considering removing serena MCP, since cc got better with its own tools.
  The new /context cc command is great for visualizing what uses how much of the context.
  On the other hand, I'm curious about dagger's container-use MCP. https://container-use.com/agent-integrations
nicwolff 8 hours ago

This article doesn't mention "subagents" https://docs.anthropic.com/en/docs/claude-code/sub-agents which makes me wonder when it was written. I'm finding that just delegating "scan the memory bank for information relevant to the current task" and "run the unit and functional tests and report back only the failures or coverage" to subagents does a lot to keep the main agent's context from filling up.
- 3uler 3 hours ago
  
  Many of the frameworks mentioned leverage subagents
- bdangubic 8 hours ago
  
  > which makes me wonder when it was written
  On Sept 6, 2025
montroser 15 hours ago

It's kinda the same as real life project managers.

musbemus 21 hours ago

One thing I hope to see included is a precursor step when constructing specs where Claude is used to intelligently inquire about gaps to fill that would disambiguate the implementation. If you told an engineer to do something with a set of requirements and outcomes, they'd naturally also have follow-up questions to ensure alignment before executing.

joshstrange 20 hours ago

Yes, kind of like open AI’s deep research tool. I often find that a number of mistakes are made because no clarification questions or ask or even considered.

CompoundEyes a day ago

For anyone that’s applied one of these at what level of autonomy are you using it? And in what setting? Greenfield?

I see one mention brownfield development. Has anyone with experience using these frameworks fired up Claude Code on enterprise software and had confident results? I have unchecked access to Claude Code at work and based on personal agentic coding I’m sure they do aid it. I have decent but not consistent results with my own “system” in our code base. At least until the front end UI components are involved even with Playwright. But I’m curious — how much litter is left behind? How is your coworker tolerance? How large are your pull requests? What is your inference cost? How do these manage parallel?

The README documentation for many have a mix of fevered infomercial, system specific jargon, emoji splatter and someone’s dad’s very specific toolbox organization approach only he understands. Some feel like they’re setting the stage to sell something…trademarked!? Won’t Anthropic and others just incorporate the best of the bunch into their CLI tools in time?

Outside of work I’ve regularly used a reasoning model to produce a ten page spec, wired my project with strictest lint, type check, formatter, hooks, instruct it to check off as it goes and do red green TDD. I can tell gpt-5 in Cursor to “go”, occasionally nudge to stay on task and “ok next” then I’ll end up with what I wanted in time plus gold plating. The last one was a CLI tool for my agent to invoke and track their own work. Anyone with the same tools can just roll their own.

dsiegel2275 20 hours ago

I'm only three weeks into using Claude Code but I'm now seeing impressive results using a structured, "role" or "persona" based approach in a large (500K+ SLOC) Elixir / Phoenix codebase. I'm using the $200 Max plan - so my inference costs are fixed.
For certain, the results are better when I use it to build new features into our platform - as opposed to making complicated refactors or other deep changes to existing parts of the system. But even in the latter case, if we have good technical documentation capturing the design and how parts of the system work (which we don't in many places), Claude Code can make good progress.
At first I was seeing a fair amount of what I would consider "bad code" - implementation and code that either didn't follow accepted coding style and patterns or that simply wasn't structured for reusability, maintainability. But after strengthening the CLAUDE.md file and adding an "elixir-code-reviewer" subagent which the "developer" persona had to use - the quality of code improved significantly.
Our platform is open source, you can see our current Claude commands and subagents here: https://github.com/Simon-Initiative/oli-torus/tree/master/.c...
- fny 20 hours ago
  
  I didn't realize just how bad LLMs are with unpopular languages until now:
  "Elixir lists do not support index based access via the access syntax"
  "Never use else if or elseif in Elixir, always use cond or case for multiple conditionals."
  - LiamPowell 19 hours ago
    
    Interestingly I've found that Claude is very good at writing valid Ada, it just writes complete garbage that doesn't follow the specification at all. As an example I asked it to change the handling of command line arguments in a program so that duplicates would cause an error rather than being ignored. If I recall correctly it took 6 course corrections to get to what I asked for, but each time it wrote valid code that just didn't complete the task. One I remember was arbitrarily limiting the length of an argument to 4096 characters and then only accepting arguments that were exactly 4096 characters.
    Here is the relevant change, it didn't have any sort of hidden complexity: https://github.com/Prunt3D/prunt/commit/b4d7f5e35be6017846b8...
    
    fny 18 hours ago
    
    I'm pretty convinced the only developers who think we're on the cusp of AGI code exclusively in Python or JavaScript.
    
    3uler 2 hours ago
    
    The problem is that there is a lot of bad python and typescript/javascript out there, and I similarly find my self having to define my coding style in context files in order to get decent results in newer code bases without a lot of examples to follow. And even then you need to say do it like @example.py all the time.
    Maybe the future is fine-tuned models on specific coding styles?
    
    Leynos 18 hours ago
    
    GPT-5 is a dab hand at Rust
    
    evolve2k 10 hours ago
    
    I was hearing that it was good with writing Ruby on Rails as the rails community is so structured already in where things go and sorta the Rails way of writing the code. Anyone have experience with this?
- iandanforth 20 hours ago
  
  Thanks for sharing that extensive documentation!

stocksinsmocks 20 hours ago

I got a strong sense of LLM style in the blog. Interesting information, but that I’m learning about AI from AI is amusing.

stingraycharles 20 hours ago

Yeah, unfortunately this is very often the case with articles about AI.
In my own experience, this type of stuff is just wishful thinking right now: for anything non-trivial, you still need to monitor Claude Code closely and interrupt when you discover it goes on the wrong train of thought.
Additionally, for security reasons, you don’t want it to give it too many permissions, and/or actually see which commands it’s executing.
The “frameworks” OP talks about are still far away. Right now the best way to think about it is an intern which is usually wrong but can cramp out code at lightning speed.
Szpadel 19 hours ago

looks like either author didn't checked well repos that link or it's indeed redacted deep research output.
Examples: superClaude is not mcp server at all metaGPT looks like is not compatable with Claude code at all

matt3D a day ago

Pretty surprised BMAD-method wasn't mentioned.

For my money it's by far the best Claude Code compliment.

touristtam 35 minutes ago

BMAD is mentioned in the QA part, FWIW.
3uler a day ago

The BMAD system seems similar to the AgentOS mentioned in the post.
This way of context engineering has definitely been the way to go for me, although I’ve just implemented it myself… using Claude to help generate commands and agents and tweaking them to my liking, lately been using json as well as markdown to share context between steps.
CGamesPlay a day ago

What is this? Just a system prompt? What makes it so good for you?
https://github.com/bmad-code-org/BMAD-METHOD
- matt3D 21 hours ago
  
  It manifests as a sort of extension for Claude Code.
  When I'm in the terminal I can call on Agents who can create standardised documents so there is a memory of the product management side of things that extends beyond the context window of Claude.
  It guides you through the specification process so that you have extremely tight tasks for Claude to churn through, with any context, documentation and acceptance criteria.
  Perhaps there are others similar, but I have found it completely transformative.
- 3uler a day ago
  
  It’s basically a set of commands and agents and a way to structure context.
  - touristtam 33 minutes ago
    
    It is Agile for interactive session with an LLM.
- imiric 19 hours ago
  
  This amuses me to no end: https://github.com/bmad-code-org/BMAD-METHOD/issues/546
  An AI tool finding issues in a set of YAML and Markdown files generated by an AI tool, and humans puzzled by all of it.
  > We should really have some code reviewer...
  Gemini to the rescue!
MarcelOlsz a day ago

Same with taskmaster, also not there.
- 3uler a day ago
  
  I never found taskmaster that useful, something about how it forced you to work didn’t click with me…
  - MarcelOlsz 13 hours ago
    
    Yeah that's fair, it doesn't feel great. It does work if you have something very concrete you want to make and know how to do it, so its pretty easy to scope out into some tasks and subtasks, but working on something where you generate it as you go and requires editing tasks its pretty bad.

faangguyindia 10 hours ago

I often wonder why agent is not allowed to manage its own context like humans do?

Why recycle full history into every future turn until you run out of context window?

Perhaps letting agent manage its own context while knowing what an effective context and the harm or going over context or smartly making that tradeoff, it can navigate the tasks better?

vivzkestrel 5 hours ago

what is your opinion about the research paper that got published recently which claims that AI coding actually slows you down 20%

3uler 2 hours ago

That’s like saying a vim expert would be slower in VS Code - technically the IDE does more, but expertise with your existing tools often beats learning new ones.
Also that study was from early 2025 before Claude 4 which to me was a big break through in productivity, I did not really find these tools too useful before using sonnet 4.
- vivzkestrel 39 minutes ago
  
  but has that cut down your prompting time, i assume an AI agent would take a fixed amount of time to generate N lines of code. Constructing effective prompts is probably where most time is spent, has this time been cut down with newer releases or has it been proved somehow that we need N less prompts to achieve the same result with newer AI models?

anonyfox 13 hours ago

isn't this textbook "bitter lesson" playing out here again? whatever "frameworks" people try to build, the next generation of models will make them obsolete, no?

---

link: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

touristtam 5 minutes ago

The models are still ingesting text, are they not? Those framework are providing textual guidance to what the task at hand should be aim for. Those are formalising part of the context passed to the LLM, regardless of the model itself.

g42gregory 14 hours ago

I tried B-MAD Framework and it was like night and day. Can’t work without it. I’d like to see more frameworks like that.

sharts 11 hours ago

context management seems like low level programming where you need to carefully place the right things on cpu registers to do the operation you want correctly.

On difference is that we have less control of the context to add/remove things per task necessary.

AbuAssar a day ago

did anyone try any of these so called frameworks? do they deliver or just riding the hype-wagon?

stuartjohnson12 20 hours ago

Having occasionally looked into these, the overwhelming issue is that pretty much all of them are built with themselves and the output is what you'd expect - vast quantities of untested functionality, no documentation beyond a wall of Claude-isms about how next generation and feature rich it is, and no thought to applicability beyond the narrow set of projects that interest the author.

troupo a day ago

> a set of rules, roles, and workflows that make its output predictable and valuable.

Let me stop you right there. Are you seriously talking about predictable when talking about a non-deterministic black box over which you have no control?

andsoitis a day ago

> Are you seriously talking about predictable when talking about a non-deterministic
Predictability and determinism are related but different concepts.
A system can be predictable in a probabilistic sense, rather than an exact, deterministic one. This means that while you may not be able to predict the precise outcome of a single event, you can accurately forecast the overall behavior of the system and the likelihood of different outcomes.
https://philosophy.stackexchange.com/questions/96145/determi...
Similarly, a system can be deterministic yet unpredictable due to practical limitations like sensitivity to initial conditions (chaos theory), lack of information, or the inability to compute predictions in time.
- sublinear a day ago
  
  The topic of chaos is underrated when people talk about deterministic systems, but I think it's at least (usually?/always?) a tractable problem to draw up a fractal or something and find the non-chaotic regions of a solution space. You have nice variables to work with when you draw up a model of the problem.
  Maybe someone can elaborate better, but it seems there is no such luck trying to map probability onto problems the way "AI" is being used today. It's not just a matter of feeding it more data, but finding what data you haven't fed it or in some cases knowing you can't feed it some data because we have no known way to represent what is obvious to humans.
- troupo a day ago
  
  From the discussion in the link: "Predictability means that you can figure out what will happen next based on what happened previously."
  Having used nearly all of the methods in the original article, I can predict that the output of the model is nearly indistinguishable from a coin toss for many, many, many rather obvious reasons.
raincole a day ago

Yes, and there is absolutely nothing wrong with that. Living creatures are mostly black boxes. It doesn't mean we don't aim for making medicine with predictable effects (and side effects).
- sublinear a day ago
  
  I don't think that's accurate.
  Don't most people working in medicine usually have biology and chemistry degrees? Are you saying those sciences are dark arts?
  - raincole a day ago
    
    Having biology degrees doesn't make you understand every detail of human body. There are many, many drugs that are known to work (by double blind testing) but we don't know exactly how.
    The details of how penicillin kills bacteria were discovered in 2000s. Only about half a century of after its commercial production. And I'm quite sure we'll still see some more missing puzzle pieces in the future.
    
    sublinear a day ago
    
    Yes, but I think we want to know how they work? Not knowing "exactly how" but having a good ballpark idea is not equivalent to letting AI throw stuff at the wall to see what sticks.
- troupo a day ago
  
  Medicine that can either kill you, cure you, or have no effect at any given time for the same disease is quite unlikely to even pass certification.
  Do you know why?
  - raincole a day ago
    
    That is exactly my point.
    
    troupo 17 hours ago
    
    Then no one understands your point
    
    raincole 17 hours ago
    
    Maybe you can try to read other comments below your original comment, as they mostly share the same point and I don't bother to repeat what everyone else has said.
    I'll put it concisely:
    Trying to build predictable result upon unpredictable, not fully understood mechanisms is an extremely common practice in every single field.
    But anyway you think LLM is just coin toss so I won't engage with this sub-thread anymore.
    
    troupo 16 hours ago
    
    And you should read replies to those replies, including yours.
    Nothing in the current AI world is as predictable as, say, the medicine you can buy or you get prescribed. None of the shamanic "just one more prompt bro" rituals have the predicting power of physics laws. Etc.
    You could reflect on that.
    > But anyway you think LLM is just coin toss
    A person telling me to "try to read comments" couldn't read and understand my comment.
    
    touristtam 12 minutes ago
    
    > Nothing in the current AI world is as predictable as, say, the medicine you can buy or you get prescribed.
    Do you know there are approve drugs that have been put in the market for treating one ailment and that have proven to have effect on another or have been shown to have unwanted side effect, and therefore have been shifted? The whole drugs _market_ is full of them and all that is needed is to have enough trial to prove desired effect...
    The LLM output is yours to decide if it is relevant to your work or not, but it seems that your experience is consistently subpar with what others have reported.
brookst 18 hours ago

Would you say a top tier human developer produces predictable output? I would, in the sense that it will be well designed and implemented code that meets the requirements. Can we guess every variable name and logic choice? Probably not.
- troupo 17 hours ago
  
  > Would you say a top tier human developer produces predictable output?
  First you'd have to prove that LLMs can be equated to a "top tier human developer"
  > I would, in the sense that it will be well designed and implemented code that meets the requirements.
  Indeed. Something LLMs can or cannot do with all the predictability of a toss coin.
signatoremo 17 hours ago

Huh? I guarantee you, if you give two different developers tbe exact sane set of requirements, that you’d get two very different programs. Try it. They likely perform differently also, performance- or resource-wise.
Would you still call that predictable? Of course you would, as long as they meet your requirements. Put it another way, anything is unpredictable depending on your level of scrutiny. AI is likely less predictable than human, doesn’t mean it isn’t helpful. You are free to dismiss it of course.
- troupo 2 hours ago
  
  > Of course you would, as long as they meet your requirements.
  Key word: "as long as they meet your requirements".
  I've yet to meet an LLM that can predictably do that. Even on the same code with the same tools/prompt/rituals a few hours apart.
  > AI is likely less predictable than human, doesn’t mean it isn’t helpful.
  I'm struggling to see where I said they weren't helpful or that I dismissed them
dist-epoch a day ago

Non-deterministic does not mean not-predictable.
Quantum mechanics is non-deterministic, yet you can predict the motion of objects with exquisite precision.
All these "non-deterministic boxes" will give the same answer to the question "What is the capital of France"
- sublinear a day ago
  
  Yes, but the "exquisite precision" comes from the deterministic parts of physics.
  - baq 21 hours ago
    
    Nah. The only thing we can establish precisely at the lowest levels is probability. We can and do engineer systems to maximize probabilities of desired outcomes and minimize probabilities of undesirable ones.
    Frankly I don’t understand how software engineers (not coders mind you) can have issues with non deterministic tools while browsing the web on a network which can stop working anytime for any reason.
    
    lbreakjai 18 hours ago
    
    Because the failure modes are deterministic. An API can be down, which you can easily plan for, but if it's up and returning a 200, you can reasonably expect it to return what it's supposed to.
cindyllm a day ago

[dead]

d_503 20 hours ago

Clown wars