Agentic coding is clearly changing how software teams work; tools and models are getting better with every release.
Agentic coding helps in a simple way: it parallelizes work. One person can ask an agent to draft tests while another agent refactors a module and a third updates documentation. It also reduces the need to be fully “in the zone” for every task. That matters because keeping deep focus over a long period of time is exhausting or not possible.
But there is a catch - even with SOTA models and agents, engineers still need to stay engaged when the work touches real production code. Writing bugs (like humans) aside, the reason is fundamental: today’s agents struggle to build and maintain a durable mental model and memory of a codebase and its evolution due to the underlying architecture of LLMs.
A useful analogy: imagine hiring a very fast contractor who starts each day with no memory of the building. They can install doors quickly, run cables, and paint walls. You have to explain it to them every day from the beginning. One day you forget to tell what is already behind the drywall, so they build a second set of wiring next to the existing one, or install a second pair of pipes. This time not at the optimal angle like the first time. Maybe they also mess up the wiring on the second wire - everyone makes mistakes, even agents.
Code agents work with a limited “context window,” (read: prompt and conversation history) meaning they can only look at a small slice of the code and related documents at once. You can try to feed them more context, but that hits practical limits:
- Larger context is more expensive and slower because more math is required to solve the puzzle.
- Larger context makes it harder for the model to notice what matters (“attention” problems) and what’s noise (Drew Breunig’s How Long Contexts Fail and Vatsal Bajpai’s Understanding Attention: Coherency in LLMs)
- The agent still needs to discover the right files, the right library docs, and the right existing implementation.
The discovery step is where things break down in large codebases.
All coding agents search in surprisingly simple ways: a file system search, a text search, or traversing the AST using language-server features. That often fails in large systems because the “right” solution might be:
- in another package,
- behind an abstraction,
- documented on a website somewhere,
- in a shared library,
- documented in a place the agent did not open,
- or implemented in a way that does not match the keywords the agent searched for.
When discovery fails, the agent tends to do the reasonable thing from its point of view: it writes new code. And that is where the long-term risk piles up.
Writing new code is not free.
It creates more things that need to be maintained for a long time. The new code may not know all constraints and context of the existing code. It lacks the history of the existing code, which already had 5 bug fixes. This kind of duplication is subtle because it often passes tests and quality control in the short term: The product still ships and the demo still works. But the subtle bugs accumulate over time, and the system becomes exponentially harder to change safely as time passes because implementation behavior diverges between code paths. Engineers used to hate spaghetti code for a reason: over time it turns into tech debt.
- Bugs get fixed in one place but not the other.
- Behavior drifts between implementations.
- Engineers lose confidence in refactors because they are not sure what depends on what.
- Eventually, changes slow down because every change risks breaking something surprising.
This effect gets worse as the codebase grows. In a small, new codebase, “vibe-coded” systems work great. There is not much existing functionality to rediscover, so the agent’s tendency to create new code does not hurt as much. As complexity increases, the probability of “reinventing something we already have” rises. More code means more places to miss. More ways to be “almost right.”
If not controlled, this problem is self-reinforcing. As more code gets added, AI agents have an even harder time to find the “correct place” to make the change, potentially creating even more new code instead of using what’s there. Agents are cautious when breaking existing code - if it changes code that breaks tests, it sometimes chooses the safer local action: write a new function, wire it next to the new feature, and avoid touching the existing code. That reduces the chance of immediate breakage, but it increases long-term complexity. Additionally, code agents are prone to creating code that’s not actually used in production code paths / dead code. Hopefully we get better tooling to address this specific issue soon.
Similarly, when writing tests the agent sometimes reimplements the functionality to be tested, because it was “too hard” to write the test with the existing code or because it did not find the code, rendering the tests useless but giving engineers a false sense of security (”I added tests for that yesterday and they looked good when skimming them”). This happened to me multiple times and once you know it’s easier to spot.
Human involvement is critical
This is why engineers still need to “be in the zone,” and spend human attention at the points where architecture and reuse matter. Humans have an advantage that is easy to underestimate: they do not start from an empty memory state each time. Engineers accumulate a mental model of programming in general and the system specifically. They know the “shape” of the code architecture. They know roughly where to look when something breaks. They remember that a similar feature was built last quarter and where it lives. They know which library exists for the task, which module is the source of truth, and what patterns the team expects. And they know why things broke in the past.
An agent does not have that stable mental model. The engineer has to rebuild that context for the agent repeatedly, especially when the conversation resets or the tool loses state. Agentic coding tries to patch this with memory files or repository rules (for example, a claude.md-style document). Those help, but they are not a full solution:
- They are also “only” fragments that get added to the context window and suffer from the “attention” problem.
- They are static summaries, not a live understanding of the codebase.
- They cannot capture all the nuance that matters in real systems.
- As new models get released, their behavior changes, and the rules may no longer be needed or counter-productive.
- You can’t create a rule for everything, it is too hard.
So what should non-developers take away from this?
- Agentic coding is a productivity multiplier for skilled engineers. Teams should use it responsibly and it makes engineers faster if used right.
- The main risk is that unsupervised agentic coding increases tech debt. More tech debt is correlated with increased maintenance cost from both humans and AI agents and potentially revenue shortfalls when it comes to scalability and stability.
- Code is not an asset in the way a feature list is an asset. Code is a liability the business carries. Every line should be needed, tested, secured, operated, and maintained. Less code is usually cheaper, safer, and faster to change.
Concrete ways teams can prepare:
- Put “reuse before write” into the workflow
Make it an explicit step: “Where is the existing implementation?” Agents should be asked to prove they searched and to cite where they looked.
- Invest in better discovery
If tools only do shallow search, you get shallow reuse. Better code search, better cross-repo indexing, and stronger navigation help both humans and agents.
- Make code cleanup part of delivery, not a separate project
If you do not pay the cleanup cost continuously, it compounds. The bill arrives later with interest.
- Track “cost of change,” not just velocity
If sprint output looks great but every change gets harder each month, you are borrowing speed from the future.
- Engineers must write code
Full autonomous and unsupervised vibe coding is too risky and will become expensive to maintain, if you’re building business critical software. Critical business software requires engineers to get their hands dirty and understand the code base. Full stop.
Agentic coding will keep improving, and memory and context handling will too. Today and probably tomorrow, the best results come from a partnership between agentic coding systems and humans.
Additional Ory Blogs on Agentic