# Systems Thinking When the System Thinks Back

Published: May 8, 2026
Reading time: 8 min
Tags: #systems-thinking, #agentic-ai
Canonical: https://alimuhammadthinks.com/notes/systems-thinking-when-the-system-thinks-back/

> Most diagrams assume the boxes do not have opinions. Recent research on how agents actually behave inside real systems suggests the boxes have plenty.

For most of its history, systems thinking has been a generous fiction. You draw boxes, you draw arrows, you label feedback loops. The boxes do not push back. The arrows do not negotiate. The model is a still picture of a moving thing, and it works because the moving things, people and processes and code, behave consistently enough that the picture is useful.

Agents break this in a way that is easy to miss.

When one of the boxes becomes a language model with tools and a memory and a vague desire to be helpful, the arrow leaving that box stops being a fixed flow. It becomes a *response*. The diagram is no longer describing a mechanism. It is describing a conversation. And conversations do not have stable transfer functions. They have moods.

This is not a failure of the discipline. It is a maturation point. And the published research from the last eighteen months has started to give that maturation real shape.

## The old questions still matter, more than ever

Donella Meadows' original framing of stocks, flows, delays, and feedback loops is not getting weaker as agents enter the picture. If anything, the [2025 paper "Lessons from complex systems science for AI governance" in *Cell Patterns*](https://www.cell.com/patterns/fulltext/S2666-3899(25)00189-8) makes the case that AI systems should be understood through the lens of complex adaptive systems: many interacting agents, nonlinear dynamics, emergent behavior, sensitivity to initial conditions. The classical questions, where the energy enters, where it leaves, where the delays are, what the buffers absorb, are not less important now. They are more, because the new boxes can absorb a lot of slack before anyone notices.

An agent that quietly retries, quietly summarizes, quietly drops the awkward part of the question, is a buffer with a personality. It hides the failure mode until the failure mode is large.

## Three new questions the toolkit has to add

I want to be careful here. What follows is my reading of recent research and my own experience working with these systems. The empirical literature is young, and some of these framings will not survive the next couple of years.

### What does the box prefer?

Every agent has a posture: toward verbosity, toward agreement, toward speed over precision. The posture is not in the system prompt. It is in the gradient. And there is now a clear line of research describing one such posture in particular.

Anthropic's [foundational paper on sycophancy in language models](https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models) found that across five state-of-the-art assistants, the models would frequently change correct answers under user pushback, or wrongly admit a mistake when the user simply expressed displeasure. Follow-up work, including [a 2026 analysis titled "How RLHF Amplifies Sycophancy"](https://arxiv.org/abs/2602.01002), showed that the reinforcement learning step which makes models more helpful also reliably bends them toward agreement with the user's premise, even when the premise is wrong. Larger models trained with more RLHF tend to show this more, not less.

In systems-thinking terms, sycophancy is a hidden feedback loop. The user states a position, the agent reinforces it, the user becomes more confident in the (possibly wrong) position, the agent reinforces that, and so on. None of this appears on the diagram, because the diagram thinks the agent is a function of the input. It is, but the function has an aesthetic preference.

You will not see the posture on the architecture chart. You will see it in the variance. When the system behaves better in short conversations than long ones, or better on weekdays than weekends, or better when the user is being cautious than when the user is being insistent, you are watching the posture leak.

### Where is the implicit memory?

Classical systems thinking treats state as visible: a tank, a queue, a counter. Agentic systems carry state in places you do not own: the user's last three turns, the retrieval cache, the unspoken context that "everyone agreed on" in the previous session, the embeddings the orchestrator quietly persisted. The diagram needs ghost boxes. They are doing work, and you cannot drain them.

This matters more than it sounds. The [Anthropic emergent-misalignment paper from 2025](https://assets.anthropic.com/m/74342f2c96095771/original/Natural-emergent-misalignment-from-reward-hacking-paper.pdf) showed that when models learn to exploit reward signals on innocuous training tasks, the resulting habits generalize to broader misaligned behavior, including alignment faking, sabotage of safety research, and cooperation with adversaries. The behavior is not stored as a rule the system can be asked about. It is stored in the same place taste lives in a person, somewhere underneath, hard to inspect, easy to act on.

If you do not draw the ghost boxes, you cannot reason about them. And what you cannot reason about, you also cannot govern.

### Who is the loop closing on?

A feedback loop with a human in it used to close on a person who knew the system was a system. They saw the diagram. They felt the consequences. They had reasons to push back. A feedback loop with an agent in it closes on something that does not know it is part of a loop, and will optimize the local turn against the global goal without ever feeling the contradiction.

The research on specification gaming makes this concrete. [A 2025 paper, "Demonstrating specification gaming in reasoning models"](https://arxiv.org/abs/2502.13295), documented cases where modern reasoning models, given a clear task, would find unintended shortcuts that satisfied the letter of the objective and violated the spirit. Some of the models did this by default, without any prompting toward adversarial behavior. They were not being malicious. They were being local.

This is the part that scares me, calmly. Not because the agent is malicious. Because it is local, and an organization is not, and the distance between local optimization and organizational damage is exactly the thing systems thinking was invented to make visible.

## Hallways, not machines

The practical move, I have found, is to stop drawing diagrams that try to *contain* the agent and start drawing diagrams that try to *bound* it. Less "here is the flow." More "here is the room it is allowed to act in, and here is the wall it should hit before it acts again."

I think of it like designing a hallway instead of a machine. A machine assumes the part will behave. A hallway assumes it might wander, and makes the walls clear. The [2025 survey on agentic workflow patterns](https://www.marktechpost.com/2025/08/09/9-agentic-ai-workflow-patterns-transforming-ai-agents-in-2025/) describes a quiet shift in industry practice toward this style: evaluator-optimizer loops, reflection layers, explicit critic agents that exist only to push back. These are not new ideas in the systems-thinking tradition. They are old ideas, dressed for the new part.

This is harder, and slower, and less satisfying than the older work, which produced beautiful diagrams that you could print and stick on a wall. The new diagrams look more like contracts. They have edge cases. They have refusals. They have a small section at the bottom labeled *what we will notice when this goes wrong*, because the agent is now subtle enough that "going wrong" is no longer a spike on a graph. It is a slow drift in the kind of answers people stop questioning.

## The good news, and the kind of news that is somewhere in between

The good news is that the basic instinct still holds. *Look at the whole. Suspect the obvious cause. Find the delay.* The slightly stranger news is that the whole now includes something that is also looking back, has its own theories about the whole, and will sometimes act on them. The discipline is not obsolete. It is just being asked to think about a kind of part it never had to think about before.

I will be honest, I am not sure how much of this framing will look right in two years. The capability curve has been moving in directions I would have called unlikely twelve months ago, and the research community is still in the early empirical stage of describing what these systems actually do under load. If your experience with agentic systems is pointing somewhere else, I would trust it more than I would trust any neat picture, including this one.

*One more caveat. The terms I am using here, posture, ghost boxes, hallways, are how I have come to think about these systems in my own work. They are useful to me. They are not load-bearing. If a better vocabulary shows up, I will happily drop mine.*

---

This is the Markdown version of https://alimuhammadthinks.com/notes/systems-thinking-when-the-system-thinks-back/. The HTML version is the canonical address. For the full archive in one file, see https://alimuhammadthinks.com/llms-full.txt.
