Skip to main content
Foundations / Essay 02

A Robot Is a System, Not a Pipeline

A robot is not governed by a pipeline of components, but by a closed loop linking world state, measurement, estimation, action, and world response.

5/29/2026
14 min read· 3,155 words
By Max Lungarella
A Robot Is a System, Not a Pipeline

The previous essay argued that robotics is not the assembly of advanced parts, but the engineering of coherence under feedback. This essay makes that claim more precise.

A robot is not a pipeline that transforms sensor input into motor output. It is a closed-loop dynamical system whose behavior is governed by the recursive relation among state, observation, estimation, action, and world response. That is the point of the formalism here. It is not mathematical decoration. It is a way of saying, with minimal technical language, what actually governs the machine.

The pipeline picture is attractive because it is simple. Sensors produce data. Software interprets it. A planner selects an action. A controller executes the command. Motors move. The description is not useless. It corresponds to a real engineering decomposition, and decomposition is necessary if complex robotic systems are to be designed at all.

But it is incomplete in a way that matters. It presents the robot as though information moved forward through a chain. In reality, the chain bends back on itself. Every action changes the world. That changed world alters the next observation. The next observation alters the estimate. The estimate alters the next action. The governing object is not the sequence of modules. It is the loop.

That difference is not semantic. It is the difference between a software stack that computes over representations and a machine whose own actions continuously change the conditions of its next computation. A robot does not merely process inputs and emit outputs. By acting, it partly produces the future situation from which the next round of sensing and inference must begin.

The Formal Question§

If the robot is a system rather than a pipeline, then the formal question is simple: what variables and relations actually determine its behavior?

At the lightest useful level, four objects and four functions are enough.

  • xtx_t: the relevant state of the robot and environment at time tt
  • ztz_t: the measurement available to the robot at that time
  • x^t\hat{x}_t: the robot's internal estimate of the state
  • utu_t: the action applied to the system

We also need four functions:

  • ff: the state-transition dynamics, which describe how the world evolves
  • hh: the observation model, which describes how state becomes measurement
  • gg: the estimation process, which turns measurements and past actions into a usable internal state
  • π\pi: the policy or control law, which selects an action from the current estimate

Written minimally, the loop looks like this:

xt+1=f(xt,ut,wt)x_{t+1} = f(x_t, u_t, w_t) zt=h(xt)+vtz_t = h(x_t) + v_t x^t=g(z1:t,u1:t1)\hat{x}_t = g(z_{1:t}, u_{1:t-1}) ut=π(x^t)u_t = \pi(\hat{x}_t)
The closed loop. f evolves the world; h produces a measurement; g produces an estimate; π selects an action — and the action returns to the world.
The closed-loop relations among state x, measurement z, estimate x̂, and action u — f evolves the world, h produces a measurement, g produces an estimate, and π selects an action; w and v enter as disturbance and noise.
The closed loop. f evolves the world; h produces a measurement; g produces an estimate; π selects an action — and the action returns to the world.

Here wtw_t collects disturbances and unmodeled effects, while vtv_t collects measurement noise. The meaning of the equations is straightforward. The world evolves according to its own dynamics. The robot does not read that state directly; it receives measurements. From those measurements and its recent history, it forms an estimate. From that estimate, it selects an action. That action then changes the world again.

The estimate therefore is not a direct sensory readout. It is an inference built from partial measurements, recent action history, and usually some model or prior belief about how the state evolves over time. That matters because the robot does not act on reality in finished form. It acts on a constructed internal state that may be good enough, poor, stale, biased, or uncertain.

This notation is intentionally spare. It is not meant to capture every detail of robotics, still less every special case. It is meant to identify the irreducible structure that remains whether the machine is a mobile base, a manipulator, a humanoid, or an autonomous vehicle.

It already tells us something important. The robot never acts on the world as such. It acts on an estimate, and the quality of its behavior depends on how the whole loop holds together.

That point is easy to understate. A robot is not merely a device that senses and then acts. It is a device whose sensing, estimation, and action are recursively entangled with the evolution of the world. Once the loop is written down, the old picture of components passing messages in one direction no longer describes the real causal structure.

It also clarifies something else. The point of formalization is not to replace engineering language with symbols. It is to prevent the argument from dissolving into metaphor. As long as the robot is described only as a collection of capabilities, the reader can keep imagining that better modules simply accumulate into better machines. The notation forces a different conclusion. Whatever components the robot contains, behavior is generated by the closed relation among state, observation, estimation, and action.

A robot reaches for a mug. Even an ordinary grasp already contains the whole systems problem — partial measurement, internal estimate, action, and the world that has now changed.
A robot arm reaches for a ceramic mug on a worktable while a depth camera observes — the canonical case where every grasp is an inference under partial observation.
A robot reaches for a mug. Even an ordinary grasp already contains the whole systems problem — partial measurement, internal estimate, action, and the world that has now changed.

A Mug, A Camera, And An Estimate§

Consider a robot reaching for a mug on a table.

The mug has a real state at time tt: position, orientation, perhaps motion, perhaps contact with some clutter nearby. Call that state xtx_t. The robot does not receive xtx_t itself. Its camera returns an image, ztz_t, which contains partial evidence about the mug but not the mug's pose in the exact form the gripper needs.

The estimator gg combines that image with the robot's own kinematics, recent action history, calibration assumptions, and a working model of how the scene is likely to evolve in order to form x^t\hat{x}_t: an internal estimate of where the mug is and how the arm is situated relative to it. The policy π\pi then chooses an action utu_t, perhaps a reach trajectory or the next task-space command. That action moves the arm, disturbs the scene, changes the camera view, and produces the next measurement.

Nothing about this example is especially exotic. That is precisely why it is useful. Even a mundane reach already contains the whole systems problem. The robot must infer enough about the world to act, act on that inference, and then live with the consequences of having acted under approximation.

Even in this simple example, the point is clear. The robot is not executing a one-way chain from image to command. It is participating in a loop in which acting changes the very conditions of the next inference.

That is why the equations matter. They are not there to make robotics sound formal. They make explicit what the machine is actually doing. They show why a robot can fail even when no single stage is obviously broken. If the estimate is slightly wrong, or the selected action slightly mismatched to the real scene, the error does not remain a bookkeeping problem. It changes the next state of the world.

The same structure reappears across the field. A mobile robot localizing in a hallway, a drone stabilizing against a gust, and a legged system recovering from a disturbance all differ in mechanics and scale, but not in the underlying logic. There is a real state. There are measurements. There is an estimate. There is an action. And the action feeds back into the next state.

That is why this example is better than a more glamorous one. It keeps the structure visible. The reader does not need a humanoid, a warehouse, or a giant policy model to see the systems fact. The ordinary grasp is enough. If the structure is already there in the simple case, then the more elaborate cases are extensions of the same loop, not exceptions to it.

What Actually Governs The Robot§

The answer is now visible. What governs the robot is not the isolated quality of perception, planning, or control. What governs it is whether the closed loop remains valid as a whole.

A pipeline describes decomposition. A loop describes causation.

This distinction is easy to miss because modular engineering is indispensable. We have perception modules, estimators, planners, controllers, and hardware interfaces for good reason. But local competence inside those modules does not guarantee global competence of the machine. A perception system can be accurate according to its benchmark. A planner can be optimal under its model. A controller can be stable in isolation. The robot can still fail.

Why? Because behavior is produced by interaction. The estimate must be good enough for the policy that consumes it. The action must be feasible for the hardware that executes it. The resulting motion must keep the next observation interpretable enough for the loop to continue. If those relations hold, the machine behaves coherently. If they drift apart, the system degrades even when individual parts remain locally correct.

This is the formal point of saying that a robot is a system. The system is not the list of components. It is the set of causal dependencies that make one component's output usable by the next under real operating conditions. If the estimate arrives too late, if the planner assumes dynamics the plant cannot realize, or if the controller acts on a state that has already changed, the robot fails as a system even when no individual module is obviously broken.

The difference matters because decomposition is an organizational convenience, not a law of nature. Software teams draw boundaries between perception, estimation, planning, and control because humans need manageable abstractions. The robot does not honor those boundaries. It behaves as one coupled machine in continuous exchange with a world that does not care which subsystem produced which value.

For that reason, a module output is not useful in isolation. A pose estimate matters only if it is still timely, referenced in the right frame, accompanied by a realistic uncertainty account, and compatible with the dynamics assumed by the component that will consume it. A planned trajectory matters only if the machine can still execute it before the world has moved on. A control command matters only if the hardware can realize it under the actual load, friction, compliance, and contact conditions present at that moment. The loop is where those claims are tested.

This is why the formal description is clarifying rather than decorative. It shifts attention away from the inventory of modules and toward the causal relations that determine whether those modules can behave as one machine.

Another way to say the same thing is that the pipeline view answers the wrong question. It answers: which stage comes next? The systems view answers: what makes the next stage usable at all? A planner output is not valuable because it sits downstream of perception. It is valuable only if it remains connected to a state estimate that is still good enough, a plant that can still realize it, and a world that has not already invalidated its assumptions.

That is the stronger claim this essay needs to establish. The robot is not governed by the order in which modules are arranged on a diagram. It is governed by whether their assumptions remain mutually compatible while the loop is running.

Layering Helps, But It Does Not Govern§

Because the loop is too complex to design as a single monolith, robotic systems are almost always layered. That is not a flaw. It is a practical necessity.

Perception turns raw sensor streams into usable structure. Estimation fuses measurements into a working internal state. Planning selects actions over some horizon. Control translates those intended actions into commands that actuators can realize. Hardware then meets friction, compliance, backlash, saturation, and contact.

The decomposition helps engineers think locally. But it also encourages a dangerous inference: if each layer works well in isolation, the whole machine should work well too. In robotics, that inference is often false.

The reason is straightforward. The layers are not joined by abstract function calls alone. They are joined by assumptions about timing, frames, uncertainty, actuation, contact, and model validity. If those assumptions remain compatible, layering is powerful. If they drift apart, the robot becomes incoherent even when every subsystem can defend its own output.

This is one place where the pipeline picture quietly misleads. It makes it seem as if the main task were to pass information downstream. In reality, the harder task is to preserve the validity of that information as it moves through the loop and returns to action.

That is why robotics is full of failures that seem embarrassing in retrospect. A transform is correct but old. A trajectory is elegant but infeasible for the actual plant. A controller is stable under one set of assumptions and brittle under the ones the robot actually encounters. A perception stack reports an object accurately enough for a benchmark image, yet not accurately enough, or not quickly enough, for a grasp that must close in the next fraction of a second. None of these failures requires a dramatic bug. They arise because the robot is governed not by local reasonableness, but by loop-level compatibility.

Where The System Actually Resides§

This point is worth making explicit. The system does not reside in the perception stack, the planner, the controller, or the hardware taken one by one. Nor does it reside in a diagram that lists those parts in sequence.

It resides in whether their assumptions remain jointly valid while the loop is running.

That sentence sounds abstract until one notices its consequences. If two layers disagree about what state is current, the robot is already in trouble. If a perception module reports a pose in one frame while the planner interprets it in another, the robot is already in trouble. If a command is dynamically valid only under assumptions the actuator cannot satisfy, the robot is already in trouble. If an estimate is geometrically plausible but no longer meaningful by the time it is used, the robot is already in trouble.

This is the sense in which the robot is a system rather than a pile of technologies. The machine succeeds or fails according to whether these relations continue to hold under feedback. What matters is not only what each component computes, but what the components jointly make true in time.

How The Loop Fails§

Failure becomes easier to understand once the loop is written down.

Return to the mug. Suppose the estimate x^t\hat{x}_t is slightly stale. The robot therefore chooses an action that would have been correct a moment earlier. The gripper arrives a few centimeters off target and brushes the mug instead of enclosing it cleanly. That contact shifts the mug, which changes the next image and makes the scene harder to interpret. The next estimate is now worse, not better. Error has entered the loop and begun to circulate.

This is the characteristic failure logic of robotics. A small mismatch in state, interpretation, or execution does not remain local. It alters the next conditions of sensing and action. Under feedback, mistakes propagate through consequences.

That is why local correctness is not enough. The question is never only whether one stage was reasonable. The question is whether the loop, as a whole, stayed behaviorally valid while the robot interacted with the world.

Notice what is doing the real damage here. The first error need not be dramatic. It may be a slightly biased estimate, a slightly simplified model, a slightly overconfident plan, or a slightly imperfect actuation response. In ordinary software, such discrepancies often remain informational. In robotics, they become dynamical. They change the future state from which the next decision must begin.

This is why robotic failures so often appear to live between modules rather than inside them. The visible problem may show up at execution, but the underlying inconsistency may have entered much earlier, when the estimate was formed, when a model assumption stopped matching the plant, or when one layer quietly interpreted a quantity under different conditions than another.

Failure, in other words, often lives in the interfaces: a stale transform, an underestimated covariance, an elegant trajectory that ignores actuator saturation, a control command computed for a state that no longer exists. These are not ornamental implementation details. They are precisely the places where loop-level coherence is either maintained or lost.

This is also why debugging robotic systems is so notoriously difficult. The first visible failure may not be the first real one. The grasp slips at the end of the loop, but the more consequential mistake may have been made when an earlier estimate was accepted too confidently, when a model mismatch went unmodeled, or when a downstream stage treated a quantity as current that had already begun to age. The system hides its own causes by distributing them across time and interaction.

Why This Still Matters Now§

The closed-loop picture is not a relic of classical robotics. It remains the right way to think even at the current frontier of physical AI.

A frontier model may extract more structure from images, compress richer regularities from data, or generate more capable policies than earlier systems could. But once such a model is embedded in a machine, it enters the same causal circuit as everything else. Its outputs must still connect to an estimate of state, survive latency, remain compatible with actuator limits, and continue to make sense as the world changes under action.

This point matters because modern discussion of robotics is easily pulled back toward the language of components. The emphasis shifts to foundation models, diffusion policies, world models, multimodal reasoning, simulation stacks, or learned planners. Those advances may be real and many are also fashionable. But if the reader loses sight of the governing structure, the field begins to look once again like an assembly of impressive parts.

That is why the formalization is worth doing. It gives the reader a way to hold onto the systems fact beneath the changing inventory of algorithms. From the systems view, these are not escapes from the loop. They are changes in what can be computed inside it.

A learned perception model still has to deliver information that is timely enough, calibrated enough, and operationally compatible with the rest of the machine. A world model still has to remain coupled to what the sensors and actuators can actually support. A diffusion policy still has to produce actions that survive delay, contact, uncertainty, and execution error. The names change. The governing structure does not.

The Real Lesson§

This is the main formal lesson of the essay. A robot is not governed by a pipeline of components, but by a closed loop linking world state, measurement, estimation, action, and response.

Once that is clear, two deeper questions follow. One concerns time: how quickly must the loop close, and what happens when the world evolves faster than the system can keep its own estimates and actions current? The other concerns state: what does it mean to act well when the world the robot must respond to is never directly given in finished form?

This essay has only established the formal frame in which those questions arise. It has not yet treated either of them in depth.

The next step is the first of the two. Which variable most often determines whether the loop stays intact or falls apart?

The answer is usually treated too late.

Time.