When an AI agent starts taking real work, the problem stops being just about the model

Why I increasingly resist thinking of an agent as just another tool

In Chinese internet culture, many people have started calling OpenClaw a “lobster.” I do not think the nickname itself is the interesting part. What interests me is the shift behind it: once an agent enters a workflow, your relationship to it is no longer simply “I installed another tool.” It can take messages, call tools, read files, follow processes, misunderstand intent, and still make mistakes. It behaves much more like a junior teammate who needs to be managed.

By OpenClaw’s own definition, it is a self-hosted gateway that connects your existing channels, such as WhatsApp, Telegram, Discord, Slack, or iMessage, to models, tools, workspace files, memory, and routing. That means it is structurally different from a simple chat product. It is closer to an assistant system being inserted into your real workflow.

The real shift is not that you added another chat window. It is that you now have an execution layer that can act, drift, and still needs supervision.

So to me, the interesting question is not the nickname, nor whether the model can say clever things. The question is whether you know how to manage this kind of system. It will grow according to the environment, rules, and feedback you give it.

Many people think they are deploying capability, but they are really building a role into a system

Most tools have a simple rhythm: install, open, use. OpenClaw does not. The official quickstart requires `Node 22+` and recommends going through `openclaw onboard --install-daemon` to configure the gateway, auth, workspace, and optional channels. In other words, the first thing you do is not assign work. The first thing you do is build a working environment.

This is exactly what onboarding someone new feels like. You give them a place, accounts, documents, a reporting line, and a workflow. Agents are not that different. Without a model, there is no brain. Without channels, there is no mouth. Without tools, there are no hands. Without a workspace, there is no real place for work to happen.

OpenClaw’s gateway architecture reinforces this. It is not just a chatbot. It is a persistent control layer that manages messages, sessions, workspaces, tools, and UI surfaces. Once a tool becomes a system, your relationship to it naturally shifts from “using it” to “working with it.”

The difficulty is not activation. It is building a stable way of working

The difficulty with someone new is rarely a lack of effort. More often, it is effort going in the wrong direction. You say “organize this,” and they organize it into a version you never wanted. You say “look into the materials,” and they do, but miss the point. Agents have the same failure mode. Once they can execute, the biggest risk is no longer inaction. It is misdirected action.

OpenClaw’s docs make this surprisingly explicit. It injects files like `AGENTS.md`, `SOUL.md`, `USER.md`, `IDENTITY.md`, `TOOLS.md`, and `HEARTBEAT.md` into the runtime context. That means you are not merely issuing one-off prompts. You are writing a job description and a way of working into the environment itself.

This is why I no longer believe in the fantasy of one giant perfect prompt. Stable agent behavior does not come from one brilliant instruction. It comes from continuously writing process, boundaries, priorities, and definitions of “done” into a system the agent can inherit.

You are not just assigning tasks. You are translating team habits, personal preferences, and operating boundaries into context a system can repeatedly act on.

Without external memory, every day feels like day one again

One of the most tiring things about ordinary chatbots is that you keep repeating yourself. What was agreed yesterday disappears today. What was decided last week has to be restated this week. OpenClaw is interesting because it tries to make memory concrete rather than magical.

In the docs, the workspace itself is explicitly described as memory. Daily notes can live in `memory/YYYY-MM-DD.md`, while tools like `memory_search` and `memory_get` help the agent retrieve them. The important point is not just that the model “somehow remembers.” The important point is that memory is externalized into files and indexes you can inspect, edit, and back up.

That matters because good new teammates do not merely remember things internally. They keep notes: meeting decisions, manager preferences, project states, pitfalls. A good agent becomes useful in exactly the same way. Not because it suddenly becomes genius, but because it stops forgetting yesterday.

Skills are not add-ons. They are stored ways of working

Many people first see skills and think of them as plugins. I increasingly think that is too shallow. OpenClaw’s skill system is really closer to a library of working methods. Skills can come from bundled defaults, from `~/.openclaw/skills`, or from a workspace-level `skills/` directory, with local versions taking precedence.

That logic feels a lot like teaching a junior teammate. You show them how to pull data, write a daily update, triage an issue, follow a template, or troubleshoot before escalating. After that, they stop consuming the same slice of your attention every time. A skill does the same thing for an agent. The value is not that it “understood a concept,” but that it can reliably perform an action.

So I do not think of skills as accessories. I think of them as a way of teaching. Not magic, but method.

Being useful does not mean it should receive maximum permission

This is the part I find most important. OpenClaw’s security docs are unusually blunt: inbound DMs should be locked down; group replies should preferably require mentions; links, attachments, and pasted instructions should be treated as hostile by default. More importantly, prompt injection is not limited to random outsiders. It can enter through webpages, emails, documents, attachments, and logs that the agent reads.

This matters because the risk profile of an agent is not the same as the risk profile of a chatbot. A chatbot that answers badly is mostly an information problem. An agent that can read files, run commands, and send messages can cause real operational damage: deleting things, leaking context, or triggering actions that should never have happened.

So mature usage is never “it works, therefore give it maximum privilege.” The correct approach is staged delegation: let it read before it writes, suggest before it executes, handle reversible tasks before sensitive ones, and keep the most important credentials out of reach.

Intelligence amplifies efficiency, but it can also amplify accidents. Boundaries are not conservative. They are what make long-term use possible.

Why this kind of agent suddenly feels so hot

I think the answer is simple: it activates a very seductive imagination. Not just asking AI for help, but turning AI into someone who works for you. That is a much more powerful idea than “better search” or “faster writing.”

Recent reporting captures this well. Semafor noted that AI agents are quickly taking off in China and that tools like OpenClaw are spreading beyond developer circles, with around 1,000 people reportedly attending a free installation event in Shenzhen. At the same time, SCMP reported that banks, brokerages, and regulators have become much more cautious because an agent of this kind needs unusually deep access to devices and workflows.

That combination explains the fascination. It is not just software. It is a new kind of working relationship: something you train, complain about, praise when it performs, and watch closely because it may still create trouble.

So why do I increasingly think of this as managing a new teammate

To me, there are at least four layers here. First, do not treat it as magic. Treat it as an execution layer that requires management. Second, do not leave process in your own head. Write preferences, memory, skills, and acceptance criteria into the environment. Third, growth requires feedback, not neglect. Fourth, even when it becomes useful, it still requires oversight, because being able to act is not the same as being able to bear consequences.

This is also why I increasingly put OpenClaw and Codex inside the same question. They do different jobs. OpenClaw sits closer to messages, channels, and personal workflows. Codex sits closer to repositories, tasks, and coding environments. But both of them push you toward the same managerial problem: if you really had a tireless digital teammate who can take actions, call tools, and still make mistakes, do you actually know how to lead them?

To me, this is no longer just an efficiency question. It feels like one of the newest management questions of 2026. Because what you are really managing is not the model itself, but context, process, permissions, feedback, and boundaries.