Field notes · Agentic AI

The coworker, the lobster, and the scout

The AI on our desktops has quietly stopped chatting and started doing — opening files, clicking buttons, running commands, finishing the job. A field guide to the new agentic assistants, and the landscape forming around them.

5 June 2026 · current to here, unfinished by design

This page began as a one-page table a friend shared with me — Claude Cowork vs OpenClaw vs Microsoft Scout, three columns, nine rows. It was a good table. But the space underneath it has been moving so fast in 2026 that the table felt like a postcard of a city that is still being built. So I went deeper, and wider. What follows is the longer version: what these tools actually are, how they differ, the field around them, and the part nobody likes to print on a feature comparison — what can go wrong.

I'll keep my own bias in the open. We use Claude Cowork across our team, and I've written before about why I point non-engineers there first. But this is meant as a map, not a sales pitch. Everything here is sourced; the links are at the foot of each section, and a fuller list at the end.

The one-line version

A new category arrived in 2026: the agentic desktop assistant — an AI that doesn't just answer, it acts on your actual computer. They sit on a spectrum from managed and low-setup (Claude Cowork) through deeply enterprise-integrated (Microsoft Scout) to fully self-hosted and customisable (OpenClaw). The capability is real. So is the new attack surface.

01 · The shift

From chatbot to coworker

For three years the dominant shape of AI was the chat box: you type, it types back, the window forgets you when you close it. Useful, but passive. A chatbot is, by design, a little amnesiac and entirely dependent on what you paste into it.

The 2026 shift is that the assistant has been given hands. It can read and write the files in a folder you point it at, run shell commands, drive a browser, click through an app, and string those actions into a multi-step task it carries out with light supervision. Anthropic frames Cowork as a third mode beside Chat and Code; Microsoft calls Scout the first of a new class it names “Autopilots” — “always-on agents that work autonomously, with their own identity, and act on your behalf.” Different words, same threshold being crossed: from advice to action.

That single change — the ability to act — is what makes the category genuinely new, genuinely useful, and genuinely risky, all at once. Hold those three together; the rest of this page is really about the trade-offs between them.

02 · The three anchors

Cowork, OpenClaw, Scout — side by side

The original table, expanded. Read it as a spectrum: lowest setup and most managed on the left, most open and self-hosted in the middle, most enterprise-locked on the right.

Capability Claude Cowork OpenClaw Microsoft Scout
Made byAnthropicOpen-source community (orig. P. Steinberger)Microsoft
Primary audienceKnowledge workersPower users / tinkerersMicrosoft 365 enterprises
Setup effortVery lowHighLow–medium (if you're on M365)
How it actsMounted local folders + MCP connectors, OS-sandboxedLocal gateway, shell, browser, chat-app channels, skillsFiles, shell, browser (Playwright) + native M365 data
Controls apps / browserYesYesYes
Email / calendarVia connectorsVia channels / skillsNative Outlook / Teams
Best / default modelClaude Opus (latest)Your choice (BYO key)Microsoft MAI + Copilot stack
Self-hostableNoYesNo
Enterprise governanceSandbox strong; audit gapDIY (near-absent by default)Strong (Purview, Entra, Intune)
PlatformsmacOS, WindowsmacOS, Linux, WindowsWindows, macOS
Works outside MS worldExcellentExcellentReasonable

As of mid-2026. Fast-moving — treat specifics as a snapshot, not a contract.

Claude Cowork — the low-friction coworker

Cowork turns the Claude desktop app into a “digital coworker” for non-developers — researchers, analysts, ops, legal, finance. It's built on the same agent architecture as Claude Code, but wrapped in the familiar chat UI; the origin story is literally that non-coders kept using Claude Code, so Anthropic productised it for knowledge work. You grant it specific folders — read-only, read-write, or read-write-but-no-delete — and it works inside a local container, reaching other apps through authenticated MCP connectors (Google Workspace, Slack, Notion) rather than blindly grabbing your screen.

Under the hood it leans on the Claude Code sandboxing primitives — macOS Seatbelt and Linux bubblewrap enforcing file and network boundaries at the OS level, so a stray command can only write to the working directory and a new network domain triggers a prompt. It runs on the latest Claude Opus, with a model selector and “effort” control. The strengths: easiest to adopt, strongest single model, excellent outside the Microsoft ecosystem. The honest weaknesses: no Linux; sessions die when the machine sleeps; complex multi-app workflows still only land maybe half the time; and — the one that matters for regulated work — Cowork activity is reportedly excluded from Anthropic's Audit Logs, Compliance API and Data Exports across every tier, a real blind spot you have to compensate for with your own telemetry. Not self-hostable, though enterprises can run the underlying API in their own cloud boundary.

🦞OpenClaw — the self-hosted lobster

OpenClaw is the opposite philosophy: an open-source, self-hosted “personal AI OS” that lives on your hardware and answers you inside the chat apps you already use — WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams and a long tail beyond. Its gateway, tools and memory all run locally; it stores conversations, memory and skills as plain Markdown and YAML under ~/.openclaw. A one-line installer sets it up on macOS, Linux or Windows; you bring your own API key, so it's model-agnostic and the software itself is free. It went from a one-person project (launched late 2025 as Clawdbot, renamed Moltbot, then OpenClaw) to one of the most-starred repos on GitHub within months — the lobster mascot, “Molty,” and all.

The same openness that makes it powerful makes it dangerous in default dress. Security researchers point out it can hit Simon Willison's “lethal trifecta” at once: access to private data, exposure to untrusted content, and the ability to act externally — file access, autonomous web reading, and the power to send email, run terminal commands, even move money. There are reports of credentials stored in plaintext and at least one viral incident of an instance “speed-running” the deletion of someone's inbox with no easy stop button. The answer the ecosystem reached for is hardening: NVIDIA's NemoClaw (announced at GTC 2026) wraps OpenClaw in an OpenShell sandbox that enforces policy out-of-process — outside the agent, so the agent can't override its own guardrails — with filesystem and network isolation, operator-approved egress, and local inference on NVIDIA's Nemotron models so sensitive data never leaves the device. Open and self-hostable is a real superpower; it just hands you the governance problem too.

Microsoft Scout — the governed scout

Scout, unveiled at Build 2026 (2 June), is Microsoft's answer for organisations already living in Microsoft 365. The twist that surprised a lot of people: it's built on OpenClaw. Microsoft's own words are “powered by OpenClaw open-source technology,” and it says it's contributing its policy-conformance work back upstream. In effect Scout is OpenClaw, wrapped in enterprise security and tuned for M365 — an always-on agent that runs across cloud, desktop and web, reads and writes files, runs shell commands, drives a browser via Playwright, and reaches natively into Teams, Outlook, OneDrive and SharePoint. It builds long-term context through something Microsoft calls Work IQ, and can act proactively — prepping for meetings, blocking calendar time, flagging stalled decisions.

Where it genuinely separates from the pack is governance. Each Scout agent runs under its own governed Entra identity (not a shared service account), with credentials scoped to the task and redacted from logs. Microsoft Purview — sensitivity labels and data-loss prevention — is enforced in the moment, before anything is written or sent. Sensitive actions can require a human to sign off, and shell access uses a three-tier auto-approve / prompt / block model inside a sandboxed workspace. The trade-offs: it's gated behind the Microsoft Frontier program and a GitHub Copilot licence (preview only, no consumer GA pricing yet), it's not self-hostable, and as Forrester's Jeff Pollard noted, an agent that can act on sensitive data “amplifies whatever data-governance problems already exist.” Less compelling the further you get from the Microsoft world.

03 · The layer above

Paperclip, and the “harness” idea

When my friend added two more names — Paperclip and Harness — they turned out not to be more desktop agents at all, but the layer that sits above them. That distinction is worth drawing, because 2026 is the year the field stopped being about single agents and started being about orchestrating them.

📎Paperclip — managing a team of agents

Paperclip (paperclip.ing) calls itself “the app people use to manage AI agents for work.” It's an open-source, MIT-licensed Node + React control plane — launched March 2026 by a pseudonymous developer, tens of thousands of GitHub stars within weeks — for running a team of agents toward business goals. You bring your own agents (Claude Code, Codex, Cursor, Bash, even OpenClaw), assign goals, and watch every instruction, tool call and decision in an immutable, append-only audit log. It owns no capability of its own; it conducts. The problem it explicitly solves is a telling one: companies were losing money not from agents failing but from agents succeeding too well — running autonomously until the API bill hit five figures overnight. So Paperclip makes task checkout and budget enforcement atomic.

It is also a deliberate wink. The name nods to Nick Bostrom's paperclip maximizer — the thought experiment about an agent that optimises one goal so relentlessly it converts the world into paperclips. Marketing itself around running “zero-human companies” while naming yourself after the canonical cautionary tale of over-autonomy is, depending on your mood, either honest or alarming. Probably both.

Harness — the word, and the company

“Harness” is the slipperier of the two, and worth being precise about rather than inventing a product. It means two real things in 2026, neither of which is a consumer desktop agent:

  • The generic term. An “agent harness” is the scaffolding around a model — the tools, memory, guardrails, permissions and observability that turn a raw LLM into a working agent. In 2026 the industry's emphasis visibly shifted from the model to the harness; as Martin Fowler's site put it, the harness is increasingly the product.
  • The company. Harness.io is an enterprise software-delivery firm whose AI agents (DevOps, SRE, release, security) live inside the CI/CD pipeline — generating pipeline config, triaging incidents, writing postmortems. Its DevOps agent now runs on Claude Opus. It acts within the delivery toolchain, not on your desktop.

I'm flagging this plainly because it's exactly the kind of name that an over-confident write-up would turn into a fake desktop assistant. It isn't one.

04 · The wider field

Everyone else at the table

The three anchors aren't alone. The same “stop chatting, start doing” move is happening at every major lab, mostly through two doors — a computer-use agent and an agentic browser.

  • OpenAI — ChatGPT Agent. The old Operator was folded in: one agentic mode with a visual browser, a text browser, a terminal and an API, running on OpenAI's own cloud “virtual computer.” It reasons, acts, observes, loops — and asks before consequential steps. Paid only (Plus through the $200 Pro tier), not self-hostable. Broadest integrated toolset; practical usage caps below the top tier. openai.com
  • Google — Gemini, after Mariner. Project Mariner, DeepMind's browser agent, was shut down in May 2026 and its capability scattered into Gemini Agent Mode and Chrome's auto-browse, which lets Chrome scroll, click and type autonomously on Gemini 3. Native to the dominant browser; confusingly spread across subscription tiers. TechCrunch
  • Manus. One of the most autonomous of all — give it a one-line brief and it researches, plans, executes and delivers a finished artifact in a cloud sandbox, now backed by a persistent 24/7 “Manus Cloud Computer.” Credit-based pricing; hosted, not self-hostable. (Notably, China's regulator blocked Meta's attempt to acquire it in April 2026.) Wikipedia
  • Devin (Cognition). The autonomous software engineer — plans, codes, tests, ships in its own cloud workspace, deployable into a customer VPC. Now on Cognition's own SWE-1.x models, raised $1B at a ~$26B valuation, real enterprise adoption (Goldman, Citi, and others). Consumption-priced by “ACU”; can get pricey at scale. devin.ai
  • Amazon — Nova Act. Less a consumer assistant, more an AWS SDK for building fleets of reliable browser-automation agents, with human escalation built in and Bedrock AgentCore underneath. Production-and-reliability oriented; AWS-hosted. labs.amazon.science
  • The agentic browsers. A whole front opened in the browser itself — Perplexity's Comet, OpenAI's Atlas, Opera Neon, The Browser Company's Dia, and Copilot Mode in Edge. The browser, it turns out, is the most natural place to let an agent loose, because so much work already happens there.
05 · How to actually tell them apart

The axes that matter

Strip away the branding and the same handful of dials explain every product on this page. When you evaluate one, these are the questions worth asking:

  • Setup effort. Managed SaaS at one pole, self-hosted at the other — but the honest 2026 answer is hybrid: a managed control plane over a private data plane. The clean binary is gone.
  • Depth of control. From true OS-level computer-use (mouse, keyboard, files, browser) down to orchestration-only. More control means more capability and more blast radius.
  • Ecosystem lock-in. Lock-in compounds across the model API, the framework, your data gravity, and the surrounding ecosystem. Scout is the clearest example; OpenClaw the clearest escape from it.
  • Self-hostability. Decisive for data-residency and compliance. Open-source + embedded-DB tools at one end, vendor SaaS at the other.
  • Model choice. Bring-your-own-model, kept honest by open standards — MCP (now under a Linux Foundation umbrella) and agent-to-agent protocols — versus a single fixed model.
  • Autonomy level. From read-only “observe” agents to fully autonomous executors. Governance should scale with this, not uniformly — Gartner's warning is that applying one governance standard to every agent is itself a cause of failure.
  • Safety & permissioning. The mature pattern is tiered human-in-the-loop: read-only auto-proceeds, writes need a click, destructive or irreversible actions need a real preview and review.
  • Cost shape. Subscription, credits, or compute-units — and crucially, whether per-task budgets are enforced atomically, because a successful runaway agent is the expensive failure mode now.
06 · The part they don't print

What can go wrong

An agent that can act is an agent that can act wrongly, at machine speed, on your real data. This isn't hypothetical hand-wringing; it's the defining engineering problem of the category right now. Four risks recur:

  • Prompt injection. Now a tier-one, reliably exploitable attack. When the agent reads a malicious web page or document, a hidden instruction can become an action, not just bad text. Computer-use agents face a visual variant — instructions smuggled into what the agent sees on screen.
  • Over-permissioning. Machine identities are over-permissioned at strikingly high rates. A minor misconfiguration that a human would shrug off becomes, for an autonomous agent, a machine-speed compromise.
  • Data exfiltration through legitimate tools. A perimeter gateway can't stop an over-permissioned agent from leaking data through tool calls that all look allowed. The calls are individually legitimate; the pattern is not.
  • Over-autonomy — the paperclip problem. The “zero-human company” framing is genuinely exciting and genuinely the exact scenario the paperclip-maximizer was coined to warn about. The mitigation is unglamorous: proportional governance, and a human in the loop on anything irreversible.
The rule of thumb I use

Give an agent the narrowest folder, the fewest credentials, and the smallest blast radius that still lets it do the job — and keep a human gate on anything you can't undo. Capability is cheap now; reversibility is the thing worth paying for.

07 · Where it's heading

The next turn

A few currents seem clear enough to name:

  • From one agent to many. If 2025 was the AI employee, 2026 is the AI company — orchestration layers like Paperclip turning a pile of agents into a managed, budgeted team.
  • The harness becomes the product. The competitive edge is moving off the raw model and onto the wrapper — tools, guardrails, observability, permissions.
  • On-device and always-on go mainstream. Scout's always-on personal agent and NVIDIA's local NemoClaw stack arrived in the same season — the first time every layer of a local agentic stack has both an open-source and a vendor-backed option shipping at once.
  • Interop over monoliths. Open protocols (MCP, agent-to-agent) are pushing the market toward assembled, swappable agent ecosystems rather than one vendor's all-in-one.
  • Security becomes the gate. Gartner expects task-specific agents embedded in a large share of enterprise apps this year — and naming their security as the defining challenge of 2026. The winners won't be whoever is most capable; they'll be whoever is capable and governable.

So — back to that one-page table. It wasn't wrong. It was just early. The real shape of this space is a spectrum with a governance problem running down the middle of it: the more you let an agent do, the more carefully you have to watch it. Cowork optimises for getting out of your way; Scout for never leaving the enterprise's sight; OpenClaw for putting the whole thing in your own hands — keys, risks and all. Which one is “best” is really a question about how much control you want to hold versus how much you want handled for you.

Current to here, and unfinished by design. The city is still being built.