What is a COLLEAGUE.SKILL skill package?

A self-contained, versioned text unit distilled automatically from a recorded expert trace. It holds two coordinated tracks — a capability track (what to do) and a bounded behavior track (how to do it) — and can be inspected, invoked, corrected in natural language, rolled back, and installed across different agent hosts.

What's the difference between the capability track and the behavior track?

The capability track encodes practices, mental models, and decision heuristics — the procedure. The bounded behavior track encodes communication style, interaction rules, and the correction history — the presentation. Keeping them on separate tracks lets you tune one without disturbing the other.

How is this different from fine-tuning or a system prompt?

Fine-tuning bakes expertise into opaque weights you can't inspect or roll back; a monolithic system prompt tangles what-to-do and how-to-say-it into one prose blob with no real version history. A skill package keeps the two tracks separate, in plain text, versioned, and installable unchanged across hosts.

COLLEAGUE.SKILL — Capability vs behavior skill tracks

COLLEAGUE.SKILL — two-track skill packages from one expert trace

Agent

learnaivisually.com/ai-explained/colleague-skill-capability-vs-behavioral-track

Jargon

Expert trace: A recorded run of a person or role doing real work — the messages, tool calls, decisions, and corrections produced along the way. The raw material COLLEAGUE.SKILL distills from. It is heterogeneous: the "what" and the "how" are tangled together, not written as clean instructions.
Capability track: The half of a skill package that encodes what to do — practices, mental models, and decision heuristics. In the recipe metaphor, the ordered steps and the doneness checks.
Bounded behavior track: The half that encodes how to do it — communication style, interaction rules, and the correction history. "Bounded" because it captures a limited slice of a person's style, not a full clone of them.
Skill package: A self-contained, versioned unit holding both tracks. It can be inspected, invoked, updated through natural-language feedback, rolled back, and installed across different agent hosts. Linked module: AI Agents → Tool Use → Skills.
Agent host: The runtime an agent runs inside — a CLI harness, an IDE assistant, a server-side orchestrator. "Cross-host" means the same package drops into any of them unchanged.
Rollback / versioning: Because the package is plain text under version control, an edit produces a new version (v1 → v2) and a bad edit can be reverted — the same discipline you get with code, applied to expertise.

The news. On May 29, 2026, a team posted COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation. The pitch: agents are increasingly asked to carry "bounded representations of human expertise, judgment, and interaction style," but that knowledge lives in messy traces, not clean instructions. The system distills those traces into a versioned skill package with two coordinated tracks — a capability track for practices and decision heuristics, and a bounded behavior track for communication style and correction history — that you can inspect, invoke, correct in natural language, roll back, and install across agent hosts. The authors point to a public skill gallery (reportedly ~215 skills from ~165 contributors) as evidence the packaging format already has traction. Read the paper →

Picture the master chef working the line for one dinner service. They taste, they adjust the salt, they plate fast, they tell the runner exactly how to describe the dish. If you only filmed it once, you'd have everything you need to reproduce the meal — but it's all tangled together. The recipe (sear three minutes a side, rest until it springs back) is mixed in with the plating (sauce under, not over; one sprig, not three) and the table-side patter. A new cook can't just replay the tape; they need it written down. And crucially, the steps and the style are different kinds of knowledge — one is procedure, the other is presentation.

That split is the whole idea. COLLEAGUE.SKILL reads the expert trace and sorts it onto two cards-in-one: the capability track is the recipe — what to do, the heuristics and decision rules ("check the tax line before you trust the total"). The behavior track is the plating and patter — how to do it, the tone and interaction style ("lead with the bottom line, stay terse, don't hedge"). Keeping them on separate tracks is what lets you tune one without disturbing the other: you can sharpen a decision rule without rewriting the voice, or soften the tone without touching the logic. This is exactly the "skills as a primitive" framing the Tool Use module's Skills step builds toward — and it changes what you decide to load into an agent's context, because a package is a single, nameable thing to install rather than a wall of prose to paste.

The payoff is portability. Because the package is plain text rather than weights, the same card cooks in any kitchen — you install it unchanged on a CLI harness, an IDE assistant, or a server-side orchestrator. When the chef changes their mind, you don't re-shoot the film: you scribble a fix in the margin in plain English, the package rolls from v1 to v2, and a bad edit reverts like any other commit. Compared with the usual options, that's a real shift in where expertise lives:

Approach	Where expertise lives	Inspect & edit?	Versioned / rollback?	Portable across hosts?
Fine-tuning	Model weights	No — opaque	Only by retraining	No — tied to that model
Monolithic system prompt	One prose blob	Yes, but what/how are tangled	Ad hoc (copy-paste)	Partly — must hand-port
Skill package (capability + behavior)	Versioned text file	Yes — two clean tracks	Yes — v1 → v2, revert	Yes — install unchanged

Where does that portability earn its keep? Say a team has 5 expert workflows they want running across 4 agent hosts (illustrative). Hand-porting means writing and re-syncing a bespoke prompt for every pairing — 5 × 4 = 20 bespoke prompts, and every time one expert refines a rule you re-edit it in four places. As skill packages it's 5 portable packages, each installed unchanged on all four hosts, and a refinement is one margin edit, versioned once — not four silent re-syncs that drift out of agreement. The win isn't a benchmark number (the paper is a systems contribution, not an accuracy result); it's that expertise stops being copy-pasted prose and starts behaving like code.

Goes deeper in: AI Agents → Tool Use & Function Calling → Skills

Related explainers

This was a single-concept pick, so it has no sibling pages from the same news item — but these existing explainers sit close by:

EnvFactory — synthesizing tool environments — generating the environments agents practice their skills in.
Tool Router — a contextual bandit over tools — choosing which packaged capability to invoke at run time.

Frequently Asked Questions

Check what you knowMap your AI & GPU knowledge across every track — free, role-based