COLLEAGUE.SKILL — two-track skill packages from one expert trace

Agent
L
One expert trace → a portable, versioned skill any agent can runone expert trace (a recorded run)openscan rowscheck taxflag gapdraftsenddistilSkill packagev1capability track — what to docheck the tax line firstflag if total ≠ line sumbehavior track — how to do itlead with the bottom linecalm, terse — no hedging✎ corrected in plain English → rolled to v1agent hosts (any runtime)host 1empty — re-learns from scratchhost 2empty — re-learns from scratchhost 3empty — re-learns from scratchhosts running this skill: 0 / 3capability + behavior, one portable package1 trace → every agentinspect · edit in natural language · version · install across hosts
learnaivisually.com/ai-explained/colleague-skill-capability-vs-behavioral-track

The news. On May 29, 2026, a team posted COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation. The pitch: agents are increasingly asked to carry "bounded representations of human expertise, judgment, and interaction style," but that knowledge lives in messy traces, not clean instructions. The system distills those traces into a versioned skill package with two coordinated tracks — a capability track for practices and decision heuristics, and a bounded behavior track for communication style and correction history — that you can inspect, invoke, correct in natural language, roll back, and install across agent hosts. The authors point to a public skill gallery (reportedly ~215 skills from ~165 contributors) as evidence the packaging format already has traction. Read the paper →

Picture the master chef working the line for one dinner service. They taste, they adjust the salt, they plate fast, they tell the runner exactly how to describe the dish. If you only filmed it once, you'd have everything you need to reproduce the meal — but it's all tangled together. The recipe (sear three minutes a side, rest until it springs back) is mixed in with the plating (sauce under, not over; one sprig, not three) and the table-side patter. A new cook can't just replay the tape; they need it written down. And crucially, the steps and the style are different kinds of knowledge — one is procedure, the other is presentation.

That split is the whole idea. COLLEAGUE.SKILL reads the expert trace and sorts it onto two cards-in-one: the capability track is the recipe — what to do, the heuristics and decision rules ("check the tax line before you trust the total"). The behavior track is the plating and patter — how to do it, the tone and interaction style ("lead with the bottom line, stay terse, don't hedge"). Keeping them on separate tracks is what lets you tune one without disturbing the other: you can sharpen a decision rule without rewriting the voice, or soften the tone without touching the logic. This is exactly the "skills as a primitive" framing the Tool Use module's Skills step builds toward — and it changes what you decide to load into an agent's context, because a package is a single, nameable thing to install rather than a wall of prose to paste.

The payoff is portability. Because the package is plain text rather than weights, the same card cooks in any kitchen — you install it unchanged on a CLI harness, an IDE assistant, or a server-side orchestrator. When the chef changes their mind, you don't re-shoot the film: you scribble a fix in the margin in plain English, the package rolls from v1 to v2, and a bad edit reverts like any other commit. Compared with the usual options, that's a real shift in where expertise lives:

ApproachWhere expertise livesInspect & edit?Versioned / rollback?Portable across hosts?
Fine-tuningModel weightsNo — opaqueOnly by retrainingNo — tied to that model
Monolithic system promptOne prose blobYes, but what/how are tangledAd hoc (copy-paste)Partly — must hand-port
Skill package (capability + behavior)Versioned text fileYes — two clean tracksYes — v1 → v2, revertYes — install unchanged

Where does that portability earn its keep? Say a team has 5 expert workflows they want running across 4 agent hosts (illustrative). Hand-porting means writing and re-syncing a bespoke prompt for every pairing — 5 × 4 = 20 bespoke prompts, and every time one expert refines a rule you re-edit it in four places. As skill packages it's 5 portable packages, each installed unchanged on all four hosts, and a refinement is one margin edit, versioned once — not four silent re-syncs that drift out of agreement. The win isn't a benchmark number (the paper is a systems contribution, not an accuracy result); it's that expertise stops being copy-pasted prose and starts behaving like code.

Goes deeper in: AI Agents → Tool Use & Function Calling → Skills

Related explainers

This was a single-concept pick, so it has no sibling pages from the same news item — but these existing explainers sit close by:

Frequently Asked Questions