Skip to main content

Overview

When working with AI coding agents, two components work together:

LLM (Language Model)

The underlying AI model—Claude, GPT, etc.—that reasons about code and generates responses.

Harness

The coding agent CLI: tools, commands, agentic loop, permissions, and all the features that let the model interact with your codebase.

The Harness

The harness is the coding agent CLI that wraps the model. It’s what transforms a language model into a functional coding assistant. Key components include:
ComponentExamples
ToolsRead/write files, run bash commands, search code, web fetch
Agentic LoopMulti-turn execution that lets the model plan, act, observe, and iterate
Permission SystemControls what the agent can modify, execute, or access
Sub-agentsSpecialized agents for planning, exploration, verification
MCP IntegrationsExternal tool servers (GitHub, Linear, databases)
Context ManagementWhat files, history, and information the model sees
Different coding agents have different harnesses. Claude Code, Cursor, Cline, Aider—each has its own tool set, permission model, and execution flow.

Why Model Providers Train on Specific Harnesses

Model providers typically use reinforcement learning (RL) to improve their models for coding tasks. This training happens on a specific harness:
  • Anthropic trains Claude models using the Claude Code harness
  • OpenAI trains models using their own coding agent infrastructure
  • The model learns tool-calling patterns, when to ask for permission, how to handle errors

What This Means

A model optimized for one harness may behave differently in another:
ScenarioImpact
Claude in Claude CodeWell-optimized—trained on this harness
Claude in a different agentGenerally works well, but may have subtle differences in tool use
GPT in Claude Code’s harnessWorks but tool-calling patterns may differ from native environment
Modern models are designed to generalize across tool definitions, but they often perform best in their native environment where they were trained to use specific tool patterns.

Delegate Tasks to Native Harnesses

This is where Twill comes in. Instead of forcing models to work in unfamiliar harnesses, Twill delegates coding tasks to agents running in their native environments. Each model operates with the tools, patterns, and execution flows it was trained on
By matching models to their native harnesses, you get optimal tool-calling behavior, better error handling, and the full capabilities each provider intended.
This approach means you’re not limited to one agent or one model. Use Claude Code for complex refactoring where Claude excels. Use Codex for tasks where GPT’s strengths shine. Twill handles the orchestration—you get the best of each native harness.