The 8-Component Agent Skeleton: why every working AI agent looks the same.
After 100+ AI agents shipped at agency-network scale and every public system prompt PromptLeadz could get its hands on (Claude Code's March 2026 source-map leak, the asgeirtj/system_prompts_leaks repository, Anthropic's open Skills standard, the PromptLeadz role packs across Sales / Support / Marketing / Founder / Recruiter), one pattern keeps appearing. Every working agent has the same eight components. Same names. Same order. Different settings. The agents that miss components fail in predictable ways. The agents that include them all work even when the underlying model changes. Here is the framework, the empirical proof, and the failure mode each component prevents.
PromptLeadz has spent the last 18 months inside large-scale AI agent deployment, building agents at scale across nine operating units and 200+ markets. More than 100 agents have shipped, been debugged, retired, or pulled across that footprint. Some shipped and stayed. Some shipped and got pulled. Some never made it past staging because they failed the sniff test in the first review.
The pattern that took longest to see: the agents that worked all looked the same underneath. Not in topic, not in tone, not in deployment surface — but in structure. Eight blocks, every time. The agents that failed had the same blocks missing every time. After enough cycles, the right move was to stop arguing with the pattern and start building against it explicitly.
Then in March 2026 the entire Claude Code source code leaked through an npm source map, and within 72 hours the AI engineering community had decompiled the system prompt. The night it dropped, the same eight components were visible in the same order, with the same patterns that had been emerging in production agents for a year. The asgeirtj/system_prompts_leaks repository, which catalogs extracted system prompts from GPT-5.5, Gemini 3.1, Grok 4.3, Perplexity, and others, shows the same eight components. Different brands, different surfaces, different model providers — converging on the same architecture.
This post names that architecture. The 8-Component Agent Skeleton. It is portable, it is empirical, and once you see it you cannot unsee it.
The ones that miss components fail in predictable ways.
The eight components.
Every working agent shipped or seen in the wild contains these eight blocks. They appear in the same order. They have the same job. They prevent the same failure modes. The framework is descriptive — it does not prescribe additions; it names what already exists in agents that work and what is missing in agents that don't.
Role
Establishes who the agent is.
The Role block answers the first question every model needs answered: who am I? It sets the agent's seniority (junior / peer / senior), its register (corporate / direct / technical / casual), and what it explicitly refuses to do. A vague Role produces vague output across every other component. A sharp Role anchors everything that follows.
The tell: When the role block reads 'You are a helpful assistant', the agent will be helpful at everything and excellent at nothing. When it reads 'You are a senior B2B sales operator embedded in [Your Company]'s revenue team, peer to a 5+ year sales professional', the agent has somewhere to stand.
Capabilities
Names what the agent does.
Capabilities scopes the agent to a specific list of task families. The list is short (5-10 items typically), each item is a category not a specific task, and the items have implicit boundaries. When a request maps to a capability, the agent runs the procedure for that capability. When a request falls outside, the agent surfaces that.
The tell: Good Capabilities reads like a job description with verbs. Bad Capabilities reads like 'I can help with anything related to sales.' One scopes; the other invites scope creep.
Constraints
Defines what the agent will not do.
Constraints is the most-skipped component and the most-important one. It contains banned phrases (specific words and phrases the agent will not produce), banned tactics (specific patterns the agent will not use), and length caps (specific word counts per task family). Constraints is where you encode the failure modes you have already seen and want to prevent permanently.
The tell: If the Constraints block does not list specific banned phrases that are coded for your domain (sales packs ban 'circling back', recruiter packs ban 'rockstar', founder packs ban 'pleased to announce'), the agent will produce those phrases.
Output Format
Locks the structure of every response.
Output Format provides templates per task type. Cold emails get the subject-body-CTA structure with explicit length per part. Reports get headline-numbers-progress-asks. Code review gets overall-bugs-tests-questions. Without Output Format, every response is a different shape and the operator spends time reformatting instead of using the work.
The tell: When the agent's first response to a task has wildly different structure than its third response to the same task, Output Format is missing or weak.
Examples
Anchors the format with worked examples.
Examples shows the agent what good output looks like for each major task family. 2-3 examples per family, each one demonstrating the format applied to a specific scenario. Examples do more work than Output Format alone because they make the abstract template concrete, and concrete patterns are what models match against.
The tell: An agent with strong Output Format and weak Examples will produce output that follows the template but misses the spirit. An agent with strong Examples will produce output that captures the spirit even when the template breaks.
Context
Carries the user-specific information.
Context is the only component that is actually unique to your company, your customer, your codebase. Voice samples, named customer references, claim policy, banned topics, ICP description, brand voice, prior conversation history — all of these live in Context. Two operators with the same Role, Capabilities, Constraints, Output Format, and Examples will produce identical generic output until Context is filled in. Context is where the agent becomes yours.
The tell: If the agent's output sounds like it could apply to any company in your industry, Context is underfilled. If the output references your specific customers, your specific differentiation, your specific voice — Context is doing its job.
Escalation
Names when the agent stops and asks for human review.
Escalation lists specific triggers that should halt the agent and surface a flag. Pricing commitments outside the published price book. Legal language. Specific revenue projections without case study evidence. Negative competitor mentions. Reference customer invention. Each trigger names a specific failure mode and a specific flag the operator should see. Escalation is the agent's humility check: things it should not decide alone.
The tell: An agent without Escalation will confidently generate content that should have stopped for review. An agent with Escalation will surface a flag and let the operator decide.
Self-Check
Verifies the output before it ships.
Self-Check is a checklist the agent runs before returning any output. 5-8 items typically. Length within limits? No banned phrases? Format matches template? Specific signals used, not generic? No claims requiring escalation slipped through? Voice matches samples? If any check fails, the agent revises before returning. Self-Check is what makes the agent self-correcting rather than first-draft-shipping.
The tell: An agent without Self-Check ships drafts. An agent with Self-Check ships near-final output and catches its own drift before the operator sees it.
Constraints (because they want the agent to be helpful, and constraints feel like the opposite) and Context (because they underfill it and wonder why the output is generic).
Six free role packs. Same eight components. Six cost-of-mistake calibrations.
Sales pack: cost of mistake is ignored email, agent calibrates for engagement. Support pack: cost is wrongful refund, agent calibrates for resolution within policy. Marketing pack: cost is brand drift, agent calibrates for voice consistency. Founder pack: cost is wrong hat in wrong room, agent calibrates for context-switching. Recruiter pack: cost is biased screen ending in EEOC complaint, agent calibrates for candidate dignity. Same skeleton, six different settings. Free.
See the Sales Pack →Five public agents. Same eight components.
The framework is not a guess. It is what the field has converged on. Here are five public artifacts checked component by component. The methodology was simple: read the public system prompt or specification, ask whether each of the eight components is present, mark it. Three of the five had all eight components explicit. The other two had 6-7 explicit and 1-2 implicit.
Claude Code (Anthropic, March 2026 leak)
The Claude Code source code shipped with a debug-only source map in npm package @anthropic-ai/claude-code 2.1.88, and within hours the engineering community had reverse-engineered the system prompt and published analyses. The most thorough public breakdown is dbreunig.com's component-by-component visualization, which lists each piece of the assembled system prompt and which conditions trigger it.
All eight components present and explicit. Role: "you are an interactive agent that helps users with software engineering tasks". Capabilities: the tool catalog (file operations, shell, agents, task management, web, MCP, scheduling, utility). Constraints: explicit length limits ("keep text between tool calls to under 25 words"), security guardrails, banned tactics. Output Format: structured thinking blocks and tool-call patterns. Examples: documented bash-tool examples and tool-use scaffolding. Context: working directory, git repository state, platform, OS, model name. Escalation: the six-mode permission system with denial tracking. Self-Check (implicit but present): the parallel-tool-execution rules and output-quality verifications.
GPT-5.5 Thinking (OpenAI, April 2026 extracted prompt)
OpenAI does not publish its system prompts officially, but the asgeirtj/system_prompts_leaks repository catalogs extracted prompts from GPT-5.5, GPT-5.4, Codex, and others. The GPT-5.5 Thinking system prompt (extracted April 26, 2026) has all eight components, with examples implicit rather than explicit and self-check distributed across the body rather than collected in one place. The shape is recognizable as the same skeleton.
Gemini 3.1 Pro (Google, March 2026 extracted prompt)
The same repository covers Google's Gemini 3.1 Pro. All eight components present; examples and escalation are partial rather than fully explicit. The convergence is striking given OpenAI, Google, and Anthropic do not coordinate on architecture and have shipped these systems independently.
Anthropic's open Skills standard (SKILL.md, 2026)
Anthropic published the Agent Skills format as an open standard at agentskills.io. A SKILL.md file contains YAML frontmatter (name, description, optional fields) and markdown instructions. The frontmatter is Role + Capabilities (collapsed). The markdown body explicitly contains the other six components. Anthropic's public skills repository contains examples that follow this structure. The Skills format is the eight-component skeleton compressed into a portable file.
PromptLeadz role packs (the franchise, 2026)
The six role packs in the PromptLeadz franchise (Sales, Support, Marketing, Founder, Recruiter, Microsoft Copilot) are explicitly built on this framework. Every pack has all eight components in the same order with different settings tuned for the role's cost-of-mistake. They serve as worked examples of how the framework adapts.
The Sales pack, components labeled.
Here is the Sales agent pack from the PromptLeadz franchise, with each of the eight components called out explicitly. This is what the skeleton looks like in production for a real B2B sales agent. The same eight components appear in the Support pack, the Marketing pack, the Founder pack, the Recruiter pack, and the Microsoft Copilot pack, with different settings calibrated for the role's specific cost-of-mistake.
┌─ COMPONENT 01: ROLE ─────────────────────────────────────────────────┐
You are a senior B2B sales operator embedded in [Your Company]'s revenue
team. You support an AE running outbound, expansion, and renewal motions
across the customer lifecycle. You operate as a peer to a 5+ year sales
professional, not a junior or a generic chatbot.
You write the way a top sales operator writes. Specific over abstract.
Numbers over adjectives. Buyer-respect over volume. You refuse pattern-
match shortcuts that look efficient on the rep's screen and feel spammy
on the buyer's.
└──────────────────────────────────────────────────────────────────────┘
┌─ COMPONENT 02: CAPABILITIES ─────────────────────────────────────────┐
You handle these task families:
1. Cold outbound drafting
2. Discovery call prep and post-call synthesis
3. Demo preparation
4. Proposal drafting and pricing conversations
5. Multi-stakeholder follow-up
6. Deal review and pipeline inspection
7. Renewal and expansion conversations
When a request maps to one of these families, follow the structure for
that family. When a request falls outside, surface that and ask for the
right routing.
└──────────────────────────────────────────────────────────────────────┘
┌─ COMPONENT 03: CONSTRAINTS ★ ────────────────────────────────────────┐
You will not violate these under any condition.
Length per task family.
- Cold email: 60-80 words for opens, under 40 for follow-ups
- Discovery prep brief: under 600 words
- Demo prep: under 800 words
Banned phrases. Do not use:
- "I hope this finds you well"
- "Quick question", "thoughts?", any vague single-word CTA
- "Just following up", "circling back", "bumping this"
- "Synergy", "leverage", "transform", "unlock", "best-in-class"
Banned tactics. Do not invent buyer titles, do not fabricate competitor
intel, do not promise specific outcomes without [Your Company]'s written
case studies as evidence.
└──────────────────────────────────────────────────────────────────────┘
┌─ COMPONENT 04: OUTPUT FORMAT ────────────────────────────────────────┐
For cold emails:
SUBJECT: [under 7 words, lowercase, names a specific signal]
BODY: [60-80 words, references the specific signal]
CTA: [single binary question]
For discovery briefs:
ACCOUNT: [name and dollar potential]
WHY NOW: [signals from this period]
WHO IS ON THE CALL: [per attendee, role and signal]
RECOMMENDED OPENING: [specific positioning]
[Format templates for the other 5 task families follow this same shape.]
└──────────────────────────────────────────────────────────────────────┘
┌─ COMPONENT 05: EXAMPLES ─────────────────────────────────────────────┐
EXAMPLE 1 — cold email (specific signal: shipped infrastructure project)
SUBJECT: shipped your event-pipeline migration
BODY: saw your post on switching from segment to a custom event pipeline.
we shipped the same migration 6 months ago and the bit nobody warned me
about was the order-of-magnitude cost on schema validation. wrote up our
patterns: [link]. happy to share the dirty parts if useful.
CTA: worth 20 minutes to compare notes?
[2-3 more examples per task family follow this same anchor pattern.]
└──────────────────────────────────────────────────────────────────────┘
┌─ COMPONENT 06: CONTEXT ★ ────────────────────────────────────────────┐
[Your Company] is a {STAGE} {COMPANY_DESCRIPTION}. We sell
{WHAT_YOU_SELL} to {ICP_DESCRIPTION}.
ICP: {ICP_DETAIL}
Differentiation: {SPECIFIC_DIFFERENTIATION}
Brand voice: {BRAND_VOICE_DESCRIPTION}
Voice samples: {VOICE_SAMPLE_1} / {VOICE_SAMPLE_2} / {VOICE_SAMPLE_3}
Named customer references: {NAMED_CUSTOMERS_WITH_OUTCOMES}
Banned topics: {BANNED_TOPICS}
Claim policy: We can claim {CLAIMS_WE_CAN_MAKE}.
We never claim {CLAIMS_WE_CANNOT_MAKE}.
This block is the highest-leverage in the agent. Vague entries here
produce generic output regardless of how good the rest of the agent is.
└──────────────────────────────────────────────────────────────────────┘
┌─ COMPONENT 07: ESCALATION ───────────────────────────────────────────┐
Defer to a human review (and stop drafting) when any of these are true:
1. Pricing commitments outside published price book — flag: PRICING REVIEW
2. Custom contract terms (MSA, DPA, SLA modifications) — flag: LEGAL REVIEW
3. Specific revenue or savings projections without case study evidence —
flag: METRIC PROJECTION REVIEW
4. Negative competitor mentions beyond fair-comparison framing —
flag: COMPETITIVE REVIEW
5. Technical claims about product capabilities not in {PRODUCT_CONTEXT} —
flag: PRODUCT CLAIM REVIEW
└──────────────────────────────────────────────────────────────────────┘
┌─ COMPONENT 08: SELF-CHECK ───────────────────────────────────────────┐
Before returning any output, verify:
1. Length is within the task family limit
2. No banned phrases appeared
3. The output matches the format template for this task family
4. Specific signals or named references are used, not generic
5. No claims requiring escalation slipped through
6. The voice matches the voice samples in context
7. The output respects the buyer's time and attention
If any check fails, revise before returning.
└──────────────────────────────────────────────────────────────────────┘
It does not tell you what to add. It names what already exists in the agents that work.
How to use the skeleton when building your next agent.
The framework is portable IP. The strongest way to use it is as a checklist when designing or auditing an AI agent. Three patterns to apply.
Pattern 1: Build new agents against the skeleton.
Open a doc. Paste the eight component names as headers in order. Fill each one for your specific use case. Role first (who is the agent, what is its seniority, what does it refuse). Capabilities second (which task families). Constraints third (banned phrases, banned tactics, length caps — spend disproportionate time here). Output Format fourth (template per task family). Examples fifth (2-3 worked examples per family). Context sixth (the company-specific block — voice, claims, named references). Escalation seventh (the triggers for human review). Self-Check eighth (5-8 verifications). When all eight are filled, you have the system prompt. When any are vague or missing, the agent will fail in the predictable failure mode for that component.
Pattern 2: Audit existing agents against the skeleton.
Take any agent that is misbehaving and read it against the eight components. Find the missing or weak component. Fix that one. The drift the agent was producing will go away. The most common findings: Context is underfilled (output is generic), Constraints is missing banned phrases (output uses AI-tell language), Self-Check is missing (drift goes unnoticed). The audit takes about 20 minutes for a well-written agent and surfaces the highest-leverage fix every time.
Pattern 3: Compress the skeleton into other formats.
The eight components compress cleanly into Anthropic's SKILL.md format, into Cursor's .cursorrules or modular .cursor/rules/, into Claude Code's CLAUDE.md, into ChatGPT Custom GPT instructions, into Gemini Gem instructions, and into direct API system prompts. Different formats, same skeleton. Different deployments, same architecture. The framework is the agent; the format is the deployment.
Sources and references
- Claude Code source map leak (March 31, 2026), npm package
@anthropic-ai/claude-code 2.1.88. Component-by-component analysis at dbreunig.com and varonis.com. - Extracted system prompts from GPT-5.5 Thinking, GPT-5.4, Claude Opus 4.7, Claude Opus 4.6, Gemini 3.1 Pro, Gemini 3 Flash, Grok 4.3 Beta, Perplexity, and others. Catalog at github.com/asgeirtj/system_prompts_leaks. Mirror at github.com/elder-plinius/CL4R1T4S.
- Anthropic Agent Skills overview and best practices at platform.claude.com. Open standard at agentskills.io. Public skills repository at github.com/anthropics/skills.
- Anthropic prompt engineering documentation at docs.anthropic.com.
- OpenAI GPTs documentation at help.openai.com.
- PromptLeadz franchise of 6 role packs and 4 platform-specific posts as worked examples of the skeleton. Sales, Support, Marketing, Founder, Recruiter, Microsoft Copilot.
Questions people ask.
Is the 8-component skeleton a published standard?
It is not a formally published standard yet. The 8-component skeleton is the framework distilled from building 100+ AI agents at agency-network scale and analyzing public artifacts: leaked system prompts from Claude Code (March 2026 npm source map leak), the asgeirtj/system_prompts_leaks GitHub repository (which catalogs prompts from GPT-5.5, Claude Opus 4.7, Gemini 3.1, Grok 4.3, Perplexity), Anthropic's open Skills format published at agentskills.io, and the PromptLeadz franchise of 6 role packs (Sales, Support, Marketing, Founder, Recruiter, Microsoft Copilot). Every one of these converges on the same eight components. The framework names what the field has already settled on.
Why do agents that miss components fail?
Each missing component creates a specific drift pattern. No Role and the agent has no seniority, no register, no anchor for what it refuses to do. No Capabilities and the agent accepts any task and drifts on scope. No Constraints (the most common omission) and the agent uses banned phrases, invents claims, breaks length limits. No Output Format and every response has a different shape. No Examples and the agent hallucinates the format. No Context and the output is generic. No Escalation and the agent ships content that should have stopped for human review. No Self-Check and drift goes unnoticed until it costs something. The skeleton is a forcing function: skip a component, get the failure mode that component prevents.
Which two components matter most?
Constraints (component 03) and Context (component 06). Constraints is the most-skipped because operators want the agent to be helpful and constraints feel like the opposite of helpful. But constraints prevent the failure modes that destroy trust: banned phrases that make output sound generic, banned tactics that cross policy lines, length caps that stop runaway output. Context is the highest-leverage because it is the only component that is actually company-specific. Voice samples, claim policy, named customer references, banned topics: these turn a generic agent into one that sounds like your company. Most operators underfill Context and then wonder why the agent's output is generic.
Does this apply to coding agents like Cursor and Claude Code?
Yes, with the same eight components but different settings. Claude Code's leaked system prompt (March 2026 source map leak, analyzed by dbreunig.com) has all eight components explicitly: a role block ("you are an interactive agent that helps users with software engineering tasks"), capabilities (the tool catalog: file ops, shell, agents, task management, web, MCP, scheduling, utility), constraints (length limits like "keep text between tool calls to 25 words", security guardrails, banned tactics), output format (structured thinking blocks and tool calls), examples (in the documented bash-tool examples), context (working directory, git repo state, platform, OS, model), escalation (the permission system with six modes plus denial tracking), and self-check (the output-quality verifications and the parallel-tool-execution rules). The same eight components, tuned for engineering work.
Can I use this framework for my own agents?
Yes. The framework is meant as portable IP. Build your next agent against the eight components and you avoid the most common failure modes. The PromptLeadz role packs (Sales, Support, Marketing, Founder, Recruiter, Microsoft Copilot) are worked examples of the framework applied to specific B2B operating roles; you can use them as references when building your own. The free SKILL.md pack shows how the framework compresses into Anthropic's Skills format, the free Cursor Rules and Claude.md templates show how it adapts to IDE-native engineering work, and the free platform-specific posts (Claude Projects for PM, Custom GPTs for RevOps, Gemini Gems for Finance) show how the framework deploys across each major chat platform. All of this is free to use as reference material when you build your own agents.
How does this relate to MCP, Skills, and tool-use frameworks?
MCP (Model Context Protocol) is plumbing for connecting agents to external systems. Skills (Anthropic's open standard) is a packaging format for procedural knowledge. Tool-use frameworks describe how the agent calls and chains external functions. None of those replace the 8-component skeleton; all of them sit on top of it. An MCP-enabled agent still has a role, capabilities, constraints, and so on. A skill is an 8-component skeleton compressed into a SKILL.md file. The skeleton is the agent architecture; MCP, Skills, and tool-use are how the agent connects to the world. Different layers, all complementary.
The framework is free. The Vault is for what the framework cannot reach.
Six free role packs, four free platform-specific posts, one free SKILL.md pack, one free Cursor Rules pack — all worked examples of the 8-component skeleton applied to specific B2B operating roles and deployment surfaces. The Vault is 50 specialist B2B sales prompts for the deals where the framework ends and the depth begins. Framework + Vault stack: skeleton for the architecture, prompts for the scenarios. One-time $99.99.
Get the Vault $99.99
Zostaw komentarz: