The Design Principles

These are the twelve ideas that govern every decision in CocoPlus. They are worth understanding not just as rules — but as the reasoning behind them.

A principle without a reason is just a constraint. With a reason, it becomes a compass.

1. Native First, Always

CocoPlus uses only what Coco already provides: skills, subagents, hooks, slash commands, and AGENTS.md. Nothing external. No third-party services bolted on, no external databases, no runtime dependencies beyond the Snowflake connection Coco already manages.

The reason for this is not purity — it is reliability. Every external dependency is a failure vector that lies outside your control. A cloud service rate-limits under load. A database pool exhausts at midnight. An API changes its response format without notice. When any of these happen inside CocoPlus, the developer has to debug a distributed system problem to understand why their AI plugin failed.

Native constructs fail in ways you can read and understand. A hook script that fails leaves an error message in a format you recognize. A skill file that has a logic error is a file you can open and read. The failure modes are local, legible, and fixable without calling anyone.

If a feature requires breaking this principle, the feature does not belong in CocoPlus.

2. Structure Enables Flow

The CocoBrew phases feel like overhead — until the moment you realize you haven't written a spec for your last three AI sessions and the outputs are inconsistent in ways you can't trace.

Structure is not the opposite of speed. Unstructured development feels faster at first because you're skipping steps. Then you spend twice as long on the steps you skipped — debugging misaligned expectations, re-running tests you didn't design, shipping code that doesn't match what the stakeholder asked for.

The phases in CocoBrew are cognitive handrails. Each phase enforces a mental discipline that the next phase depends on. You can't write a good test if you don't have a spec defining success. You can't do a meaningful review if you don't have tested code. You can't ship responsibly if your review is open.

This sequencing is not bureaucratic. It is the shape that good development naturally takes when you're paying attention.

3. Inspect Before Act

Acting on stale context is one of the most reliable ways to produce wrong outputs.

Before any agent executes SQL, it should understand the schema it's operating on. Before any deployment, it should understand the target environment. Before any migration, it should understand the current state of what it's migrating from.

The Environment Inspector embodies this as a feature. But the principle applies everywhere in the system. CocoHarvest reads the approved plan before decomposing it. The Safety Gate reads the SQL before allowing it to execute. Skills read the current state from .cocoplus/ before taking action.

Inspection has a cost. A 30-second environment scan uses tokens and time. But the cost of inspection is fixed. The cost of acting on wrong assumptions is variable — and the upper end of that variable is "you dropped the wrong table."

The inspection habit is cheap discipline that prevents expensive mistakes.

4. Evidence Over Assertion

A subagent that reports completion is not done. A subagent whose checkpoint files exist and contain valid data — that one is done.

This distinction matters because language models are very good at describing what they intended to do and somewhat less reliable about verifying that they actually did it. The CocoFlow checkpoint system is designed to make completion measurable rather than reportable.

Stages emit specific files as their completion evidence. Downstream stages read from those files, not from the previous stage's session context. The Safety Gate re-validates before allowing Ship. The Review phase signs off on concrete artifacts.

The system is designed to make assertions costly and evidence cheap. Not because any individual agent is untrustworthy — but because the incentive structure of a well-designed system should reward producing evidence, not reporting claims.

5. Hard Gates for Irreversible Operations

Some mistakes cannot be undone. Dropped tables. Truncated schemas. Deleted rows without a WHERE clause.

For these operations, a warning is insufficient. A language model that is prompted carefully follows its instructions most of the time. Most of the time is not acceptable for irreversible operations on production data.

Hard gates are technical enforcements that operate before any reasoning happens. The Safety Gate's PreToolUse interceptor fires before the SQL reaches Snowflake. It does not ask the language model for permission. It reads the SQL, recognizes the pattern, and blocks the call. This cannot be bypassed with a clever prompt.

For recoverable mistakes — wrong logic, suboptimal queries, missing edge cases — advisory guidance is appropriate. Soft guidance informs, does not block.

The distinction between hard and soft is architectural. Confusing them — making hard things soft, or making soft things hard — degrades the system in both directions. Soft gates on irreversible operations give false confidence. Hard gates on recoverable mistakes create friction that serves nobody.

6. Progressive Disclosure

Load the minimum context needed for the current task. Not slightly less. Not roughly the right amount. Exactly the minimum.

AGENTS.md is capped at 200 lines because every character in it is loaded into every session. Skills are loaded on demand because they contain detailed implementation logic that most sessions don't need. Monitors run per-stage because global monitors create context overhead that accumulates across every stage.

The temptation is always to add more "just in case." Just-in-case context becomes noise. Noise makes it harder for the agent to reason about the signal. A skill that loads 600 lines of context to execute a 30-line workflow is a skill whose design should be questioned.

The hard limit is a forcing function for clear boundaries. If AGENTS.md can only hold 200 lines, you have to decide what belongs there and what belongs in the warm layer. That decision-making process is valuable in itself — it forces you to articulate what the ambient context of a session actually needs to contain.

Progressive disclosure is how the system scales. At session 10, it works. At session 200, it still works, because the context budget is managed rather than exhausted.

7. Classify Failure Before Retry

When a stage fails, the failure must be understood before it is addressed.

Blind retry is not a strategy — it is an absence of strategy dressed up as action. Running the same failing stage again without understanding why it failed produces the same failure, or a different failure, unpredictably. Either way, you learn nothing.

Failure classification is the first step. Did the checkpoint file not get created? Did it get created with invalid content? Did the SQL error? Did the test fail? Each of these failure types has a different root cause and a different repair.

The CocoFlow loop construct exists for structured retry after classification. A loop that retries up to three times is appropriate after you've identified that the failure is transient — a timeout, a connection issue. A loop that retries three times on a logic error produces three documented failures instead of one.

Logging the classification is what makes the Memory Engine valuable. CocoCupper can identify patterns in failure classifications across sessions. "Stages that involve NotebookExecute fail on checkpoint validation 40% of the time due to notebook output formatting" is a pattern worth knowing. Unclassified retries produce no learning.

8. Institutional Memory Over Rediscovery

Every session should leave the project in a better-documented state than it found it.

This is not a nice-to-have. It is the difference between a project that gets smarter with time and one that resets with every new session.

The Memory Engine captures decisions as they happen. CocoCupper identifies patterns after sessions complete. CocoGrove stores the patterns worth keeping permanently. The three-layer memory architecture exists because different knowledge has different durability: some decisions are relevant for today's session, some for the next month of sessions, and some indefinitely.

An error that was encountered and resolved is worth recording — not because you are proud of the error, but because the next developer (or the next AI agent, or you in six months) should not have to rediscover the same solution. A schema pattern that worked well is worth promoting to CocoGrove so it becomes the default approach, not the hard-won exception.

The cost of documentation is paid once. The benefit of not re-discovering something is paid every time it would have been re-discovered.

9. Token Economics as Architecture

Token budgets are not constraints to work around. They are parameters that shape the system's design.

CocoPlus makes explicit model assignments across the system: Haiku for pattern recognition (CocoCupper), Haiku with Sonnet escalation for analytical tasks (Data Analyst, BI Analyst), Sonnet for execution and reasoning (most personas), Opus for strategic decisions (CDO). These are not preferences — they are architectural decisions about where depth is required and where efficiency is sufficient.

The AGENTS.md limit, the on-demand skill loading, the per-stage monitor activation — all of these are token economics expressed as design constraints. They ensure that the system is as useful at session 200 as it is at session 1, because context budget is managed at every layer.

When designing a new feature, the first question is not "what should it do" — it is "what does it cost to do that, and is the cost justified at scale?" A feature that works beautifully for the first twenty iterations and breaks the context budget by iteration thirty is not a well-designed feature.

10. Isolation Prevents Contamination

CocoHarvest agents run in isolated git worktrees. CocoFlow stages marked as isolated start with clean context. Unverified output from one stage does not become assumed truth in the next.

Isolation is not just about parallelism. It is about correctness.

When one stage contaminates the next with incomplete or incorrect assumptions, the error propagates. By the time you notice it — at Stage 4, or Stage 6, or during Review — the contamination has compound interest. Debugging requires tracing back through multiple stages to find where the wrong assumption was introduced.

Isolated stages prevent this. Each stage has explicit inputs (the files and checkpoints from prior stages) and explicit outputs (the checkpoint files it produces). The interface is clear. A failure is localized — it is in this stage, not diffused through the system.

The cost of isolation is real: more explicit file-passing, more checkpoint definitions, more deliberate context management. It is always lower than the cost of debugging a contaminated failure.

11. Personas Are Contracts

A persona is not a suggestion for how to approach a task. It is a specification of who handles the task, what tools they have, how they operate, and what model reasons on their behalf.

These specifications are locked. The Data Steward operates in plan mode because autonomous execution defeats the purpose of governance review — someone who can take action without prompting is not a check, they are a participant. The CDO uses Opus because the cost of Opus is appropriate for decisions at the level of strategic architecture.

Persona contracts matter because CocoPlus is a team system. When you assign a task to a persona, you know exactly what will happen: which tools will be available, which model will reason, which output format to expect. That predictability is the value of the contract.

A persona whose contracts can be arbitrarily overridden is a persona without identity. Identity in a specialist agent is not a personality quirk — it is the source of the specialist's value. Override the contract, and you have a generalist again.

12. The Plugin Serves the Developer

Every feature must produce visible, direct value to the developer working in the terminal.

This sounds obvious. It is often violated.

Features that add process overhead without developer benefit are not features — they are compliance theater. A feature that ensures the AI's session is auditable but provides the developer nothing concrete is a feature that serves a process, not a person.

Before any feature is built, the test is: does the developer experience value directly? Does it save them time? Does it prevent an error they would have made? Does it produce an artifact they actually want? If the answer is "it helps governance" or "it satisfies the audit trail," the next question is: does the developer experience that help, or does it happen invisibly behind them?

CocoPlus is not a governance layer. It is a productivity layer. The difference is where the value lands. Governance lands in reports. Productivity lands in the developer's experience of doing the work.

If CocoPlus ever starts to feel like something to work around rather than something to work with, this principle has been violated.

Principles are only principles when they are applied in the cases where violating them would be convenient.