Core Concepts — CocoPlus

Understanding CocoPlus means understanding a set of core ideas. Not features — ideas. The features are just these ideas made concrete.

The Lifecycle

The oldest insight in software engineering is that code produced without process is code produced twice. Once when you write it wrong, and once when you fix it.

CocoPlus calls its lifecycle CocoBrew. Six phases: Spec, Plan, Build, Test, Review, Ship. The naming is intentional — brewing takes time, has stages, and you can't rush fermentation without ruining the batch.

Each phase has a specific job:

Spec gets you to write down what you're actually building before you build it. Simple in theory. Almost never done in practice. The spec file is not a contract — it is a mirror. You see your own assumptions, written down, and immediately notice the gaps. After Spec, the optional CocoDiscuss phase locks implementation decisions — model selection, evaluation methodology, accuracy thresholds, scope boundaries — before the planning agent can make those choices silently. The CocoSpec quality gate then scores the combined specification on five dimensions (Value, Scope, Acceptance, Boundaries, Risk) before Plan is permitted to run. A specification that scores below nine is a specification with named, implicit assumptions — and the gate names them.
Plan decomposes the spec into work. CocoHarvest does this automatically, mapping task types to specialist personas and generating the execution pipeline. Plan ends with a human approval gate — and optionally a SecondEye pass: three models (Haiku, Sonnet, Opus) critiquing the plan simultaneously from different angles. Efficiency. Completeness. Risk. If SecondEye finds Critical issues, it creates a soft gate you must acknowledge before Build begins. Not a block — but not a whisper either.
Build executes the plan in isolated stages. Parallel where possible. Sequential where dependencies demand it. Every stage commits its artifacts before the next stage begins.
Test validates against the spec's success criteria. Not against the developer's mental model of what success looks like — against what was written down in the Spec phase.
Review is a structured quality pass. Not opinion — checklist. Snowflake anti-patterns, spec compliance, CocoCupper findings from the session.
Ship is gated on Review. If there are open must-fix items, Ship does not run. This is not negotiable.

The phases are not bureaucracy. They are the difference between developing in straight lines and developing in spirals.

CocoFlow

A CocoFlow is a pipeline definition — a JSON file that describes stages, their dependencies, their assigned personas, their checkpoints, and what constitutes completion.

Think of it like a directed graph of work. Stage B cannot start until Stage A's checkpoint files exist. Stage C can run in parallel with Stage D if neither depends on the other. Each stage knows which persona executes it, which model to use, and what files to produce.

CocoFlow is generated by CocoHarvest at the end of the Plan phase. The developer can read it, understand it, and edit it if needed. It is a transparent artifact — not a black box orchestration layer.

The flow file lives at .cocoplus/flow.json and is updated in real time as execution proceeds. $flow status reads this file and renders it as a human-readable report. $flow view goes further — it generates an interactive directed acyclic graph of the entire pipeline in your browser, with task cards, dependency edges, group bounding boxes, and live status badges. If a stage fails, the failure reason is recorded in the file. If you need to resume from a specific stage, $flow resume [stage-id] validates that all prior stages' checkpoints are intact before proceeding.

Every task in a CocoFlow pipeline is classified as HITL (Human In The Loop) or AFK (Away From Keyboard) by CocoLens during the CocoHarvest decomposition step. HITL tasks pause the pipeline for developer approval before their subagent spawns. AFK tasks execute autonomously. The developer reviews every classification during decomposition and can flip any task before spawning begins. The flow.json file records the classification as "hitl": true|false per stage, and $flow status shows the Type column so the boundary between supervised and autonomous work is always visible.

CocoFlow is not required for simple tasks. A single-developer, single-persona task goes straight to execution without creating a formal pipeline. The complexity of the tooling should match the complexity of the work.

The Persona System

The deepest design decision in CocoPlus is the one that looks the most obvious: use specialists, not generalists.

A generalist AI model will write you a Snowflake stored procedure. It will write a semantic model. It will write a notebook. It will review governance. It will do all of these things adequately. CocoPlus takes the position that "adequately" is not what you want from an AI assistant — you want depth, and depth requires specialization.

CocoPlus ships with eight specialists:

Data Engineer ($de) — schema design, SQL optimization, stored procedures, pipelines. The person who knows why that query is doing a full table scan and what to do about it.
Analytics Engineer ($ae) — semantic models, dbt-style transformations, metric definitions. The bridge between raw data and business questions.
Data Scientist ($ds) — notebooks, ML pipelines, feature engineering, Cortex ML functions. Comfortable with iteration and experimentation.
Data Analyst ($da) — query writing, exploration, Cortex Analyst integration. Fast, answers-first orientation.
BI Analyst ($bi) — dashboards, semantic layer refinement, presentation-ready outputs.
Data Product Manager ($dpm) — requirements, documentation, stakeholder communication. Always in plan mode — never executes autonomously.
Data Steward ($dst) — governance, data quality, access policy review. Always in plan mode.
Chief Data Officer ($cdo) — strategic architecture, system-level decisions, trade-off analysis. Uses Opus because the decisions it makes warrant the cost.

Each persona has a locked tool set. The Data Steward cannot run arbitrary SQL against production. The Data Scientist can use NotebookExecute; the Data Engineer cannot. These locks are not limitations — they are the persona's identity.

You invoke them directly ($de Review this procedure for performance issues) or let CocoHarvest assign them automatically based on task type. Either way, you know exactly what expertise is being applied to the problem.

One agent in the system is never invoked directly: CocoScout. It runs before every Build stage and every persona invocation — quietly, in the background, on Haiku. Its job is to scan the available context sources (warm memory, CocoGrove patterns, CocoContext standards, Environment Inspector snapshots) and rank them by relevance to the current task, so the executing agent begins with the most useful framing, not just the most recent. CocoScout doesn't reason about the task — it filters the environment for the agent that will. The practical effect is that each specialist begins its work with a context window shaped to the work at hand rather than the full undifferentiated heap of everything CocoPlus has accumulated.

Memory

Three layers. Each layer serves a different time horizon.

The hot layer is AGENTS.md. 200 lines, always loaded, always current. This is what Coco sees at the start of every session. It knows the current phase, the active modes, the three most recent key decisions. It is a summary, not an archive.

The warm layer is three files: decisions, patterns, errors. Richer than AGENTS.md. Consulted when a skill needs more context than the hot layer provides. Updated by hooks throughout the session.

The cold layer is CocoGrove — the curated pattern library. Not everything that happens in a session is worth keeping forever. CocoGrove contains only what has been promoted: patterns that worked, anti-patterns that burned you, reusable templates that earned their place.

These layers are a design about forgetting. Not everything deserves to be remembered. The hot layer forgets by compressing. The warm layer retains selectively. The cold layer requires deliberate promotion. This is how human institutional memory works in high-functioning teams, and it is how CocoPlus models it.

The benefit of this structure compounds. By session thirty, your CocoGrove has thirty sessions of project-specific wisdom. The AI assistant is not starting fresh — it is operating in an environment that knows your schemas, your patterns, your anti-patterns, your decisions. It is contextually smarter than any single session could make it.

There is a fourth layer, orthogonal to the three above: CocoContext. Where the session layers capture what happened during your specific project, CocoContext stores your team's standing organizational standards — approved models, quality thresholds, PII handling policy, warehouse allocation rules, naming conventions, governance gates. These six files are set once and consulted across every session and every persona invocation. They are not memory of events; they are the ambient constraints that define how your team works. A new developer on the project, a new persona invoked mid-sprint — both begin with the same organizational ground truth, because it lives in files that travel with the project and are committed to git.

The Safety Gate

There are two kinds of mistakes: recoverable and unrecoverable.

Recoverable mistakes are fine. They are how you learn. Write the wrong SQL, catch the error, fix it. No problem.

Unrecoverable mistakes are a different category. Drop the wrong production table. Truncate the wrong schema. Delete without a WHERE clause on a table with three million rows. These mistakes are expensive. Sometimes very expensive.

CocoPlus treats these two categories differently. For recoverable mistakes, the system offers guidance — suggestions, warnings, quality checks. Advisory. For unrecoverable mistakes, the system offers a hard gate — an interceptor that fires before the tool call executes and blocks it entirely.

This distinction matters because language model behavior is probabilistic. A strongly-worded prompt that says "never drop production tables" will be followed most of the time. Most of the time is not good enough for irreversible operations. The hard gate cannot be prompted around. It fires at the infrastructure level, before any reasoning happens.

The Safety Gate has two modes you can configure (strict, normal) and one you should never use in production environments (off). The default is normal — warnings on, blocks for the most severe patterns. Strict is for environments where even warnings feel too close to the edge.

CocoCupper and CocoGrove

After every session, something happens quietly in the background.

CocoCupper — a lightweight analyst agent — reads what just happened. The SQL that was executed. The files that were created. The decisions that were made. It identifies patterns: things that worked consistently, things that failed in consistent ways, approaches that were reinvented unnecessarily. It writes its findings to a file.

Those findings are raw material for CocoGrove. CocoGrove is the pattern library. It is not automatically populated by CocoCupper — that would create noise. Instead, developers promote findings they consider durable: a particular way of handling Snowflake pagination that keeps working, a schema naming convention that everyone agreed on, a query pattern that consistently avoids the performance pitfall.

CocoCupper runs on Haiku. It doesn't need to reason deeply — it needs to recognize and categorize. CocoGrove holds the results of that recognition, curated by a human who decided something was worth keeping.

The combination is institutional memory with a quality filter. The AI captures. The developer curates. The library compounds.

Token Economics

Every tool invocation, every loaded file, every active monitor, every character in AGENTS.md — it all costs context window space.

CocoPlus takes this seriously in a way most AI plugins don't. The AGENTS.md hard limit of 200 lines is not a convention — it is enforced. Skills are loaded on demand, not always active. Monitors run per-stage, not globally. The background analyst uses Haiku. The CDO uses Opus. These choices are made deliberately and documented explicitly.

Why does this matter? Because context explosion is a slow, invisible failure mode. A system that works perfectly at session five begins to degrade at session twenty as accumulated context starts crowding out useful reasoning. CocoPlus is designed to remain useful at session two hundred.

The principle it embodies: the right model for the right task, with the minimum context required to do that task well. Surgical, not comprehensive. The temptation is always to add more context "just in case." CocoPlus resists that temptation architecturally.

CocoPod

CocoPod is the container that holds everything together.

Before any CocoPlus feature can run, you initialize a CocoPod: $pod init. This creates the .cocoplus/ directory with the complete file structure that all features read from and write to. Memory files, lifecycle artifacts, flow definitions, monitor configurations, the pattern library — all of it lives here.

CocoPod is project-scoped. It lives next to your code. It goes into version control. It is the persistent state that transforms a stateless AI session into a stateful development environment.

When you come back after a week away, $pod resume reads the current state and tells you exactly where you left off. It is not summarizing from the AI's memory — it is reading from files that were written during your last session. The state is durable because files are durable.

CocoPod now integrates with CocoHealth — the context utilization monitor that watches the session's working memory and warns before reasoning degrades. When context utilization reaches the critical threshold, CocoHealth recommends running $pod checkpoint, which writes a structured recovery snapshot to lifecycle/checkpoint.md. Running $clear after a checkpoint, then $pod resume, is a first-class recovery path: not a workaround, but a designed workflow.

This is a choice about trust. CocoPlus doesn't trust the AI's session memory to persist important state. It trusts files. Files are readable, diffable, committable, inspectable. A file-based state store is a state store you can audit.

The Knowledge Graph

As a project accumulates Cortex AI functions, it outgrows the developer's ability to hold the whole thing in their head. Which functions share an evaluation dataset? Which capability depends on three others? Which function has no accuracy baseline? The answers exist in the codebase — scattered across SQL files, spec documents, prompt files, and evaluation configs.

CocoMap assembles these answers into a committed artifact. Five analysis agents work in parallel, each contributing a slice: function inventory, dependency edges, business capability mappings, evaluation infrastructure, and gap detection. Their results merge into coco-map.json — a knowledge graph with two views: a Structural view for technical dependency reasoning and a Domain view for business intent understanding.

The knowledge graph is not a diagram. It is a queryable data file that other CocoPlus features can read. CocoScout consults it when ranking context for build tasks. SecondEye can receive it as an artifact for impact analysis. The value is not in the graph itself — it is in what becomes visible when relationships are explicit: the hidden dependencies, the missing evaluations, the orphaned functions that nothing currently depends on.

Deferred Decisions

Development sessions produce two kinds of output: work product and insight. Work product is the SQL, the specs, the evaluation results. Insight is the realization mid-session that there's a better approach — but not right now, because right now you're building something else and the realization would derail the current work.

CocoSeed is the formalization of "not now, but not never." An idea stored with a trigger condition. The trigger is a statement about project state: "when CocoMap is committed," "when the evaluation baseline exceeds 90%," "when the Build phase completes." The idea is kept in .cocoplus/seeds/. The condition is evaluated at every session start. When it fires, the idea surfaces in the startup summary. The developer decides whether to promote it to the spec backlog or discard it.

This is a small feature with a precise purpose: reducing the cognitive load of remembering "we should eventually..." while eliminating the premature action that derails focused work. The trigger condition is the discipline. An idea without a trigger is just a note. A note is easily lost, easily forgotten, or — worse — acted on at the wrong time.

Working Backwards

There is an ancient design discipline in product development: write the press release before you write the specification. If you cannot describe the completed thing in a paragraph that someone would want to read, you do not yet understand what you are building.

CocoBloom encodes this discipline as an optional pre-spec phase. Four questions. Fifteen minutes. The output is a committed document that $spec reads as its first context — anchoring the specification to the original intent before the requirements dialogue begins. The bloom document stays in version control for the entire lifecycle. At review time and at ship time, it is still readable. The question "is this still the thing we said we were building?" has a written answer.

This is not bureaucracy. It is the difference between a specification that drifts gradually toward what is technically convenient and a specification that stays anchored to what was originally committed. The drift happens without the anchor. The anchor costs two minutes.

Genuine Independence

When one model is asked to play multiple roles, it does not produce multiple independent perspectives. It produces one perspective expressed in multiple voices. The Skeptic it voices shares the same prior as the Optimist it voices. Structured disagreement, same underlying judgment.

CocoKlatch addresses this not by improving the simulation but by abandoning it. Separate subagents are spawned simultaneously. Each receives only the topic brief. They have no knowledge of each other's assignment, no knowledge of each other's analysis. They produce independent conclusions. When the synthesis agent reads their outputs, it is reading genuinely different reasoning, not performed diversity.

The kaffeeklatsch metaphor is precise: an informal gathering where people speak freely rather than performing for a record. The value is not in the ritual — it is in the actual independence. Two agents who arrive at the same conclusion from different reasoning have confirmed something. Two agents who arrive at different conclusions have surfaced something. Neither result is possible when one agent plays both roles.

Distillation, Not Summarization

Summarization loses information. It compresses by discarding — keeping what seems important and dropping what doesn't. The problem is that "what seems important" is determined at compression time, before the reader's actual information need is known. A summary that was perfectly adequate for one question may omit exactly the fact needed for another.

CocoPull uses distillation instead. The distinction is meaningful: distillation reorganizes content to concentrate decision-relevant information without discarding it. Every accuracy threshold is preserved verbatim. Every schema name is preserved verbatim. Narrative explanation is compressed to its conclusion — "the analysis found that..." — but the conclusion is intact. The compression ratio reflects structural reorganization, not information loss.

The round-trip validation ($pull --validate) tests this precisely: does a reader of the distilled artifact answer the same probe questions as a reader of the original? A pull file that scores below 85% is flagged as low-reliability and will not be used as an agent input. The compression must earn its usage.

Engagement Observability

A system can be technically used without being meaningfully supervised. Approvals click through in rapid succession. Decompositions are accepted without modification. BLOCKING findings are acknowledged faster than they could plausibly have been read. The system is running correctly. The supervision is nominal.

CocoWatch makes this pattern visible. It does not interrupt the work — it is permanently non-blocking by architectural constraint. It observes engagement signals throughout the session and synthesizes them into a three-zone classification at the end: over-delegation, healthy collaboration, or under-utilization. The summary surfaces at $cocoplus ship and at FULL checkpoints.

The purpose is not to shame the developer or add friction. It is to make the question "am I actually supervising this, or am I just clicking through?" into a question with a data-grounded answer. The developer who looks at a Zone 1 summary and decides to review two specific findings before shipping has added ten minutes and potentially prevented a consequential error. The developer who looks at a Zone 1 summary and ships anyway has made an informed choice. Both outcomes are better than shipping without knowing.

Quality Gates as a Bracket

SecondEye and CocoSentinel are not variations of the same check. They are different checks at different moments, covering different failure modes.

SecondEye evaluates before execution: is this plan sound? Are the assumptions valid? Will the evaluation methodology actually detect the failures that matter? SecondEye's critics are independent — five subagents, each seeing only the plan, each reasoning separately before any output is shared. The synthesis reads their verdicts only after all five are complete.

CocoSentinel evaluates after execution: is this artifact sound? Security, correctness, performance, resilience, maintainability, compliance — seven dimensions, seven parallel sub-agents, same structural independence principle. Before the seven even run, Dimension G performs a binary check: did this artifact arrive with genuine evidence of its own verification, or only with claims? A submission that passes evidence criteria gets the seven-dimension review. A submission that fails it gets BLOCKED immediately. Reward-hacking — shrinking evaluation sets, skipping test blocks, empty assertions — is detected at the gate.

The bracket these two gates form means that both planning failures and implementation failures are caught before they advance through the lifecycle. You cannot slip a bad plan past SecondEye and ship a bad artifact past CocoSentinel. The only path forward is a plan that survives scrutiny and an artifact that deserves its approval.

Institutional Memory as a Quality Lever

Quality gates fire on individual artifacts in individual sessions. But the patterns that generate failures are not individual — they are institutional. A team that consistently writes vague acceptance criteria will fail the SecondEye spec gate repeatedly until someone names the pattern and changes the practice.

CocoWisdom is the mechanism for that naming. Every time a gate blocks — SecondEye at a spec, CocoSentinel at an artifact, the Devil's Advocate with an unrebutted concern — a structured record is appended to the project's wisdom store. The store accumulates across sessions. It is committed to git. It is read by SecondEye and CocoSentinel before they spawn their critic populations, surfacing the three (for SecondEye) or two (for CocoSentinel) most recent rejection records per dimension as prior context.

This does not constrain how critics evaluate the current artifact — independence is structural, not behavioral. What it does is prime pattern recognition. A critic that has seen the last three acceptance criteria rejections is more likely to notice the fourth occurrence of the same gap than a critic seeing the dimension for the first time. The project's quality gates become progressively better calibrated to the team's specific recurring failure patterns.

The append-only invariant matters philosophically as well as practically: the record of what failed cannot be erased. A team can observe its quality trajectory improving. It cannot rewrite the history of where it struggled.

Review as Knowledge Transfer

Code review in most systems is a gate — a binary approval decision that either passes or blocks. The reviewer finds problems. Problems are fixed. The artifact advances.

CocoReview encodes a different philosophical starting point: review is an act of knowledge transfer. A reviewer who finds only problems and surfaces no positive findings has given the author no information about what to continue doing. A reviewer who names what is well-constructed alongside what needs correction produces a richer signal and a better engineering culture over time.

This is encoded structurally, not aspirationally. The praise finding label is mandatory — not "try to find something nice to say" but a hard requirement: every review must name at least one well-constructed pattern. If no praise finding is emitted, the review skill activation rejects the output and requests it.

The progressive disclosure architecture matters here too. Deep reference material — language-specific guides, security review guides, architecture review guides — loads only when the review scope requires it. A routine SQL review does not pay the context cost of loading an OWASP-aligned security review guide. A $cocoplus review --security does. Depth is available on demand without taxing every invocation.

Evidence-First Analytics

Generic operational advice — "your pipelines are slow," "you should write more tests," "your deployment frequency is low" — is structurally unfalsifiable. It cannot be wrong because it names no specific thing. A data engineering team receiving that advice has learned nothing actionable.

CocoOps is built on an alternative principle: every metric from a named, traceable source; every recommendation citing the specific pipeline, task, or PR that produced it. Pipeline Run Frequency is computed from Snowflake task execution history. Failure Recovery Time is measured from failure event to successful retry completion. The DORA tier thresholds are sourced from published DORA research benchmarks, not inferred. The narrative synthesis adds interpretation on top of computed values — it does not substitute for them.

The DORA framework was chosen specifically because it was built from empirical research across thousands of software delivery teams. Its four metrics — deployment frequency, lead time, mean time to recovery, change failure rate — have been shown to correlate with both organizational performance and developer wellbeing. CocoOps adapts these metrics for the data engineering context: pipeline run frequency rather than deployment frequency; data availability lead time rather than general lead time; quality gate failures after modifications rather than change failure rate.

Demo mode exists because evidence-first analytics cannot be evaluated hypothetically. The demo dataset contains realistic failure patterns and a 30-day metric improvement progression — enough to demonstrate what CocoOps surfaces before a team commits to production integration. Experienced evaluators can tell immediately whether the insights are genuine or theatrical.

The Four-Tier Boundary Framework

Autonomy without boundaries is liability. The practical question every AI-assisted system must answer is not whether to grant autonomy but where to draw the lines — which actions can proceed without human input, which require a prompt, which require explicit confirmation, and which may never be delegated to an agent under any condition.

CocoPlus answers this question with a named, cross-cutting classification that applies to every autonomous action in every CocoPod: ALWAYS / ASK FIRST / HUMAN REQUIRED / NEVER. ALWAYS actions are safe, reversible, and routine — read operations, generating reports, appending to logs. ASK FIRST actions have consequences worth naming before proceeding — schema creation, service deployment, sending external communications. HUMAN REQUIRED actions are irreversible or high-stakes enough that no amount of agent confidence is sufficient justification for proceeding without explicit developer confirmation — schema drops, production data modifications, security policy changes. NEVER actions are categorically off-limits regardless of instruction or context.

These tiers are not soft guidance encoded in AGENTS.md prose. They are operational configuration, expressed in cocoplus.toml under the [security] section. The Safety Gate's PreToolUse interceptor reads this configuration before every tool call. If a tool call's classification is HUMAN REQUIRED and no explicit developer approval token is present in the current session context, the interceptor blocks the call. The classification is structural: it cannot be reasoned around, argued against, or overridden by clever prompting.

The framework provides a shared vocabulary for a question that every developer working with an autonomous system eventually confronts: "I know I approved this pipeline — but did I understand what it was going to do?" The answer CocoPlus gives is: "You saw the classification. You saw what required your explicit sign-off. The rest was under the ALWAYS or ASK FIRST tier — which you configured, not the agent."

A system is only as good as what it remembers and what it refuses to forget.