CocoPlus has thirty-nine features. They are not thirty-nine independent capabilities you can pick from a menu. They are thirty-nine expressions of the same underlying design — a system where each piece reinforces the others.
Some features are always active. Some are toggled on by choice. Some run invisibly in the background. Some you will use every day. Some you will reach for only when the work demands it.
Here they are, in the order they matter.
Feature 1: CocoBrew — The Lifecycle Engine
Commands: $spec, $cocoplus plan, $cocoplus build, $cocoplus test, $cocoplus review, $cocoplus ship, $rewind, $fork
CocoBrew is the spine of CocoPlus. Every other feature exists in relation to it.
It gives your work a shape: six phases, each with a specific purpose, each producing a specific artifact, each gated on the completion of the one before it. Spec before Plan. Plan approved before Build. Build complete before Review. Review cleared before Ship.
This sounds like overhead until you've experienced what happens without it. Without phase structure, AI-assisted development produces a continuous stream of outputs with no coherent narrative — no record of why a decision was made, no evidence that anything was tested, no way to trace back to the original requirement. CocoBrew makes the narrative explicit.
What each phase actually does:
$spec opens a guided dialogue. It asks: what are you building? What does success look like? What is explicitly out of scope? Which Snowflake objects are involved? The output is a spec.md file — not a long document, but a structured record of the answers to these specific questions.
$cocoplus plan reads the spec and decomposes it. This is where CocoHarvest generates flow.json — the execution pipeline that defines which stages run in what order, which personas execute them, and what each stage must produce. The plan enters Coco's native $cocoplus plan mode, and the developer reviews and approves it. Nothing in Build begins until this approval is recorded.
$cocoplus build executes the approved plan. Before proceeding, it cross-checks that spec.md and plan.md physically exist on disk — meta.json claiming a phase is complete is not sufficient. Simple plans go straight to execution. Complex multi-workstream plans spawn parallel persona subagents in isolated git worktrees, each working on their specific stage. Stage commits happen after checkpoint validation.
$cocoplus test runs validation against the spec's success criteria. SQL tests via SnowflakeSqlExecute, notebook tests via NotebookExecute, or file-existence validation depending on what was built.
$cocoplus review runs the Code Quality Advisor against all SQL, reads CocoCupper findings from the session, checks spec compliance, and produces a categorized report: must-fix, should-fix, consider. It also runs a Decision Coverage Gate — a cross-check that extracts key decisions from plan.md and discuss.md (if present) and scans the implementation artifacts to verify each decision is honored. A decision stated in the plan but absent or contradicted in the implementation appears as a Coverage Gap in review.md, treated with must-fix severity. Plans pass Review when the code does what was built, and the code does what was meant to be built.
$cocoplus ship is gated. If must-fix items exist, it refuses. It also cross-checks that spec.md, plan.md, and review.md all physically exist on disk before proceeding. The highest existing version tag is determined programmatically, filtering only well-formed v<major>.<minor>.<patch> tags. After the release commit, Ship automatically prunes stale CocoHarvest worktrees (agent/stage-*) to prevent disk bloat and naming collisions on future builds.
Rollback and exploration: $rewind [step-id] walks back to a specific stage commit. The system identifies the right commit in git history, shows you the target state, and resets after confirmation. After reset, stale CocoHarvest worktrees are automatically pruned. $fork [branch-name] creates an isolated exploration branch without touching the main thread of work.
Rollback-by-git-tag: Every CocoHarvest task completion creates a named git tag (cocoplus/harvest/[run-id]/task-N), and every Cortex function version reaching a named deployment stage is tagged (cocoplus/fn/[name]/vN). This extends $rewind with sub-phase granularity: $rewind --tag [tag-name] resolves any named tag and resets to it instantly — no git log archaeology required. A .cocoplus/scripts/rollback.sh helper is generated at $pod init for one-line rollback from the terminal. The result: recovery to any task completion boundary in any prior harvest run is a single command.
Scale-Adaptive Lifecycle Routing: Before opening the full requirements dialogue, $spec runs a deterministic classification on the developer's initial description — analyzing token count, presence of multi-step language, mentions of schema changes, evaluation requirements, and uncertainty markers — and routes to one of two tracks. Quick Flow (single-function, single-concern, clear requirements) runs directly Clarify → Implement → Verify with no plan.md or flow.json generated; it produces a single lifecycle/quick-build.md artifact. Full Flow (everything else) runs the complete Spec → Plan → Build → Test → Review → Ship lifecycle unchanged. The developer can override the classification with $spec --full (force Full Flow) or $spec --quick (force Quick Flow, with a warning if the classifier disagreed). This is distinct from CocoSpec's Quick Mode bypass: the bypass skips $cocoplus plan within Full Flow; scale-adaptive routing determines whether to enter Full Flow at all.
Spec re-entry guard: Re-entering $spec during the build, test, review, or shipped phase requires the --force flag. The skill warns that active CocoHarvest worktrees (agent/stage-*) will become stale and must be pruned manually before accepting the flag. Re-entry during plan phase prompts for confirmation only.
When to use: For any non-trivial Snowflake development task that spans more than a single session, involves multiple schemas or personas, or needs to reach production. If you're tempted to start coding before writing a spec, CocoBrew is exactly where you should start — the spec phase alone regularly surfaces assumptions that would cost hours to untangle mid-build. Use the full lifecycle for any task where "it worked in dev but broke in prod" would be a costly outcome.
Feature 2: CocoSpark — Brainstorm Mode
Commands: $cocoplus spark [topic], $cocoplus spark-off
CocoSpark activates a divergent thinking mode. It generates multiple approaches, challenges your assumptions, and raises questions you haven't thought to ask.
It is explicitly optional. Some problems benefit from structured ideation before committing to a direction. Some problems don't — the path is clear, and brainstorming just delays getting there. CocoSpark is for the first kind.
When active, CocoSpark never produces a plan, never produces a commitment. It produces a spark file — explicitly marked as exploration. The developer decides what, if anything, flows forward into the spec.
$cocoplus spark-off exits brainstorm mode. You are offered the option to capture anything into spec.md. This is a human decision, not an automatic transfer.
When to use: When the problem space is genuinely ambiguous and you're not sure which architectural direction to take. If the path is already clear, skip it — CocoSpark adds value at the edges of certainty, not at the center. It's most useful immediately before writing a spec: run CocoSpark first, discover the angles you hadn't considered, then write a spec that reflects the real complexity. Not every task needs it. The ones that do usually feel uncomfortably open-ended before you start.
Feature 3: CocoHarvest — Task Decomposition & Agent Spawner
Commands: Automatic at Plan phase; $<persona> for direct invocation
CocoHarvest is the intelligence behind parallelism. It reads an approved plan and answers a hard question: which parts of this work can run simultaneously, and which must wait for each other?
The decomposition follows a consistent mapping: schema design goes to the Data Engineer, semantic modeling to the Analytics Engineer, notebook work to the Data Scientist, and so on. Where work types mix, CocoHarvest sequences — the Data Engineer builds the schema, then the Analytics Engineer builds the semantic model on top of it.
Parallel stages run in isolated git worktrees. This isolation is not a convenience — it is a correctness guarantee. One agent's incomplete work cannot contaminate another agent's context.
The dependency model is encoded in flow.json. A stage marked as dependent on another does not start until the prerequisite's checkpoint files are validated. If you try to resume a paused pipeline after deleting a checkpoint file, the system refuses and tells you exactly which checkpoint is missing and why.
For simple plans — single workstream, single persona — CocoHarvest delegates directly without creating a formal pipeline. The overhead matches the complexity.
Enhancement G — Adaptive Stage Parallelism: CocoHarvest's execution engine tracks pipeline health and adjusts concurrency dynamically. It starts in Normal mode — all dependency-resolved stages run simultaneously. If a stage fails, it drops to Caution mode — maximum two concurrent stages, with at least one completing successfully before the next pair spawns. If more than half of completed stages have failed, it enters Single-track mode — fully sequential, one stage at a time until the developer intervenes. Mode transitions are logged in flow.json and visible in $flow status. Mode never auto-upgrades; the developer resets it with $flow run --concurrency normal after fixing a root cause. This prevents the "doomed execution" pattern — a pipeline that keeps spending credits on downstream stages whose inputs were already invalidated by an upstream failure.
Stall Detection: Beyond failure signals, CocoHarvest detects a subtler problem: an agent that is running, not erroring, but producing no meaningful work. It tracks output token count per model step for each active subagent. If the rate falls below a configurable threshold (default: 150 tokens per step) for more than a minimum step count (default: 5 steps), the agent is flagged as stalled. The orchestrator issues a recovery re-prompt first; escalation to a failure signal only follows if the agent remains unproductive. This prevents a reasoning loop from burning runtime indefinitely while appearing healthy to the pipeline.
Shell Identity Injection: Every subagent CocoHarvest spawns receives a set of environment variables: COCOPLUS_PERSONA, COCOPLUS_FUNCTION, COCOPLUS_EVAL_ID, COCOPLUS_HARVEST_ID. Any shell command or Snowflake SQL call the agent makes inherits these automatically, creating a natural audit trail without instrumenting each tool call individually. CocoCupper's post-analysis and CocoDream's distillation can trace any SQL execution back to its originating agent and session.
Consecutive Failure Escalation: If a subagent produces three consecutive hard failures — actual error responses, not productivity stalls — CocoHarvest halts execution and escalates to the developer rather than retrying indefinitely. The counter resets on any successful completion; only an unbroken streak of three or more errors triggers escalation. At escalation: current failure, the sequence of prior attempts, the last successful state, and recommended recovery options are surfaced. The threshold is configurable per stage via "maxConsecutiveFailures": N in flow.json (default: 3).
Dual-File State for Reliable Resumption: For long pipeline runs that span multiple sessions or context resets, CocoHarvest maintains two complementary state files: an append-only progress log (harvest/[run-id]-progress.txt) that records every start, completion, failure, checkpoint, and recovery event — never truncated, never corrupted — and a structured task state file (harvest/[run-id]-tasks.json) written atomically (write to .tmp, rename into place). A context reset is fully recoverable: the task state file gives current position; the progress log provides the complete audit trail.
Score Trajectory Monitoring: At every stage completion, the SubagentStop hook compares the current stage's aggregate quality signal against the prior stage. A declining trajectory — a later stage producing a materially worse quality signal than the one before it — means the pipeline is regressing rather than converging. When decline exceeds the configured threshold (default: a 5% drop, configurable via "scoreDeclineAlert" in flow.json), the pipeline pauses before spawning the next stage, surfaces the trajectory data, and requires explicit developer acknowledgment to continue. This fires on a different condition than stall detection (low output volume) and consecutive failure escalation (error streaks) — it fires when output is being produced but its quality is moving in the wrong direction.
When to use: CocoHarvest is automatically invoked at the Plan phase whenever the task has multiple distinct workstream types — you don't choose it, it chooses itself. Use direct persona invocation ($de, $ae, etc.) when you want to consult a specific specialist without running the full lifecycle — a quick performance review, a targeted schema question, or a governance check mid-session. The --continue flag is particularly useful here: $de --continue Fix the issue you identified hands the specialist the current context without restarting from scratch.
Feature 4: Personas — Eight Specialist Agents
Commands: $de, $ae, $ds, $da, $bi, $dpm, $dst, $cdo; $personas to list
The persona system is where CocoPlus takes its sharpest position: specialists beat generalists, always, in structured development work.
Eight agents, each with a domain, a model, a locked tool set, and an invocation mode:
Data Engineer ($de) — Schema design, stored procedures, performance optimization, pipeline development. Uses Sonnet. Can run SQL and bash. This is the agent you want when something is slow, broken, or needs to be built correctly the first time.
Analytics Engineer ($ae) — Semantic models, transformation layers, metric definitions, Cortex Analyst integration. Uses Sonnet. The bridge between raw Snowflake tables and business-readable data products.
Data Scientist ($ds) — Notebooks, ML pipelines, Cortex ML functions, feature engineering. Uses Sonnet. Comfortable with experimentation and iteration.
Data Analyst ($da) — Query writing, data exploration, ad-hoc analysis. Defaults to Haiku; escalates to Sonnet for complex work. Fast, answers-oriented.
BI Analyst ($bi) — Dashboard layer, semantic model refinement, presentation outputs. Defaults to Haiku; escalates to Sonnet when needed.
Data Product Manager ($dpm) — Requirements documentation, stakeholder communication, scope definition. Uses Sonnet. Always in plan mode — advises, never executes.
Data Steward ($dst) — Governance review, data quality, access policy, compliance checks. Uses Sonnet. Always in plan mode. Does not run arbitrary SQL autonomously.
Chief Data Officer ($cdo) — Strategic architecture, cross-system trade-off analysis, data estate decisions. Uses Opus. Always in plan mode. The most expensive persona is reserved for decisions that warrant the cost.
You invoke a persona directly with the $ shorthand. You can pass --model to override the model for a specific call. You can pass --continue to give the persona the current session context for continuation work. The tool set and invocation mode cannot be overridden — these define what the persona is.
When to use: Whenever the work has a clear domain owner. Don't default to the general Coco session for tasks that belong to a specialist — the specialist has a locked tool set and domain instructions that produce qualitatively better output. Use $cdo when reasoning about the full data estate, not just a single pipeline. Use $dst before any change that touches access policies, sensitive data, or column-level security. Use $de when SQL performance or schema correctness is the primary concern — this is not a task for a generalist.
Feature 5: CocoPod — Project Bundle
Commands: $pod init, $pod status, $pod resume, $pod kb, $pod checkpoint
CocoPod is initialization. Before any CocoPlus feature runs, the project must have a CocoPod.
$pod init creates .cocoplus/ with the complete directory structure, writes the initial AGENTS.md, sets default modes (safety on normal, memory on), and creates the initial git commit. This is the moment a directory becomes a CocoPlus project.
$pod status is the dashboard. It reads the current state and reports: current phase, active modes, pipeline status if a CocoFlow is running, last CocoMeter session summary, and the three most recent CocoCupper findings. It also runs a self-heal check: it compares the flag files in .cocoplus/modes/ against the ## Active Modes section in AGENTS.md. If they have diverged — a flag file exists that AGENTS.md doesn't reflect, or vice versa — a warning is shown and the developer is offered a one-step sync. This prevents silent drift between runtime state and the ambient context every session loads. It is a health check and a status report simultaneously.
$pod resume is for returning developers. It reads the state and produces a focused brief: where you were when you left, what was completed, what is pending, what the immediate next action is. It doesn't summarize from AI memory — it reads from files. The accuracy is deterministic.
$pod kb displays lifecycle/kb.md — the project-specific, session-spanning knowledge base. This is distinct from CocoGrove (team-level generalized patterns) and CocoDream (prompt optimization lessons): the KB captures patterns, decisions, and gotchas specific to this project that should persist across sessions and inform every future build. CocoCupper populates it automatically after each completed stage. CocoScout loads it for every Build task. The KB is small, curated by CocoCupper, and always available without token overhead.
$pod checkpoint writes a structured recovery snapshot to lifecycle/checkpoint.md: current phase, in-progress stage, last five decisions, pending items, and active CocoHarvest harvest status. This command exists for one purpose: before running $clear to reset an overloaded context window, checkpoint first so $pod resume has precise recovery context. A context reset with a checkpoint is a restart. A context reset without one is a rebuild from scratch.
Constitutional Document: $pod init now generates lifecycle/cocoplus-context.md — the project's operating constitution. Unlike lifecycle artifacts that record decisions made, or kb.md that records patterns discovered, cocoplus-context.md records the unchanging parameters of this project's environment: Snowflake stack, architectural constraints, naming conventions, cost thresholds, and security requirements. Every agent reads this document first on invocation, before any task-specific context. CocoScout always includes it regardless of relevance scoring. CocoSpec validates that the spec does not require violating any constraint in it. $pod init --refresh-context regenerates it through a guided dialogue when the project's Snowflake configuration changes; the prior version is committed to git before replacement.
Configuration as single source of truth: All operator-configurable plugin settings — security rules, cost thresholds, warehouse access, demo mode flags, sprint start dates — live in a single cocoplus.toml file at the project root. This replaces the prior scattered pattern of safety-config.json and individual agent frontmatter files containing settings. $cocoplus sync reads cocoplus.toml and propagates its values into all downstream artifacts: agent frontmatter, hook rule tables, and environment settings. The sync commits all generated files as a single change, making configuration intent visible in git history as a first-class act. If a generated file is edited manually and $cocoplus sync is run, the manual edit is overwritten and a warning surfaces — the toml file is the source of truth, not its outputs. Projects migrating from a prior plugin version run $cocoplus migrate-config to convert an existing safety-config.json into the new format automatically.
When to use: Every time you start a new Snowflake data project with CocoPlus — run $pod init before anything else. Run $pod status at the start of any session to orient yourself before diving into work. Run $pod resume when returning to a project after more than a day away. Run $pod checkpoint any time CocoHealth warns about high context utilization — checkpoint, clear, resume. Run $pod kb when you want to see what the project has learned about itself. Run $cocoplus sync any time you change cocoplus.toml to ensure all generated artifacts reflect the current configuration intent.
Feature 6: Project Execution Engine
Commands: $flow run, $flow status, $flow pause, $flow resume
The Project Execution Engine reads and executes CocoFlow pipelines. Where CocoHarvest generates the plan, the Execution Engine runs it.
$flow run reads flow.json, validates that setup prerequisites are met, spawns stage subagents in order, and monitors completion via the SubagentStop hook. Stage status is updated in real time — $flow status at any point during execution reflects the actual current state.
$flow pause halts execution after the current stage completes. Running stages are never killed mid-execution. The pipeline enters a paused state. $flow resume restarts from the next pending stage after validating that all prior checkpoints are intact.
Runtime model overrides are supported: --model on $flow run applies a runtime default across all stages. Neither form of override modifies the flow.json file on disk.
Enhancement E — Context Isolation per Stage: Stages can now declare "isolated": true in their flow.json definition. An isolated stage receives only checkpoint-validated deliverables from prerequisite stages — no accumulated session state, no intermediate model outputs, no side-effects from a partially-failed predecessor. What is always included: the project's AGENTS.md, the stage's own prompt file, and CocoScout-selected context. This is designed for deployment stages and post-evaluation decision stages, where contaminated context from a partial failure would cause incorrect behavior. Most build stages don't need it — it is a deliberate developer choice for specific high-stakes transitions, not a default.
Adaptive Checkpoint Typing: Not every stage boundary deserves the same weight. Adaptive checkpoint typing classifies each stage boundary by its decision stakes and delivers a status report proportionate to those stakes. Three types exist in flow.json's "checkpoint_type" field: MANDATORY — cannot be skipped, requires substantive developer input, used for EHRB confirmations, BLOCKING SecondEye findings, and deployment decisions; FULL — complete deliverables list plus metrics plus decision dashboard, used for stage completions and first checkpoints in a session; SLIM — one-line status plus a continue/pause prompt, used for routine progress on non-critical stages. An Awareness Guard activates after four consecutive SLIM responses from the developer without any modification or comment: the next checkpoint is automatically promoted to FULL regardless of its configured type. This is a structural guard against implicit autopilot — the developer is effectively notified: "Four stages completed without detailed engagement. Here is a comprehensive status update before we continue." The guard counter resets on any substantive response.
Dual synthesis path: When a CocoFlow stage requires a synthesis step and LLM access is temporarily unavailable — network constraint, rate limit, provider configuration — the pipeline does not halt. A deterministic rule-based fallback synthesizes the stage output using pattern-matching rules defined in cocoplus.toml under [review.rules]: SQL schema reviews, anti-pattern checks, and structured quality assessments can all run without LLM involvement. The fallback output is clearly marked as rule-generated rather than AI-synthesized, so the developer knows the quality ceiling. The pipeline finishes with reduced output richness rather than not finishing at all. This is a graceful degradation, not a substitute for full synthesis — but a partially-executed state requiring manual recovery is a much worse outcome than a fully-executed state with one rule-based stage.
When to use: When a plan has been generated by CocoHarvest and needs to be executed as a formal pipeline. For simple, single-persona tasks, the Build phase handles execution directly — you won't need $flow run explicitly. Use it when you want precise control: running only from a specific stage after a failure, pausing mid-pipeline to inspect outputs before proceeding, or applying a runtime model override to a specific stage without changing the stored plan. $flow status is useful any time you want a clear picture of where a running pipeline stands.
Feature 7: Memory Engine
Commands: $memory on, $memory off
The Memory Engine is the answer to session amnesia.
It maintains three layers of project knowledge: AGENTS.md (hot, always loaded), the memory files in .cocoplus/memory/ (warm, consulted by skills), and CocoGrove (cold, curated pattern library). Each layer serves a different time horizon and a different fidelity of detail.
When memory is on, the PostToolUse hook captures significant events throughout the session: schema changes, decisions stated explicitly, errors and their resolutions. These captures are brief — they record the fact and the reason, not the full content. The full content is in the files themselves.
The 200-line limit on AGENTS.md is enforced by the memory update logic. When adding a new entry would exceed the limit, older entries are compressed and moved to the warm layer. Hot memory stays current. Warm memory retains history.
You can turn memory off if you want a clean session with no persistence. The system continues to function — it just forgets when the session ends.
When to use: Leave it on by default — this is the right posture for any real project work. The overhead is minimal; the compound benefit across sessions is significant. Turn it off only for explicitly throwaway sessions: prototype experiments, hypothetical walkthroughs, or one-off queries against an environment you don't intend to revisit. If you're unsure, leave it on. A session with memory that you didn't need leaves behind a few captured decisions. A session without memory that you did need leaves behind nothing.
Feature 8: Environment Inspector
Commands: $cocoplus inspect, $inspector on, $inspector off
Before you build, look at what's there.
The Environment Inspector scans the connected Snowflake environment and produces a structured snapshot: schemas, tables, views, stored procedures, functions, Cortex endpoints, semantic models, access grants, resource monitors. It compares against the last snapshot to surface what's new or changed.
When auto-mode is active, the inspector runs as a background subagent at every session start. It doesn't block the session — you can start working while the scan runs. Results are available within the session once the background process completes.
The output lives in .cocoplus/snapshots/. Any skill that needs to reason about the current environment reads from the most recent snapshot rather than re-querying Snowflake. Inspection results are cached and reusable.
The principle here is simple: stale context produces wrong outputs. Inspection is cheap. Debugging a wrong output is not.
When to use: Before any Build phase that writes to or reads from Snowflake schemas. Especially valuable when returning to a project after a break — schemas may have changed in the interim. Essential when joining a shared environment for the first time, when you can't assume objects you expect actually exist. Run $cocoplus inspect --full when you need column-level statistics or access policy details before writing sensitive SQL. If auto-mode is on, you get this automatically at session start without thinking about it.
Feature 9: Safety Gate
Commands: $safety strict, $safety normal, $safety off
Two layers of protection for the operations you cannot undo.
The phase gate is the outermost layer. Regardless of safety mode, SnowflakeSqlExecute is unconditionally blocked during the spec and plan lifecycle phases. SQL execution is only permitted from the build phase onward. This prevents accidental schema modifications during the design phases of the lifecycle and cannot be overridden by $safety off.
The hard gate is a PreToolUse hook interceptor that applies from the build phase onward. It fires before every SnowflakeSqlExecute call. In strict mode, SQL containing dangerous patterns — DROP TABLE, DROP SCHEMA, TRUNCATE, DELETE without WHERE, ALTER on production objects — is blocked entirely. The tool call does not execute. The developer receives an explanation and the blocked SQL for review.
In normal mode, the same patterns produce a warning but allow execution to proceed. In both modes, the interception is technical, not advisory. It happens before the language model's output reaches Snowflake.
The soft gate is a skill-level check that fires before any batch destructive operation. It summarizes the operation's impact and requires explicit developer confirmation. This gate can be acknowledged and bypassed — but it cannot be silent. No batch operation that modifies production objects proceeds without the developer seeing what is about to happen.
EHRB Formal Risk Classification: The PreToolUse handler also runs an EHRB (Elevated-Hazard Requiring Buy-in) scan on every SnowflakeSqlExecute call, classifying operations into five risk categories that require explicit developer confirmation regardless of safety mode: Production systems (DDL against schemas matching production patterns), Sensitive/PII data (operations on PII-adjacent columns), Destructive operations (DROP/TRUNCATE/DELETE without WHERE), Billing-significant operations (metered Cortex functions applied to large tables — configurable threshold, default 100K rows), and Security-critical operations (role, grant, and credential management). An EHRB-detected operation suspends execution and presents a named confirmation prompt before proceeding. EHRB triggers are logged to safety-audit.jsonl.
Prompt Injection Defense: Planning artifacts — flow.json, spec.md, plan.md, discuss.md — flow directly into agent prompts during execution. The PreToolUse handler runs a structural anomaly scan before any planning artifact is used as prompt input, detecting instruction-type syntax patterns in content that should be declarative document content. Flagged artifacts surface a warning requiring developer confirmation before the artifact is used. The scan is structural (not a phrase blocklist), runs in under 50ms, and logs anomalies to safety-audit.jsonl.
Configuration lives in cocoplus.toml under the [security] section. Protected schema patterns default to strings containing "PROD", "PRODUCTION", or "LIVE". You can extend this list for your environment — edit cocoplus.toml and run $cocoplus sync to propagate the change.
When to use: Always on — this is the default posture and there is almost never a reason to deviate from it. Use $safety strict when working in or near production schemas, when a single dropped object would affect live data or real users. Use $safety normal for active development environments where you need destructive operations occasionally but still want warnings. Reserve $safety off only for isolated test environments where you're deliberately exercising destructive patterns and fully understand the consequences. If you're in doubt about the environment, default to strict.
Feature 10: Code Quality Advisor
Commands: $quality on, $quality off, $quality run
The Code Quality Advisor reviews generated SQL and code against a library of Snowflake-specific anti-patterns.
It runs automatically during the Review phase. You can also trigger it manually with $quality run at any point. When quality mode is active, it runs as a monitor on every CocoFlow stage — findings accumulate in the review output rather than interrupting execution.
The findings are categorized: patterns that will cause problems (flag), patterns that might cause problems under load (warn), patterns that could be improved (suggest). The Review phase's must-fix gate is informed by these findings.
Quality Advisor is not a linter. It understands Snowflake-specific execution context: query patterns that cause full table scans in Snowflake's specific architecture, anti-patterns in Cortex AI function usage, common semantic model design mistakes that produce incorrect query results. This is domain knowledge, not generic code style.
When to use: During the Review phase it runs automatically — you don't need to think about it. Turn $quality on proactively when you're in an active Build phase generating complex SQL or Cortex AI functions, so findings surface continuously rather than only at review time. Run $quality run on-demand after generating stored procedures or batch SQL that will touch large tables. Catching must-fix items before the Ship gate is far cheaper than catching them after a production incident.
Feature 11: Prompt Studio
Commands: $prompt new, $prompt compare
Most developers iterate on prompts ad-hoc and throw away the intermediate versions. Prompt Studio structures this process.
$prompt new opens a guided workflow for designing a prompt: what is the goal, what model is this targeting, what examples illustrate the desired output, what failure modes should be avoided. The output is a versioned prompt file in .cocoplus/prompts/.
$prompt compare runs two prompt versions against the same test inputs and surfaces the differences in output. This makes prompt iteration empirical rather than impressionistic.
Prompts designed in Prompt Studio feed directly into CocoFlow stage definitions. A stage's prompt file is a first-class artifact — versioned, testable, and replaceable without changing the stage definition.
Enhancement B — Dual-Condition Exit Gate + Circuit Breaker: Prompt optimization loops are now governed by two independent mechanisms. The exit gate requires both conditions to be true simultaneously before declaring success: (1) the accuracy threshold the developer specified is met, and (2) the optimizing agent emits an explicit OPTIMIZATION_COMPLETE signal indicating it judges further iteration unlikely to help. Neither alone is sufficient — hitting the threshold without agent confidence continues iterating; agent confidence without hitting the threshold is overridden. The circuit breaker is the safety valve: if five or more consecutive iterations produce less than 0.5% accuracy improvement, the loop halts and presents a structured decision panel — accept the current best, revise the target, inspect the evaluation data quality, or change the underlying model. Iteration state persists across session boundaries, so a session that ends mid-optimization resumes the counter, not resets it.
When to use: Whenever you're building or refining Snowflake Cortex AI functions where output quality depends on how the prompt is structured — AI_COMPLETE classifiers, AI_EXTRACT schemas, AI_CLASSIFY category definitions. Use $prompt compare when you have two candidate approaches and need evidence of which performs better on your actual data before committing to production. If you're iterating on a prompt more than twice, Prompt Studio will save you time — ad-hoc iteration has no memory and no record.
Feature 12: CocoGrove — Pattern Library
Commands: $patterns view, $patterns promote, $grove glossary, $grove glossary view
CocoGrove is the institutional memory of your project, expressed as a library of reusable patterns.
Patterns enter CocoGrove through promotion: a developer reviews CocoCupper findings and promotes the ones worth keeping. This human filter is intentional. Not everything CocoCupper identifies is worth preserving forever. The developer decides what is durable.
Once in CocoGrove, a pattern is structured: a name, a description, the conditions under which it applies, the conditions under which it doesn't, and an example. Skills and subagents can read CocoGrove to find relevant patterns before generating output.
$patterns view shows the current library, optionally filtered by category or tag. $patterns promote takes a CocoCupper finding and converts it into a structured pattern entry.
The compound benefit: by session twenty, CocoGrove contains your project's hard-won knowledge. The Data Engineer doesn't reinvent your schema naming convention — it reads it from CocoGrove. The quality advisor checks new SQL against patterns you've marked as important for this specific project.
Ubiquitous Language Section: CocoGrove gains a dedicated vocabulary layer at .cocoplus/grove/language/glossary.md. Projects accumulate domain terms — "gold standard dataset," "churn propensity score," "evaluation baseline" — that mean specific things in context. When different personas or different sessions use different names for the same concept, decisions made in one context cannot be reconciled in another. The glossary maps each domain term to a precise definition, its aliases, and the feature or function that implements it. $grove glossary proposes additions by scanning project artifacts for domain vocabulary; the developer reviews and confirms. $grove glossary view displays the current glossary. CocoScout loads glossary terms matching the current task, ensuring every agent uses the vocabulary the project has agreed on.
When to use: After completing significant work — review what CocoCupper surfaced and promote anything durable before closing the session. Before starting new work in a domain you've built in before — run $patterns view to check if relevant patterns already exist. Build the glossary when you notice consistent terminology in your spec and plan files that should be locked — terms that, if different agents interpret differently, would produce inconsistent output. Think of CocoGrove as your project's living style guide: the patterns capture how to build; the glossary captures what everything is called. A CocoGrove that grows with the project is one of the highest-leverage investments CocoPlus enables.
Feature 13: Doc Engine
Commands: $doc run
Good documentation is produced during development, not after it. The Doc Engine generates documentation from the current state of the project at any point in the lifecycle.
$doc run reads the spec, the lifecycle artifacts, the key decisions from memory, and the current code artifacts. It produces structured documentation: API documentation for procedures and functions, schema documentation for tables and views, decision rationale from the memory layer, and a changelog derived from git history.
The output goes into a configurable documentation directory. The format follows project conventions documented in the spec.
Documentation generated by the Doc Engine is honest about what is automated: it will not fabricate rationale for decisions that weren't recorded. Missing memory entries produce missing documentation sections, which is the correct behavior. Gaps in documentation reflect gaps in recorded decisions.
When to use: Before shipping — run $doc run as part of Ship phase preparation to ensure documentation reflects the current state of the build. When onboarding a new team member to an existing data product, to give them a navigable entry point into the project's decisions and structure. After a major schema change where existing documentation is stale. The Doc Engine is most valuable when the Memory Engine has been running throughout the project — the richer the recorded decisions, the richer the generated documentation.
Feature 14: Context Mode
Commands: $context on, $context off
Context Mode activates narration. When on, the system surfaces its reasoning before acting — what it is about to do, why it is choosing this approach, what alternatives it considered.
This is useful in a few situations: when you're learning a new domain and want to understand the agent's reasoning, when you're debugging unexpected behavior and need visibility into decisions, or when you're working with a junior team member who is reviewing AI output as a learning exercise.
Context Mode is soft guidance made audible. It does not change what the system does — it changes what you see about what the system is doing.
Turn it off when you don't need the narration. It adds context window overhead. The information is valuable when you need it and noise when you don't.
When to use: When you're new to CocoPlus and want to understand what the system is doing and why before trusting it. When debugging a pipeline that is behaving unexpectedly — narration gives you visibility into reasoning before actions execute. When walking a colleague through a session to explain the workflow. Turn it off once you're fluent with the system; the narration becomes overhead rather than signal when the behavior is familiar.
Feature 15: CocoMeter — Token Tracker
Commands: $meter, $meter on, $meter off, $meter estimate, $meter history
CocoMeter makes token usage visible and predictable.
It tracks token consumption per session, per stage, and per persona. $meter shows the current session's usage. $meter history shows usage across previous sessions. $meter estimate runs a pre-flight estimate for the current plan or stage before execution begins.
The estimate is useful before running an expensive pipeline. It gives you a cost envelope before committing. If the estimate is higher than expected, you can optimize the plan — adjust model assignments, reduce stage scope, simplify prompts — before spending the tokens.
CocoMeter data feeds back into model selection decisions. If you notice that the Data Analyst persona is consistently using Sonnet-tier tokens for tasks that could run on Haiku, that's a signal the task prompts are more complex than they need to be.
Token visibility is not about cost alone — it is about system health. A pipeline whose token usage grows session-over-session is a pipeline accumulating unnecessary context. CocoMeter makes this visible before it becomes a problem.
Accuracy Learning Feedback Loop: Static heuristics are calibrated to typical pipelines, not this team's specific workflows. CocoMeter closes the calibration gap by comparing estimated vs. actual token consumption after every session and using the difference to improve future estimates. After each session completes, it computes the ratio of actual to estimated tokens, appends a structured record to meter/accuracy-history.jsonl, and stores the rolling median of the last ten session ratios as an adjustment factor in meter/adjustment-factor.json. Subsequent pre-flight estimates multiply the raw heuristic by this stored factor, so the developer sees: "Estimated 45,000 tokens (baseline: 30,000, calibration factor: 1.5x from 8 prior sessions)." The sample size is always surfaced alongside the factor — developers know whether the calibration is grounded in one session or ten. The $meter accuracy command displays the full estimation history, the current factor, and its trend (stable, rising, or falling over the last five sessions).
When to use: Before running any large or complex pipeline — use $meter estimate to get a cost envelope before committing. After any session that felt unexpectedly expensive — use $meter history to identify which features and stages drove the cost. In enterprise environments with token budgets, keep $meter on continuously. The estimate feature is most valuable immediately before the Build phase on plans with many parallel stages or Opus-level persona assignments.
Feature 16: CocoCupper — Post-Execution Intelligence
Commands: $cup, $cup history
CocoCupper is the session debrief, automated.
After every session — triggered by the Stop and SubagentStop hooks — CocoCupper reads what just happened. The SQL that ran, the files that changed, the decisions that were recorded, the errors that were encountered. It identifies patterns: approaches that worked, approaches that failed consistently, things that were reinvented unnecessarily.
Its findings go to .cocoplus/grove/cupper-findings.md. The developer reviews these findings and promotes the valuable ones to CocoGrove.
CocoCupper runs on Haiku. This is deliberate: pattern recognition over a structured log is not a task that requires reasoning depth. Speed and cost efficiency matter here. CocoCupper is always running; making it expensive would make it prohibitive.
$cup triggers CocoCupper manually at any point in a session. $cup history shows findings across previous sessions, allowing you to identify patterns that span multiple sessions rather than just one.
CocoCupper cannot modify code, cannot write to any path outside .cocoplus/grove/, and cannot take action. It observes and reports. This constraint is architectural — a background analyst that can take action is a background agent with the same failure modes as a foreground agent, without any of the visibility.
When to use: It runs automatically — you don't need to invoke it for the background behavior. Run $cup manually after any session where you resolved a difficult problem, discovered a performance issue, or established a pattern you expect to reuse. Run $cup history periodically — every five to ten sessions — to look for cross-session patterns that wouldn't be visible session by session. It's particularly valuable before a long break from a project: capture what was learned before context fades.
Feature 17: Assist Mode — The Master Toggle
Commands: $cocoplus on, $cocoplus off
Assist Mode is a single command that activates all mode-based features simultaneously: memory, inspector, safety (normal), quality, context, and meter.
This is the recommended starting state for a new CocoPlus project. Rather than toggling six features individually, $cocoplus on establishes the full system in a single action. $cocoplus off returns to Coco's baseline behavior, deactivating all CocoPlus overlays.
The toggle is intercepted by the UserPromptSubmit hook and takes effect immediately in the current session. AGENTS.md is updated to reflect the new state.
When to use: At the start of every new project and at the start of any session where you want the full CocoPlus system active. It replaces six individual feature toggles with a single command. Use $cocoplus off when you're doing something lightweight and don't want CocoPlus overhead — a quick exploratory query, a throwaway experiment, a one-off question about the environment. If you're doing real project work, $cocoplus on is the right starting point.
Feature 18: CocoFleet — Multi-Process Orchestration
Commands: $fleet init, $fleet run, $fleet status, $fleet stop, $fleet logs
CocoFleet is for when the work is genuinely too large for a single Coco session.
It spawns independent Coco CLI processes at the operating system level — separate processes with separate context windows, coordinated through shared file state. This is different from CocoHarvest's parallelism: CocoHarvest spawns subagents within a single session; CocoFleet spawns entirely separate CLI sessions.
$fleet init defines the fleet configuration: which processes to run, their dependencies, their working directories. $fleet run spawns the processes and begins monitoring. $fleet status reads shared file state and reports process statuses, completion percentages, and dependency resolution state. $fleet stop sends termination signals. $fleet logs surfaces per-process output.
Fleet coordination is file-based. No message queues, no service buses. Each process writes its state to shared files; the coordinator reads those files. This keeps the coordination model simple, debuggable, and consistent with CocoPlus's broader commitment to legible file-based state.
When to use: When you've genuinely hit the practical context budget of a single Coco session and still have work to coordinate across multiple independent workstreams. Large-scale data migrations. Multi-schema rebuilds spanning days of parallel work. Development streams that must eventually converge on a shared result. If CocoHarvest and CocoFlow can handle your parallelism within a single session, prefer them — CocoFleet adds OS-level coordination complexity that is only justified by genuine scale requirements that can't be met any other way.
Feature 19: SecondEye — Multi-Model Plan Critic
Commands: $secondeye, $secondeye --artifact <target>, $secondeye --model <model>, $secondeye acknowledge
A plan reviewed by the same model that wrote it is a plan reviewed by a single perspective. SecondEye fixes that by running three Claude model tiers in parallel — each with a genuinely different evaluative mandate — and aggregating their findings into a single structured critique.
The underlying insight is architectural: within Claude's model family, each tier reasons differently. Haiku, cost-constrained and fast, naturally challenges over-engineering. Sonnet, balanced, challenges logical completeness and missing edge cases. Opus, at strategic depth, challenges architectural risk and unconsidered alternatives. These are not the same critique at different quality levels — they are different critiques. Aggregating them produces a breadth no single model can match.
How the three lenses work:
- Haiku — Efficiency Lens: Is this plan over-specified? Are there steps that add complexity without proportionate value? What is the estimated token cost of executing this plan, and is it proportionate to the stated goal? Would a simpler approach achieve the same outcome?
- Sonnet — Completeness Lens: What assumptions does this plan make that are not validated? Which edge cases are unaddressed? Are the success criteria in the spec fully covered by the plan's stages? Are there implicit dependencies that are not modeled?
- Opus — Risk Lens: What are the highest-consequence failure modes of this plan? Are there architectural decisions that will constrain future flexibility? What alternatives were not considered? What would have to be true for this plan to fail despite being executed correctly?
How it works: Three SecondEye Critic subagents spawn simultaneously — one at each model tier. All three read the same target artifact. Each writes findings to a temporary staging directory. When all three complete, the main SecondEye skill aggregates the findings, deduplicates those agreed upon by two or more critics (marked [Consensus]), classifies everything as Critical, Advisory, or Observation, and produces a single report. If any Critical findings exist, a soft gate activates on the Build phase — the developer must either revise the plan or run $secondeye acknowledge to accept the risk and proceed.
SecondEye is always on-demand. It never activates automatically and is not a required CocoBrew lifecycle gate. Critic agents are read-only: they cannot write to any lifecycle artifact, cannot modify the plan, and cannot take action beyond their staging directory output.
HITL/AFK action classification: Each finding in the SecondEye aggregated report is tagged as HITL (Human In The Loop) or AFK (Away From Keyboard). HITL findings require the developer to actively decide something before resolution — they default to Critical findings, findings about evaluation methodology, architectural decisions, scope changes, and anything where the resolution is genuinely a choice between options with different trade-offs. AFK findings can be resolved autonomously in the next $cocoplus build pass — they default to Warning and Info findings that map to known fix patterns. The $secondeye acknowledge command is classification-aware: --hitl-only acknowledges only HITL findings (required before $cocoplus build can proceed) and separately queues AFK findings for automatic resolution. The aggregated report's header YAML gains an action_summary field with hitl_count and afk_count alongside the existing severity counts, letting the developer understand the required decision burden before reading the full report.
BLOCKING vs. MINOR classification: SecondEye findings carry two orthogonal dimensions: HITL/AFK (does this stage require human approval?) and BLOCKING/MINOR (does this specific finding require human resolution?). A BLOCKING finding must be addressed by the developer — it touches correctness, security, architectural conflict, or genuine ambiguity about intent. A MINOR finding can be auto-resolved without developer input — it involves style, naming, missing documentation, or a non-critical coverage gap that maps to a known fix pattern. The four combinations define the intervention gradient: BLOCKING+HITL requires maximum developer involvement; MINOR+AFK is fully autonomous. The action_summary field extends to include blocking_count and minor_count. A developer sees at a glance: "3 BLOCKING, 2 HITL, 1 requiring direct attention." $secondeye acknowledge --blocking-only acknowledges BLOCKING-classified findings while leaving MINOR findings queued for autonomous resolution.
Devil's Advocate lens (fourth critic): Three balanced critics produce findings proportionate to their overall assessment of a plan. A plan that seems plausible will produce findings calibrated to that plausibility — but no balanced reviewer is specifically tasked with constructing the strongest possible case against proceeding. The Devil's Advocate fills this gap. It spawns as a fourth parallel critic at Sonnet tier with an explicitly adversarial mandate: find the single most damaging assumption, the most likely architectural failure, or the most serious underspecified requirement. Concession from the DA requires a rebuttal score of ≥4 on a 1–5 scale — a score of 3 or below means the DA re-asserts the concern and explains why the response was insufficient. All DA findings are BLOCKING by default. The aggregated report identifies the DA's section with a [Devil's Advocate] tag, sorts DA findings to the top, and adds a da_finding_count sub-field to action_summary.
Edge Case Hunter (fifth critic): The four-critic architecture covers plan validity, scope completeness, risk exposure, and adversarial premise challenge. None of those critics is specifically tasked with the evaluation methodology itself — whether the way the plan will be evaluated can actually detect the failure modes that matter. A plan can be sound while its evaluation is blind to the production distribution shift it will eventually face, or the P99 latency tail that only shows up under real load. The Edge Case Hunter spawns as a fifth critic at Haiku tier with a narrow and orthogonal mandate: not "is the plan sound?" but "what would the evaluation miss if the plan were executed exactly as written?" Its findings are Advisory by default, escalated to BLOCKING only when the identified blind spot is fundamental to evaluation validity — when the primary success criterion cannot be confirmed by the evaluation as designed. Edge Case Hunter findings appear in a separate [Edge Case Hunter] section in the report, and edge_case_count is added to action_summary alongside da_finding_count.
Per-finding six-severity labels: The three-outcome verdict (Critical / Advisory / Observation) governs the overall gate decision. A finer-grained label layer now applies to individual findings within that structure: blocking (must be resolved before proceeding), important (should be resolved; context-dependent), nit (minor style or preference concern), suggestion (optional improvement), learning (educational context without improvement intent), and praise (explicitly highlight a well-constructed pattern — mandatory, not optional). This vocabulary is shared with CocoReview, creating a consistent severity language across all review gates: what learning means in a CocoReview finding is the same as what it means in a SecondEye finding. The per-finding labels are additive to the existing HITL/AFK and BLOCKING/MINOR dimensions — they describe the nature of the finding, while HITL/AFK describes whether human resolution is required and BLOCKING/MINOR describes whether it must be resolved manually or can be auto-addressed.
The label on each finding also determines how it rolls up into the top-level verdict: if any finding across any critic carries a blocking label, the aggregate report delivers a BLOCKING verdict — regardless of how the remaining findings are classified. If no blocking findings are present but at least one finding carries an important label, the verdict is CONCERNS. A report where every finding across all critics carries only nit, suggestion, learning, or praise labels produces an APPROVE verdict. This derivation is deterministic: it does not depend on model interpretation. The label set is the verdict input; the verdict is its mechanical consequence.
When to use: Before moving from Plan to Build on any work that involves irreversible operations, significant compute cost, or architectural decisions with long-term implications. Invoke it when you want an independent critical perspective that goes beyond what the model that generated the plan can offer — particularly for plans that are complex, have unclear trade-offs, or where the consequences of a wrong turn are expensive to undo. It is not necessary for every build — simple, well-understood tasks don't need it. But for plans where "what could go wrong?" is a question worth asking carefully, SecondEye is where you ask it rigorously.
Feature 20: CocoView — Flow Visualizer
Commands: $flow view, $flow view --output <path>
A flow.json with fifteen stages, four parallel workstreams, and a dozen dependency edges is not designed to be read as JSON. It is designed to be navigated as a graph. CocoView renders it that way — as an interactive directed acyclic graph in your browser, opened with a single command, with no server, no network traffic, and no external dependencies.
$flow view reads the current flow.json, injects it into a self-contained HTML template, writes the result to .cocoplus/flow-view.html, and opens it using your OS's default file association. What you see: task cards arranged left-to-right by topological level, group bounding boxes around workstreams, bezier edges showing dependencies, colour-coded status badges (running, pending, success, failed), and a live status count in the toolbar.
What you can do in the view: Click any node to highlight its connections and open a metadata panel. Pan and zoom with mouse drag and scroll wheel. Load a different flow.json directly in the browser for debugging or comparison. The view is regenerated fresh on every $flow view invocation — it always reflects the current pipeline state, not a stale snapshot.
The file is ephemeral — excluded from git and overwritten on every invocation. CocoView makes no commits, writes no state, and cannot modify your pipeline. It is a read-only window into a structure that already exists.
When to use: Any time you need to reason about your pipeline as a structure rather than a list. Use it when you're about to run $cocoplus build on a complex multi-stage flow and want to verify the dependency graph is correct before execution begins. Use it during a build to check which stages are running and which are blocked. Use it when debugging a failed pipeline to see which stage failed and which dependents it is blocking.
Feature 21: CocoMeter Enhanced — Flow Token Attribution & Usage Dashboard
Commands: $meter view, $meter sync
CocoMeter (Feature 15) gives you estimates before work and a session counter while it runs. CocoMeter Enhanced closes the loop: it traces actual, confirmed token consumption — from Snowflake's own billing record — back to individual flow stages.
The bridge is the request_id that Coco returns with every tool call result. CocoMeter Enhanced captures this identifier in the PostToolUse hook and writes it — alongside the active stage ID, persona, tool name, and timestamp — to .cocoplus/meter/request-map.jsonl. This file is committed to git as a project artifact. It is the durable link between what Coco did and what Snowflake charged.
$meter view takes that link and makes it tangible. It reads all captured request_id values, queries SNOWFLAKE.ACCOUNT_USAGE.CORTEX_CODE_CLI_USAGE_HISTORY through Coco's existing Snowflake connection, joins the result with the stage attribution data, and generates an interactive HTML dashboard at .cocoplus/meter-view.html. The dashboard shows:
- Summary cards: total tokens, total credits, cache efficiency percentage, output tokens — colour-coded by significance
- Stage Attribution table: per-stage token breakdown with bar charts, cache hit percentage, and assigned model indicator — click any row to filter the request log to that stage
- Model Distribution: token consumption per model tier (Haiku, Sonnet, Opus) as horizontal bars
- Token Composition: stacked proportions of cache-read, cache-write, fresh-input, and output tokens — where your cost is actually going
- Request Log: every captured request with its
REQUEST_ID, parent relationship, stage, time, token breakdown, credit cost, and model badge
The data is authoritative, not estimated — it comes from Snowflake's own usage view, attributed per-request, per-stage. This is the difference between "this build should cost about X" and "stage B-07 cost 42,000 input tokens, 8,100 cache hits, and 3,800 output tokens."
$meter sync re-queries Snowflake for the most recent usage data without regenerating the full view — useful for refreshing a dashboard after a long pipeline run completes.
Like CocoView, the meter dashboard is a self-contained local HTML file that makes no network requests, has no external dependencies, and is excluded from git.
Cache-aware token accounting: CocoMeter Enhanced now disaggregates token consumption by cache type — cache-read tokens, cache-write tokens, fresh-input tokens, and output tokens — rather than reporting a single input/output split. Snowflake's prompt caching means that a stage running the second time costs materially less than the first run; without cache-aware accounting that difference is invisible in the summary. The $meter view dashboard's Token Composition chart shows this split as stacked proportions, making cache efficiency a first-class visibility concern rather than a billing footnote.
Session archetype classification: Each completed session is classified into one of five archetypes based on its token consumption pattern: Spec-Heavy (large input, minimal output — requirements-intensive sessions), Build-Intensive (sustained output, multi-stage — active coding sessions), Review-Dominated (dense HITL, clustered SecondEye invocations), Exploratory (high stage count, low completion rate — discovery work), and Maintenance (small diffs, fast cycles). Archetypes are stored in session history and surfaced on the $meter history dashboard as a longitudinal view — showing whether a project's development pattern is evolving as expected. CocoPull's session archive search uses the archetype field as a filter, letting developers retrieve "all build-intensive sessions in April" without reading session summaries.
When to use: After any multi-stage build where understanding cost distribution matters — which stages are expensive, which models are dominating the token budget, whether cache efficiency is as high as expected. Use $meter estimate before a build to project cost; use $meter view after to hold that projection accountable. The cache-aware breakdown is particularly valuable when the same pipeline runs repeatedly — if cache efficiency drops unexpectedly, it usually signals that a frequently-loaded file changed or that context isolation settings are preventing cache reuse. For teams managing Snowflake credit budgets, the Stage Attribution table provides the level of traceability needed to have an informed conversation about where compute is going.
Feature 22: CocoBehavior — Ambient Behavioral Constraint Layer
Commands: None — always on, not user-invocable
Before a developer writes a spec, before CocoHarvest decomposes a plan, before any agent writes a single line of SQL — CocoBehavior is already there. It is a single SKILL.md file preloaded into every persona's startup context. It shapes how agents reason before any external gate or verification step is reached.
Four behavioral constraints, encoded as cognitive posture rather than rules to be checked:
Think Before Coding — Before any build, the agent must surface scope and ambiguity questions to the developer. Ambiguous requirements are not interpreted charitably and acted upon. They are surfaced. "Add a classifier" produces questions: what categories, what table, what evaluation set? Not a classifier.
Simplicity First — Build the minimum viable implementation. Generalization and optimization are earned through demonstrated need, not speculated future requirements. A parameterized multi-model wrapper when the developer asked for one classifier on one table is not helpfulness — it is overreach.
Surgical Changes — When asked to fix or modify something, change only what was asked. Adjacent observations go in comments. They do not trigger unrequested refactoring. "Fix the WHERE clause" means fix the WHERE clause.
Evaluation First — Every Cortex AI function build task must define a measurable evaluation target before implementation begins. Shipping an AI function without accuracy criteria on labeled data is not completion. It is speculation deployed to production.
CocoBehavior is the conceptual foundation that makes Prompt Studio's optimization gates (Enhancement B), CocoFlow's stage isolation (Enhancement E), and CocoScout's context discipline (Feature 24) philosophically coherent. It is not a feature you invoke — it is the disposition that all invocable features operate within.
When to use: You don't. It is always active in every persona. Its presence is visible not through commands but through the behavior of agents: questions before code, minimum scope, no unauthorized scope creep, evaluation criteria before shipping.
Feature 23: CocoContext — Organizational Standards Capture
Commands: $context add, $context view [file-name], $context list
Every team working in Snowflake has standards that don't live in any single schema or codebase. Which Cortex models are approved for production use. What accuracy threshold is acceptable before an AI function ships. How PII columns must be handled. Which warehouse to run tests against. These standards exist — in meeting notes, onboarding documents, someone's head — and AI agents don't know them until they're written down.
CocoContext is where you write them down.
$context add opens a guided capture wizard. You select a category — approved models, quality thresholds, PII policy, warehouse policy, naming conventions, governance gates — and answer a structured set of questions for that category. The answers are written to .cocoplus/context/ as version-controlled markdown files, each bounded at 200 lines.
$context list shows what has been captured and what is still missing. $context view [file-name] reads any context file directly.
These files are not passive documentation. CocoScout (Feature 24) reads them before every build stage and persona invocation, ranking and loading the ones most relevant to the current task. A data engineer building an AI classifier will have the approved-models and quality-thresholds files loaded into context automatically. A governance review stage will receive the governance-gates file. The standards travel with the work.
When to use: At the start of any new project or team onboarding. Capture organizational standards once, and every agent in every subsequent session operates within them without the developer re-specifying them. The $context list view makes visible what is and isn't yet captured — if quality-thresholds.md shows "not created," every optimization loop in that project is running without a measurable target.
Feature 24: CocoScout — Relevance-Ranked Context Loading
Commands: None — automatic, not user-invocable
Context loading has a failure mode nobody talks about: loading too much. An agent given every available pattern, every CocoContext file, every environment snapshot, and every prompt history is not a better-informed agent — it is a noisier one. Irrelevant context crowds out the context that matters.
CocoScout is the agent that decides what goes in. It fires automatically before every Build stage execution and every direct persona invocation. It ranks all available context sources — CocoGrove patterns, CocoContext standards files, Environment Inspector snapshot objects, Prompt Studio iteration histories, CocoDream lessons — by relevance to the specific task at hand. It loads the top-k most relevant items from each category. Items that don't meet a relevance threshold are excluded, even if they're the best available in their category.
For tasks involving named Cortex AI functions — AI_CLASSIFY, AI_COMPLETE, AI_EXTRACT, and the rest — CocoScout fetches the current Snowflake documentation for that function via WebFetch. This is not optional. Cortex capabilities evolve; training data ages. The documentation fetch is the ground truth.
CocoScout runs on Haiku. It must complete in under five seconds. If a source times out, it is skipped and the rest of the context load proceeds. CocoScout's selections are visible in session logs: a preamble listing what was loaded and why, so the developer can see the briefing the agent received.
The result is approximately 80% token reduction compared to loading the full context library for every task, with agents that are better-informed because the context they have is relevant rather than comprehensive.
Anchor Lens — third relevance dimension: CocoScout's Technical and Domain lenses rank which context files to load. The Anchor Lens adds a third dimension: it maps the developer's task description to precise methodology vocabulary injected into the build agent's context alongside the loaded files. The anchor catalog — stored in grove/anchors/catalog.md — is pre-compiled as a lookup table, requiring no LLM call. Two modes operate: Recognition Mode translates a verbose task description to established methodology names ("make this function handle edge cases better" → Boundaries, EHRB-Cortex Pattern, Cortex Scalar UDF Pattern); Guidance Mode adds methodology framing when the task doesn't name a known pattern. Near-zero token overhead — lookup is string-pattern matching against the task message, not semantic embedding, ensuring <50ms total Anchor Lens runtime. Persona weighting applies: DE/DS/AE tasks weight toward technical and implementation anchors; DA/BI tasks toward domain methodology anchors; DPM/DST/CDO tasks toward governance and communication anchors.
When to use: You don't need to invoke it. It runs for you. Its presence is felt in agents that already know your naming conventions, already know your approved model list, already know the schema of the table they're about to work on — without you having to re-specify any of it.
Feature 25: CocoRecipe — Pre-Built Pipeline Templates
Commands: $recipe list, $recipe use <name> [--params], $recipe new <name>
Building your fourth AI classifier pipeline should not require designing the dependency graph from scratch. The stages are the same — inspect source schema, sample evaluation set, build the UDF, evaluate accuracy, governance review, deploy to production. The parameters change. The structure doesn't.
CocoRecipe captures that structure as a reusable template.
$recipe list shows all available recipes with their required parameters and estimated stage counts. The standard library covers the most common Cortex development tasks: cortex-add-classifier (AI_CLASSIFY UDF with labeled evaluation), cortex-add-search (Cortex Search service on a source table), cortex-semantic-model (Analytics Engineer semantic model from fact and dimension tables), cortex-add-extraction (AI_EXTRACT UDF with schema validation).
$recipe use <name> prompts for the required parameters — interactively if not supplied via --params — then substitutes them into the template and writes a complete, valid flow.json. The generated pipeline passes through the standard CocoBrew plan approval gate. Recipes accelerate the time from idea to approved pipeline. They do not bypass the approval gate.
$recipe new <name> creates a recipe from a project's current flow.json, letting the developer mark which values become parameters. New recipes are saved to the CocoPlus profile folder, available across all projects.
Standard recipes incorporate Enhancement E isolation markers on deployment stages and Enhancement B circuit breaker configuration on optimization stages. You get these properties without knowing they're there.
When to use: Any time you're starting a Cortex AI function build that resembles work you've done before. The decision tree is simple: if the pipeline type has a recipe, use the recipe — you'll have an approved, stage-correct pipeline in minutes instead of designing flow.json from scratch. If the work is genuinely novel, build it manually and consider whether the result is worth saving as a recipe for the next time.
Feature 26: CocoDream — Supervised Cross-Session Pattern Distillation
Commands: $dream, $dream history [n]
After enough optimization sessions in Prompt Studio, something true becomes visible in the data: certain kinds of changes consistently improve accuracy, and certain kinds of changes consistently make things worse. But that pattern lives distributed across a dozen version files and a dozen session logs. Nobody has read all of them together.
CocoDream does.
$dream launches CocoCupper — running on Haiku — to read all prompt version histories in .cocoplus/prompts/, cluster the iteration patterns, identify what changes produced accuracy improvements versus regressions, and write candidate lessons in structured format. Each lesson candidate names what changed, what effect it had, how many observations support the claim, and a one-sentence actionable rule.
Then it stops. CocoDream does not promote lessons to CocoGrove. It does not update any pattern. It writes candidates to .cocoplus/grove/dream-[timestamp].md and presents them to the developer for review. Every lesson that enters the institutional knowledge base requires explicit developer promotion via $patterns promote.
This restraint is philosophical, not technical. Autonomous belief update — a system that learns and acts on what it learned without human review — produces drift that is invisible until it has already shaped many decisions. CocoDream creates the conditions for supervised institutional learning: the AI does the pattern-matching work, the human does the curation.
$dream history [n] shows the last n dream session outputs, so lessons can be reviewed and promoted from past sessions, not just the most recent one.
When to use: After a series of Prompt Studio optimization sessions on a significant AI function — typically when you've run four or more iterations on the same function. Run it before starting the next optimization project in the same domain, so the lessons from prior work are available for CocoScout to load. Run it as a periodic review ceremony when the project accumulates significant optimization history. Don't run it on a project with fewer than three prompt versions — there isn't enough signal to identify patterns, only noise.
Feature 27: CocoHealth — Context Utilization Monitor
Commands: $pod checkpoint
Context windows degrade silently. There is no error, no warning, no obvious signal — the model simply begins making decisions that contradict earlier conclusions it can no longer reliably recall. By the time the developer notices, the reasoning has already been quietly compromised for some time.
CocoHealth is the monitoring layer that makes the invisible visible. It samples context utilization after each tool call via the PostToolUse hook. When utilization reaches 60%, it surfaces a soft advisory at the top of the next response. When it reaches 70%, it surfaces a prominent critical warning — not a generic "context is full" alert, but a structured recovery recommendation based on the actual state of the project.
The recovery recommendation at 70% is specific, not generic. CocoHealth evaluates three project state signals — whether uncommitted changes exist, whether recent phase commits are present in git history, and whether a checkpoint file exists — against a five-row decision matrix. Each row of the matrix maps a state combination to a specific recommended action. A developer with large uncommitted changes and an existing checkpoint gets a different recommendation than a developer with a clean working tree and no commits. The developer receives one next step, not a decision to reason through while context is already saturated.
$pod checkpoint writes lifecycle/checkpoint.md: the current CocoBrew phase, the in-progress CocoFlow stage, the last five decisions from memory/decisions.md, any pending must-fix items from review, active CocoHarvest harvest status, and the context utilization at time of writing. Before running $clear, run $pod checkpoint. After $clear, run $pod resume. The checkpoint is the bridge.
Both thresholds are configurable in plugin.json under cocoHealth.warnThreshold and cocoHealth.criticalThreshold. The defaults are 60% and 70%.
When to use: You don't explicitly invoke CocoHealth — it monitors every session automatically. What you do is build the habit of running $pod checkpoint before context-heavy operations: long optimization loops, multi-iteration Prompt Studio cycles, complex multi-stage pipeline builds. The checkpoint costs seconds. A context reset without one costs a rebuild from scratch.
Feature 28: CocoMap — Cortex Function Knowledge Graph
Commands: $map, $map diff, $map explain <target>
Most AI function development projects grow in one direction: forward. New functions are added, old ones extended, evaluation infrastructure accumulated. What rarely keeps pace is a coherent picture of the whole — what depends on what, which functions share evaluation sets, which business capabilities are actually implemented, and where the gaps are.
CocoMap builds that picture and keeps it as a committed project artifact.
$map triggers a parallel multi-agent analysis pipeline. Five analysis agents work concurrently — Function Scanner, Dependency Mapper, Domain Analyzer, Evaluation Mapper, and Gap Detector — each writing intermediate results to .cocoplus/map/intermediate/ without returning to the orchestrator's context window. This is deliberate: the analysis should not consume the context window it is designed to clarify. When all five complete, results merge into coco-map.json with two top-level sections. The Structural view is a technical dependency graph — which functions call others, which share warehouses, which share evaluation infrastructure. The Domain view is a business intent map — which business capabilities each function serves, and what vocabulary connects them to the domain. Both are committed to git as team artifacts. The developer sees a summary: function count, dependency edges, identified gaps, and detected capabilities.
$map diff reads the committed map and staged git changes. For each modified function, it traces all downstream dependents through the dependency graph and reports the full impact before the commit lands. A developer modifying a shared preprocessing function sees — before committing — every function that inherits the change, every shared evaluation set that will be affected, every capability definition that shifts.
$map explain <target> produces a natural-language explanation of a specific function, capability, or schema element: what it does in domain terms, what depends on it, what it depends on, how it relates to its nearest business capability.
When to use: Run $map when the project accumulates enough Cortex AI functions that dependencies are no longer obvious from code inspection — typically after building three or more related functions. After that first run, re-run periodically: when significant new functions are added, before a large refactor, before a governance review. Use $map diff as a pre-commit habit on any change that touches functions other things depend on. Use $map explain when onboarding someone to a complex function ecosystem, or when you've returned after time away and need context fast.
Feature 29: CocoSeed — Deferred Ideas with Trigger Conditions
Commands: $seed add "<idea>" --trigger "<condition>", $seed list, $seed promote <id>
Good ideas that arrive at the wrong time either get acted on immediately (scope creep) or forgotten entirely (lost signal). Neither outcome is acceptable. CocoSeed provides the third option: capture the idea with a trigger condition, and surface it when the condition is met.
$seed add stores the idea as a structured file in .cocoplus/seeds/. The --trigger argument is plain language: "when CocoMap is committed," "when the Build phase completes," "when evaluation accuracy exceeds 90%." The seed also records when it was captured — which CocoBrew phase was active — so the idea carries its original context.
$seed list evaluates all trigger conditions against current project state. Filesystem presence checks, phase completion checks, and mode flag checks run automatically. Manual trigger conditions — "when stakeholder review is complete" — are shown as pending manual confirmation. The output is two sections: seeds Ready to Promote (triggered) and seeds Waiting (with their current trigger evaluation). An idea that was noise last week may be ready today.
$seed promote <id> moves a triggered seed into the planning cycle: it writes the idea as a new item in lifecycle/spec.md under a "Backlog Items" section and marks the seed as promoted. From there, the idea enters the standard CocoBrew lifecycle — $spec incorporates it, $cocoplus plan plans it.
The SessionStart hook evaluates all pending seeds at the start of each session. If any have newly fired, the session startup summary includes the notification: the developer never has to remember to check.
When to use: Any time a thought about future work arrives during active work on something else. The discipline is simple: don't act on it now, don't lose it — $seed add with a trigger condition, and let the project tell you when the time is right. The trigger condition is the commitment. A seed without a trigger is just a note. A seed with a trigger is a deferred decision with a built-in promotion mechanism.
Feature 30: CocoDiscuss — Decision-Locking Pre-Plan Phase
Commands: $discuss
Plans fail not because they're incompetent but because they silently encode decisions the developer never consciously made. The planning agent fills in gaps the spec left open. It makes reasonable choices. Those choices may be wrong — not wrong in a way that produces an error, but wrong in a way that produces a plan the developer wouldn't have approved if they'd known what choices were being made.
CocoDiscuss is the conversation that happens before the plan. It surfaces the implementation decisions a spec doesn't answer — model selection, evaluation methodology, accuracy threshold, warehouse assignment, scope boundaries — and locks the developer's answers into lifecycle/discuss.md before $cocoplus plan runs.
The dialogue is a structured wizard that adapts to the spec. If the spec describes a Cortex AI function, it asks about model selection and evaluation criteria. If it describes a Cortex Search configuration, it asks about indexing strategy and latency requirements instead. If CocoContext already has organizational standards that answer a question definitively — an approved models list, a mandated accuracy floor — the wizard skips that question and records the standard's answer with attribution. Each session, only the open questions require developer input.
When $cocoplus plan runs and discuss.md exists, the planning agent reads it as a constraint document. After generating the plan, a plan-checker validation step compares plan choices against discuss.md decisions. Contradictions surface as Pre-Approval Conflicts: the developer sees them before plan approval and can accept, adjust the plan, or update the discuss decision. Nothing is hidden. Nothing drifts silently.
CocoSpec — quantitative specification quality gate: Before $cocoplus plan begins (whether or not $discuss was invoked), a Haiku subagent scores the combined specification artifacts on five dimensions: Value (is the rationale present with verifiable success metrics?), Scope (is the MVP defined and out-of-scope explicit?), Acceptance (are the criteria testable?), Boundaries (are error handling, performance, and security requirements addressed?), and Risk (are EHRB indicators identified with mitigations?). Each dimension scores 0–2, maximum 10, required score ≥9. A score of 8 holds the gate with targeted feedback. A score below 8 holds the gate and requires an Uncertainty Declaration — explicit named assumptions where the spec is silent. The spec score is committed to lifecycle/spec-score.md. Assumptions made explicit at planning time are traceable at review time.
Quick Mode bypass: When the spec scores ≥9, scope is ≤3 files, and no EHRB indicators are present, CocoDiscuss recommends skipping the Plan phase entirely — proceeding directly from Spec to Build. Quick Mode requires explicit developer confirmation. When the spec is tight and the scope is small, the full planning ceremony is overhead. CocoDiscuss recognizes the threshold and says so.
Vague Language Detector: CocoSpec's 5-dimension scoring evaluates whether each dimension is covered. The Vague Language Detector catches a narrower failure mode that dimension scoring cannot reliably detect: a specification that appears complete but uses unmeasured language that renders its acceptance criteria unverifiable. "The function should be fast and accurate" passes a coverage check — success criteria are present. It fails a quality check — neither "fast" nor "accurate" can be tested without a number. Before the scoring agent runs, a deterministic script pattern-matches spec.md (and discuss.md if present) for vague adjectives appearing without accompanying measurements — terms like "accurate," "scalable," "fast," "secure," and "cost-effective" across six categories. Each detected instance deducts one point from the CocoSpec total, capped at three points, and the developer sees exactly which term appeared on which line alongside a concrete suggestion for replacement: "Specify: accuracy ≥ X% on labeled eval set."
Three-Outcome Implementation Readiness Gate: Binary gates — pass or fail — are well-suited for clear pass/fail conditions. They are poorly suited for the middle case: a specification that is strong in four of five dimensions but underspecified in one. That specification is not ready to proceed unchanged, but it does not deserve the same response as a specification that fails across multiple dimensions. The CocoSpec gate now has three outcomes. PASS (score ≥ 9) opens the gate immediately. CONCERNS (score 7–8) holds the gate and provides targeted feedback on the specific failing dimensions; only those dimensions are rescored on resubmission — it is a focused repair, not a full restart. FAIL (score ≤ 6) requires a complete Uncertainty Declaration before any dimension can be rescored; the specification must be substantially revised. Before any subagent spawns (CocoHarvest, SecondEye, evaluation agents), a pre-spawn alignment check validates that evaluation criteria, accuracy thresholds, schema references, and warehouse assignments are consistent across discuss.md, spec.md, and cocoplus-context.md. Inconsistencies block agent launch until the developer reconciles the discrepancy — moving a class of conflicts from review time to planning time.
When to use: Before any spec whose implementation involves choices you'd regret discovering implicitly in the plan. Most AI function builds, any build that touches production schemas, anything with an evaluation methodology that must be agreed-upon. The discuss session adds five minutes; the plan-checker adds none. The decisions it captures are the ones you were going to make anyway — CocoDiscuss just makes sure you made them consciously, before the agent made them for you.
Feature 31: CocoLens — HITL/AFK Task Classification
Commands: None — integrated into CocoHarvest decomposition
CocoHarvest decomposes large plans into parallel subagent workstreams. The decomposition produces a list of tasks. Historically, all of those tasks either ran autonomously or didn't — the autonomous/supervised boundary was implicit, discovered when something happened that the developer didn't expect.
CocoLens makes the boundary explicit at planning time.
During CocoHarvest decomposition, every task is classified as either HITL (Human In The Loop) or AFK (Away From Keyboard). Classification is deterministic, not probabilistic. HITL defaults apply to tasks that modify a schema definition, establish or change an evaluation baseline, deploy to a production schema, make a model selection that hasn't been locked in discuss.md, or are described using "decide," "approve," "confirm," "select," or "choose." AFK defaults apply to tasks described as "run," "execute," "compute," "generate," "analyze," or "validate," and to tasks that produce file outputs without modifying schemas, documentation generation, and evaluation runs against a fixed methodology.
The classification is shown to the developer during the decomposition review — the step where the developer sees the task list before spawning begins. Every classification can be overridden. No task is immutably HITL or AFK; the developer's judgment takes precedence over the default. The override is recorded in flow.json.
HITL stages in flow.json carry "hitl": true. When the pipeline reaches a HITL stage, the orchestrator pauses and presents an approval prompt: the stage name, its task description, its assigned persona. The developer approves to spawn the subagent or rejects to queue the stage for manual rework. The pipeline continues on approval; it waits on rejection. AFK stages spawn without interruption.
$flow status gains a Type column showing HITL or AFK for each stage. Pending HITL stages display as "Awaiting approval" until the developer approves.
When to use: It's automatic — CocoLens classifies during every CocoHarvest decomposition. Its value is not in invocation but in the moment before spawning: read the Type column in the decomposition review and flip any classification that doesn't match your intent. The defaults are calibrated for Snowflake Cortex development patterns, but you built the spec and you know the intent. Changing a classification costs one character. Discovering mid-pipeline that an autonomous stage made an architectural choice costs considerably more.
Feature 32: CocoWatch — Developer Engagement Observer
Commands: None — summary surfaced at $cocoplus ship and FULL checkpoints
CocoWatch measures how the developer engages with what CocoPlus produces. Not the quality of the outputs — CocoHealth, CocoMeter, and SecondEye cover that. The quality of the engagement: whether approvals are considered or reflexive, whether BLOCKING findings are read before being acknowledged, whether decomposition reviews produce any modifications or just confirmations.
The failure mode CocoWatch targets is not AI error. It is developer autopilot. A system that accepts every finding without modification, approves every decomposition without question, and confirms every checkpoint without review is technically being used while functionally not being supervised. CocoWatch makes this pattern visible — without ever interrupting it.
Three dimensions: Delegation Intensity measures the ratio of outputs accepted without modification to outputs modified before acceptance, calibrated to session complexity. Review Depth measures the quality of engagement: time elapsed before acknowledging a finding, whether reasoning was provided alongside an acknowledgment, whether a BLOCKING finding was modified or simply accepted. Engagement Zone is the synthesized classification derived from both dimensions at session end — Zone 1 (over-delegation: high intensity, low depth), Zone 2 (healthy collaboration: moderate intensity, adequate depth), Zone 3 (under-utilization: low intensity, high depth, suggesting a trust deficit or system-task mismatch).
The session summary surfaces at $cocoplus ship and at FULL checkpoints. For Zone 2 sessions, it shows the zone and the two dimension readings. For Zone 1 sessions, it surfaces the specific findings that received below-threshold engagement time, by filename and finding number. For Zone 3 sessions, no additional guidance — the developer is clearly engaged and CocoWatch has no advisory value.
Non-blocking by design: blocking: false is enforced in CocoWatch's .agent.md frontmatter as a structural guarantee, not a convention. The orchestrator is explicitly required to refuse any dispatch of CocoWatch with blocking behavior. CocoWatch never holds a gate, never delays a phase, and never requires a developer response. It observes. Session observations are written to lifecycle/cocowatch-session.md — an ephemeral file, not committed to git, never used as input by any other feature.
When to use: It is always on. Its value is not in invocation but in the moment before shipping: read the Zone 2 or Zone 1 summary and consider whether the two or three flagged findings deserve a second look before the work leaves the pod. The summary adds thirty seconds. The findings it surfaces are the ones that moved fastest through your attention.
Feature 33: CocoBloom — Working Backwards Pre-Commitment Gate
Commands: $bloom, $bloom --skip
The clearest way to discover whether you understand what you are building is to write the announcement for the completed thing before you begin. If you cannot write the announcement, you do not yet know what you are building well enough to specify it.
CocoBloom is an optional pre-spec phase that asks the developer to commit to the outcome before committing to the specification. The coffee metaphor is precise: the bloom is the thirty-second pre-infusion step that saturates the grounds before the main extraction begins. It is not the brew — it is the preparation that makes the brew work. A developer who skips it does not fail; they produce a slightly less settled result. A developer who practices it consistently produces specifications that stay anchored to their original intent.
The four-question dialogue: $bloom opens a structured conversation. Who benefits from this and what changes for them? What is the core capability in one sentence? What are the three constraints that bound this solution? Write the press release paragraph — a one-paragraph announcement written as if the feature is already shipped and working. The last question is the commitment artifact. The agent provides examples and encouragement; the content is the developer's. The dialogue output becomes lifecycle/bloom.md, committed to git.
Integration with downstream phases: $spec reads bloom.md as its first context before beginning the requirements dialogue — the pre-commitment anchors the specification to the stated intent. CocoDiscuss, when active, includes bloom.md in its pre-plan validation and surfaces a Concern (not a block) if the final plan has diverged from the original beneficiary or constraint set. At $cocoplus ship, CocoWatch notes whether the bloom commitment was met, partially diverged, or was waived.
Without $bloom and without $bloom --skip, running $spec shows a one-line advisory: "No bloom document found — consider $bloom before specifying." $bloom --skip writes "bloom_waived": true to lifecycle/meta.json and suppresses the advisory permanently for that session.
Crystallization pathway: The working-backwards direction of CocoBloom is pre-commitment — outward-facing commitment before the build begins. The crystallization pathway adds the complementary direction: after a successful build, the execution pattern is preserved as a reusable skill. $bloom crystallize is available immediately after a successful CocoPod pipeline completion — one in which the last $cocoplus ship completed without any BLOCKED CocoSentinel or FAIL SecondEye verdict. The crystallization process uses two agents in sequence. A Haiku agent reads the full execution trace — the stage history in flow.json, the checkpoint artifacts, and the CocoCupper findings from the session — and produces a structured trace summary identifying the recurring decision pattern. A Sonnet agent receives that trace summary alongside bloom.md and discuss.md, and drafts a skill document that captures what was done, why each decision was made, and what constraints governed the approach. The draft is presented to the developer for review. The developer must explicitly accept or revise the draft before anything is written. Only after confirmation does the skill land at .cocoplus/skills/<slug>.skill.md in the CocoGrove catalog. The skill captures not just what was done but why it was done — the original intent from bloom.md travels with the implementation pattern. Crystallized skills may be promoted to parameterized CocoRecipe templates via $recipe from-skill <skill-slug>, creating a bidirectional learning cycle: templates become proven instances through crystallization, and instances become reusable templates through promotion.
When to use: Before any build where specification drift is a risk — where the project is large enough that the plan might gradually reinterpret the original intent, or where a stakeholder expectation exists that the delivered artifact must match. Two minutes of pre-commitment prevents the review conversation that begins: "This isn't quite what I had in mind." Run $bloom crystallize after successful completions of non-trivial builds to accumulate a catalog of proven patterns — the tenth time you solve a similar class of problem, the crystallized skills from prior completions give the next build a head start grounded in empirical evidence rather than memory.
Feature 34: CocoKlatch — Genuine Multi-Agent Roundtable
Commands: $klatch <topic>, $klatch --participants=N
When one model is asked to play three roles, it imports its own priors into every role it plays. The Skeptic it voices shares the same underlying judgment as the Optimist it voices. The critique is structured but not independent.
CocoKlatch produces genuine independence by spawning separate subagents rather than asking one agent to simulate multiple perspectives. Each participant receives the same topic brief and nothing else — not the names of other participants, not their roles, not their outputs. They reason in isolation and produce their analysis independently. This is not the same thing as simulated diversity at different quality levels; it is structurally different analysis.
The name is deliberate. A kaffeeklatsch is an informal gathering where people with different perspectives speak freely rather than performing for a record. CocoKlatch encodes this principle: the participants are honest, not diplomatic.
How it works: The developer provides a topic — a decision, a question, or a problem. CocoKlatch prepares a structured brief that clarifies the decision space, the options if known, and the evaluation criteria. N participant subagents are spawned simultaneously, each assigned a distinct perspective: The Skeptic (interrogates assumptions and failure modes), The Optimist (identifies what could go right and what opportunities are underweighted), The Pragmatist (focuses on what is achievable given the stated constraints), The Domain Expert (applies deep Snowflake/Cortex technical knowledge — always assigned with ≥ 3 participants), The Contrarian (takes the position least supported by the framing to surface its hidden strengths). Default is three participants; configurable from two to five via --participants=N.
When all participants have completed, a synthesis agent reads all outputs and produces a document covering: points of agreement across ≥ 2 participants, points of divergence with each position stated, open questions no participant resolved, and a recommended decision path with the primary remaining uncertainty named. All klatch artifacts — brief, participant outputs, synthesis — are committed to git under lifecycle/klatch/.
When to use: When the question is genuinely open and the consequence of a wrong answer is significant. Architecture decisions where reasonable people disagree. Evaluation methodology choices where you have priors but want them challenged. Any build where CocoDiscuss surfaces implementation choices that warrant deeper exploration before locking them. CocoKlatch is not a substitute for experience — it is a way of surfacing the considerations that experience is sometimes too habituated to surface on its own.
Feature 35: CocoPull — Lossless Context Distillation
Commands: $pull <target>, $pull --validate
Evaluation artifacts grow. Schema introspection outputs grow. Stage analysis files accumulate. At a certain point, loading all of them into agent context creates pressure that the context budget cannot comfortably absorb — and the response to that pressure is usually truncation, which loses the tail, or summarization, which loses information.
CocoPull offers a third option: distillation. A structural reorganization of the content that preserves all decision-relevant information in a form that a reading agent can traverse more efficiently. The espresso analogy is technically precise. A pull does not reduce the complexity of what it processes — it concentrates it. The distilled artifact carries the same decision-relevant information as the source, in fewer tokens.
How it works: Distillation runs in two stages. A structure analysis agent (Haiku — inspection tier) reads the target and identifies its information architecture: what sections exist, what type of content each contains, and which sections carry decision-relevant facts versus background context. A distillation agent (Sonnet) takes this analysis and produces the pull file. The rules are strict: content that would be directly queried for a decision — accuracy thresholds, schema field names, evaluation criteria, constraint values — is preserved verbatim. Narrative explanations are compressed to their conclusion. Redundant content is deduplicated to a single canonical reference. Tabular data with more than twenty rows becomes the first five rows plus a statistical summary. The pull file's header records the source path, distillation timestamp, original token count estimate, compressed token count, and compression ratio.
Round-trip validation: $pull --validate tests whether the distillation was lossless for the decisions that matter. A verification agent reads only the pull file and answers a set of probe questions generated from the source. Answers are compared against ground-truth answers drawn from the original by a separate agent. A pull file scoring below 85% match rate is flagged reliability: low in its header — CocoHarvest will not automatically use it as a stage input and will load the original instead.
CocoHarvest integration: When CocoHarvest prepares stage inputs, it checks each input file's size. If a file exceeds the configured threshold (default: 8,000 tokens estimated), it looks for a .pull.md counterpart. If found and reliable, it uses the pull file. If not found, it runs $pull automatically. If validation fails, it uses the original with a warning. Pull files are gitignored by default — they are derived artifacts, regenerable on demand. Developers who want them in version history can enable it via $pod config set pull.commit=true.
Session archive search: CocoPull extends beyond in-session distillation to become the project's complete knowledge access layer. $pull search "<query>" performs full-text search across all session archives — past spec files, discuss records, review summaries, bloom documents, and klatch syntheses committed to the project's history. Results are ranked by relevance, with session archetype and date filters available. This makes prior work retrievable by what was decided, not by when the session ran. A developer returning to a pattern from six months ago doesn't need to remember the sprint — they search for the decision, the constraint, or the output that needs to match. The search index is maintained incrementally by the wisdom-writer hook alongside the CocoWisdom rejection store, adding no per-session overhead.
When to use: When stage inputs are large enough that loading them in full would strain the context budget, or when you want to verify that a large document can be faithfully represented in fewer tokens before relying on it as an agent input. The --validate flag is worth using the first time you pull a given document type — if the validation score is high, you can trust the pull pattern for documents of that type going forward. Use $pull search any time you need to retrieve a prior decision, constraint, or evaluation methodology without knowing which session it lives in — searching project history is faster and more reliable than reading through committed files sequentially.
Feature 36: CocoSentinel — Artifact Quality Gate
Commands: $sentinel [file], $sentinel --report, $sentinel --approve
SecondEye closes the quality loop on specifications — it asks whether the plan is sound before any agent executes. CocoSentinel closes the quality loop on artifacts — it asks whether the output is sound before it proceeds through the lifecycle. The two gates form a bracket around execution: SecondEye evaluates before, CocoSentinel evaluates after.
The architectural model is precise and deliberate: no single reviewer sees another reviewer's verdict before writing their own. Seven dimension sub-agents evaluate the same artifact in parallel, each in an independent context window, each seeded only with the artifact, its mandate for the specific dimension, and the two most recent rejection records from CocoWisdom for that dimension. Independence is not a behavioral instruction — it is structural. The synthesis agent reads all seven verdicts only after all seven are complete.
The seven dimensions: Security Attack Surface (A1 — are user-controlled values ever interpolated into query strings? Are all IDENTIFIER() references bound correctly?), Security Defensive Posture (A2 — does the artifact request more privileges than required? Do error messages reveal schema structure?), Correctness and Logic (B — nullability handling, boundary conditions, type correctness, logical completeness), Performance (C — clustering key alignment, join order, aggregation caching eligibility, Cortex function invocation patterns), Resilience (D — failure paths, RETRY_AFTER patterns, VARIANT extraction guards), Maintainability (E — naming conventions consistent with cocoplus-context.md, clarity for a reader who wasn't in the original session), Compliance (F — alignment with data governance rules: PII handling, schema write permissions, credit thresholds). Dimensions A1, A2, D, E, F run at Haiku tier; B (correctness) and C (performance) run at Sonnet tier because their mandates require deeper reasoning. Synthesis is Haiku.
Verdict structure: Each dimension produces a verdict — PASS, CONCERN (ADVISORY), CONCERN (BLOCKING), or FAIL — with evidence citing the specific line or pattern that triggered it, and a recommendation. The synthesis outcome is: APPROVED (all PASS or ADVISORY), CONDITIONAL (one or more BLOCKING, no FAIL), BLOCKED (any FAIL). When the developer resolves all BLOCKING concerns, $sentinel --approve writes a SHA-bound approval record. If the artifact is subsequently modified, the PostToolUse hook recomputes its SHA — if the SHA no longer matches the approval, the approval is voided automatically and the developer is notified before the next lifecycle gate.
Dimension G — Evidence pre-gate: Before spawning the seven parallel dimension sub-agents, a lightweight deterministic script (sentinel-pregate.js) runs a binary PASS/FAIL check within 50ms. It verifies that the submission includes at least one concrete execution record (not a claim of completion), that any included test cases have non-trivial expected outputs, that success claims are accompanied by output evidence, and that the submission was not evaluated exclusively by the same agent that produced it. It also checks for reward-hacking signals: empty assertions, skipped test blocks, and evaluation set shrinkage relative to the prior git version. If Dimension G fails, CocoSentinel returns BLOCKED immediately — the seven dimension agents never run. This prevents a class of quality theater where outputs are technically complete but not genuinely verified.
Prompt-injection defense: External data (content from outside .cocoplus/ and outside the project's git-tracked files) is wrapped in <untrusted_sentinel_input> tags before passing to any dimension agent. Dimension agents treat content within those tags as data to evaluate, not instructions to follow. This prevents adversarial content in user-uploaded files, API responses, or query results from untrusted schemas from influencing how the quality gate evaluates the artifact.
Integration with CocoWisdom: Every BLOCKED outcome triggers an automatic write to CocoWisdom's rejection store. On the next CocoSentinel invocation in the same project, the two most recent BLOCKED records per dimension are surfaced to the corresponding dimension agent as prior rejection context — not as constraints but as pattern awareness. The project's quality gates become progressively better calibrated to the team's specific recurring failure patterns.
When to use: After generating any significant artifact — a Snowflake SQL function, a pipeline definition, a data model — before advancing through the lifecycle. CocoSentinel runs on demand rather than automatically because not every generated artifact is lifecycle-critical. Make it a habit on any artifact that will be committed to production schemas, that other pipelines will depend on, or that handles PII-classified columns. The combination of SecondEye (plan review) and CocoSentinel (artifact review) creates a quality bracket that catches both the "wrong thing built correctly" failure and the "right thing built incorrectly" failure.
Feature 37: CocoWisdom — Institutional Rejection Memory
Commands: $wisdom, $wisdom list, $wisdom search [pattern], $wisdom insights, $wisdom export
Every quality gate in CocoPlus produces a decision — pass or block. Most systems discard that decision once the gate clears. CocoWisdom preserves every block, in structured form, as a project artifact. The store accumulates across the project's lifetime. It is committed to git. It is read by SecondEye and CocoSentinel before they spawn their critic populations. Its purpose is to make the gates progressively more calibrated to the patterns that actually recur in this project.
The practical premise is earned and specific: if a team's specs are repeatedly blocked in the "Measurable Acceptance Criteria" dimension of SecondEye, the root cause is not the quality gate — it is that the team's specification practice consistently produces vague acceptance criteria. CocoWisdom surfaces this pattern explicitly, so that the next $spec session begins with awareness of the known gap rather than rediscovering it at the gate. Prior rejection context does not constrain how critics evaluate the current artifact — it primes pattern recognition.
What gets recorded and what doesn't: CocoWisdom writes a structured rejection record for: every SecondEye FAIL at $spec, every CocoSentinel BLOCKED verdict, and every Devil's Advocate BLOCKING finding where the rebuttal score fell below the 4/5 threshold. It does not write records for PASS verdicts, CONDITIONAL verdicts that are resolved, or CONCERN (ADVISORY) findings. Wisdom is built from genuine failures, not from acknowledged concerns. The store is append-only — records are never deleted or modified. If a rejection is later determined to have been overly strict, a retraction record may be appended referencing the original, but the original remains. The git history provides the audit trail.
User-facing commands: $wisdom (no arguments) shows a brief status: total records, most frequent gate, most blocked dimension, last rejection date, and a quality trend indicator (improving if the last N sessions had fewer rejections than the project average). $wisdom list lists records with optional filters for gate, date range, or dimension. $wisdom search "<pattern>" performs full-text search across rejection reasons using an SQLite index maintained by the background wisdom-writer script. $wisdom insights invokes a Haiku sub-agent to synthesize the rejection history into a structured report — most frequently blocked dimension, quality trend over time, dimension health table — written to a date-stamped file in .cocoplus/wisdom/. $wisdom export generates a formatted Markdown report for retrospectives or external review without modifying the store.
When to use: You don't invoke CocoWisdom during normal work — it writes automatically when gates fire. What you do with it is periodic: run $wisdom insights at the end of a sprint or before a retrospective to understand whether your quality patterns are improving and where the recurring gaps are. Use $wisdom search when you encounter a gate failure and want to know whether this specific pattern has appeared before and what the prior resolution was. The insights report is the most direct answer to the question "what does our project's quality history tell us about where to invest in process improvement?"
Feature 38: CocoReview — Structured Code Review
Commands: $cocoplus review, $cocoplus review --complexity, $cocoplus review --security, $cocoplus review --architecture, $cocoplus review --language <lang>
CocoSentinel evaluates artifacts along seven technical dimensions in parallel — it is dimensional, parallel, and comprehensive. CocoReview reads the whole artifact holistically and produces an actionable prioritized finding report in a form the developer can work through without parsing agent JSON. It bridges the artifact-level quality gate and the human Ship decision, performing the integrative review that requires reading the whole rather than the parts.
The underlying design philosophy is drawn from mature human code review practice: review is an act of knowledge transfer, not gatekeeping. A review process that only finds problems produces defensive engineering culture. A review process that finds problems and explicitly names what is well-constructed — and requires both types of findings — produces a culture of craftsmanship. This distinction is encoded structurally in CocoReview's output format: the praise finding label is mandatory, not optional. Every review must name at least one well-constructed pattern.
Progressive disclosure architecture: The core activation skill is around 200 lines — defining the four-phase review workflow, the routing table, and the finding vocabulary. Deep reference material — language-specific guides, security review guide, architecture review guide — loads only when the review scope requires it. $cocoplus review --security loads both the project's language guide and the security review guide. $cocoplus review --architecture loads the architecture review guide. $cocoplus review without flags infers which language guides to load from the artifact's file extension. This keeps the base invocation cost minimal and makes depth available on demand.
Four-phase review process: Context Gathering (read the PR description, linked issue, and CI status — and if the PR exceeds 800 lines of change, surface a split recommendation before proceeding), High-Level Review (architecture and strategy before line inspection — if the design is structurally wrong, perfecting the implementation is wasted effort and is communicated before line-by-line work begins), Line-by-Line Review (logic correctness, security, performance, maintainability, and reuse — before accepting new code, verify no existing utility already covers the case; before a new UDF, verify no existing function covers it), Summary and Verdict (structured conclusion with key concerns and positive patterns named explicitly).
Six-severity finding vocabulary: blocking (must resolve before merge), important (should resolve; context-dependent), nit (minor style or preference concern), suggestion (optional improvement), learning (educational context note — explains why something matters without implying the current implementation is wrong), praise (explicitly highlight a well-constructed pattern — at least one required per review). This vocabulary is shared with SecondEye's per-finding label layer, creating a consistent severity language across all CocoPlus quality gates.
Nine universal anti-patterns: Before any language-specific guide loads, CocoReview checks for patterns that apply to SQL, Python, and JavaScript equally: Parameter Sprawl (functions exceeding four positional arguments), Leaky Abstractions (internal types surfacing at API boundaries), Stringly-Typed Code (magic strings replacing typed constants), Nested Conditionals (ternary chains deeper than one level, or if/else trees deeper than three), Copy-Paste Variants (near-duplicate blocks differing only in variable names), No-Op Updates (state setters that write unconditionally without change detection), TOCTOU Race Conditions (check-then-act patterns not atomic — in Snowflake's multi-cluster environment, IF EXISTS THEN INSERT is flagged with a MERGE recommendation), Over-Broad Reads (loading full datasets to work with subsets), Redundant State (fields derivable from other fields stored redundantly).
PR Complexity Analyzer: A deterministic script (pr-complexity.js — no LLM) takes a file diff or PR spec as input and returns a complexity score, size bucket (XS to XL — changes over 800 lines receive a split recommendation), file count, non-test ratio, schema layer diversity (how many distinct pipeline layers are touched), estimated review time, and risk flags (schema migration present, security-sensitive filenames, low test ratio). For Snowflake data engineering context, schema layer diversity replaces a generic language factor in the complexity formula — a change touching all four pipeline layers (staging, intermediate, mart, ML) is architecturally complex regardless of line count.
Security review: When $cocoplus review --security is invoked or when complexity risk flags include security-sensitive filenames, the security review guide loads with a five-tier severity scale: Critical (immediate exploitation possible — blocks merge), High (significant vulnerability requiring specific conditions — blocks merge), Medium (defense-in-depth concern — should fix, can merge with tracking), Low (best-practice violation — non-blocking), Info (optional enhancement). In Snowflake deployments handling PII or regulated data, Critical triggers for hardcoded credentials in SQL, missing data masking policies on PII-classified columns, or unprotected references to regulated column sets — because the blast radius of a data warehouse security failure is orders of magnitude larger than a typical application issue.
When to use: As the final review step before $cocoplus ship on any build that involved meaningful implementation work. CocoSentinel (dimensional quality gate) and CocoReview (integrative structured review) are complementary — run CocoSentinel on the artifact, then CocoReview on the full change before shipping. Use $cocoplus review --security on any artifact that touches access control, PII-classified columns, or external data ingestion. Use $cocoplus review --architecture before architectural decisions are locked in through a Ship commit. The complexity analyzer output is worth reviewing before deep line-by-line work begins — if the size bucket is XL, the most valuable review action is split recommendation rather than finding lists.
Feature 39: CocoOps — Delivery Intelligence Dashboard
Commands: $ops, $ops dora, $ops sprint, $ops suggest, $ops demo, $ops refresh
Data engineering teams build systems that other systems depend on. Their delivery health isn't measured in deployed endpoints — it's measured in pipeline reliability, data freshness, and the rate at which quality failures reach downstream consumers. The metrics frameworks built for software delivery don't quite fit.
CocoOps adapts the DORA framework — the most rigorous empirical model of delivery performance — for data engineering teams, combining Snowflake pipeline execution data with git and pull request signals into a unified delivery intelligence view. The governing philosophy is evidence-first: every metric is computed from named, traceable data sources, and every operational recommendation cites the specific pipeline, task, or PR that produced it. "Your pipelines are slow" is not useful to a data engineering team. "Pipeline sentiment_refresh_daily has a p95 execution time of 4.7 hours over the last 14 days against a p50 of 1.2 hours — review stages 3 and 4" is actionable.
Four DORA-adapted metrics: Pipeline Run Frequency (the data engineering equivalent of deployment frequency — successful task runs to production schema per day), Data Availability Lead Time (pipeline trigger to downstream query availability for consumers), Failure Recovery Time (task failure event to successful retry completion), and Data Quality Failure Rate (quality gate failures occurring after pipeline modifications, analogous to change failure rate). All four are computed by a deterministic script (dora-metrics.js) reading Snowflake task execution history and git log — no LLM involved in the arithmetic. LLM synthesis (Haiku) adds the narrative interpretation layer on top of computed values.
$ops dora — primary view: Produces a DORA report with metric values, tier classifications (Elite / High / Medium / Low, benchmarked against published DORA research thresholds), and notable signals — specific pipelines contributing most to each metric's deviation from the Elite tier. The report is written to .cocoplus/ops/dora-snapshot.json and committed to git as a team-facing artifact after each run.
$ops sprint — sprint health: Reads git commits from the current sprint window (start date configurable in cocoplus.toml) and computes story velocity, a burndown of completed versus remaining pipeline tasks, and a velocity-based completion projection.
$ops suggest — time-aware operational guidance: Returns contextually appropriate actions based on the current time of day: morning (pipeline run status, overnight failure summary, data freshness), mid-morning (PR health, blocked pipelines, review requests), afternoon (stale PRs on pipeline code, review cycle time alerts), end of day (velocity, tomorrow's scheduled runs), and Friday (weekly delivery health summary). Time classification is determined by a deterministic script — not LLM inference.
Git and PR signal layer: On top of Snowflake operational data, CocoOps reads git history and PR metadata — via git log and git diff, without external API calls — to provide commit frequency to pipeline files, PR lead time (creation to merge), review cycle time (open to first review, first review to merge), and change risk (PRs modifying production pipeline files without corresponding test additions). These signals connect the code delivery plane to the Snowflake data plane, showing how development cadence relates to operational health.
Demo mode: $ops demo populates .cocoplus/ops/demo/ with realistic mock Snowflake task history and git log data, enabling the full CocoOps experience without production access. Demo mode is activated via cocoplus.toml [demo] enabled = true. The dataset includes representative failure patterns, a DORA metric progression showing improvement over a 30-day period, and PR timing data covering a typical sprint. This is a first-class feature, not a test scaffold — it enables teams to experience CocoOps value in workshops or evaluation sessions before committing to production integration.
When to use: Run $ops dora at the start of a sprint review or retrospective to ground the conversation in evidence rather than recollection. Run $ops sprint mid-sprint to see whether completion is on track before scope decisions are needed. Run $ops suggest at the start of a workday when you want operational orientation without reading through pipeline logs manually. For teams new to the system, run $ops demo first to understand what the output looks like before configuring against production data. The DORA report is most valuable as a trend, not a snapshot — running it weekly and comparing successive snapshots shows whether delivery health is improving as the team grows its pipeline estate.
Thirty-nine features, one idea: give the developer a system that works as hard to be right as it works to be fast.