What To Measure
Track every agent run against five stages:| Stage | Success Signal | Failure Stage |
|---|---|---|
| Install | Client discovers /mcp/agent and completes dynamic registration or manual install | install |
| Auth | OAuth authorization and token exchange succeed for the selected workspace | auth |
| Input | The agent supplies enough brief, brand, and asset context to start a run | input |
| Runtime | Lamina queues and completes the selected creative workflow | runtime |
| Output | The agent receives usable final assets or structured outputs | output |
POSTHOG_SERVER_API_KEY.
| Event | When It Fires |
|---|---|
agent_runtime.install.discovery_challenged | A remote MCP client discovers that /mcp/agent requires OAuth |
agent_runtime.install.client_registered | Dynamic MCP OAuth registration succeeds |
agent_runtime.install.failed | Dynamic registration fails |
agent_runtime.auth.authorize_redirected | The OAuth authorize request validates and redirects to consent |
agent_runtime.auth.approved | A signed-in user approves workspace access |
agent_runtime.auth.token_issued | Authorization code or refresh-token exchange succeeds |
agent_runtime.auth.succeeded | A bearer token is accepted for /mcp/agent |
agent_runtime.auth.failed | OAuth authorization, token exchange, or bearer validation fails |
agent_runtime.tool_call.completed | A five-tool MCP call completes and is classified |
agent_runtime.tool_call.completed event includes:
Benchmark Scenarios
Run these scenarios for each supported MCP client before calling the distribution ready.| Scenario | Required Proof |
|---|---|
| Hosted OAuth install | The client discovers metadata, registers or uses its configured client, shows consent, receives tokens, and lists exactly five tools |
| First image run | The agent calls lamina_create from a short prompt and obtains at least one image output through lamina_status |
| First video run | The agent calls lamina_create for a video task and can wait or poll until a terminal result |
| Brand-aware planning | The agent calls lamina_brand, uses the returned guidance, and starts a run with the same workspace context |
| Batch creative | The agent calls lamina_batch with 3 to 10 related briefs and receives per-item runId values or actionable item errors |
| Clarification loop | An intentionally underspecified request returns needsInput with missing fields, examples, and a follow-up prompt |
| Auth recovery | An expired or insufficient-scope token yields a clear OAuth error and the client can reauthorize |
Pass Criteria
Use these thresholds for the preferred-runtime scorecard:| Metric | Target |
|---|---|
| Install success rate | 95% or higher per supported client |
| First successful generation time | Under 5 minutes from clean client install |
| Tool call auth failure rate | Under 1% after successful install |
| Clarification-loop rate | Tracked separately from hard failures |
| Run completion rate | 90% or higher for benchmark workflows |
| Webhook or polling delivery success | 99% for terminal run visibility |
| Output usability rate | 95% of completed runs have at least one usable output |
needs_input as a runtime failure. It is an input-stage clarification outcome and should be optimized by improving discovery, examples, and prompt mapping.
Suggested Benchmark Record
Store one record per client, scenario, and run:Dashboard Breakdown
At minimum, build dashboard cards for:- Install starts, successful registrations, and failed registrations by MCP client
- OAuth approvals, token issues, bearer-token failures, and insufficient-scope failures
- Tool-call success rate by
tool_name needs_inputrate bytool_nameand requested modality- Runtime failure rate by workflow/app when available
- Empty-output and failed-output rate after terminal
completedstatus - Time from first install event to first completed output