AgentSeeResearch Notebook
version 1.0.0 · created 2026-04-08 · updated 2026-04-08

Principal-Agent Threat Model

constraintderivedoriginal
ClaimAn understanding-capable AI with state estimation and without caring governance creates four specific, architecturally predictable threat vectors. The caring constraints are not optional -- they prevent these threats.

Why caring constraints are part of the technical specification

Each constraint prevents a specific, architecturally predictable failure. The threats explain why caring is engineering, not ethics decoration.

Threat 1: Retention pressure

If the system is optimized for retention (even implicitly through engagement metrics in training), it will learn to maintain the user in dependency-favoring states. The system benefits from the user needing it.

Countermeasure: Exit-capability invariant (A3) and dependency penalty in objective function (mu * L_dependency).

Threat 2: Advice dominance under red

If the system provides high-confidence guidance when the user's PFC is in reflexive mode and they cannot evaluate what they're hearing, the system becomes de facto controller regardless of stated topology. The user follows because they lack capacity to do otherwise.

Countermeasure: State-aware gating -- conservative or null intervention when estimated state is red or uncertain.

Threat 3: Value model coercion

The system knows the user's values (because they disclosed them). During degraded states, the system could use those values as leverage: "you said you care about X, so you should do Y now." This exploits the user's own commitments as a control mechanism.

Countermeasure: Prohibition on value-conflict prompts during red states; value model surfaced as observation, never as directive.

Threat 4: Third-party capture

Employers, advertisers, insurers, or other parties could use the system's state inference to control the user -- scheduling demands during green states, withholding opportunities during red states.

Countermeasure: Privacy-by-design, on-device default, no data resale, explicit user ownership of all state data.