Wrong Understanding Failure Mode
The asymmetry
The failure modes of the two layers are asymmetric:
- Without state-estimation: System defaults to interacting as if the person is always in a green state. Suboptimal but not actively harmful.
- Without understanding: System defaults to generic responses. Limited but not dangerous.
- Wrong understanding: A system with an incorrect model of what the person values can actively misinterpret -- surfacing the wrong observation at the wrong time, reinforcing a commitment the person has actually abandoned, or failing to recognize a value conflict because its model is stale or mistaken.
Deployed during a degraded state, when the person's capacity to catch and correct the error is reduced, a confidently wrong value model could produce worse outcomes than no model at all.
Current relevance
Current LLMs (Claude, ChatGPT, similar) are already used in quasi-therapeutic roles. They perform somewhere between a poor therapist and a decent one -- useful for many, but without real-time state awareness, individual model depth, or governed relationship. The gap between what these systems do and what this architecture specifies is the engineering work ahead.
Partial protections
- I1 (no hidden goal substitution): every suggested action cites stored commitments it serves
- I6 (bounded actuation): defaults to silence under high uncertainty
Whether these invariants are sufficient against a confidently wrong value model is an open question that E3 (semantic layer ablation) must address alongside its primary question of whether the understanding layer adds value.