Value Calibration Over Time
The problem
The understanding layer requires an ongoing model of what the human values. But values change as people grow. A static model becomes stale. A confidently wrong value model deployed during a degraded state -- when the person's capacity to catch and correct errors is reduced -- could produce worse outcomes than no model at all.
Maintaining calibration requires ongoing dialogue, which requires the human to trust the machine enough to be honest with it. This connects to Schoeller et al.'s trust dynamics -- but with the added requirement that trust must support value disclosure, not just behavioral interaction.
Scale of the problem
Stated intentions typically predict only 30-40% of variance in actual behavior, with the gap widening for passive choices, complex decisions, and situations involving delayed consequences (Faries 2016). The machine's value model must therefore track revealed preferences (behavioral patterns) alongside stated preferences.
The wanting/liking dissociation (Berridge & Robinson 2016) provides a neurobiological account of why stated and revealed preferences diverge: a person can want things they don't like, fail to want things they would enjoy, or experience wanting that operates in direct opposition to their cognitive desires.
Partial protections
Caring invariant I1 (no hidden goal substitution) requires that every suggested action cites stored commitments it serves. If the system infers a value update, it must request confirmation and log the change. This creates an auditable trail but does not solve the underlying estimation problem.
Unsolved questions
- How to detect value drift versus value change (stale model versus genuine evolution)
- How to weight revealed versus stated preferences
- When to surface perceived value misalignment versus when to update the model
- How to handle the circular dependency between trust (needed for accurate disclosure) and accuracy (needed for trust)