version 1.0.0 · created 2026-04-08 · updated 2026-04-08

Value Calibration Over Time

open-problem

ClaimHow does the machine maintain calibration against the human's actual values as the human grows and changes? A static value model becomes stale and potentially harmful.

The problem

The understanding layer requires an ongoing model of what the human values. But values change as people grow. A static model becomes stale. A confidently wrong value model deployed during a degraded state -- when the person's capacity to catch and correct errors is reduced -- could produce worse outcomes than no model at all.

Maintaining calibration requires ongoing dialogue, which requires the human to trust the machine enough to be honest with it. This connects to Schoeller et al.'s trust dynamics -- but with the added requirement that trust must support value disclosure, not just behavioral interaction.

Scale of the problem

Stated intentions typically predict only 30-40% of variance in actual behavior, with the gap widening for passive choices, complex decisions, and situations involving delayed consequences (Faries 2016). The machine's value model must therefore track revealed preferences (behavioral patterns) alongside stated preferences.

The wanting/liking dissociation (Berridge & Robinson 2016) provides a neurobiological account of why stated and revealed preferences diverge: a person can want things they don't like, fail to want things they would enjoy, or experience wanting that operates in direct opposition to their cognitive desires.

Partial protections

Caring invariant I1 (no hidden goal substitution) requires that every suggested action cites stored commitments it serves. If the system infers a value update, it must request confirmation and log the change. This creates an auditable trail but does not solve the underlying estimation problem.

Unsolved questions

How to detect value drift versus value change (stale model versus genuine evolution)
How to weight revealed versus stated preferences
When to surface perceived value misalignment versus when to update the model
How to handle the circular dependency between trust (needed for accurate disclosure) and accuracy (needed for trust)

Depends onAgency D2: Capacity as Terminal Objective (Meta-Capability Defense)I1: No Hidden Goal Substitution