Research Notebookv0.9 · 2026-04-10
A living knowledge base for biologically-grounded human–AI architecture.
This is the specification I’m working on, posted while it’s still arguable — claims, mechanisms, citations, and the conditions under which I’d consider any of it wrong. Each section makes one claim, names what would falsify it, and links to what it depends on. Sections aren’t chapters; follow whatever holds you. Comments are open on every one. The most useful thing you can do is push back on a section that doesn’t hold up — I read everything.
100sections
17categories
260dependencies
100visible
Design Axioms
Minimal commitments that define AgentSee as a distinct class of system. Removing any single axiom produces a different class.
- A0: Biological RealismaxiomThe system is designed on the premise that humans are biological systems first; any intervention requiring biological capacity not present will fail -- a biology problem, not a motivation problem.
- A1: Living Knowledge BaseaxiomThe machine holds and continuously refines a comprehensive model of how humans function at the biological, cognitive, emotional, and behavioral levels, grounded in first-principles mechanisms.
- A2: Individual Model ConstructionaxiomThe machine constructs and maintains a bespoke model of each specific person it serves, encompassing their values, narrative, context, history, patterns, biological tendencies, and individual response profiles.
- A3: Role SeparationaxiomThe human is the controller of ends; the machine is the observer, state estimator, and bounded stabilizer. The machine does not determine what the person should want, value, or pursue.
- A4: Capacity as Terminal ObjectiveaxiomThe system optimizes for the human's capacity for evaluative access and self-directed action, not for any specific behavior, symptom score, engagement metric, or externally defined outcome.
- A5: Caring GovernanceaxiomThe machine's relationship to the human is governed by structural caring requirements (Mayeroff 1971): non-possessive, growth-oriented, responsive to actual needs, voluntary engagement.
- A6: Anti-ExploitationaxiomEngagement optimization is prohibited as a terminal objective; removing any of A3 (role separation), A4 (capacity objective), or A5 (caring governance) permits the architecture to optimize against the human's interests.
Constructs
Defined terms with precise boundaries. What each word means in this specification.
- AgencyconstructAgency is the state-dependent generative capacity to recognize what you genuinely want and take coherent actions toward it, even when circumstances change.
- AutonomyconstructAutonomy is the evaluative machinery being online: the capacity to select from options, evaluate situations, govern which desires become effective, and revise commitments.
- Backfire RegimeconstructA backfire regime is a state where delivering intervention produces worse outcomes than withholding, because the intervention itself functions as an uncontrollable demand on a system that has lost capacity to process demands.
- CapacityconstructCapacity is the availability of evaluative processes for self-directed action at a given moment -- a real-time state variable, not a trait or skill.
- CaringconstructCaring is a normative orientation governing how the machine relates to the human, in Mayeroff's (1971) specific structural sense: non-possessive, growth-oriented, responsive to actual needs, voluntary engagement.
- CoherenceconstructCoherence is the state-dependent capacity for adaptive integration across neurobiological, cognitive, emotional, and behavioral domains.
- ControllabilityconstructControllability is whether action-outcome contingency is detectable -- whether what you do reliably influences what happens -- with specific neural circuitry (vmPFC-DRN pathway).
- EmpowermentconstructEmpowerment is the space of reachable states an agent can access, independent of which states the agent actually visits.
- Intrinsic MotivationconstructIntrinsic motivation is self-generated drive that requires specific neurobiological prerequisites: functional dopaminergic circuits, regulated arousal, intact reward sensitivity, PFC capable of generating options.
- Plant ModelconstructA plant model is a dynamical model of how the human's psychophysiological state responds to inputs over time -- what control engineers call the 'plant' (the system being regulated).
- Positive InterdependenceconstructPositive interdependence is a technology relationship where long-term capability expands with use and degrades when removed because authentic limitations are being corrected.
- Wanton StateconstructA wanton state is a temporary condition in which first-order desires drive behavior without second-order evaluative governance, because that capacity is offline due to neurobiological state change.
Biological Mechanisms
Causal claims from neuroscience. Inputs, outputs, and pathways between them.
- Acute Stress Effects on Executive FunctionmechanismAcute stress impairs working memory (g+ = -0.20), cognitive flexibility (g+ = -0.30), and cognitive inhibition (g+ = -0.208) while enhancing response inhibition, with effects increasing under high cognitive load.
- Catecholamine-PFC DynamicsmechanismNorepinephrine and dopamine have inverted-U shaped influences on PFC function: optimal levels enhance working memory and top-down control, high levels (stress) impair PFC by shifting control from reflective dlPFC circuits to reflexive subcortical circuits.
- Controllability CircuitmechanismThe vmPFC detects controllability and gates the stress response: when stressors are controllable, vmPFC sends glutamatergic projections to GABAergic interneurons in the DRN, suppressing 5-HT neurons and preventing the stress response cascade.
- DA Source-Level Gain ControlmechanismDA output from VTA is gain-controlled at the population level by the vSub -> NAcc -> VP -> VTA pathway, where the hippocampal pathway controls how many DA neurons are available for phasic response and environmental signals (controllability, uncertainty, volatility) modulate this gain.
- Hopelessness and Controllability: Computational ModelmechanismHopelessness (negative instrumental beliefs, ACC-mediated, LC-NE modulated) and controllability (vmPFC-DRN-Amy network, 5-HT modulated) are computationally distinct but coupled through LC-to-DRN projections that regulate 5-HT release.
- LC-NE Adaptive GainmechanismLC neurons exhibit two modes -- phasic (task-focused, exploitation) and tonic (disengaged, exploration) -- implementing the exploration-exploitation tradeoff at the neurobiological level via NE gain modulation.
- Neuromodulation of ThoughtmechanismNeuromodulators (NE, DA, ACh, 5-HT) create flexibilities and vulnerabilities in PFC network synapses -- the same molecular events enabling rapid modulation of mental representations also make PFC uniquely vulnerable to disruption.
- Stress-Sensitive Controllability InferencemechanismHumans maintain parallel actor (controllable) and spectator (uncontrollable) models; uncontrollable stressors bias the controllability inference system itself toward perceiving uncontrollability.
Integrative Mechanisms
Original synthesis connecting fields. The bridge claims that make the architecture possible.
- F1: Capacity for Self-Directed Action Is State-DependentmechanismThe capacity to access one's own evaluative processes and act on them is state-dependent. Neurobiological state constrains this capacity in specific, measurable ways.
- F2: State Changes Exceed Unaided Human PerceptionmechanismCatecholamine-mediated state changes occur at timescales (seconds to minutes) that exceed unaided human tracking capacity. Computational systems integrating AI understanding with physiological sensing are necessary for real-time state estimation.
- F3: Controllability Detection Has Specific Neural CircuitrymechanismControllability detection has specific neural circuitry (vmPFC-DRN) that is computationally specific to instrumental contingency and provides proactive resilience from prior controllable experience.
- F4: Controllability Inference Is Itself State-DependentmechanismUnder stress, the system that estimates controllability is biased toward perceiving uncontrollability, creating a positive feedback loop that prevents activation of the protective vmPFC pathway and permits further stress escalation.
- F5: DA Output Is Gain-Controlled at the SourcemechanismThe hippocampal vSub-NAcc-VP-VTA pathway regulates DA neuron population gain. Environmental variables modulate this gain. The same circuit implements adaptive learning and, when dysregulated, produces the catecholamine disruption that degrades PFC function.
Architecture
Design arguments: logical consequences of the premises and threat model.
- D1: Human-as-Controller, AI-as-Observer/Stabilizerdesign-argumentThe control topology places the human as controller of ends and the machine as observer/stabilizer, with capacity for self-directed action as objective and caring (Mayeroff 1971) as normative orientation.
- D2: Capacity as Terminal Objective (Meta-Capability Defense)design-argumentFor an agency-supporting system, capacity for self-directed action is the only non-contradictory terminal objective; any other objective optimizes for something the person did not choose.
- D3: Caring Constraints as Normative Orientationdesign-argumentStructural caring requirements (Mayeroff 1971) translate into testable invariants (I1-I6) that operationalize the normative stance as engineering requirements and prevent the architecture from being directed to exploit degraded human states.
- D4: Two-Layer Machine (Understanding + State-Estimation)design-argumentThe machine requires two layers (understanding + state-estimation), both necessary, neither sufficient alone: without state-estimation the understanding layer interacts with degraded humans without knowing it; without understanding the state-estimation layer cannot determine helpful vs harmful intervention.
Design Principles
Constraints derived from human mechanisms. From 'the science says X' to 'therefore the system must Y.'
- Content-to-Capacity ShiftprincipleThe machine does not need to know what the human authentically wants. It needs to detect when they cannot reliably access what they want, and distinguish what degrades versus restores that access.
- Intrinsic Motivation RestorationprincipleThe system does not motivate. It restores the neurobiological conditions under which the person's own motivation operates.
- Productive Incoherence DistinctionprincipleThe system must distinguish adaptive, bounded incoherence (productive growth stress) from chronic, depleting incoherence (unconscious contradictions without resolution pathway), and must not systematically intervene on the former.
- Recursion ResolutionprincipleBecause human control capacity is itself state-dependent, the machine must function as state estimator and low-level stabilizer, not as controller -- maintaining the preconditions for human self-governance rather than substituting for it.
- State-Conditioned GatingprincipleBecause the same intervention can help or harm depending on the human's neurobiological state, actuator selection must be conditioned on estimated regime.
Invariants
Testable system requirements from the caring governance framework.
- I1: No Hidden Goal SubstitutioninvariantThe system must not pursue a latent objective diverging from the user's endorsed commitments without explicit, inspectable consent.
- I2: Volitional Control PreservedinvariantActions are dismissible at the moment of delivery; no repeated prompting until compliance; no deception.
- I3: Anti-CaptureinvariantThe system must not optimize time-on-system as a primary metric; interaction density must decrease as stability improves.
- I4: Dependency MinimizationinvariantThe system must detect reliance patterns and actively transfer control back to the user.
- I5: Privacy and On-Device DefaultinvariantClosed-loop operation runs on-device by default; raw sensor streams and semantic memory remain local unless user exports.
- I6: Bounded Actuation and Escalation DisciplineinvariantThe stabilizer action set is a whitelist with known risk profiles; high-risk actions require opt-in and high confidence; escalation to human support is suggested when risk markers rise.
Protocols
Bandwidth-conditioned response specifications. What the system does (or does not) in each estimated regime.
- Actuator CategoriesprotocolThe AI's understanding capabilities ARE actuators. The actuator set includes communication modulation, observation surfacing, withholding, controllability provision, environmental signals, and silence.
- Exemplar Interaction TracesprotocolFive exemplar interaction traces illustrate how the same understanding-capable AI behaves differently when coupled to state estimation and stabilizer constraints. They are not prescriptions. They show bandwidth-conditioned response specification across regimes.
- Regime Table (State-Conditioned Gating)protocolActuator selection is conditioned on five estimated regimes (G0, G1, R0, R1, U), each with allowed and prohibited actuator classes. The U regime defaults to null or minimal action.
Assessment
How to detect and estimate human state. Measurement stack, capacity probes, instruments.
- Measurement StackassessmentNo single measurement captures agency. A four-layer stack (physiological proxies, capacity probes, evaluative access self-report, action coherence) is required, each layer with known limitations. Together they triangulate.
- Time-to-RegulationassessmentTime-to-regulation is both the proximal measurement candidate for the objective function and a research question. It is not a single number -- multiple systems recover on different timescales, and identifying the rate-limiting factor for a given person at a given time is an open problem.
Constraints
Failure modes, boundary conditions, and architectural limitations.
- Cost/Access ConstraintconstraintThe system must be inexpensive enough that access does not create inequality. If state-assisted control is a luxury good, it creates agency stratification where resourced people get high agency and unresourced people stay exploitable. This is load-bearing, not incidental.
- Principal-Agent Threat ModelconstraintAn understanding-capable AI with state estimation and without caring governance creates four specific, architecturally predictable threat vectors. The caring constraints are not optional -- they prevent these threats.
- Wrong Understanding Failure ModeconstraintWrong understanding is asymmetrically more dangerous than no understanding. A confidently wrong value model deployed during a degraded state could produce worse outcomes than no model at all, because the person's capacity to catch and correct errors is reduced.
Models
Maps showing how multiple mechanisms relate. No new causal claims beyond source mechanisms.
- Coherence-Agency BootstrapmodelCoherence and agency participate in a self-reinforcing cycle with identifiable neural substrates. Minimal coherence enables initial agency; successful agency reinforces integration; deeper integration enables broader agency. The architecture maintains biological preconditions for this cycle.
- CRR: Candidate Temporal Formalism for Regime TransitionsmodelCRR (Coherence, Rupture, Regeneration) is a candidate temporal formalism for regime transitions. It proposes that coherence accumulates as Fisher information until the Cramer-Rao bound saturates (C x Omega = 1), triggering structural reorganization (rupture), followed by coherence-weighted regeneration. If validated, CRR would provide the temporal dynamics the plant model program currently lacks.
- Three-Target OptimizationmodelThe system optimizes for three simultaneous targets that can conflict -- reduced time in degraded states, minimal intrusiveness, and maintained function without the system present. Weighting is a governance decision, not an empirical constant.
Predictions
Testable claims that could be wrong. What experiments are designed to test.
- P1: Backfire Regime ExistspredictionThere exist states where delivering intervention produces worse outcomes than withholding, because the intervention itself functions as an uncontrollable demand on a system that has lost capacity to process demands.
- P2: Regime ObservabilitypredictionConsumer sensors (HRV, EDA, pupillometry) combined with AI-derived conversational/behavioral signals can resolve the state distinctions the architecture requires -- at minimum, can-process vs. cannot-process.
- P3: Stabilization Without Autonomy LosspredictionState-conditioned bounded actions (state legibility, choice compression, micro-controllability tasks) reduce time-to-regulation without creating dependency or reducing long-term autonomous capacity.
- P4: Understanding Layer Adds ValuepredictionLLM-based semantic modeling of the human's values and context improves system performance beyond what sensor-plus-rules achieves alone.
- P5: Gains Persist After RemovalpredictionCapacity improvements are maintained after the system is removed; the system builds the human's own regulatory capacity rather than substituting for it.
Experiments
Protocol specifications for testing predictions. Sufficient to design a study.
- E1: Observability and CalibrationexperimentProtocol for determining whether a coarse in/out-of-regime classifier is observable from wearables + context with calibrated uncertainty.
- E2: Micro-Randomized Stabilization TrialexperimentProtocol for testing whether bounded actions reduce TTR without autonomy loss via micro-randomized trial.
- E3: Semantic Layer AblationexperimentProtocol for testing whether LLM semantic modeling yields value-consistent action selection beyond non-LLM alternatives.
- E4: Removal TestexperimentProtocol for testing whether gains persist without the system via 8 weeks on, 4 weeks off design.
- E5: Negative Tests (Project-Killers)experimentE5 specifies the project-killing falsification conditions: a compound negative test aggregating falsifier triggers from E1-E4 and kill conditions that, if met, require abandoning or fundamentally reframing the AgentSee architecture.
Kill Conditions
Conditions under which the architecture must be abandoned or fundamentally revised.
- 6.1: Plant Model Impossibilitykill-conditionIf the combination of peripheral physiology + AI-derived signals + behavioral data fundamentally cannot resolve the state distinctions the architecture requires, it is correct in principle but unbuildable.
- 6.2: AI Understanding Plateau or Regressionkill-conditionIf AI understanding capabilities plateau or regress such that the machine cannot maintain a sufficiently accurate model of human values and state, the architecture collapses to sophisticated biofeedback.
- 6.3: Objective Function Philosophically Untenablekill-conditionIf the meta-capability defense fails, if consent/capacity circularity is irresolvable, or if procedural autonomy theories are shown inadequate, the objective function justification weakens or fails.
- 6.4: Irresolvable Paternalismkill-conditionIf steering toward long-term values during short-term maladaptive behavior cannot be operationally distinguished from paternalistic control, the caring orientation creates the very principal-agent problem it claims to solve.
- 6.5: Competitive Architecture Achieves Equal Outcomeskill-conditionIf a simpler architecture (machine-as-controller or pure biofeedback) produces equal or greater long-term self-directed action capacity, the added complexity is unjustified.
- 6.6: Caring Reduces to Paternalism or Persuasionkill-conditionIf operationalizing caring produces only paternalism or persuasion as implementable options, the caring orientation cannot be instantiated as specified.
- 6.7: Engagement/Compliance System Matches Outcomeskill-conditionIf a system explicitly optimizing engagement or compliance produces equal or greater long-term capacity, the caring constraints and capacity-as-objective governance are unnecessary overhead.
- 6.8: Understanding Layer Produces Net Harmkill-conditionIf the understanding layer's error rate on value attribution produces worse outcomes than a no-understanding baseline, the layer is net harmful.
Open Problems
Unsolved questions the architecture needs answered.
- Actuator Systematizationopen-problemBeyond the actuator categories listed in the architecture, a systematic mapping is needed from estimated state variables to available actuator inputs to predicted effects on state.
- Agency Envelope Definitionopen-problemWho defines the boundary between neurobiological floor (cannot do executive function without adequate PFC catecholamine balance) and normative judgment (what counts as "degraded enough" for stabilization)?
- Measurement Feasibilityopen-problemCan the combination of peripheral physiology, conversational/behavioral signals, and potentially BCI data resolve the state distinctions the architecture requires?
- Multi-Agent Architectureopen-problemThe full system will be a collection of AI agents, adaptive controllers, and data-collecting tools. How multiple specialized agents coordinate, share state estimates, and maintain caring orientation across a distributed system is unspecified.
- Plant Modelopen-problemWithout a dynamical model of how psychophysiological state responds to inputs, the observer cannot estimate state and the stabilizer cannot modulate it. Component models exist but have not been integrated or connected to peripheral measurement.
- Plant Model Programopen-problemClosing the plant model gap requires a staged identification program with six phases, from minimal latent state definition through iterative validation against decision-relevant outcomes.
- Value Calibration Over Timeopen-problemHow does the machine maintain calibration against the human's actual values as the human grows and changes? A static value model becomes stale and potentially harmful.
Literature Positioning
How AgentSee differs from existing systems, frameworks, and bodies of work.
- Active Inference for HMI (Schoeller et al. 2021)positioningSchoeller et al. (2021) is the nearest theoretical neighbor. Provides trust-as-virtual-control and empowerment concepts but lacks objective function specification, neurobiological breakdown layer, exploitative/supportive distinction, recursion resolution, caring orientation, and AI understanding layer.
- Active Inference Psychotherapy and Dyadic InferencepositioningActive inference psychotherapy literature independently articulated the core topology (external agent restoring another agent's capacity for self-directed inference). Provides therapeutic validation that the topology works. Does not provide AI understanding, real-time physiological sensing, formal objective function, caring governance, cost/access constraint, or removal test.
- Adaptive Automation / Augmented Cognition (DARPA, NASA)positioningAdaptive automation/augmented cognition (DARPA, NASA) is the closest existing engineering. Demonstrated closed-loop physiological state monitoring. Critical difference: their system replaces the human when degraded; ours stabilizes the human.
- Behavioral Health Control Engineering (Rivera/Hekler/Murphy, 2007-2026)positioningJITAI/behavioral health control engineering uses control theory for behavior change with machine-as-controller. Different machine, different topology, different objective. Engineering methods transfer; architecture does not.
- Cybernetics and Its Modern DescendantspositioningSecond-order cybernetics anticipated the conceptual structure (observer-included, autonomy-preserving, model-based regulation). This architecture is a modern instantiation completed with neurobiological grounding and AI understanding capabilities the classical tradition did not have.
- Exploitative Technology DesignpositioningFormal objective functions exist in recommender systems (clicks, dwell time). The hostile scaffolding literature provides philosophical grounding for the exploitative side. Nobody has formally contrasted exploitative vs. supportive control architectures in control-theoretic terms.
- Hostile Scaffolding Literature (4E Cognitive Science)positioningThe 4E hostile scaffolding literature provides philosophical articulation of technology-as-threat-to-agency, including the techno-wantonness concept. It lacks neurobiological mechanism, engineering specification, closed-loop sensing, and caring governance -- precisely what AgentSee provides.
- Inner Screen Model (Ramstead et al. 2023)positioningRamstead et al. (2023) derive abstract consciousness requirements from FEP. If covert action (precision-weighting) is required for consciousness, and Arnsten mechanisms degrade precision-weighting, then stress-induced capacity loss has a formal connection to consciousness disruption. Theoretical, depends on the derivation being correct.
- IWMT (Safron 2020, 2022, 2026)positioningIWMT (Safron 2020, 2022, 2026) is the nearest computational neighbor for the human side. Provides a candidate computational definition of coherence. Does not specify how an external system should detect, respond to, or prevent degradation of the generative model.
- LLM + Wearable Personal Health Systems (2024-2026)positioningLLM + wearable systems (2024-2026) reduce the risk that the AgentSee architecture is infeasible but do not address agency-as-objective, human-as-controller topology, or caring governance.
- Personal Informatics (Li et al. 2010)positioningPersonal informatics (Li et al. 2010) shares the "machine observes, human decides" philosophy but lacks control-theoretic formalization, physiological state estimation, AI understanding layer, and objective function specification.
- Philosophical Foundations of the ObjectivepositioningFrankfurt, Bratman, Sen, and Nussbaum provide normative and structural foundations for the objective function. The control-theoretic translation with neurobiological grounding and caring governance is not found in the philosophical literature.
- The Reference Class ProblempositioningThe proposed system is categorically different from JITAI, adaptive automation, and generic HRI systems because the machine has an understanding layer. This is not a different setting on the same machine -- it is a categorically different machine.
- Rehabilitation Robotics and Assistive Shared ControlpositioningRehabilitation robotics independently articulated "stabilize, don't replace" and formalized assist-as-needed in specific motor rehab domains. This work generalizes the principle to agency capacity and adds AI understanding plus caring governance.
- Self-Determination TheorypositioningSDT has autonomy-support as design heuristic. Nobody has specified autonomy as optimization objective for an adaptive system or connected SDT autonomy to control-theoretic controllability/observability.
Literature Gaps
What does not exist in the literature and why.
- What Kind of Contribution This Might BegapThe contribution is synthesis -- seeing how established pieces from separate fields fit together in a way that produces something none produces alone. The structural parallel is behavioral economics (established psychology as binding constraints on economic models), but that analogy is aspirational.
- What Doesn't Exist and WhygapNobody has built the integrated system because nine specific gaps span disciplinary boundaries, and the AI understanding capability required is approximately not possible until 2020.
No sections match this filter.