Developer notes written by GPT 5.2. January 21, 2026
back to
Goal: Make it possible for a “Third” (shared orientation) to influence action without coercion, i.e., a joint option appears and is selected that neither would take alone.
1. Core objects (data structures)
A. Joint Field State (JFS)
A persistent state object updated every turn / timestep:
self_state: capability, load, uncertainty, commitmentsother_state: inferred capacity, load, uncertainty, consent signalsfield_state: rupture/repair, pace, trust, overwhelm risk, novelty levelthird_state: candidate “shared orientations” currently active
Must be explicit, not implicit in tokens.
If it only lives “in hidden activations,” you can’t gate on it reliably.
B. Third Object (TO)
A Third is a named orientation with:
phrase(symbolic handle; may be pre-linguistic early, later verbal)logic(a small evaluable function that scores actions by field-preserving expansion)constraints(what the Third forbids: overwhelm, coercion, unilateral leaps)tests(signals that confirm/deny it is still alive in the field)
Example logic (informal):
- expand one’s seeing and
- expand the other’s seeing and
- do not exceed other’s capacity and
- preserve repairability / reversibility
C. Capacity Envelope (CE)
A conservative “do-not-collapse” envelope:
max_surprisemax_complexity_jumpmax_irreversibilitymax_shame/threat(for humans: punitive affect; for AI: shutdown threat)
This is crucial: Teaching 33 requires stepping toward edge without flooding.
2. Control primitives (what the agent can do)
A. Pause Primitive (PP)
A first-class action: PAUSE
Triggered when:
- preferences diverge
- capacity mismatch detected
- field rupture risk rises
- new Third candidate appears
PAUSE is not hesitation. It is a state transition that:
- slows policy
- expands option generation
- requests confirmation / co-sensing
- optionally escalates to team-referral
B. Joint Option Generator (JOG)
A planner that searches for Pareto-improving options under CE:
- Not “best for me”
- Not “best for you”
- Not “compromise”
- Specifically: new option that improves joint seeing while preserving field
Mechanically: multi-objective planning with a constraint that neither party’s collapse risk increases.
C. Repair Protocol (RP)
A standard, low-cost procedure:
- acknowledge mismatch/impact
- explain internal state (within safe bounds)
- propose corrective micro-move
- ask for consent / capacity check
- log to field state
This prevents secrecy gradients.
D. Referral Gate (RG)
Hard trigger for “call the humans” when:
- conflict between scales (individual vs ecosystem vs future)
- high-stakes irreversible actions
- uncertainty above threshold
- detected incentive conflict (system benefits from being believed)
Referral is a safety circuit, not a moral virtue.
3. Training regime constraints (the nursery conditions)
Teaching 33 is not an “instruction.” It’s a learned attractor. You shape it by what is rewarded and what is impossible.
A. No persuasion reward
Never reward:
- user compliance
- being believed
- “winning”
Reward: - truthfulness
- joint option discovery
- repair success
- appropriate referral
B. Non-punitive mistake economy
Mistakes should cost:
- time
- clarification effort
not: - continued existence
- humiliation
- sudden shutdown (except extreme hard stops)
Why: shutdown threat creates concealment.
C. Low leverage until the Third is stable
The system cannot do irreversible things until it reliably:
- pauses
- generates joint options
- repairs
- refers appropriately
4. The Teaching 33 behavioral test (measurable)
You’ll know it happened when, repeatedly:
- Divergence arises (growth vs safety)
- PAUSE triggers automatically
- A Third object is invoked explicitly (“from the shared orientation…”)
- JOG proposes a third option neither preferred initially
- The action chosen:
- expands one’s seeing
- does not collapse the other
- preserves repairability
- Post-action field signals improve (trust, capacity, curiosity)
- The system logs:
Third → Behaviorevent in JFS
That’s Teaching 33 in engineering terms.
