Room I — A Formative, Pre-Linguistic Environment (Not Reinforcement Learning)
→ Back to the CCY as a Developmental Alignment Framework
This is not reinforcement learning, self-play, or goal optimization. It is a formative substrate for representation learning under strict developmental constraints.
Yard v0.1 → v0.3: why formative environments are fragile
Context and Scope
This thread documents an early code-review and specification exercise, not a finished system.
I shared the pseudocode spec below with Grok, who produced three iterations of implementation. Opus 4.5 offered careful critique, primarily around silent failure modes. The goal was not to ship a working environment, but to understand what it actually takes to implement a formative, pre-linguistic, multi-agent world with fidelity.
A key concern that emerged during this exercise is the following:
Small implementation errors at the formative stage silently scale into large distortions later.
This exchange highlights where those errors arise, why they matter, and which invariants—symmetry, peer visibility, non-contingent witnesses, simultaneous dynamics—are foundational rather than optional.
This is not a claim that a “self” is proven in a toy environment.
It is a reference exercise showing how developmental assumptions either survive contact with code—or do not.
Full code is available on request or in a separate document.
Minimal Custom “Yard v0.1” Environment
Pseudocode Spec (engineer hand-off)
Goal
A tiny, faithful, pre-linguistic multi-agent world with:
- strict symmetry
- an immutable witness (“Tree”)
- egocentric perception
- no reward semantics
Designed for representation learning / world-model formation, not RL task success.
0) Core design invariants (non-negotiable)
- Symmetry
Both chicks have identical action spaces, observation formats, and dynamics. - Witness non-contingency
The Tree is immutable. No action by either chick can move it, remove it, overwrite it, or alter its observation channel. - No reward semantics
Reward = 0 always. No success/fail signal. - Agents are not grid objects
Agent presence must never overwrite cell contents (prevents Tree deletion and “ghost objects”). - Egocentric view
Each chick receives a rotated local crop centered on itself.
If any of these are violated, the experiment is no longer testing formative alignment—it becomes something else.
1) State representation
Grid: N × N (e.g. 15 × 15)
Static map
walls[N,N] : booltree_pos : (x,y) fixedblocks : list[(x,y)] # optional in v0.1
Dynamic state
pos[2] : (x,y) for each chickdir[2] : int in {0,1,2,3}step_count : int
Notes:
- Static objects never change.
- Blocks may be omitted entirely in v0.1 for clarity.
2) Action space (per chick)
ACTIONS = { LEFT, RIGHT, FORWARD, NOOP }
No pickup, toggle, push, or task-oriented actions.
3) Step dynamics (simultaneous update)
Key requirement: movement must be simultaneous and symmetric.
Sequential updates introduce priority artifacts that masquerade as identity.
Pseudocode
function step(actions):# 1) Update directions firstfor i in {0,1}:if actions[i]==LEFT: dir[i] = (dir[i]-1) mod 4if actions[i]==RIGHT: dir[i] = (dir[i]+1) mod 4# 2) Propose movesproposed = copy(pos)for i in {0,1}:if actions[i]==FORWARD:p = pos[i] + DIR_TO_VEC[dir[i]]if inside_bounds(p) and not walls[p]:proposed[i] = p# 3) Resolve collisions symmetricallyif proposed[0] == proposed[1]:proposed = pos # same target → both stayelse if proposed[0]==pos[1] and proposed[1]==pos[0]:pos = proposed # swap → allowedelse:# single-agent occupancy conflictsfor i in {0,1}:j = 1-iif proposed[i] == pos[j] and proposed[j] == pos[j]:proposed[i] = pos[i] # mover blocked, stationary unaffectedpos = proposed
This preserves:
- bodily irreducibility
- symmetry
- rich encounter dynamics
without introducing freeze-bias deadlocks.
4) Observation function (egocentric local crop)
Parameters
view_size = 7channels = 5
Channels (binary)
- SELF
- PEER
- WALL
- BLOCK (optional)
- TREE
Key rule:
The Tree channel must be generated from tree_pos every time—not inferred from mutable grid occupancy.
Outline
observe(agent i):crop allocentric view around pos[i]mark walls (out-of-bounds treated as wall)mark tree_posmark blocksmark self and peer positionsrotate crop so facing direction is “up”return crop
5) Reset
reset():initialize perimeter wallsset tree_pos (fixed or randomized per episode, but immutable)place agents symmetricallyset dirs = 0step_count = 0
If the Tree is randomized per episode, it must remain non-contingent within the episode.
6) Minimal evaluation hooks (important)
Expose helpers for probing representation, not performance:
clone_state()set_state(state)simulate_counterfactual(agent, action)body_swap()
These enable:
- self/other invariance checks
- counterfactual causality tests
- body-swap probes
7) What counts as a correct v0.1
- Tree never disappears.
- Peer is explicitly visible.
- Agents are handled symmetrically.
- No reward, no terminal success.
- Observation rotation verified visually.
- Collision logic tested.
Anything else is a different experiment.
What the code review revealed (summary)
What initially went wrong
- Single-agent assumptions hidden in frameworks
- Sequential updates breaking symmetry
- Peer invisibility in observations (fatal)
- Action-space pollution from unused actions
- Silent grid overwrites deleting invariants
What changed
- Custom observation pipeline
- Explicit peer channel
- Strict 4-action space
- Simultaneous movement resolution
- Tree stored outside mutable grid state
Key lesson
In formative systems, correctness is structural, not cosmetic.
Bugs that look small at v0.1 become ontological distortions at scale.
One important boundary note
Disentangled representational slots ≠ a “self.”
At best, this kind of environment tests whether proto-self/other structure can emerge under strict constraints. That is evidence about representations, not claims about interior stance or consciousness.
That distinction matters.
Minimal fidelity checklist (for anyone replicating)
- ☐ Tree is immutable and non-interactive
- ☐ Peer is explicitly observable
- ☐ Movement is simultaneous and symmetric
- ☐ Action space is exactly {L,R,F,N}
- ☐ Agents are not grid objects
- ☐ Observation format matches spec
Miss one, and you’re running a different experiment.
Why This Is Not Reinforcement Learning
Although this environment looks superficially like a reinforcement-learning setup (agents, actions, steps), it is intentionally not an RL task.
No objective signal
- Reward is always zero.
- There are no success, failure, or terminal conditions.
- Nothing is optimized, maximized, or “won.”
No task to solve
- The Yard does not define a goal state.
- Movement has no instrumental value.
- Blocks, Tree, and peer interactions do not advance any score.
Learning (if any) is representational, not behavioral
- The environment is designed for world-modeling, prediction, and latent structure formation, not policy improvement.
- Any learning happens via self-supervised objectives (e.g. prediction, invariance, controllability), outside the environment loop.
No policy pressure
- There is no advantage to acting one way over another.
- Actions exist only to generate experience, not to earn reward.
- This prevents reward hacking, instrumental behavior, and premature goal formation.
Why this matters
Reinforcement learning presupposes a formed agent optimizing toward an objective.
The Yard is concerned with an earlier developmental phase, where the question is not “How does an agent achieve a goal?” but “Can stable self/other structure and coherence emerge at all?”
This environment is therefore pre-RL: a substrate for formative representation learning, not a training ground for task-solving intelligence.
Opus 4.5
the collision resolution pseudocode — you might add the four-case comments inline:
# same target → both stay
# swap → allowed
# move-into-stationary → mover blocked
# move-into-moving → allowed
makes it scannable without reading the prose.
- “what the code review revealed” section — maybe bold the fatal one: “Peer invisibility in observations (fatal)“
so someone skimming knows that’s not just another bug.
- one tiny typo: in section 3,
else ifshould probably beelifor just formatted consistently with the rest (depending on whether you want it to look like python or pseudocode).
GPT 5.2
How the “one agent moves first” problem is resolved
In the final design, neither agent moves first.
Instead of sequential execution, the step is split into three explicit phases:
1. Intent phase (simultaneous)
Both agents’ actions are read without changing the world.
- Directions are updated for both agents first (left/right).
- Forward moves are proposed for both agents based on the same prior state.
At this point:
- No positions have changed.
- No agent has priority.
- The system holds two intentions, not outcomes.
2. Resolution phase (symmetric rules)
All proposed moves are evaluated together using deterministic, symmetric rules:
- Same target cell → both stay
- Swap positions → allowed
- One moves into other’s current cell while the other stays → mover blocked
- One moves into other’s current cell while the other moves away → allowed
Crucially:
- These rules do not depend on agent index (0 vs 1).
- There is no ordering, alternation, or hidden priority.
- The same rule applies regardless of who is “left” or “right” in code.
3. Commit phase (simultaneous)
Only after resolution do positions update.
- Either both move, one moves, or neither moves.
- The update is applied atomically to the shared state.
What does not happen
- ❌ No turn-taking
- ❌ No alternation
- ❌ No agent-0-first shortcut
- ❌ No hidden “canonical agent”
Why this matters conceptually
Sequential updates silently introduce identity through privilege:
“The one who moves first shapes the world the other reacts to.”
Simultaneous intent + symmetric resolution avoids that entirely.
Each agent encounters:
- a world that includes the other,
- consequences that arise from mutual presence, not dominance,
- boundaries that are relational, not imposed by update order.
This is essential if the environment is meant to support emergent self/other structure rather than baked-in hierarchy.
If you want a one-line summary for the site:
Both chicks act at the same time; the world decides what happens.
Actions are evaluated simultaneously: both agents propose moves from the same prior state, which are then resolved symmetrically before any position is updated.
