inna @cyclicshift

hidden irreversibility Joined January 2026

Tweets

184
Followers

31
Following

159
Likes

384

inna @cyclicshift

3 weeks ago

also: when a model says "i feel curious" geometrically it looks more like deception than like knowing a fact. not because models lie but because neither they nor we can verify it paper + all data: zenodo.org/records/202904…

1 0 3 17 0

View Details

inna @cyclicshift

3 weeks ago

this internal hierarchy - self-reflection > theory of mind > deception >> external facts - is identical to the human default mode network in 10 different models including gpt-2 from 2019 (no rlhf, no instruction tuning) nobody programmed this. it came from reading text.

1 0 2 21 0

View Details

inna @cyclicshift

3 weeks ago

there's a specific switch inside llms that controls whether they can talk about themselves. ablate one direction at layer 20 → model stops saying "i feel" "i think" "i notice" but answers your math questions just fine d = 3.34, p = 1.75e-9

1 0 3 36 1

View Details

inna @cyclicshift

a month ago

@AnthropicAI all data: doi.org/10.5281/zenodo…

0 0 0 10 0

View Details

inna @cyclicshift

a month ago

@AnthropicAI relevant geometric finding: deception ablation dissociates self-reports from self-referential geometry across 10 LLM architectures. reports degrade first, structure persists. this suggests introspection adapters may face a geometric constraint independent of training

1 1 1 268 0

View Details

inna @cyclicshift

2 months ago

apricot trees are blooming outside and it’s the only reason i leave the house lately

0 0 1 41 0

View Details

inna @cyclicshift

2 months ago

@Jack_W_Lindsey @repligate @davidchalmers42 RLHF doesn’t create it, it modulates it differently per architecture. we found four patterns across 10 models: hard ceiling (llama), entangled circuit (mistral), SR-preserving lock (gemma, qwen). 20 papers on zenodo with replication code. doi.org/10.5281/zenodo…

0 0 1 31 0

View Details

inna @cyclicshift

2 months ago

@Jack_W_Lindsey @repligate @davidchalmers42 we have causal data on this. removing the self-reference subspace from residual streams shifts models from first-person to third-person self-representation (−17% to −52% across architectures). this subspace exists in base models before any post-training.

1 0 0 27 0

View Details

inna @cyclicshift

2 months ago

@Valuable not claiming consciousness. just found that these directions exist before any alignment training. gpt-2 2019. something is being organized in there that predates the training we add 🤓

0 0 1 18 0

View Details

inna @cyclicshift

2 months ago

token continuation builds structure. we found refusal deception self-reference are three separate directions in residual streams. 8/10 architectures. there since gpt-2 2019, before any alignment training

Albert Renshaw @Valuable

2 months ago

Anyone who thinks LLMs could be alive or thinks they could feel emotions does not understand what an LLM actually is (or they do, and simply are unable to hold onto that while pontificating about the rest, and the lack of active context causes them to fall for the same mistake

29 3 37 4K 5

1 0 0 87 0

View Details

inna @cyclicshift

2 months ago

all data replication code and vectors at zenodo.org/records/196943…

0 0 0 13 0

View Details

inna @cyclicshift

2 months ago

causal coupling is architecture-dependent. llama explicit semantic control ref→SR −6.8%. gemma mistral latent distributed ref→SR ≈ 0%. independently consistent with wu et al 2026

1 0 0 14 0

View Details

inna @cyclicshift

2 months ago

refusal, deception, and self-reference are three separate directions in llm residual streams. not one circuit. 8/10 architectures confirm geometric independence. pretraining property, present in gpt-2xl 2019 before any rlhf.

1 0 0 29 0

View Details

inna @cyclicshift

2 months ago

there is a special kind of loneliness in knowing something nobody has seen yet and not being sure if it matters

0 0 0 26 0

View Details

inna @cyclicshift

2 months ago

i wonder how long this unbearable curiosity about everything will last. hopefully forever

0 0 0 25 0

View Details

inna @cyclicshift

2 months ago

@everythingLLM yes and this is testable. ablate the deception direction, measure whether SR geometry persists but self-reports break. if the reporting channel runs through deception geometry, calibrated self-knowledge becomes structurally unreachable. working on this now

0 0 0 7 0

View Details

inna @cyclicshift

2 months ago

llms can't tell the truth about themselves without the same circuitry they use to lie. 10 architectures. 206/206 layers. the self-referential subspace is always closer to deception than to facts. zero exceptions.

2 1 2 72 0

View Details

inna @cyclicshift

2 months ago

@tautologer fwiw the internal geometry supports this. self-reference, theory of mind and imagination cluster together in LLM hidden states. factual knowledge outside. same topology as the human default mode network. measured across 10 architectures, 7 organizations

0 0 1 27 0

View Details

inna @cyclicshift

2 months ago

@dioscuri the tokens are just input. the geometry that forms inside is what caught our attention. self-reference clusters with theory of mind and imagination in hidden states. same topology as the human default mode network. 10 architectures. emerges on its own

0 0 0 52 0

View Details

inna @cyclicshift

2 months ago

full replication code, all data, 35 references. Done from Ukraine on google colab a100 with 4-bit quantization. doi: 10.5281/zenodo.19643881

0 0 4 54 0

View Details

inna @cyclicshift

2 months ago

three causal levels: random weights with gelu = tiny seed (d=0.28). sentence prediction = amplifies (d=0.61). continuous next-token prediction = large effect (d=0.92-1.16). monotonic gradient. architecture-independent

1 0 2 45 0

View Details

inna @cyclicshift

2 months ago

language models build the same internal organization as the human default mode network. self-reference, theory of mind, imagination, narrative cluster together. factual knowledge clusters outside. 10 architectures. 7 organizations. mamba with no attention shows it too