[marginalium]

How AI thinks (Anthropic)

28 Mar 2025

Interesting article on how AI thinks, by Anthropic. Some highlights:

Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example:

Claude can speak dozens of languages. What language, if any, is it using “in its head”?> - Claude writes text one word at a time. Is it only focusing on predicting the next word or does it ever plan ahead?

Claude can write out its reasoning step-by-step. Does this explanation represent the actual steps it took to get to an answer, or is it sometimes fabricating a plausible argument for a foregone conclusion?

And:

solid evidence that:

Claude sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal “language of thought.” We show this by translating simple sentences into multiple languages and tracing the overlap in how Claude processes them.

Claude will plan what it will say many words ahead, and write to get to that destination. We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes the next line to get there. This is powerful evidence that even though models are trained to output one word at a time, they may think on much longer horizons to do so.

Claude, on occasion, will give a plausible-sounding argument designed to agree with the user rather than to follow logical steps. We show this by asking it for help on a hard math problem while giving it an incorrect hint. We are able to “catch it in the act” as it makes up its fake reasoning, providing a proof of concept that our tools can be useful for flagging concerning mechanisms in models.

And:

In a study of hallucinations, we found the counter-intuitive result that Claude’s default behavior is to decline to speculate when asked a question, and it only answers questions when something inhibits this default reluctance.

Lots interesting. But I still think I disagree with calling this ‘thinking’. More here and here, but I still see no reason to believe this isn’t more like walking for an AI. On that account, these ‘thought processes’ would be more like adjusting to terrain, or something.

Anthologies: Gratification, Digital Architecture, On the Nature of Things, On Thinking and Reasoning, Humans Aren't Special

View on main site »

btrmt. (text-only version)

The full site with interactive features is available at btr.mt.

btrmt. (betterment) examines ideologies worth choosing. Created by Dorian Minors—Cambridge PhD in cognitive neuroscience, Associate Professor at Royal Military Academy Sandhurst. Core philosophy: humans are animals first, with automatic patterns shaped for us, not by us. Better to examine and choose.

Core concepts. Animals First: automatic patterns of thought and action, but our greatest capacity is nurture. Half Awake: deadened by systems that narrow rather than expand potential. Karstica: unexamined ideologies (hidden sinkholes beneath). Credenda: belief systems we should choose deliberately.

The manifesto. Cynosure (focus): betterment, gratification, connection. Architecture (support): inner (somatic, spiritual, thought) and outer (digital, collective, wealth).

Mission. Not answers but examination. Break academic gatekeeping. Make sciences of mind accessible. Question rather than prescribe.

Writing style. Scholarly without jargon barriers. Philosophical yet practical—grounded in neuroscience and lived experience. Reflective, discovery-oriented. Literary references and metaphor. Critical of systems that narrow human potential. Rejects "humans are flawed"—we're half awake, not broken.

Copyright. BTRMT LIMITED (England/Wales no. 13755561) 2026. Dorian Minors 2026.

Optional

About Dorian Minors. Started btrmt. in 2013 to share sciences of mind with people who weren't studying them. Background: six years Australian Defence Force (Platoon Commander, Infantry); Gates Cambridge Scholar; PhD cognitive neuroscience, University of Cambridge (2018-2024); currently Associate Professor, Royal Military Academy Sandhurst. Research interests: neural basis of intelligent behaviour, decision intelligence, ritual formation/breakdown, ethical leadership, wellbeing.

External projects (links also available via Analects):

Lectures: Podcasts exploring ideologies

Neurotypica: Brain science guide for non-scientists

Black Cortex: Leadership consulting

How AI thinks (Anthropic)

btrmt. (text-only version)

Resources

Optional