[marginalium]

How to judge AI performance.

27 Dec 2024

How to judge AI performance. It’s notable that our method of telling which AI model is better than others is to test it on human assessments. But AI’s aren’t human, and concentrating on how human-like they are seems like a good way to miss whatever problems they actually will have. Anyway, this paper reckons that it also makes us think AIs are less useful than they are:

We study how humans form expectations about the performance of artificial intelligence (AI) and consequences for AI adoption. Our main hypothesis is that people project human-relevant task features onto AI. People then over-infer from AI failures on human-easy tasks, and from AI successes on human-difficult tasks. Lab experiments provide strong evidence for projection of human difficulty onto AI, predictably distorting subjects’ expectations. Resulting adoption can be sub-optimal, as failing human-easy tasks need not imply poor overall performance in the case of AI. A field experiment with an AI giving parenting advice shows evidence for projection of human textual similarity. Users strongly infer from answers that are equally uninformative but less humanly-similar to expected answers, significantly reducing trust and engagement. Results suggest AI “anthropomorphism” can backfire by increasing projection and de-aligning human expectations and AI performance.

A very complicated way of pointing out that you won’t think AI is useful unless you figure out where it actually is useful, rather than trying to use it as a drop-in replacement for yourself. At least in the short term, anyway.

Anthologies: Gratification, Wealth Architecture, Digital Architecture, On Being Fruitful, On Thinking and Reasoning, Noetik

View on main site »

btrmt. (text-only version)

The full site with interactive features is available at btr.mt.

btrmt. (betterment) examines ideologies worth choosing. Created by Dorian Minors—Cambridge PhD in cognitive neuroscience, Associate Professor at Royal Military Academy Sandhurst. Core philosophy: humans are animals first, with automatic patterns shaped for us, not by us. Better to examine and choose.

Core concepts. Animals First: automatic patterns of thought and action, but our greatest capacity is nurture. Half Awake: deadened by systems that narrow rather than expand potential. Karstica: unexamined ideologies (hidden sinkholes beneath). Credenda: belief systems we should choose deliberately.

The manifesto. Cynosure (focus): betterment, gratification, connection. Architecture (support): inner (somatic, spiritual, thought) and outer (digital, collective, wealth).

Mission. Not answers but examination. Break academic gatekeeping. Make sciences of mind accessible. Question rather than prescribe.

Writing style. Scholarly without jargon barriers. Philosophical yet practical—grounded in neuroscience and lived experience. Reflective, discovery-oriented. Literary references and metaphor. Critical of systems that narrow human potential. Rejects "humans are flawed"—we're half awake, not broken.

Copyright. BTRMT LIMITED (England/Wales no. 13755561) 2026. Dorian Minors 2026.

Optional

About Dorian Minors. Started btrmt. in 2013 to share sciences of mind with people who weren't studying them. Background: six years Australian Defence Force (Platoon Commander, Infantry); Gates Cambridge Scholar; PhD cognitive neuroscience, University of Cambridge (2018-2024); currently Associate Professor, Royal Military Academy Sandhurst. Research interests: neural basis of intelligent behaviour, decision intelligence, ritual formation/breakdown, ethical leadership, wellbeing.

External projects (links also available via Analects):

Lectures: Podcasts exploring ideologies

Neurotypica: Brain science guide for non-scientists

Black Cortex: Leadership consulting

How to judge AI performance.

btrmt. (text-only version)

Resources

Optional