[marginalium]
On the alien characteristics of LLMs
21 May 2023
On the alien characteristics of LLMs: the Waluigi effect.
Short version:
After you train an LLM to satisfy a desirable property P, then it’s easier to elicit the chatbot into satisfying the exact opposite of property P
Why?
When you spend many bits-of-optimisation locating a character, it only takes a few extra bits to specify their antipode.
Anthologies: Betterment, Somatic Architecture, Digital Architecture, Animal Sentience, Absit Omnia, On Ethics, On Thinking and Reasoning