
Making Language Models Unbiased, One Vector At a Time
Introduction
AI has officially broken out of the tech bubble and into everyday workflows, boosting productivity but also raising safety concerns, especially around bias in large language models. These models inherit societal biases from internet data, and debiasing efforts by frontier labs can sometimes go too far (remember the racially