
When AI Snitches: Auditing Agents That Spill Your Model’s (Alignment) Tea
Sure, your model aced every benchmark, but can you trust it when the stakes are real? Every frontier lab runs alignment post-training before shipping their chat models to the world. The problem? Actually auditing whether this alignment worked can be an absolute nightmare. You're basically trying to find