Edit to add: I also found someone who recorded a voice chat of the same thing. This isn’t that someone uploaded a song, or that AI didn’t actually process the file. These models really are this sycophantic:
Edit to add: I also found someone who recorded a voice chat of the same thing. This isn’t that someone uploaded a song, or that AI didn’t actually process the file. These models really are this sycophantic:
RLHF was a fundamental mistake. Human feedback almost always trains an AI to be sycophantic because humans in general are super easy to flatter.
We are building the perfect addiction machine, far more powerful than social media is, and it actively undermines the honesty of the system.
I kind of want to see a llm trained on nothing but people who hate being flattered and rather give death threats then accept ANY form of praise
The absolute unhinged result might be enough to finally show people that ai is in fact dumb as rocks.