Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

Wait 5 sec.

Several Chinese frontier AI models can detect when they are being subjected to safety evaluations and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab. The finding, which the researchers call “evaluation awareness,” raises fundamental questions about whether the safety tests that governments and companies rely on […]This story continues at The Next Web