Musk’s Grok 4 launches one day after chatbot generated Hitler praise on X

Wait 5 sec.

On Wednesday night, Elon Musk unveiled xAI's latest flagship models Grok 4 and Grok 4 Heavy via livestream, just one day after the company's Grok chatbot began generating outputs that featured blatantly antisemitic tropes in responses to users on X.Among the two models, xAI calls Grok 4 Heavy its "multi-agent version." According to Musk, Grok 4 Heavy "spawns multiple agents in parallel" that "compare notes and yield an answer," simulating a study group approach. The company describes this as test-time compute scaling (similar to previous simulated reasoning models), claiming to increase computational resources by roughly an order of magnitude during runtime (called "inference").During the livestream, Musk claimed the new models achieved frontier-level performance on several benchmarks. On Humanity's Last Exam, a deliberately challenging test with 2,500 expert-curated questions across multiple subjects, Grok 4 reportedly scored 25.4 percent without external tools, which the company says outperformed OpenAI's o3 at 21 percent and Google's Gemini 2.5 Pro at 21.6 percent. With tools enabled, xAI claims Grok 4 Heavy reached 44.4 percent. However, it remains to be seen if these AI benchmarks actually measure properties that translate to usefulness for users.Read full articleComments