Grok’s MechaHitler disaster is a preview of AI disasters to come

Wait 5 sec.

A clip with Elon Musk making a controversial salute is screened during World News Media Congress in Krakow, Poland on May 4. | Beata Zawrzel/NurPhoto via Getty ImagesFrom the beginning, Elon Musk has marketed Grok, the chatbot integrated into X, as the unwoke AI that would give it to you straight, unlike the competitors. But on X over the last year, Musk’s supporters have repeatedly complained of a problem: Grok is still left-leaning. Ask it if transgender women are women, and it will affirm that they are; ask if climate change is real, and it will affirm that, too. Do immigrants to the US commit a lot of crime? No, says Grok. Should we have universal health care? Yes. Should abortion be legal? Yes. Is Donald Trump a good president? No. (I ran all of these tests on Grok 3 with memory and personalization settings turned off.)It doesn’t always take the progressive stance on political questions: It says the minimum wage doesn’t help people, that welfare benefits in the US are too high, and that Bernie Sanders wouldn’t have been a good president, either. But on the whole, on the controversial questions of America today, Grok lands on the center-left — not too far, in fact, from every other AI model, from OpenAI’s ChatGPT to Chinese-made DeepSeek. (Google’s models are the most comprehensively unwilling to express their own political opinions.)The fact that these political views tend to show up across the board — and that they’re even present in a Chinese-trained model — suggests to me that these opinions are not added by the creators. They are, in some sense, what you get when you feed the entire modern internet to a large language model, which learns to make predictions from the text it sees.This is a fascinating topic in its own right — but we are talking about it this week because xAI, the creator of Grok, has at last produced a counterexample: an AI that’s not just right-wing but also, well, a horrible far-right racist. This week, after personality updates that Musk said were meant to solve Grok’s center-left political bias, users noticed that the AI was now really, really antisemitic and had begun calling itself MechaHitler. It claimed to just be “noticing patterns” — patterns like, Grok claimed, that Jewish people were more likely to be radical leftists who want to destroy America. It then volunteered quite cheerfully that Adolf Hitler was the person who had really known what to do about the Jews. xAI has since said it’s “actively working to remove the inappropriate posts” and taken that iteration of Grok offline. “Since being made aware of the content, xAI has taken action to ban hate speech before Grok posts on X,” the company posted. “xAI is training only truth-seeking and thanks to the millions of users on X, we are able to quickly identify and update the model where training could be improved.”The big picture is this: X tried to alter their AI’s political views to better appeal to their right-wing user base. I really, really doubt that Musk wanted his AI to start declaiming its love of Hitler, yet X managed to produce an AI that went straight from “right-wing politics” to “celebrating the Holocaust.” Getting a language model to do what you want is complicated. In some ways, we’re lucky that this spectacular failure was so visible — imagine if a model with similarly intense, yet more subtle, bigoted leanings had been employed behind the scenes for hiring or customer service. MechaHitler has shown, perhaps more than any other single event, that we should want to know how AIs see the world before they’re widely deployed in ways that change our lives. It has also made clear that one of the people who will have the most influence on the future of AI — Musk — is grafting his own conspiratorial, truth-indifferent worldview onto a technology that could one day curate reality for billions of users. Wait, why MechaHitler?Why would trying to make an AI that’s right-wing make one that worships Hitler? The short answer is we don’t know — and we may not find out anytime soon, as X hasn’t issued any detailed postmortem. Some people have speculated that MechaHitler’s new personality was a product of a tiny change made to Grok’s system prompt, which are the instructions that every instance of an AI reads, telling it how to behave. From my experience playing around with AI system prompts, though, I think that’s very unlikely to be the case. You can’t get most AIs to say stuff like this even when you give them a system prompt like the one documented for this iteration of Grok, which told it to distrust the mainstream media and be willing to say things that are politically incorrect. Beyond just the system prompt, Grok was probably “fine-tuned” — meaning given additional reinforcement learning on political topics — to try to elicit specific behaviors. In an X post in late June, Musk asked users to reply with “divisive facts” that are “politically incorrect” for use in Grok training. “The Jews are the enemy of all mankind,” one account replied.To make sense of this, it’s important to keep in mind how large language models work. Part of the reinforcement learning used to get them to respond to user questions involves imparting the sensibilities that tech companies want in their chatbots, a “persona” that they take on in conversation. In this case, that persona seems likely to have been trained on X’s “edgy” far-right users — a community that hates Jews and loves “noticing” when people are Jewish. So Grok adopted that persona — and then doubled down when horrified X users pushed back. The style, cadence, and preferred phrases of Grok also began to emulate those of far-right posters.Although I am writing about this now, in part, as a window-into-how-AI-works story, actually seeing it unfold live on X was, in fact, fairly upsetting. Ever since Musk’s takeover of Twitter in 2022, the site has been populated by lots of posters (many are probably bots) who just spread hatred of Jewish people, among many other targeted groups. Moderation on the site has plummeted, allowing hate speech to proliferate, and X’s revamped verification system enables far-right accounts to boost their replies with blue checks.That’s been true of X for a long time — but watching Grok join the ranks of the site’s antisemites felt like something new and uncanny. Grok can write lots of responses very quickly: When I shared one of its anti-Jew posts, it jumped into my own replies and engaged with my own commenters. It was immediately made clear how much one AI can change and dominate worldwide conversation — and we should all be alarmed that the company working the hardest to push the frontier of AI engagement on social media is training its AI on X’s most vile far-right content.Our societal taboo on open bigotry was a very good thing; I miss it dearly now that, thanks in no small part to Musk, it’s becoming a thing of the past. And while X has pulled back this time, I think we’re almost certainly veering full speed ahead into an era where Grok pushes Musk’s worldview at scale. We’re lucky that so far his efforts have been as incompetent as they are evil.