Watching One of the World's Most Advanced AIs Try to Beat Pokémon Red Is Strangely Fascinating

Wait 5 sec.

To prepare to take over the real world, AI models are first conquering virtual ones.On Tuesday, Anthropic kicked off its Twitch livestream titled "Claude Plays Pokémon." Without human intervention, the Google-backed startup's latest AI model, Claude 3.7 Sonnet, explores the world of Pokémon Red, doing its best to beat Nintendo's classic RPG for the Game Boy, released in the halcyon days of 1998.And it's not doing too badly, either. So far, Claude 3.7 has managed to clinch three Gym Leader badges, most recently besting Lt. Surge at the Vermilion City Gym. That's considerably better than Claude 3.5, which had stalled at Pallet Town, the game's starting area. Endearingly, Claude 3.7 even gives nicknames to its roster of battling creatures, christening its choice of starter Pokémon, Squirtle, as "Shell."Last week, a researcher tried out an early preview of Claude 3.7 Sonnet.The results were striking. Within hours, Claude defeated Brock. Days later, it trounced Misty. Progress that older models had little hope of achieving.Turns out extended thinking is super effective. pic.twitter.com/RspsLgj2Uf— Anthropic (@AnthropicAI) February 25, 2025Video games ranging from Minecraft to Goat Simulator have become a popular way of testing agentic AI models, or AI models that can autonomously interact with a given environment.In the case of Pokémon, the game's turn-based combat, not to mention simple dialog options, make an ideal testing ground for the LLM's newly-boasted "reasoning" capabilities. There's a limited number of options available to the player, making the challenge approachable.Viewers of the livestream can witness Claude's real-time thought process in a window next to the gameplay, providing some amusing insights. "It appears a wild Pokémon encounter has started when I moved!" reads the AI's ersatz stream of consciousness. "Let me press 'a' to advance through this unusual dialogue… and prepare for battle. I'll lead with SPIKE who is at full health." That said, the AI's thought process as it navigates the game's open world portions can be painstakingly circuitous. TechCrunch notes an instance where Claude was stumped by a rock wall that it kept trying to walk through, taking forever to realize that it could simply path around the minor obstacle.According to Anthropic, Claude mainly sees the world by analyzing a constant stream of screenshots of the game — though often erroneously, the startup admits. It also can read the game's memory, gleaning information like the player's coordinates. And in what is the biggest upgrade from its predecessor, Claude 3.7 keeps an ever-changing "knowledge base" to store notes about its gameplay as it goes along, like where things are, or what sequence to press buttons in to execute certain game mechanics. Actually controlling the game, meanwhile, is accomplished by a custom interface that lets Claude press virtual buttons, Anthropic said, along with a pathfinding tool that helps the model determine how to move from location to location.Clunkiness and laborious pace notwithstanding, watching the AI model stumble around and occasionally succeed can be an oddly fascinating spectacle. If nothing else, it's a nostalgic trip down memory lane — or an excuse to keep some good old Pokémon music in the background.More on gaming: Elon Musk's Video Game Character Caught Leveling While He Was at InaugurationThe post Watching One of the World's Most Advanced AIs Try to Beat Pokémon Red Is Strangely Fascinating appeared first on Futurism.