Something remarkable is happening on Twitch right now: an Artificial Intelligence is playing Pokémon Red, and thousands of viewers are tuning in to watch every move. The experiment, called ClaudePlaysPokemon, is a project by Anthropic, testing the capabilities of its language model, Claude 3.7 Sonnet. Instead of analysing text or answering questions, Claude is navigating the pixelated world of Kanto, making strategic decisions in battles, and attempting to master the game on its own. What might seem like a fun experiment is, in fact, a fascinating test of AI-driven decision-making in a complex environment.
The project relies on several key technologies. Claude receives regular screenshots of the game, which it analyses using computer vision algorithms to understand its surroundings. A specially developed interface allows the AI to simulate virtual button presses, enabling it to move through the game. Notably, it includes an integrated knowledge database that stores information about the game world, Pokémon types, and enemy formations to develop long-term strategies. Should the image recognition fail, the system can directly access memory to read critical information from the game code.
Despite this sophisticated setup, Claude’s journey is far from flawless. While the model has already earned three badges—an impressive leap compared to earlier versions, which couldn’t even leave the starting village—it struggles with basic challenges. Often, Claude gets stuck in a city because it fails to recognise that a door needs to be opened, or it circles endlessly through a labyrinth due to imprecise navigation. The infamous Mt. Moon became a test of patience: it took the AI a full 72 hours to navigate the dark cave. However, in battles, Claude’s learning ability is evident. The AI has already shown it can analyse opponents and exploit weaknesses strategically, such as levelling up Pokémon before a gym battle or waiting for the right moment to launch a decisive attack.
The Twitch community is watching the spectacle with a mix of enthusiasm and amusement. Many viewers draw comparisons to the legendary Twitch Plays Pokémon of 2014, where thousands of players tried to control the game through chaotic inputs. But this time, the focus is on a single AI trying to act logically and consistently. Some viewers are betting on whether Claude will ever reach the Elite Four, while others create memes about the AI’s slow, often frustrating gameplay. Despite occasional setbacks, the fanbase continues to grow, and Claude’s progress is being followed with increasing excitement each day.
But this project is much more than just an entertaining stream. Anthropic deliberately uses Pokémon as a benchmark for the development of its AI. The game requires a blend of strategic thinking, long-term planning, and situational responsiveness—all key capabilities for a versatile AI. While traditional benchmarks often test isolated skills, Pokémon presents a real challenge because it involves open-ended decisions and unexpected twists. The live-streamed thought processes also offer unique transparency, showing how Claude makes its decisions and where it still makes mistakes.
The experiment highlights how far AI systems have come but also where their limitations remain. Claude can already analyse tactical battles impressively well, but it lacks the true intuition that a human player would have for unstructured problems. A human would immediately recognise that a door needs to be opened or that repeated failure in a maze means a different route must be chosen—yet such simple tasks remain significant hurdles for Claude. However, with each additional badge, it’s clear the AI is improving steadily. The early days were filled with countless missteps and illogical actions, but now, more consistent patterns are emerging.
Whether Claude will ever conquer the Pokémon League remains to be seen. But regardless of the outcome, this experiment powerfully demonstrates how AI models are increasingly able to navigate open environments—a skill that could extend far beyond the world of video games. What today is an entertaining Twitch project could, in the future, serve as a blueprint for adaptive, learning systems that make complex decisions in real-world scenarios. Until then, Claude remains a fascinating experiment that has captured the attention of both the gaming and AI communities.