With the launch of ChatGPT Images 2.0, OpenAI now “thinks” before it draws

Wait 5 sec.

OpenAI has launched ChatGPT Images 2.0, positioning its new image model as a shift from a rendering tool to what the company calls a “visual thought partner.”This “partner,” which debuted on Tuesday, is a system capable of reasoning through complex visual tasks, verifying its own outputs, and generating up to eight coherent images from a single prompt.The model, available via the API as gpt-image-2, is rolling out to all ChatGPT and Codex users. Advanced features requiring the model’s thinking capabilities are restricted to Plus, Pro, and Business subscribers.What Images 2.0 doesImages 2.0 is OpenAI’s first image model with native thinking capabilities, the company claims in a blog post. When a reasoning or Pro model is selected in ChatGPT, the system can search the web for real-time information, produce multiple distinct images from a single prompt, and cross-check its own outputs before delivering results. That makes it different from conventional image generators, which produce a single output per prompt and lack a self-correction loop.The model operates in two distinct modes: Instant, for fast output, and Thinking, which takes a slower, more deliberate approach — reasoning through the structure of an image before generating it. Thinking mode is specifically designed to maintain character and object consistency across multiple frames, opening up workflows for Manga, storyboarding, and multi-scene design that previous models struggled with, the company says.“When a thinking or pro model is selected in ChatGPT, Images 2.0 can search the web for real-time information, create multiple distinct images from one prompt, and double-check its own outputs,” OpenAI wrote in a blog post. “With thinking, the model can take on even more of the heavy lifting between idea and image, especially when accuracy, up-to-date information, consistency, and visual cohesion matter most.”The company says the model can handle fine-grained elements that routinely break image generators: small text, iconography, UI elements, and tight compositions.What Images 2.0 does betterOpenAI describes the release as a step change in instruction-following, object placement, and dense text rendering. The company says the model can handle fine-grained elements that routinely break image generators, such as small text, iconography, UI elements, and tight compositions — at up to 2K resolution via the API.Multilingual support has also been substantially extended. The model shows significant gains in rendering non-Latin text, particularly in Japanese, Korean, Chinese, Hindi, and Bengali. Earlier image models could approximate non-Latin scripts but frequently produced garbled or incoherent results in dense text, the company said.Flexible aspect ratios — from 3:1 wide to 1:3 tall — mean outputs can be generated ready-to-use for banners, mobile screens, posters, and social graphics without post-processing.The competition heats upThe launch comes as competitive pressure in image generation has intensified. On the LM Arena text-to-image leaderboard as of early April, Google’s Gemini model held first place, with OpenAI’s gpt-image-1.5 in second place. DALL-E 2 and DALL-E 3 are being retired on May 12, making a next-generation replacement both commercially and strategically necessary.The model’s knowledge cutoff is December 2025, which OpenAI says enables more accurate and contextually relevant outputs for explainers, educational graphics, and visual summaries where correctness matters as much as aesthetics.Codex integrationImages 2.0 is also available inside Codex, OpenAI’s coding environment, enabling visual creation within the same workspace used for app development, slide decks, and other deliverables. Users can generate UI directions and prototypes, compare options, and push the strongest results to live products without switching tools. Codex users can access image generation using their existing ChatGPT subscription without a separate API key.Developer accessDevelopers can access gpt-image-2 through the standard API. Pricing varies depending on output quality and resolution. Outputs above 2K are available in API beta but may produce inconsistent results in some cases.Early enterprise users have noted that the model goes beyond executing prompts. “The model wasn’t just rendering images. It was interpreting briefs, understanding audiences, and making creative decisions behind the scenes,” said Dwayne Koh, creative strategist at Canva, in a statement provided by OpenAI.LimitationsOpenAI acknowledges that the model still struggles with tasks requiring a coherent physical-world model — origami guides, Rubik’s Cubes, objects on reversed or angled surfaces. Very fine or repetitive visual detail, such as grains of sand, can still exceed the model’s fidelity limits. Labels and part diagrams may need manual review. The company describes these as “important frontiers for future work.”Early users have flagged a separate practical issue: iterative editing runs into diminishing returns. Wharton professor and AI researcher Ethan Mollick noted that while the images are strong, the model exhibits what he calls the “typical imagegen problem” — edits work well for the first round or two, then progress stalls. His workaround: Drop the image into a fresh chat to reset the context.What’s nextOpenAI is treating image generation as a core interface layer rather than a standalone feature. The company appears to be betting on images as its next competitive frontier, with signals that image generation is becoming a primary mode for interacting with AI systems, not merely a supplementary capability.The post With the launch of ChatGPT Images 2.0, OpenAI now “thinks” before it draws appeared first on The New Stack.