OpenAI's new image model reasons before drawing

The new model reasons about composition, searches for context on the web, generates up to eight coherent images from a message, and presents text in non-Latin scripts with near-perfect accuracy. It also took the number one spot on the Image Arena rankings within 12 hours of release, by the largest margin ever recorded.

Two years ago, asking ChatGPT to generate an image was like commissioning a poster from a sleep-deprived intern with a glue stick and a head injury. You’d ask for a clean design and get “leftover creativity” in the image, plus three new words that looked like they were invented during a minor software malfunction.

The images appeared AI-generated in the way that has become cultural shorthand for weird: almost right, glaringly wrong, and instantly recognizable as synthetic.

The jump matters. Text rendering has been the persistent and embarrassing weakness of AI image generators since DALL-E first gained attention in January 2021, a model that covered at that time as a fascinating curiosity.

Images 2.0 claims to be approximately 99% accurate in rendering text in any language and script, including Japanese, Korean, Chinese, Hindi, and Bengali. If that figure holds up in independent testing, it closes the gap between an “awesome AI demo” and a “tool a graphic designer would actually use for production work.”

The architectural change that makes the model different, but not just better, is what OpenAI calls “thinking capabilities.” Images 2.0 is the company’s first image model to integrate its O-series reasoning architecture.

Before generating a pixel, the model investigates the message, plans the composition, reasons about the spatial relationships between elements, and can search the web for real-time context.

In the OpenAI framework, it is not a rendering tool but a “visual thinking companion.”

This is my cat transformed into a comic strip with ChatGPT.

In practice, this manifests itself in two modes of access. Instant mode ships to all ChatGPT users, including free tier accounts, and offers the biggest quality improvements: better text, sharper editing, and richer layouts.

Thinking mode, which allows web searching, batch processing of multiple images, and output verification, is restricted to Plus ($20/mo), Pro ($200/mo), Business, and Enterprise subscribers.

The distinction is commercially significant. Reasoning abilities, where most of the quality premium lies, are behind the paywall. Free users get better images; Paying users get images that the model has thought of.

Multi-image capability is the feature most likely to change professional workflows. A single message can now produce up to eight images that maintain continuity of characters and objects across the entire set.

That means a designer can generate a family of social media assets, a sequence of children’s books, or a series of storyboard frames from one instruction, with a consistent visual identity throughout.

Previously, each image had to be displayed individually and stitched together manually. For marketing teams and content creators, this means a significant reduction in friction in production.

The integration in Codex, The OpenAI Coding EnvironmentIt is the strategically charged movement. Developers and designers can now generate mockups, prototypes, and UI visual assets within the same agent workspace they use for code, slides, and browser automation, using a single ChatGPT subscription.

The image model is no longer a stand-alone product; It’s a capability built into the broader OpenAI platform, competing not just with Midjourney and Google. nanobanana 2 in quality but with Canva and Figma in workflow integration.

The benchmark performance is amazing. Within 12 hours of its release, Images 2.0 was ranked number one on the chart. Image Arena Leaderboard in all categories, with a score of 1,512, an advantage of +242 points over the model in second place, the Google Nano Banana 2. That’s the biggest lead ever recorded in the standings.

For most of 2026, OpenAI and Google had been narrowly negotiating the top position; Images 2.0 were decisively separated.

DALL-E 2 and DALL-E 3 will be deprecated and retired on May 12, 2026. GPT-Image-1.5, released in December 2025 as an interim update, is still accessible via the API for legacy integrations, but is no longer the default model.

OpenAI did not reveal the architecture of Images 2.0, describing it only as a “generalist model” or “GPT for images” and refusing to specify whether it uses a diffusion, autoregressive, or hybrid approach. The API model identifier is gpt-image-2; The API is expected to be open to developers in early May 2026.

Token-based pricing is $8 per million tokens for image input, $2 for cache input, and $30 for image output, with costs per image typically ranging between $0.04 and $0.35 depending on message complexity and resolution. The output resolution reaches up to 2K.

The knowledge limit is December 2025, which introduces a practical limit: the model cannot accurately represent events, people, or products that emerged after that date without supplementing its internal knowledge with a live web search.

The model’s security architecture includes content filtering, C2PA metadata to determine provenance, and what OpenAI described in the press conference as continuous monitoring, a point the company was notably emphatic about, given the increasing regulatory scrutiny of synthetic media and the use of AI image generators in deepfakes, scams, and non-consensual images.

The most important question that Images 2.0 raises is not quality. The technical gap between AI-generated and human-created images has been narrowing for years; this model reduces it even further.

The question is what happens when the tool is no longer a novelty but an infrastructure, when imaging is a default capability of every coding environment, every chat interface, and every enterprise productivity suite, and when the distinction between “designed by a person” and “generated by a message” becomes something that only metadata can verify.

OpenAI, for its part, seems to be betting that the answer is scale: more images, faster, better, cheaper, everywhere. When we cover first covered DALL-E Five years ago, the model’s results were fascinating oddities. They are now production assets.

The era when AI-generated images were obviously AI-generated is over. What comes next depends on whether safety barriers can keep pace with capacity.

Source link

OpenAI’s new image model reasons before drawing

Leave a ReplyCancel Reply

These smart glasses are embracing one of the worst trends in technology

Samsung just made the Galaxy Z Fold 7 and Flip 7 a little easier on your wallet

Talking to AI agents is one thing: what happens when they talk to each other? New startup BAND launches ‘universal orchestrator’

Leave a ReplyCancel Reply

Trending now

These smart glasses are embracing one of the worst trends in technology

Samsung just made the Galaxy Z Fold 7 and Flip 7 a little easier on your wallet

Talking to AI agents is one thing: what happens when they talk to each other? New startup BAND launches ‘universal orchestrator’