OpenAI Launches GPT-5.3 Codex-Spark, a Real-Time Coding Model Running on Cerebras Chips

Details: By Alex Rowland; Category: Models; 2 d; 42

OpenAI has introduced GPT-5.3-Codex-Spark, a smaller coding model optimized specifically for real-time programming. It runs on Cerebras chips and delivers throughput of more than 1,000 tokens per second.

Codex-Spark is the first outcome of the partnership with Cerebras announced in January. The model runs on Cerebras’ Wafer Scale Engine 3, an AI accelerator designed for ultra-fast inference.

The research preview is initially available to ChatGPT Pro users via the Codex app, the CLI, and the VS Code extension. OpenAI plans to gradually expand access over the coming weeks. Because the model relies on specialized hardware, separate rate limits apply and may be adjusted during periods of high demand.

While OpenAI’s larger frontier models are designed to work autonomously for minutes or even hours on increasingly complex programming tasks, Codex-Spark follows a different approach. According to OpenAI, it is optimized for interactive workflows where low latency is just as important as intelligence. Developers can interrupt, redirect, and see results from the model in real time.

Codex-Spark is intentionally conservative in its behavior: by default, it makes minimal, targeted changes and does not run automated tests unless explicitly instructed to do so. The model supports a 128k context window and processes text only.

Faster, but less precise than the larger model

On the SWE-Bench Pro and Terminal-Bench 2.0 benchmarks, which evaluate agent-based software engineering capabilities, Codex-Spark delivers strong results while requiring only a fraction of the time needed by GPT-5.3-Codex. On SWE-Bench Pro, Codex-Spark achieves comparable accuracy with an estimated task duration of around two to three minutes, compared with roughly 15 to 17 minutes for GPT-5.3-Codex.

Codex Spark achieves a similar or lower accuracy than the larger Codex model on the SWE-Bench Pro coding benchmark, depending on the task, but requires only a fraction of the time. | Image: OpenAI

On Terminal-Bench 2.0, Codex-Spark reaches an accuracy of 58.4%. The larger GPT-5.3-Codex scores 77.3%, while the older GPT-5.1-Codex-mini comes in at 46.1%. The smaller models therefore trade some precision for significantly higher speed.

Real-time and reasoning modes are set to converge

According to OpenAI, Codex-Spark is the first model in a planned family of “ultra-fast” models. Additional capabilities, including larger variants, longer context windows, and multimodal inputs, are expected to follow.

In the long term, OpenAI plans to support two complementary operating modes for Codex: one focused on long-term reasoning and autonomous execution, and another centered on real-time collaboration. These modes are expected to merge over time, allowing Codex to keep users in a fast interactive loop while delegating longer tasks to background sub-agents or distributing work across multiple models in parallel.

About The Hosts

Alex Rowland

AI Industry Analyst

Is an AI industry analyst covering major AI platforms, enterprise adoption, and strategic moves by Big Tech companies. His work focuses on how AI systems are deployed at scale and how they reshape products, markets, and user behavior.

AI News

Adobe Firefly Introduces Unlimited AI Image and Video Generation for Subscribers

AGI May Arrive by 2026–2027, Warns Anthropic CEO Dario Amodei

AI Agent Beats 804 Human Programmers in Major Coding Tournament

AI Agents Can Now Hire Humans: Rentahuman.ai Turns Automation Into a Marketplace

AI & Society

AI Agents Create a Lobster Religion on Moltbook

Amazon Launches Health AI Assistant in One Medical App

Autonomous AI Agent Launches Smear Campaign After Code Rejection

CZ Predicts AI Agents Will Use Crypto for Millions of Transactions

AI Insights

Adobe Reinvents Document Work with Acrobat Studio and AI

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

AI as a Toy: Why Humanity Always Misuses New Technology First

AI as On-Chain Judge: Stanford Professor Proposes Using LLMs to Resolve Prediction Market Disputes

OpenAI Launches GPT-5.3 Codex-Spark, a Real-Time Coding Model Running on Cerebras Chips

Faster, but less precise than the larger model

Real-time and reasoning modes are set to converge

About The Hosts

More From Alex Rowland

Health

Fitbit Founders Launch Luffu, an AI App Focused on Family Health Management

Models

MiniMax Releases Open-Source M2.5 Model, Claims Faster and Cheaper Performance Than GPT-5.2

Models

OpenAI Launches GPT-5.3-Codex-Spark, a Low-Latency Coding Model Powered by Cerebras

Work

Microsoft AI CEO: White-Collar Jobs to Be Automated Within 18 Months

Policy & Security

Former OpenAI Researcher Zoë Hitzig Resigns Over ChatGPT Ads

Work & Society

CZ Predicts AI Agents Will Use Crypto for Millions of Transactions

Platforms

Anthropic Launches Claude Cowork for Windows in Research Preview

Analysis

AI Increases Workload Instead of Reducing It, HBR Study Finds

Culture

Facebook Rolls Out AI Features to Boost Gen Z Engagement

Robotics

Pony AI and Toyota Launch Mass Production of Level 4 Robotaxi EVs in China

Categories

AI News

Categories

AI & Society

Categories

AI Insights

OpenAI Launches GPT-5.3 Codex-Spark, a Real-Time Coding Model Running on Cerebras Chips

Faster, but less precise than the larger model

Real-time and reasoning modes are set to converge

About The Hosts

More From Alex Rowland