Category: Analysis
Daniel Mercer
Share
Listen On

OpenAI, together with the biotech company Ginkgo Bioworks, has connected its language model GPT-5 to a fully automated cloud laboratory to optimize cell-free protein synthesis (CFPS). The results are measurable, but the limitations remain significant.

In fields such as mathematics or physics, ideas can often be validated purely through computation. Biology works differently: progress depends on laboratory experiments that consume both time and money. This is where a new joint project by OpenAI and Ginkgo Bioworks comes in. The two companies connected OpenAI’s GPT-5 model to a fully automated cloud lab with the goal of optimizing CFPS.

In cell-free protein synthesis, proteins are produced without cultivating living cells. Instead, the protein-production machinery from inside cells is transferred into a controlled mixture and operated there. The method plays an important role in drug development, diagnostics, and industrial enzyme production, and is also used in the commercial manufacture of an antibody–drug conjugate, according to the accompanying paper.

Optimizing CFPS is considered difficult. The reaction mixtures consist of many interacting components, including DNA templates, cell extracts (lysates containing cellular machinery), energy sources, salts, and cofactors. The space of possible compositions is enormous, and the effects of small changes are hard to predict intuitively. Earlier machine-learning approaches delivered only incremental improvements.

Six rounds, 36,000 reactions, 40% lower costs

Over six iterative rounds, the system tested more than 36,000 different reaction compositions across 580 automated microtiter plates, according to the paper. These are small plastic plates with hundreds of tiny wells in which individual reactions can run in parallel.

The specific production cost for the test protein sfGFP (a fluorescent standard protein used as a benchmark) fell from $698 per gram to $422 per gram. This represents a 40% reduction compared with the previous state of the art published by researchers at Northwestern University in August 2025. At the same time, protein yield increased by 27%, from 2.39 to 3.04 grams per liter of reaction solution. Pure reagent costs dropped by 57%, from $60 to $26 per gram.

For comparison, the authors cite the list price of a commercially available CFPS kit from NEB at around $800,000 per gram, while noting that the figures are not directly comparable.

GPT-5 designs, robots pipette

The workflow follows a closed loop. GPT-5 designs experimental setups as digital files. A validation system based on the Python library Pydantic automatically checks whether the designs are scientifically meaningful and physically executable on the lab automation platform. Only then are the designs sent to Ginkgo Bioworks’ cloud laboratory in Boston.

There, modular robotic units known as Reconfigurable Automation Carts (RACs) execute the experiments automatically. Each cart contains a single lab device, such as a liquid handler, incubator, or measurement instrument. Robotic arms and transport rails move the sample plates between stations, while Ginkgo’s Catalyst software orchestrates the process.

Once experiments are completed, the measurement data flows back to GPT-5, which analyzes the results, formulates hypotheses, and designs the next experimental round. According to the authors, human intervention was limited to preparing and loading reagents and consumables. Over six months, the system generated around 150,000 data points.

In the first round, GPT-5 designed reaction compositions without any prior example data or experimental results, relying solely on knowledge encoded in its model weights. Even in this “zero-shot” mode, it produced usable—though not yet optimal—designs.

Tool access unlocked progress

Initially, the system struggled with high measurement variability, with deviations between replicates on the same plate sometimes exceeding 40%. To improve precision, Ginkgo staff manually adjusted reagent concentrations and stock solutions, reducing variability to a median of 17%. The process was therefore not fully autonomous.

The largest performance jump occurred from the third round onward. At that point, GPT-5 gained access to a computer, the internet, data-analysis packages, and a recent preprint describing the previous best performance achieved by Northwestern University researchers. The model also received richer metadata, including raw data, liquid-handling error logs, and actual incubation times.

From round three onward, GPT-5 was able to combine its own experimental results with insights from the scientific literature. Within just two months (rounds three to five), the system surpassed the prior state of the art. However, improvements to the DNA template and cell lysate occurred at the same time, making it harder to attribute gains solely to the model.

GPT-5 anticipated insights from the field

GPT-5 proposed reagents such as nucleoside monophosphates (NMPs), potassium phosphate, and ribose even before gaining access to the Northwestern preprint. The authors of that publication independently identified the same substances as critical. In other words, GPT-5 arrived at similar conclusions based on its training data and ongoing experimental feedback.

The model also produced human-readable lab notebook entries documenting its data analyses and hypotheses. Among other observations, it identified that an inexpensive buffer (HEPES) had a disproportionately large impact on protein yield, that phosphate must be buffered within a narrow concentration and pH range, and that adding spermidine—a naturally occurring compound that stabilizes nucleic acids—boosts yields.

GPT-5 also made an economic observation: more than 90% of CFPS reaction costs are now driven by cell lysate and DNA. Increasing protein yield per unit of these expensive ingredients is therefore the most effective lever for cost reduction, rather than cutting cheaper auxiliary components.

Of more than twenty proposed additive reagents, several appeared in the best-performing reaction compositions, including NMPs, glucose, potassium phosphate, and catalase.

Few errors, many open questions

The error rate was low. Of 480 designed microtiter plates, only two contained fundamental design errors—less than one percent. In one case, the model overwrote a required volume specification to make room for additional reagents. In another, a unit-conversion bug produced a plate containing only glucose and ribose, which unsurprisingly yielded no protein.

Nevertheless, the limitations are substantial. All results are based on a single protein (sfGFP) and a single CFPS system. Whether the optimized compositions generalize to other proteins remains unclear. In a test with twelve additional proteins, only six were detectable via gel electrophoresis, indicating that further optimization would be required.

Human oversight was also continuously necessary for protocol improvements and reagent handling. OpenAI and Ginkgo Bioworks say they plan to extend the approach to other biological processes.

The ability of AI models to independently improve wet-lab protocols also raises safety questions. OpenAI points to its internal Preparedness Framework for evaluating such capabilities with respect to biosecurity risks, but the paper does not specify concrete measures beyond this stated intention.

AI Research Contributor
Daniel Mercer is an AI research contributor specializing in large language models, benchmarking, and multimodal systems. He writes about model capabilities, limitations, and real-world performance across leading AI assistants and platforms.

Recent Podcasts

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

Artificial intelligence is becoming the main role model for Generation Alpha. 2026 may mark a...

AI as a Toy: Why Humanity Always Misuses New Technology First

Artificial intelligence could, in theory, help solve all of humanity’s problems. Stop wars, cure...

Nvidia’s Jensen Huang and the Myth That AI No Longer Hallucinates

Anyone who believes that AI is in a bubble may feel vindicated by a recent CNBC interview with...