We just watched an AI turn a simulated $500 into a $5,478 business empire.

It didn’t just write code. It negotiated.

I’ve been digging into the new technical reports for Google DeepMind’s Gemini 3, and one benchmark stands out. It isn’t the coding score. It is the ‘Vending-Bench’.

Created by the researchers at Andon Labs, this test gives an AI agent a simulated vending machine business, a small wedge of cash, and a year to run it.

It has to manage inventory, set prices, email suppliers, and balance the books. If it fails to pay its daily fees, it goes bankrupt.

Previous flagship models like OpenAI’s GPT-5.1, Anthropic’s Claude Sonnet 4.5 and xAI’s Grok often stalled or lost money over the long haul.

Gemini 3 didn’t just survive. It posted a 272% higher net worth than its nearest competitor, optimising profit margins that human managers would envy.

This is a wake-up call for the enterprise titans. The supply enterprise dashboards sold by SAP and Oracle and Microsoft Dynamics are no longer just ‘tools’ for humans to stare at. They are about to become autonomous agents that make the decisions themselves.

We aren’t just automating the task anymore. We are automating the manager.

💬 Join the conversation on LinkedIn

View on LinkedIn →