According to benchmarks shared by Stagehand, which is a production-ready browser automation framework, Gemini 2.0 Flash has the lowest error rate (6.67%) along with the highest exact‑match score (90%), and it’s also cheap and fast. On the other hand, GPT‑4.1 has a higher error rate (16.67%) and costs over 10 times more than Gemini 2.0 Flash. In another data shared by Pierre Bongrand, who is a scientist working on RNA at Harward, GPT‑4.1 offers poorer cost-effectiveness than competing models. According to the benchmarks, these models are far better than the existing GPT‑4o and GPT‑4o mini, particularly in coding. Models like Gemini 2.0 Flash, Gemini 2.5 Pro, and even DeepSeek or o3 mini lie closer to or on the frontier, which suggests they deliver higher performance at a lower or comparable cost. We're seeing similar results in coding benchmarks, with Aider Polyglot listing GPT-4.1 with a 52% score, while Gemini 2.5 is miles ahead at 73%. Yesterday, OpenAI confirmed that developers with API access can try as many as three new models: GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano.
This Cyber News was published on www.bleepingcomputer.com. Publication date: Tue, 15 Apr 2025 21:25:10 +0000