Grok 4 benchmark results: Tops math, ranks second in coding

Gemini 2.5 Pro and Claude still remain the best models for coding, but that might change when xAI ships Grok 4 Code in August. Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now have answers, thanks to new independent benchmarks. The numbers could be different with Grok 4 Heavy, which uses multiple agents to think and compare results, but the Grok 4 Heavy model is not yet available on the API platform. We're talking about Grok 4 API (grok-4-0709), which received about 4k+ community votes and ranks #3 overall in Text Arena. This is a huge leap from Grok 3, which ranked 8th. Grok 4 Code is optimised for coding, and we're also expecting a CLI, similar to Gemini CLI and Claude Code. According to LMArena's tests, Grok 4 scores Top-3 across all categories (#1 in Math, #2 in Coding, #3 in Hard Prompts). LMArena.ai, which is an open platform for crowdsourced AI benchmarking, has published the results of Grok 4. However, it is worth noting that the tested model is Grok 4, not Grok 4 Heavy. While both are reasoning models, Grok 4 Heavy is significantly better.

This Cyber News was published on www.bleepingcomputer.com. Publication date: Wed, 16 Jul 2025 10:40:12 +0000

Cyber News related to Grok 4 benchmark results: Tops math, ranks second in coding

Grok 4 benchmark results: Tops math, ranks second in coding - Gemini 2.5 Pro and Claude still remain the best models for coding, but that might change when xAI ships Grok 4 Code in August. Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We ...
5 months ago Bleepingcomputer.com

Grok 4 spotted ahead of launch with special coding features - Grok 4 Code (grok-4-code-0629) — Engineering Intelligence Unleashed — a model purpose-built to be your coding companion. Elon Musk-funded xAI is skipping Grok 3.5 and releasing Grok 4 after Independence Day in the United States, and it ...
5 months ago Bleepingcomputer.com

xAI prepares Grok 4 Code as it plans to take on Claude and Gemini - xAI is planning to release Grok 4 without Vision and Image support after Independence Day in the United States of America. xAI is preparing the rollout of Grok 4, which replaces Grok 3 as the new state-of-the-art model. One of the models is ...
5 months ago Bleepingcomputer.com

AI Coding Tools: How to Address Security Issues - In 2022, a surge of AI-based coding assistants revolutionized the software development landscape. Even though organizations everywhere are using AI-based coding, there remains a tug-of-war within organizations between the benefits and security fears ...
2 years ago Securityboulevard.com

Unauthorized Access to Grok-3 AI Achieved via Client-Side Code Exploitation - A researcher with the handle “single mode” has demonstrated how client-side code manipulation can bypass access controls and gain unauthorized access to Grok-3, an AI model integrated into Elon Musk’s X platform. The script modifies ...
10 months ago Cybersecuritynews.com

A Framework for Maintaining Code Security With AI Coding Assistants - Today, there are countless AI coding assistants available that promise to lighten developers' loads. It's an issue that software development firms and solo coders are only beginning to come to grips with. Either use AI coding assistants and accept ...
1 year ago Feeds.dzone.com

New Slopsquatting Attack Leverage Coding Agents Workflows to Deliver Malware - Researchers have identified a sophisticated new supply-chain threat targeting AI-powered development workflows, where malicious actors exploit coding agents‘ tendency to “hallucinate” non-existent package names to distribute ...
5 months ago Cybersecuritynews.com

Latest Release of CIS Security Standards for February 2023 - We are delighted to announce the release of the new CIS pfSense Firewall Benchmark v1.0.0! We would like to express our gratitude to Touhid Shaikh and Daniel Brown for their hard work and communication which made this release possible. CIS ...
2 years ago Cisecurity.org

Vibe Coding Is the New Open Source - The article "Vibe Coding Is the New Open Source" explores a fresh paradigm in collaborative software development that transcends traditional open-source models. Vibe coding emphasizes real-time, synchronous collaboration where developers share not ...
2 months ago Wired.com

Revolutionize Sustainability with AI, Observability, and Cisco Tech - As the Vice President of an organization deeply committed to technological advancements and environmental sustainability, I am thrilled to announce an exciting coding challenge. Our Build for Better coding challenge invites you to be at the forefront ...
1 year ago Feedpress.me

Critical Vulnerability in JavaScript Library Exposes Millions of Apps to Code Execution Attacks - The vulnerability, assigned CVE-2025-7783, stems from the library’s use of the predictable Math.random() function to generate boundary values for multipart form-encoded data, allowing attackers to manipulate HTTP requests and inject malicious ...
5 months ago Cybersecuritynews.com CVE-2025-7783

Open Source B3 Benchmark Security Tool Gains Traction in Cybersecurity Community - The cybersecurity community is witnessing a significant advancement with the introduction of the open-source B3 Benchmark Security tool. This innovative solution is designed to enhance security benchmarking processes, providing organizations with a ...
1 month ago Infosecurity-magazine.com

Threat actors abuse XS-Grok AI to spread malicious links - Threat actors have started exploiting XS-Grok AI, an AI-powered tool, to distribute malicious links and conduct phishing campaigns. This abuse highlights the growing trend of cybercriminals leveraging advanced AI technologies to enhance their attack ...
3 months ago Bleepingcomputer.com

xAI Dev Leaked API Key on GitHub for Private SpaceX, Tesla & Twitter/X - The exposed credentials provided unauthorized access to private large language models (LLMs) specifically fine-tuned for SpaceX, Tesla, and Twitter/X internal operations, highlighting critical vulnerabilities in credential security practices even at ...
7 months ago Cybersecuritynews.com

VibeScamming - Hackers Using AI Tools to Generate Phishing Ideas & Working Models - In a concerning evolution of cybercrime, security researchers have identified a new threat known as “VibeScamming” – where malicious actors leverage generative AI to create sophisticated phishing campaigns with minimal effort. Their ...
8 months ago Cybersecuritynews.com

Security Concerns Shadow Vibe Coding Adoption - The adoption of Vibe coding, a new programming approach, is raising significant security concerns within the application security community. As organizations increasingly integrate Vibe coding into their development processes, experts warn about ...
2 months ago Darkreading.com

Slow Pisces Hackers Attacking Developers With Coding Challenges & Python Malware - Security experts recommend developers implement strict code execution policies, employ isolated development environments, and exercise caution when running code from external sources, even when it appears to come from legitimate coding platforms or ...
8 months ago Cybersecuritynews.com

Meta's Purple Llama wants to test safety risks in AI models - Generative Artificial Intelligence models have been around for years and their main function, compared to older AI models is that they can process more types of input. Take for example the older models that were used to determine whether a file was ...
2 years ago Malwarebytes.com

Leaked Apple iPad Pro M5 benchmark shows it's faster than some laptop CPUs - A recent leak of the Apple iPad Pro M5 benchmark reveals that Apple's latest tablet chip outperforms several laptop CPUs, showcasing significant advancements in mobile processing power. The M5 chip, designed by Apple, demonstrates superior speed and ...
2 months ago Bleepingcomputer.com

OpenAI document explains when to use each ChatGPT model - Unlike these two models, o3 stands out in complex or multi step tasks, such as strategic planning, detailed analyses, extensive coding, advanced math, science, coding, and visual reasoning. OpenAI admitted that it can be confusing for users to choose ...
7 months ago Bleepingcomputer.com

Rug Pull Schemes: Crypto Investor Losses Near $1M - Check Point's Threat Intel Blockchain system has revealed a new scam, shedding light on the persistent threat of Rug Pulls - a deceptive tactic causing financial losses for investors. The company's system recently identified suspicious activities ...
2 years ago Infosecurity-magazine.com

xAI Dev Leaks API Key for Private SpaceX, Tesla LLMs – Krebs on Security - An employee at Elon Musk’s artificial intelligence company xAI leaked a private key on GitHub that for the past two months could have allowed anyone to query private xAI large language models (LLMs) which appear to have been custom made for ...
7 months ago Krebsonsecurity.com

OpenAI releases big upgrade for ChatGPT Codex for agentic coding - OpenAI has launched a significant upgrade to its ChatGPT Codex, enhancing its capabilities for agentic coding. This upgrade aims to improve the AI's ability to autonomously write, debug, and optimize code, making it a powerful tool for developers and ...
3 months ago Bleepingcomputer.com

Lies in the Loop: Attack AI Coding Agents - The article "Lies in the Loop: Attack AI Coding Agents" explores the emerging cybersecurity risks associated with AI coding agents. These AI systems, designed to automate software development, are vulnerable to sophisticated attacks that manipulate ...
3 months ago Darkreading.com