Complete List of Leading LLMs in 2025: Intelligence, Speed, and Cost

Choosing a large language model in 2025 can feel overwhelming. New models ship constantly, and marketing claims don’t always match real-world performance. That’s where independent benchmarks help. At LilacPearls, we’ve pulled from Artificial Analysis—their Artificial Analysis Intelligence Index and speed/cost data—to give you a clear, human-readable snapshot of the LLM landscape.

Why Benchmarks Matter

Intelligence, speed, and cost all affect which model you should use. A model that tops “smartest” lists might be too slow or expensive for high-volume work; a cheaper, faster model might be enough for many tasks. We focus on intelligence (reasoning, knowledge, coding), output speed (tokens per second), and price (USD per million tokens) so you can weigh trade-offs yourself.

Top of the Pack: Intelligence (2025)

According to Artificial Analysis and their v4.0 Intelligence Index (which includes evaluations like GDPval-AA, SciCode, AA-Omniscience, and others):

Gemini 3.1 Pro Preview leads the overall index (57.05), with strong performance across reasoning and knowledge.
Claude Opus 4.6 (max) and Claude Sonnet 4.6 (max) sit just behind (53.03 and 51.27), with Claude models often excelling on agentic and real-world tasks.
GPT-5.2 (xhigh) and GLM-5 are in the same tier (around 49–51), with GPT-5.2 Codex (xhigh) a top choice for coding-heavy workloads.
Kimi K2.5, Gemini 3 Flash, DeepSeek V3.2, and Grok 4 round out the frontier—all in the 41–47 range, with different strengths in speed vs cost.

So when you’re comparing LLMs, “best” depends on whether you care most about raw capability, coding, long context, or cost.

Speed and Cost: The Practical Side

Speed (output tokens per second) matters for interactive or high-throughput apps. Leaders here include very fast models like gpt-oss-120B (high) and Gemini 3 Flash, plus Nova 2.0 Pro and Gemini 3.1 Pro—so you can prioritize latency if you need snappy responses.

Cost (USD per 1M tokens) varies a lot. Open and cost-optimized options like gpt-oss-120B (high) and DeepSeek V3.2 sit at the cheaper end; premium proprietary models cost more. For many teams, the “sweet spot” is a model that’s smart enough for the task but cheap and fast enough to scale.

How to Use This List

Use this as a starting point, not a final answer. Check Artificial Analysis for the latest rankings, run your own tests on your data, and read our LLM comparison posts (e.g. GPT-5.2 vs Claude, Gemini vs Claude) for head-to-head takes. For more on technology and artificial intelligence in practice, browse the rest of LilacPearls.