The best open LLMs for your use case:

1DeepSeek-V4-ProDeepSeek

DeepSeek's frontier 1.6T-parameter Mixture-of-Experts model (49B active per token) with hybrid attention built for long-context, low-cost reasoning. Runs in FP4 with a 512K-token context window.

Speed:

Intelligence:

Price: (1M Tokens)

$1.74 / 3.48

Cached input: (1M Tokens)

$0.20

Context: (tokens)

512,000

Inputs:

ImageText

Benchmarks:

MRCR 1M

Summarization

83.5

SimpleQA

General Knowledge

57.9

LiveCodeBench

Coding Agents

93.5

SWE-Bench Verified

Coding Agents

80.6

Terminal-Bench 2.0

Coding Agents

67.9

GDPval-AA

Agents and Function Calling

1554

HLE

General Knowledge

37.7

MMLU-Pro

General Knowledge

87.5

MRCR 1M

Summarization

83.5

SimpleQA

General Knowledge

57.9

LiveCodeBench

Coding Agents

93.5

SWE-Bench Verified

Coding Agents

80.6

Terminal-Bench 2.0

Coding Agents

67.9

GDPval-AA

Agents and Function Calling

1554

HLE

General Knowledge

37.7

MMLU-Pro

General Knowledge

87.5

Try it out

2MiniMax-M3MiniMax

Next-generation reasoning model from MiniMax with frontier agentic, coding, and multimodal performance. Strong scores on SWE-Bench, BrowseComp, OmniDocBench, and IMO/USAMO competition reasoning.

Speed:

Intelligence:

Price: (1M Tokens)

$0.30 / 1.20

Cached input: (1M Tokens)

$0.06

Context: (tokens)

524,288

Inputs:

ImageText

Benchmarks:

GPQA-Diamond

General Knowledge

92.9

Video-MME v2

Multimodal - Vision

85.4

Claw-Eval

Agents and Function Calling

74.5

SWE-Bench Verified

Coding Agents

80.5

SWE-Bench Pro

Coding Agents

Apex Agents

Agents and Function Calling

27.7

MMMU-Pro

Multimodal - Vision

78.1

GPQA-Diamond

General Knowledge

92.9

Video-MME v2

Multimodal - Vision

85.4

Claw-Eval

Agents and Function Calling

74.5

SWE-Bench Verified

Coding Agents

80.5

SWE-Bench Pro

Coding Agents

Apex Agents

Agents and Function Calling

27.7

MMMU-Pro

Multimodal - Vision

78.1

Try it out

Use case:

Summarization

Features:

Long Context Handling