The best LLMs for your use case:

1Qwen2.5 Vision Language 72BQwen

Vision-language model with advanced visual reasoning, video understanding, structured outputs, and agentic capabilities.

Speed:

Intelligence:

Price: (1M Tokens)

$1.95

Inputs:

ImageText

JSON Mode:

Function Calling:

Benchmarks:

#2

MMMU

Multimodal - Vision

70.2
#1

ChartQA

Multimodal - Vision

88.96
#1

DocVQA

Multimodal - Vision

96.4
#10

MMLU-Pro

General Knowledge

54.7
#2

MMMU

Multimodal - Vision

70.2
#1

ChartQA

Multimodal - Vision

88.96
#1

DocVQA

Multimodal - Vision

96.4
#10

MMLU-Pro

General Knowledge

54.7
2Llama 4 Maverick (17Bx128E)Meta

SOTA 128-expert MoE powerhouse for multilingual image/text understanding, creative writing, and enterprise-scale applications.

Speed:

Intelligence:

Price: (1M Tokens)

$0.27

Inputs:

ImageText

JSON Mode:

Function Calling:

Benchmarks:

#2

ChartQA

Multimodal - Vision

85.3
#2

DocVQA

Multimodal - Vision

94.4
#1

MMMU

Multimodal - Vision

73.4
#1

Multilingual MMLU

Multilingual

84.6
#2

MGSM

Multilingual

92.5
#3

GPQA-Diamond

General Knowledge

69.8
#4

LMArena

Chat

1269
#4

WebDevArena

Code

1015
#5

LiveBench

General Knowledge

55.19
#5

Aider Polyglot

Code

15.6
#6

EQBench

Creative Writing

628.6
#6

LiveCodeBench

Code

43.4
#7

BFCL

Agents and Function Calling

53.32
#8

MMLU-Pro

General Knowledge

62.9
#2

ChartQA

Multimodal - Vision

85.3
#2

DocVQA

Multimodal - Vision

94.4
#1

MMMU

Multimodal - Vision

73.4
#1

Multilingual MMLU

Multilingual

84.6
#2

MGSM

Multilingual

92.5
#3

GPQA-Diamond

General Knowledge

69.8
#4

LMArena

Chat

1269
#4

WebDevArena

Code

1015
#5

LiveBench

General Knowledge

55.19
#5

Aider Polyglot

Code

15.6
#6

EQBench

Creative Writing

628.6
#6

LiveCodeBench

Code

43.4
#7

BFCL

Agents and Function Calling

53.32
#8

MMLU-Pro

General Knowledge

62.9

Use case:

Multimodal - Vision