The best LLMs for your use case:
Vision-language model with advanced visual reasoning, video understanding, structured outputs, and agentic capabilities.
Speed:
Intelligence:
Price: (1M Tokens)
$1.95Inputs:
JSON Mode:
Function Calling:
Benchmarks:
MMMU
Multimodal - Vision
ChartQA
Multimodal - Vision
DocVQA
Multimodal - Vision
MMLU-Pro
General Knowledge
MMMU
Multimodal - Vision
ChartQA
Multimodal - Vision
DocVQA
Multimodal - Vision
MMLU-Pro
General Knowledge
SOTA 128-expert MoE powerhouse for multilingual image/text understanding, creative writing, and enterprise-scale applications.
Speed:
Intelligence:
Price: (1M Tokens)
$0.27Inputs:
JSON Mode:
Function Calling:
Benchmarks:
ChartQA
Multimodal - Vision
DocVQA
Multimodal - Vision
MMMU
Multimodal - Vision
Multilingual MMLU
Multilingual
MGSM
Multilingual
GPQA-Diamond
General Knowledge
LMArena
Chat
WebDevArena
Code
LiveBench
General Knowledge
Aider Polyglot
Code
EQBench
Creative Writing
LiveCodeBench
Code
BFCL
Agents and Function Calling
MMLU-Pro
General Knowledge
ChartQA
Multimodal - Vision
DocVQA
Multimodal - Vision
MMMU
Multimodal - Vision
Multilingual MMLU
Multilingual
MGSM
Multilingual
GPQA-Diamond
General Knowledge
LMArena
Chat
WebDevArena
Code
LiveBench
General Knowledge
Aider Polyglot
Code
EQBench
Creative Writing
LiveCodeBench
Code
BFCL
Agents and Function Calling
MMLU-Pro
General Knowledge
Use case:
Multimodal - Vision