Anthropic Claude 3.5 Sonnet launched; Beats Chat GPT 4o

Just after releasing the Claude 3 models three months ago, Anthropic has now introduced a much improved Claude 3.5 Sonnet model. It's not the biggest model from Anthropic's lab, but it beats the ChatGPT 4o and Gemini 1.5 Pro, at least in several benchmarks. The Claude 3.5 Sonnet is a mid-range model and it provides 2x faster speed than the largest Claude 3 Opus model.

The new Claude 3.5 Sonnet is better than the GPT-4o

Anthropic has kept the API price the same for the Sonnet 3.5 model with a context window of 200K tokens. For regular users, it is available for free at claude.ai (visit) and supports both image and document uploads. Keep in mind that there is a limit for free users.

When it comes to benchmarks, the Claude 3.5 beats the Sonnet GPT-4o in almost all benchmarks except MMLU and MATH, but the difference is very marginal. In HumanEval which tests coding abilities, the Claude 3.5 Sonnet scores 92% while the GPT-4o scores 90.2%. In GPQA Diamond, which evaluates graduate-level reasoning, the new Sonnet model achieves a score of 59.4% while the GPT-4o stands at 53.6%.

With 0-shot prompting in the MMLU test, Claude 3.5 Sonnet scores 88.3% and OpenAI's GPT-4o model scores 88.7%. From the table, you can conclude that Anthropic has developed a very capable model that outperforms both the GPT-4o and Gemini 1.5 Pro models.