outperforms GPT-4 and Google

By Emil Bjerg, journalist and editor

Anthropic, backed by Amazon and Google, has released its Claude 3 model family. Their most intelligent model, they claim, “outperforms its peers on most of the common evaluation benchmarks.” But does that claim hold up to a third-party test?

Anthropic’s new Claude 3 family model introduces three versions tailored to different user needs: Haiku, Sonnet, and Opus.

Opus is the most advanced of the three. According to Anthropic, it can handle highly complex tasks and exhibit human-like understanding and fluency in its text-based responses.

Unlike previous Claude models, this update has caught up with the competition by offering a multimodal interface where users can use text and images in prompts.

Opus: A new industry standard?

Anthropic writes that their Opus model “outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge, graduate-level expert reasoning, basic mathematics, and more. It exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.”

Like Google, when they released Gemini, Anthropic released their Claude 3 model family with benchmark tests. According to these benchmarks, Opus does indeed outperform competitors on a number of parameters. But one thing is the tests that Anthropic has publicized. Another thing is how they compare to competitors in third-party tests.

Beepop writes that Opus seems like a capable model “but falters on tasks where you expect it to excel. In our commonsense reasoning tests, the Opus model doesn’t perform well, and it’s behind GPT-4 and Gemini 1.5 Pro.”

On the other hand, according to Beepop, there are specific areas like niche translation and physics, where Opus excels ahead of GPT-4 and Gemini.

So, as things are now, the best generative AI depends mainly on the use case.

Who’s behind Anthropic?

Anthropic is founded by siblings and ex-OpenAI workers Daniela and Dario Amodei. Dario led OpenAI’s engineering team, and Daniela was in charge of policy and safety teams.

They’re backed by both Google and Amazon, with Amazon committing to invest 4 billion. As a result, Amazon Web Services is Anthropic’s primary cloud provider, allowing them to compete with Google and OpenAI (backed by Microsoft) in their compute-heavy pursuits.

The Amodei siblings founded Anthropic over concerns about OpenAI’s commercial direction and a commitment to developing AI responsibly​​. Therefore, the company emphasizes empirical research in AI safety, believing real-world experiments and data should guide safety measures rather than theory alone.

To guide their ethical development, they have developed a framework known as “Constitutional AI,” which aims to guide the behavior of their generative AI by embedding high-level principles directly into the training process.

Their more careful approach is likely why past editions of Claude have refused to answer harmless prompts. According to Anthropic, that’s because previous models had a “lack of contextual understanding.” According to the company, the Claude 3 model family will better understand user queries and deny less harmless prompts.

What’s the difference between the three models in the new release?

As mentioned, the new Claude 3 family model introduced by Anthropic features three main versions: Haiku, Sonnet, and Opus, aiming to address different needs between the variables of speed, cost, and performance.

Opus, the most advanced model, is particularly suited for task automation, research and development, and strategic analysis. It comes at a higher price range of $15 to $75 per million tokens, depending on the specific input and output requirements​​​​​​​​.

When a pricing model mentions a cost “per million tokens,” you are charged based on the amount of text processed by the AI. This includes both the input (what you ask the AI or the data you feed into it) and the output (the AI’s responses).

Claude 3 Sonnet balances performance and speed, making it ideal for enterprise workloads like data processing, sales, and time-saving tasks like code generation. The costs range from $3 to $15 per million tokens.

Claude 3 Haiku is the most speedy and cost-efficient of the three. According to Anthropic, it’s particularly well-suited for customer interactions and content moderation, and its price is set between $0.25 and $1.25 per million tokens​​​​​​.

LEAVE A REPLY

Please enter your comment!
Please enter your name here