By Emil Bjerg, journalist and editor of The European Business Review
This Wednesday, Google released Gemini 1.0 – a long-awaited large language model intended to bring the search giant up to speed with OpenAI’s ChatGPT. How does the launch change the race for generative AI dominance?
In a video released Wednesday, Google’s CEO Sundar Pichai and Google Deepmind’s Demis Hassabis announced “the Gemini Era.” Gemini is Google’s most advanced AI model by far and, according to Pichai, one of the biggest science and engineering efforts that Google has undertaken.
As The Verge writes, Google has been an ‘AI-first company’ for nearly a decade. Having seen the search giants dominate for decades, Google faced a significant challenge when outmaneuvered by the newcomers from OpenAI with the release of ChatGPT. For that reason, the main discussion point after the release is how good of a job Gemini does in catching up with GPT-4.
We’ll return to that. Let’s start by looking at the features and capabilities of Gemini 1.0.
What do we know about Gemini?
A central aspect of Gemini is its multimodal design, enabling it to process and understand various information types, including text, code, audio, images, and video. This capability makes it a versatile tool and sets it apart from ChatGPT in being built from the ground as a multimodal tool. GPT-4 only became multimodal with the recent update announced at the DevDay last month. This could mean that Gemini will prove better at switching between formats.
Gemini is released in three versions: a Nano, a Pro, and an Ultra.
According to The Verge, Gemini Nano is meant to be run natively and offline on Android devices.
Gemini Pro has already been integrated into Bard, Google’s AI-based chatbot, making it available in English in over 170 regions and territories. This integration will likely significantly upgrade Bard’s capabilities, offering more advanced reasoning and understanding in English.
Gemini Ultra, which won’t be launched till next year, is expected to be applied in data centers and for enterprises mainly.
As a part of the launch, in a pre-recorded video, Google displayed how Gemini can be used to correct a physics paper. In the demo, Gemini can analyze and interpret formulas, equations, and scientific concepts written on paper, including those written by hand. It can even explain the mathematical or scientific principles behind the formulas before performing the correct calculations, which could fundamentally change how students do homework and learn.
And from the Gemini basics to the million-dollar question.
But does it outperform ChatGPT?
At the launch, Google claimed that Gemini scores 90 percent on the MMLU benchmark. The MMLU (Massive Multitask Language Understanding) benchmark measures a language model’s understanding and reasoning ability across various topics and question formats. To compare, an “expert level” human can expect to score 89.8 percent.
It also surpassed GPT-4 on the same benchmark, with GPT-4 ranking at 86.5 percent.
Further, according to Google Deepmind, Gemini outperforms GPT-4 by 30 out of 32 parameters – even though that’s by thin margins.
So why are experts arguing if Gemini actually outperforms GPT-4?
Criticism
Critics highlight that the benchmark scores Google presents are for its largest model, Gemini Ultra, which is not yet available to the public. Google cites legal issues as to why it’s not already being launched.
Gemini also won’t be available for users in Europe. That’s likely the result of a risk assessment made by Google that the proactive approach to regulation in the EU puts them at too high of a risk for lawsuits and (massive) fines. With the recent agreement on the AI Act, Google might be able to also launch within the EU.
The revelation that a demonstration video for Gemini, which garnered over two million views, was extensively edited has sparked considerable controversy. After the launch, it was revealed that the video, meant to showcase Gemini’s flexible and responsive multimodal, real-time understanding, was created from a series of carefully tuned text prompts used on still images. While it was meant to look like a live interaction, Gemini was actually given different prompts than the video shows. As Tech Crunch writes: “So although it might kind of do the things Google shows in the video, it didn’t, and maybe couldn’t, do them live and in the way they implied.”
The criticism blurs the picture. Gemini represents an impressive leap forward in Google’s AI output. But the slightly rushed before-the-end-of-the-year release reveals that Google might still be operating in the ‘Code Red,’ an emergency provoked by the public release of ChatGPT.
What’s next?
Already now, Google in working on applying Gemini to search, Google’s main product and source of income.
In an innovators paradox, Google’s Gemini might be part of killing Google’s ‘10 blue links’, the suggested pages coming up after any Google search. Instead, we might soon get used to the idea of a ‘generative search experience,’ a term Sundar Pichar recently used in an interview with Platformer.
When it comes to the competition between Google and OpenAI – essentially Google and Microsoft – to dominate generative AI, it’s difficult to know who’ll pick the longest straw. While Google’s benchmark numbers are impressive, OpenAI’s strategy of releasing great generative AI products for users to experience themselves – and thereby build affinity towards – has given them an enormous user base.
Meanwhile, both Microsoft and Google struggle to discover how to monetize generative AI, which is likely to be a main topic of concern for the companies in the coming year.