Google unveiled the next generation of its Pathways Language Model (PaLM 2) on May 10, 2023, at Google I/O 2023. Its new large language model (LLM) boasts a lot of improvement over its predecessor (PaLM) and might finally be ready to take on its biggest rival, OpenAI’s GPT-4.
But just how much improvement has Google made? Is PaLM 2 the difference maker Google hopes it will be, and more importantly, with so many similar capabilities, how is PaLM 2 different from OpenAI’s GPT-4?
PaLM 2 vs. GPT-4: Performance Overview
PaLM 2 is packed with new and improved capabilities over its predecessor. One of the unique advantages PaLM 2 has over GPT-4 is the fact that it’s available in smaller sizes specific to certain applications that do not have as much onboard processing power.
All these different sizes have their own smaller models called Gecko, Otter, Bison, and Unicorn, with Gecko being the smallest, followed by Otter, Bison, and finally, Unicorn, the largest model.
Google also claims an improvement in reasoning capabilities over GPT-4 in WinoGrande and DROP, with the former pulling a narrow margin in ARC-C. However, there’s significant improvement across the board when it comes to PaLM and SOTA.
PaLM 2 is also better at math, according to Google’s 91-page PaLM 2 research paper [PDF]. However, the way Google and OpenAI have structured their test results makes it difficult to compare the two models directly. Google also omitted some comparisons, likely because PaLM 2 didn’t perform nearly as well as GPT-4.
In MMLU, GPT-4 scored 86.4, while PaLM 2 scored 81.2. The same goes for HellaSwag, where GPT-4 scored 95.3, but PaLM 2 could only muster 86.8, and ARC-E, where GPT-4 and PaLM 2 got 96.3 and 89.7, respectively.
The largest model in the PaLM 2 family is PaLM 2-L. While we don’t know its exact size, we do know that it’s significantly smaller than the largest PaLM model but uses more training computing. According to Google, PaLM has 540 billion parameters, so the “significantly smaller” should put PaLM 2 anywhere between 10 to 300 billion parameters. Do keep in mind that these numbers are just assumptions based on what Google has said in the PaLM 2 paper.
If this number is anywhere close to 100 billion or under, PaLM 2 is most likely smaller in terms of parameters than GPT-3.5. Considering a model potentially under 100 billion can go toe to toe with GPT-4 and even beat it at some tasks is impressive. GPT-3.5 initially blew everything out of the water, including PaLM, but PaLM 2 has made quite the recovery.
Differences in GPT-4 and PaLM 2 Training Data
While Google hasn’t unveiled the size of PaLM 2’s training dataset, the company reports in its research paper that the new LLM’s training data set is significantly larger. OpenAI also took the same approach when unveiling GPT-4, making no claims about the size of the training dataset.
However, Google wanted to focus on a deeper understanding of mathematics, logic, reasoning, and science, meaning a large part of PaLM 2’s training data is focused on the aforementioned topics. Google says in its paper that PaLM 2’s pre-training corpus is composed of multiple sources, including web documents, books, code, mathematics, and conversational data, giving it improvements across the board, at least when compared to PaLM.
PaLM 2’s conversational skills should also be on another level considering the model has been trained in over 100 languages to give it a better contextual understanding and better translation capabilities.
As far as GPT-4’s training data is confirmed, OpenAI has told us that it has trained the model using publicly available data and the data it licensed. GPT-4’s research page states, “The data is a web-scale corpus of data including correct and incorrect solutions to math problems, weak and strong reasoning, self-contradictory and consistent statements, and representing a great variety of ideologies and ideas.”
When GPT-4 is asked a question, it can produce a wide variety of responses, not all of which might be relevant to your query. To align it with the user’s intent, OpenAI fine-tuned the model’s behavior using reinforcement learning with human feedback.
While we may not know the exact training data either of these models were trained on, we know that the training intent was very different. We’ll have to wait and see how this difference in training intent differentiates between the two models in a real-world deployment.
PaLM 2 and GPT-4 Chatbots and Services
The first portal to access both the LLMs is using their respective chatbots, PaLM 2’s Bard and GPT-4’s ChatGPT. That said, GPT-4 is behind a paywall with ChatGPT Plus, and free users only get access to GPT-3.5. Bard, on the other hand, is free for all and available across 180 countries.
That’s not to say you can’t access GPT-4 for free, either. Microsoft’s Bing AI Chat uses GPT-4 and is completely free, open to all, and available right next to Bing Search, Google’s biggest rival in the space.
Google I/O 2023 was filled with announcements about how PaLM 2 and generative AI integration will improve the Google Workspace experience with AI features coming to Google Docs, Sheets, Slides, Gmail, and just about every service the search giant offers. In addition, Google has confirmed that PaLM 2 has already been integrated into over 25 Google products, including Android and YouTube.
In comparison, Microsoft has already brought AI features to the Microsoft Office suite of programs and many of its services. At the moment, you can experience both LLMs in their own versions of similar offerings from two rival companies going head to head in the AI battle.
However, since GPT-4 came out early and has been careful to avoid many of the blunders Google made with the original Bard, it has been the de facto LLM for third-party developers, startups, and just about anyone else looking to incorporate a capable AI model in their service so far. We have a list of GPT-4 apps if you want to check them out.
That’s not to say that developers won’t be switching to or at least trying out PaLM 2, but Google still has to play catch-up with OpenAI on that front. And the fact that PaLM 2 is open-source, instead of being locked behind a paid API, means it has the potential to be more widely adopted than GPT-4.
Can PaLM 2 Take on GPT-4?
PaLM 2 is still very new, so the answer to whether or not it can take on GPT-4 remains to be answered. However, with everything that Google is promising and the aggressive manner it has decided to use to propagate it, it does look like PaLM 2 can give GPT-4 a run for its money.
However, GPT-4 is still quite a capable model and, as mentioned before, beats PaLM 2 in quite a few comparisons. That said, PaLM 2’s multiple smaller models give it an irrefutable edge. Gecko itself is so lightweight that it can work on mobile devices, even when offline. This means that PaLM 2 can support an entirely different class of products and devices that might struggle to use GPT-4.
The AI Race Is Heating Up
With the launch of PaLM2, the race for AI dominance has heated up, as this might just be the first worthy opponent to go against GPT-4. With a newer multimodal AI model called “Gemini” also in training, Google isn’t showing any signs of slowing down here.