Will ChatGPT solve all of your organisation´s problems? Here are points to consider before you implement the seemingly magical AI tool.
Since its official launch at the end of 2022, ChatGPT has demonstrated how artificial intelligence (AI) systems have drastically improved. There is much excitement about the technology — which we definitely share– but it remains important to get clear on what it does/does not. Here is a list of important considerations an executive should think about before implementing such a tool.
1. How strong is the performance curve?
While the combination of big data with AI led to major advancements in deep machine learning, it took only a matter of one decade for AI to perform at about human capabilities for image, writing and speech recognition. What ChatGPT further demonstrates is that the next step, reading and language understanding, could match human capabilities only in a matter of a few years.1 In fact, beyond the anecdotes, recent academic studies such as the one led by Choi and colleagues in late January 2023,2 blindly rated ChatGPT answers on real exam questions from the University of Minnesota Law School achieving a low C but passing grade in all four courses. And this was GPT 3.5, not the new version GPT4.
That level of conversational quality for Large Language Models (LLMs) such as ChatGPT does not come for free. ChatGPT had been trained on billions of data points, implying very large training costs. But here as well, things are quickly changing, with the training cost of GPT 3 equivalent going down by more than 80% in 2.5 years.
Furthermore, shortcuts are being tested very successfully to democratise 3 the cost of training a more limited LLM model. For example, a colleague of mine noted to me that Stanford researchers have built a much narrow parameters conversational model, which has been reinforced by a series of prompts asked in parallel to OpenAI’s GPT, with surprisingly good results and for a cost of less than one thousand dollars. While this is to be further checked, this implies a cost of 1000 times lower than a typical enterprise model which will use ChatGPT directly.
2. Are all use cases/domains possible with ChatGPT?
One of the first applications of ChatGPT has been its rival use to search queries. The battle is on between Microsoft and Google.
This is not to say that Google is not ready with LLMs. The danger for Google is disruption — Google´s dominance in search obliges the company to have a new perfect LLM to blend with search queries. But to date, chat queries are costing much more than search and can eat Google´s comfortable margins. Microsoft, on the other hand, can have an inferior (but already fascinating) product like ChatGPT to integrate into its search, Bing. ChatGPT, it is hoped, is a clear way to rebalance the flow of queries to its advantage.
Besides this evident case affecting tech superstars, other cases may abound for ChatGPT and other types of LLMs to be used in enterprises. One case is education and information intelligence, aggregated from digital sources such as the web, and which are typically not yet structured for direct valuable insights (which ChatGPT will then deliver). Another case is virtual assistance for managerial organisational tasks or even creative tasks like developing a marketing tagline or building up IT codes.
Still, one thing must remain clear. ChatGPT is a predictive model. Its accuracy is not perfect and may fall quickly if it did not get enough training data around it. As a statistical model, it also may not deliver the same answer to the same prompt. The model is as good as the data it has collected, so that it should be constantly retrained to be real-time accurate. Finally, even if it is trained on billions of data, a large part of data remains strictly private- so ChatGPT is blind around enterprise closed doors.
Those are rather critical limitations that should be clearly taken into account when using GPT. For example, in a sector like private equity where I advise (Antler and FortinoCapital), ChatGPT may have a hard time getting a proper deal flow of newbie companies, if not trained on real-time data. Private sources may also limit the capacity of finding interesting bootstrap companies for instance. Likewise, the answers provided may not be fully perfect (so-called hallucination).4
3. Is Artificial intelligence really human intelligence?
Finally, artificial intelligence does not mean that AI, under its current zoom of language model, matches all tasks of human intelligence, especially reasoning. The shortcut made by some5 is a false logic that claims that ChatGPT may have acquired simple reasoning from learning from a massive amount of real-world data. OpenAI itself is aware of many limitations of ChatGPT as posted on its website and as recognised in public by OpenAI’s CEO.
In fact, in line with Open AI cautions, and despite those drumbeat claims, most of the recent works testing ChatGPT reasoning performance demonstrate it remains rather dumb. A recent study by Bang and colleagues6 shows that ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning. While reinforcement learning techniques would make LLMs better in reasoning, it is not there yet for a large number of reasoning tasks.
Finally, and not least, the question is not that AI has yet to prove strong reasoning capabilities. What can be missing is the prevalence of data bias, unethical use, and more. The genius is there, but this is not yet Artificial General Intelligence. While potentially powerful, though, we are also to understand the conditions, such as jailbreaking, where LLMs can be harmful too.
About the Author
Jacques Bughin is CEO of MachaonAdvisory, and a former professor of Management while retired from McKinsey as senior partner and director of the McKinsey Global Institute. He advises Antler and Fortino Capital, two major VC/PE firms, and serves on the board of multiple companies.
References
- Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., … & Fung, P. (2023). A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv preprint arXiv:2302.04023.
- Choi, Jonathan H. and Hickman, Kristin E. and Monahan, Amy and Schwarcz, Daniel B., ChatGPT Goes to Law School (January 23, 2023). Minnesota Legal Studies Research Paper No23–03, http://dx.doi.org/10.2139/ssrn.4335905
- Douwe Kilea et al, 3021, Dynabench: Rethinking Benchmarking in NLP. ”Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Li, Belinda, Implicit Representations of Meaning in Neural Language Models, [http://arxiv:2106.00737/]arXiv:2106.00737,] [http://%20https/doi.org/10.48550/arXiv.2106.00737]https://doi.org/10.48550/arXiv.2106.00737]
- Smith, Craig, 2023, Hallucinations Could Blunt ChatGPT’s Success, IEEE Spectrum, March 13
Wang, James; 2020, Improving at 50x the Speed of Moore’s Law: Why It’s Still Early Days for AI, Ark Investments.