By Luke Treglown
AI is transforming how we work and interact, not just through efficiency but by mimicking human qualities like personality and empathy. As we debate whether AI can truly possess humanity, its growing ability to act “human” sparks deeper questions about its role in our lives. Luke Treglown explores the implications of our perceptions of AI’s empathy and personality, challenging us to rethink what makes these traits meaningful in an increasingly AI-driven world.
Introduction
We have been so blinded by stories of AI making us more productive, effective, efficient at work that we have completely missed the bigger disruption that AI is already having: the ability to be “human” at work.
It is difficult to have any conversation at the moment without someone mentioning ChatGPT, AI, and large language models (LLMs). It is simply everywhere and is becoming engrained in how we operate; 93 percent of Gen Z workers admit to using AI every day as a part of their jobs (Independent article from November 2024). But LLMs are not just tools or algorithms that help us answer emails in a softer tone; they are conversational actors, capable of mimicking human qualities like personality and empathy.
Yet we remain somewhat oblivious to the impacts of AI systems that look, feel, and act strikingly human because we are comforted by a belief that they will never truly be “human”. Articles discuss the superhuman talents that will differentiate us from AIs, such as building connections and practicing empathy (Chamorro-Premuzic, 2023; Chamorro-Premuzic & Akhtar, 2023). But what if that is not enough anymore and these are not traits that really make us unique?
The innovations and advancements show no signs of slowing down, with each new release and iteration responding and sounding increasingly human. As a result, there is a growing interest in exactly how “human” our AIs are.
There are two main groups when it comes to evaluating the question of how “human” AIs are becoming. The first group, which we call “the AI evangelists”, argue that we are stepping ever closer to artificial general intelligence and that AIs are showing evidence of human-level outputs and qualities. There is evidence that AIs can display qualities previously thought to be unique to humans, including solving theory-of-mind tasks (Kosinski, 2024), displaying distinct personality traits (Pan & Zeng, 2023), showing empathy (Welivita & Pu, 2024), and engaging in moral reasoning (Kidder et al., 2024).
To be empathetic and display personality, you need to possess the capability to feel emotion or to have motivation, neither of which AIs have.
Then there is the second group, which we will call “the AI sceptics”, who argue that we are getting way too far ahead of ourselves and that these results are all hallucinations or “data leakage”; they only know how to answer these questions properly because they have been trained to do so before. For instance, small variations and changes to classic theory-of-mind tasks stump AI and cause it to no longer be able to answer accurately (Ullman, 2023). Central to this argument is that AIs are precluded from ever being able to possess human-like qualities because they simply are not human. To be empathetic and display personality, you need to possess the capability to feel emotion or to have motivation, neither of which AIs have.
But are these two groups focusing their arguments and evidence on the entirely wrong thing? Rather than trying to resolve, prove, or refute whether AIs genuinely possess human-like qualities, we should instead be looking at the output of their behaviour and how it is being interpreted by users.
This idea comes from Daniel Dennett, who wrote about this concept in his book The Intentional Stance (1987). Dennett was interested in how we can explain or interpret behaviour. What causes something to behave or act the way it does? Dennett argued that there are three ways we can do this:
- Physical stance. The best way to understand something’s behaviour is by explaining it through physical properties, such as the laws of physics or chemistry (e.g., an apple falls from a tree because of gravity).
- Design stance. We can explain the way something behaves because it is inherent to the function or design of the thing; its behaviour is best understood by knowing what it is designed to do (e.g., assuming that a clock is designed to tell time).
- Intentional stance. The behaviour is best explained by assuming that the thing has beliefs, desires, motivations. A dog digs a hole because we assume that it has wants and intends to bury its bone.
For Dennett, the intentional stance can be applied to machines, robots, and AI. He uses a chess-playing program as an example. When it makes a move, we interpret it as wanting to take our pieces and trying to beat us at the game. There is practical and functional value in our thinking that the chess-playing app has motivations and intentions, because that is the way it appears to be behaving.
Importantly, Dennett argues that it is not actually important to focus on whether or not the thing actually has intentions. All that matters is whether assuming that it has intentions is the best way to understand and predict its behaviour. Dennett emphasises the pragmatic value of using the intentional stance with AI. He avoids debates about whether AI “really” has beliefs or desires, focusing instead on whether such attributions help us predict and interact with AI effectively. He compares this to the way we interpret animals; we don’t need to resolve metaphysical questions about whether our dog “really” loves us in order to recognise that this interpretation is useful for understanding its behaviour.
But how does this relate to AI, personality, and empathy?
Taking a view from the intentional stance, I think it is clear that AI absolutely has personality and empathy. In fact, it is more meaningful to interpret its behaviour from the intentional stance, as it has some pretty serious and interesting implications.
AI, Assessments, and Cheating
Ever since we started assessing and evaluating individuals, there has been concern about cheating. Whether it is sneaking notes into the exam room or getting your smarter friend to take the test for you, we display increasing imagination and ingenuity when it comes to trying to get a leg up and improve our scores. Assessments help us to make decisions by accurately evaluating an individual’s competencies, capabilities, or personality. If there are doubts about the legitimacy or genuineness of the responses, then the validity of the assessment is called into question.
Increased access to AI has meant that this topic has resurfaced as a critical topic for assessment providers, academics, and practitioners. AI poses a unique challenge to assessments of comprehension and veridical truths (Hickman et al., 2024) because it is easily able to get to the “right answer”. Similarly, AIs are seemingly able to reason “social truths” and do well in situational judgement tasks that are aimed at assessing attributes like teamwork, ethics, prioritisation skills, and safety behaviour (Borchert et al., 2024; Mittelstadt et al., 2024). Instead, aptitude assessments that prioritise speed of learning will still be relevant, where what differentiates you is not only getting the right answer but also how quickly you can do it.
But what about when there is no “right answer”? How would an AI approach an assessment of personality or behavioural preferences? And would different LLMs and AIs respond to the same assessments consistently?
Taking the intentional stance to AI and personality creates opportunity for academics, practitioners, and assessment providers.
There is growing interest around the idea that LLMs and AI are developing personality. Recent studies suggest that these models display measurable differences across well-established frameworks like the Big Five personality traits (Jiang et al., 2023), HEXACO dimensions (Bodroza et al., 2023), and even the Myers-Briggs Type Indicator (MBTI; Huang et al., 2023). For those interested in the traits that undermine and derail human behaviour, LLMs have also scored on less-savoury traits such as those in the “dark triad” – Machiavellianism, narcissism, and psychopathy (Weber et al., 2024) – raising questions about their ethical implications. Beyond just measuring these traits, personality in LLMs has been linked to their ability to simulate theory-of-mind behaviours, enabling them to anticipate or infer human thoughts and intentions (Tan et al., 2024).
Critics of this research argue that the responses to assessments are not consistent and are susceptible to influence, bias, and hallucinations, meaning that LLMs obviously do not possess personality. For instance, LLMs are biased in personality assessments towards item presentation and picking the first option (Wang et al., 2023; Zheng et al., 2023). Gupta et al. (2024) found that applying assessments to AI is unreliable, as the results differ greatly simply by changing the instruction text and question order.
However, whether AIs possess personality is not the issue or concern for the role of assessments in making accurate, valid, and predictive decisions about people. What matters is that AIs are able to function and respond as if they do have personality. If AIs are going to be used increasingly often by candidates, what will this mean for assessment providers that are looking to spot and combat “faking”? There is already evidence that AIs are better at faking than humans when asked to respond in an ideal fashion to a job description (Philips & Robie, 2024), but that certain assessments are harder to fake than others because the “right answer” is not clear.
Taking the intentional stance to AI and personality creates opportunity for academics, practitioners, and assessment providers. For psychologists, we need to improve our research into AI and personality, exploring how training sets cause LLMs to differ in their “default personalities”, how changing instruction texts and prompt design impacts responses, and whether AI responses converge or diverge from norms we see in human responses. For practitioners, the implications are that they need to have a clear understanding of what they are looking for in a candidate, how the assessments they have chosen will validly assess and evaluate an individual against that, and equip themselves on how different AIs can perform on their assessments and what factors to be on the lookout for.
AI and Empathy
When OpenAI released ChatGPT, one of the use cases that I can imagine they did not expect was the volume of people that would look to their AI for therapy, counselling, and guidance.
AI’s ubiquity has caused a boom in popularity with people seeking out the support of AI for personal issues and mental-health concerns.
AI’s ubiquity has caused a boom in popularity with people seeking out the support of AI for personal issues and mental-health concerns. YouGov found that nearly a third of young people said they would be comfortable with discussing their mental health with an AI, where nearly half say that ease of access and availability are the most appealing things about seeking out an artificial therapist.
It is not surprising. We know that people are more likely to self-disclose, engage in less impression management, and display emotion more intensely and openly when talking to a computer than to another human (Lucas et al., 2014).
What if an AI is able to console, comfort, and advise? If an AI does not have emotion or personality, will it be able to display the empathy and emotional astuteness needed to support people when they are at their most vulnerable?
The data suggests that AI is very good at displaying empathy. Users are seen to project empathetic qualities, such as warmth and having “great bedside manner”, onto AI outputs, even when they know they are talking to an artificial intelligence (Inzlicht et al., 2023). In certain cases, patients are seen to actually prefer AI responses to their questions and rate them more favourably than responses from trained physicians (Ayers et al., 2023). Empathic AIs are also not held back by human qualities like tiredness (Cikara et al., 2014), bias towards their own in-group (Batson et al., 1995), or avoiding empathetic conversations altogether (Cameron et al., 2019).
If we change our perspective on empathy away from the “provider” to the “receiver”, the ability to be empathetic is evaluated by the impact it has. Research into AI and empathy then takes a different shape and purpose. How does the data an AI is trained on impact its ability to respond dynamically to individual questions and concerns? How do different tones of voice, choice of words, ability to explain concepts clearly, and perceived personality have an effect on perceived empathy? As with human-based interventions, what guardrails and safety measures are needed to ensure the safety of, and avoid harm to, the individual?
The reality is that users do not seem to care whether an AI possesses emotion or empathy itself, but rather whether they perceive themselves to have received empathy: do they feel understood, cared for, and heard?
People are increasingly using AI interfaces to seek support and guidance on personal, often vulnerable, issues. Whether an AI possesses the metaphysical properties of empathy or not is the wrong focus. We need instead to be evaluating AI’s empathy from a functional and pragmatic sense, asking how AI’s expressions of emotion and empathy are being received by users and the impact that it is having on their well-being.
One industry where empathic AI will become disruptive is coaching, where AI is being used to augment, facilitate, and democratise coaching to larger groups of people. Empathy is a cornerstone of effective coaching relationships; the ability for the coach and coaches to build rapport is critical for the success of the intervention.
There is a really important role in understanding how empathetic different LLMs are, how they are interpreted, and what impact this has on the person being coached. However, there is very little research looking at how empathetic interactions with AI are perceived and what impact they have on the coaching experience. If different LLMs may vary significantly in their ability to emulate empathy, this will likely impact how they are interpreted, as well as their effectiveness in driving personal and professional growth. For leaders in the coaching industry, understanding the nuances of AI-generated empathy is going to be key to building great tools that enhance, rather than hinder, the deeply human process of self-reflection and development.
Conclusion
It doesn’t matter whether AI truly has personality or empathy. What matters is that users believe that it does. This perception is reshaping industries, from executive coaching, where empathetic AI challenges the role of human rapport, to personality assessments, where AIs can take and fake assessments better than humans can. Shifting our perspective on AI will transform professions, redefine trust and connection, and challenge leaders to navigate the blurred lines between human and artificial traits. This shift is needed because AI that appears empathetic and human-like is no longer a futuristic concept; it’s already influencing how people are assessed, coached, and developed.
About the Author
Dr. Luke Treglown is a psychologist and data scientist specialising in combining AI and people science. As Director of AI at Thomas, he advances AI-driven solutions to enhance workplace connections. Author of 35+ publications on personality, leadership, and resilience, he explores how AI augments interpersonal and intrapersonal dynamics at work.
References
-
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., … & Smith, D. M. (2023). “Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum”. JAMA internal medicine, 183(6), 589-96.
-
Batson, C. D. et al. (1995). “Immorality from empathyinduced altruism: when compassion and justice conflict”. J. Pers. Soc. Psychol. 68, 1042–54
-
Bodroza, B., Dinic, B. M., & Bojic, L. (2023). “Personality testing of GPT-3: limited temporal reliability, but highlighted social desirability of GPT-3’s personality instruments results”. arXiv preprint arXiv:2306.04308.
-
Borchert, R. J., Hickman, C. R., Pepys, J., & Sadler, T. J. (2023). “Performance of ChatGPT on the situational judgement test – A professional Dilemmas–based examination for doctors in the United Kingdom”. JMIR Medical Education, 9(1), e48978.
-
Cameron, C. D. et al. (2019). “Empathy is hard work: people choose to avoid empathy because of its cognitive costs”. J. Exp. Psychol. Gen. 148, 962–76.
-
Cikara, M. et al. (2014). “Their pain gives us pleasure: how intergroup dynamics shape empathic failures and counterempathic responses”. J. Exp. Soc. Psychol. 55, 110–25.
-
Charmorro-Premuzic, T. (2023). I, Human: AI, automation, and the quest to reclaim what makes us unique. Harvard Business Review Press: Boston, Massachusetts.
-
Chamorro-Premuzic, T. & Ahktar, R. (2023, May). “3 human super talents AI will not replace”. Harvard Business Review : https://hbr.org/2023/05/3-human-super-talents-ai-will-not-replace
-
https://www.independent.co.uk/tech/gen-z-work-ai-research-b2653425.html
-
Hickman, L., Dunlop, P. D., & Wolf, J. L. (2024). “The performance of large language models on quantitative and verbal ability tests: Initial evidence and implications for unproctored high‐stakes testing”. International Journal of Selection and Assessment, 32, 499–511.
-
Huang, J. T., Wang, W., Lam, M. H., Li, E. J., Jiao, W., & Lyu, M. R. (2023). “Chatgpt an enfj, bard an istj: Empirical study on personalities of large language models”. arXiv preprint arXiv:2305.19926.
-
Inzlicht, M., Cameron, C. D., D’Cruz, J., & Bloom, P. (2023). “In praise of empathic AI”. Trends in Cognitive Sciences, 28, 89-91.
-
Jiang, H., Zhang, X., Cao, X., Breazeal, C., Roy, D., & Kabbara, J. (2023). “Personallm: Investigating the ability of large language models to express personality traits”. arXiv preprint arXiv:2305.02547.
-
Kidder, W., D’Cruz, J., & Varshney, K. R. (2024). “Empathy and the right to be an exception: what LLMs can and cannot do”. arXiv preprint arXiv:2401.14523.
-
Kosinski, M. (2024). “Evaluating large language models in theory of mind tasks”. Proceedings of the National Academy of Sciences, 121(45), e2405460121.
-
Lucas, G. M. et al. (2014) “It’s only a computer: virtual humans increase willingness to disclose”, Computers in Human Behaviour, 37, 94–100.
-
Mittelstädt, J. M., Maier, J., Goerke, P., Zinn, F., & Hermes, M. (2024). “Large language models can outperform humans in social situational judgments”. Scientific Reports, 14(1), 27449.
-
Phillips, J., & Robie, C. (2024). “Can a computer outfake a human?”. Personality and Individual Differences, 217, 112434.
-
Pan, K., & Zeng, Y. (2023). “Do LLMs possess a personality? Making the MBTI test an amazing evaluation for large language models”. arXiv preprint arXiv:2307.16180.
-
Tan, F. A., Yeo, G. C., Wu, F., Xu, W., Jain, V., Chadha, A., … & Ng, S. K. (2024). “PHAnToM: Personality has an effect on theory-of-mind reasoning in large language models”. arXiv preprint arXiv:2403.02246.
-
Ullman, T. (2023). “Large language models fail on trivial alterations to theory-of-mind tasks”. arXiv preprint arXiv:2302.08399.
-
Wang, P., Li, L., Chen, L., Cai, Z., Zhu, D., Lin, B., … & Sui, Z. (2023). “Large language models are not fair evaluators”. arXiv preprint arXiv:2305.17926.
-
Weber, E., Rutinowski, J., & Pauly, M. (2024). “Behind the screen: investigating ChatGPT’s dark personality traits and conspiracy beliefs”. arXiv preprint arXiv:2402.04110.
-
Welivita, A., & Pu, P. (2024). “Are large language models more empathetic than humans?”. arXiv preprint arXiv:2406.05063.
-
YouGov: https://business.yougov.com/content/49481-uk-trials-ai-therapy
-
Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., … & Stoica, I. (2023). “Judging LLM-as-a-judge with MT-Bench and Chatbot Arena”. Advances in Neural Information Processing Systems, 36, 46595-46623.