Here’s something that should make you think twice before asking ChatGPT about that headache.
Oxford University ran a study earlier this year. They gave the same medical scenarios to AI chatbots, twice. First with 100 doctors at the keyboard. Then with 1,300 ordinary people.
The doctors got 95% accuracy. The ordinary people got 34%.
Same AI. Same scenarios. Different humans.
Why the gap? And what does it tell us about how the rest of us should be using AI at work?
This week’s ThrowForward Thursday gets into the Dunning-Kruger problem with large language models, why AI in the hands of a non-expert can be a liability, and how our 5T AI Impact Model helps organisations get measurable value out of AI rather than chasing the hype.
Resources Mentioned:
- Oxford study: New study warns of risks in AI chatbots giving medical advice.
- BBC Coverage: Should you really trust health advice from an AI chatbot?
- The 5T AI Impact Model: Masterclass for Leaders who’ve heard enough about AI and now want to do something useful with it.
TRANSCRIPT
Should you use generative AI for medical advice?
The short answer is no, although the medium answer is yes if you are already a doctor. The longer answer is this…
My name is Graeme Codrington. This is ThrowForward Thursday. AI as a medical doctor, yes or no?
The Reasoning with Machines Laboratory, located at the University of Oxford in the UK, did a remarkable piece of research earlier this year. There are links in the notes if you’re interested in reading up more about it. They put together a number of scenarios, scenarios of people who might be having medical problems, some of them minor, some of them incredibly serious, with the need to call an ambulance immediately.
And they took these scenarios firstly to a group of about 100 medical professionals and asked the medical professionals to put the symptoms in. It’s things like, um, I’ve got a striking headache, I’m losing vision in my left eye, my arm is tingling, that sort of thing. And then ask the system, any one of the various large language models and generative AI platforms, ask them, you know, give me a diagnosis, tell me what to do.
When those 100 medical professionals put that information into these systems, they got about a 95% accuracy rate. And 95% would be good enough to say we can use these systems for medical assistance. But, that’s not the end of the story.
Because they then took the exact same scenarios and to 1,300 randomly chosen normal people without medical training and asked them to do the same thing. And there, the accuracy and validity of the responses of these AI platforms plummeted to only about a third, 34% accuracy, which is a disaster.
Um, and, and basically was medically not, I want to say medically problematic, but it was actually dangerous. And here I think you can work out what we should be doing, what the answer is should we use AI for medical advice.
The answer is, if you are already a medical professional with medical expertise, you will have the language of medicine. You, instead of saying, “My vision is going a bit blurry, I have a headache,” you might be able to say,
“The vision in my left eye has begun to blur at the edges. I have a headache that is pounding in the back of my head towards the top of my neck at vertebra number this or that,” and you would describe the tingling in your arm in a very particular way.
Because you already have a base level of knowledge and an understanding of the language that you might use, tingling, for example, might not be the right word.
What happens is the large language model, which probably does have the right answer, the way you prompt it, will enable it to get to a better answer for you and you will also be able to push back and have some level of filter to the first response, to be able to ask an additional question or realise it didn’t quite understand what you had said, give it a bit more information and get to 95%. In the hands of somebody who doesn’t have that expert-level knowledge already, it becomes a disaster.
This is a wonderful example of the Dunning-Kruger effect. I’m sure you know this: the effect of knowing that when you don’t know what you don’t know, it’s very easy to think that you are an expert. As soon as you begin to learn something about a topic, you realise how little you know, and you underestimate your ability until you develop a little bit more expert knowledge, and then you’ve got a little bit more confidence, but also a healthy dose of humility at the same time.
The problem with large language models is they are designed to give very confident, very specific, very authoritative-sounding responses. Even when they are not actually as clear and authoritative and factual as they should be, and the bottom line is large language models have no way of self-checking themselves. They have no way of applying the Dunning-Kruger effect filter to themselves.
That’s a human thing that we can do, and large language models literally can’t even be programmed to do that yet. Maybe in the future. But for now, not the future, for now, if you have expertise in a field, large language models become a fantastic tool to improve and increase your expertise. If you are not an expert, they can be dangerous.
Our team at TomorrowToday has built a model that we call the 5T AI Impact Model, which helps organisations to think about where and when AI can be best used and how to get the best out of AI. The 5 Ts refer to 5 different levels at which AI can be implemented.
Very, very simply, the first level is about productivity, personal productivity gains with automating and improving your tasks. The second is when AI becomes a team member; this is when you, who already have expertise, allow AI to be a tool as part of your team to improve and increase your expertise.
The third level is, by the way, where most businesses will see real bottom-line profit improvements. This is why most companies are not seeing it, because this together level, or sometimes called the transversal level, is where you look for workflow redesign. And we believe that that’s where you start to see the real value and we also know that about 95% of our clients haven’t even got there yet. They still at these first two levels, maybe even only the first level.
The fourth level is what everybody wants, which is transformative innovation, but unless you’ve built the foundation of these other three things, you’re not going to get that fourth level.
And the fifth level is more of a pillar that runs throughout this, and that is, we have to trust the systems. And that comes back to that Oxford University research, that we can only trust the system if we know the sorts of things the system should be producing.
AI is never going to replace us as humans. AI is going to augment existing human expertise, supercharge, and make us bionic in what we do. That is how you really get impact, measurable bottom-line impact from AI.
Slightly different in the ThrowForward Thursday studio this week, maybe more of a PSA this week than a future focus, but if you’d like to know more about our 5T model, and if you’d like to know more about how to unlock real AI value, getting beyond the hype, make sure you contact our team for a discovery call. We’d love to chat to you and take you into tomorrow’s world today.
I’ll see you next week in our studio.
At TomorrowToday Global, we help clients around the world analyse major global trends, developing strategies and frameworks to help businesses anticipate and adapt to market disruption in an ever-changing world.
Subscribe to our team’s weekly newsletter filled with insights and practical resources to help you succeed in the future of work.
For all enquiries, please use this email: [email protected]
Graeme speaks to 100,000+ people a year in 150+ countries and is a 2× TEDx speaker and best-selling author. He’s also ranked #17 in the Global Gurus “Top 30 Futurist Professionals” for 2026.
Chat to us about booking Graeme to help you unlearn, re-think and re-imagine your strategy and upgrade your thinking to identify the emerging opportunities in your industry.

