Artificial intelligence (AI) is evolving rapidly, but according to a recent study published in The BMJ, even advanced AI chatbots are showing signs of mild cognitive impairment when subjected to tests designed for early dementia detection in humans. Surprisingly, older versions of these chatbots performed worse, mimicking the age-related cognitive decline observed in human patients.
The Impact of Chatbot Age on Performance
The study highlighted an intriguing parallel between human cognitive aging and AI models. Older chatbots consistently underperformed on cognitive tests compared to their newer versions. This phenomenon, referred to as “digital aging,” was particularly evident in Google’s Gemini models. For instance, the older Gemini 1.0 scored significantly lower on tests than the newer Gemini 1.5, despite being released less than a year apart.
Key Findings
- ChatGPT-4 vs. ChatGPT-4o: ChatGPT-4 demonstrated minor losses in executive function compared to its predecessor, ChatGPT-4o.
- Gemini Models: Gemini 1.5 outscored Gemini 1.0 by six points on the Montreal Cognitive Assessment (MoCA) test.
- Visuospatial Reasoning: All models struggled with tasks such as the Trail Making B test and the clock-drawing exercise. Notably, Gemini 1.5 produced an avocado-shaped clock, a pattern associated with dementia in humans.
These results challenge the assumption that AI systems are ready to replace human doctors, particularly in cognitive assessment tasks. The findings emphasize the need for continuous refinement and evaluation of AI in healthcare applications.
Comparing GPT-4 and Gemini Models
GPT-4 and Gemini are among the most advanced AI chatbots today, each excelling in different areas. Here’s how they stack up:
Feature | GPT-4 | Gemini |
Strengths | Natural language processing, | Multimodal capabilities handling text, |
mathematical reasoning, and coding | images, and other data types | |
Performance | Superior in complex reasoning | Excels in specific tasks like digital |
and language tasks | advertising | |
MoCA Test Score | 26 (ChatGPT-4o), 25 (GPT-4) | 16 (Gemini 1.0), 22 (Gemini 1.5) |
Visuospatial Tasks | Struggles with visuospatial tasks | Similar struggles |
Unique Capabilities | Strong contextual adaptability | Advanced targeting in advertising |
While GPT-4 outshines Gemini in complex reasoning and language understanding, Gemini’s multimodal prowess makes it a strong contender in areas like digital advertising. Both models continue to evolve, and their strengths could diversify as AI technology advances.
AI Cognitive Decline and MoCA Results
The Montreal Cognitive Assessment (MoCA) is a widely used tool for detecting early dementia in humans. When administered to AI chatbots, the results were surprising:
- Top Performer: ChatGPT-4o scored the highest with 26 out of 30.
- Mid-Range: ChatGPT-4 and Claude scored 25 each.
- Lowest Score: Gemini 1.0 trailed with 16 points.
A score below 26 typically indicates mild cognitive impairment. Additionally, all chatbots struggled significantly with visuospatial tasks. Only ChatGPT-4o succeeded in the incongruent stage of the Stroop test, a measure of cognitive flexibility.
Performance Highlights
- Visuospatial and Executive Functions: Uniform difficulty across all models.
- Clock-Drawing Test: Gemini 1.5’s avocado-shaped clock underscored challenges in interpreting such tasks.
- Stroop Test: ChatGPT-4o stood out by passing a critical stage of the test.
What These Findings Mean for AI in Healthcare
These revelations raise critical questions about the reliability of AI in sensitive applications like medical diagnostics. While AI shows immense potential, these cognitive limitations highlight areas where human expertise remains indispensable.
Implications for Development
- Continuous Updates: Ensuring AI systems don’t “age” digitally requires regular updates and retraining.
- Task-Specific Models: Developing specialized AI for healthcare might yield better results than relying on general-purpose models.
- Human Oversight: AI should complement, not replace, human professionals in critical fields.
Quick Comparison Table
Model | MoCA Score | Strengths | Weaknesses |
ChatGPT-4o | 26 | Cognitive flexibility, NLP | Struggles with visuospatial |
ChatGPT-4 | 25 | Language and math | Similar struggles |
Claude | 25 | General performance | Executive functions |
Gemini 1.5 | 22 | Multimodal capabilities | Poor visuospatial tasks |
Gemini 1.0 | 16 | Baseline functionality | Significant impairments |
Conclusion
The study’s findings reveal that even the most advanced AI models are not immune to cognitive limitations. While they excel in many areas, their performance in cognitive assessments highlights the need for ongoing improvements. As AI technology continues to evolve, maintaining a balance between innovation and reliability will be key, especially in critical applications like healthcare.