AI Chatbots Show Dementia: A Surprising Study

Discover how AI chatbots like GPT-4 and Gemini exhibit dementia-like traits in cognitive tests, raising questions about AI's readiness for healthcare.

Arva Rangwala
AI Chatbots Show Dementia

Artificial intelligence (AI) is evolving rapidly, but according to a recent study published in The BMJ, even advanced AI chatbots are showing signs of mild cognitive impairment when subjected to tests designed for early dementia detection in humans. Surprisingly, older versions of these chatbots performed worse, mimicking the age-related cognitive decline observed in human patients.

The Impact of Chatbot Age on Performance

The study highlighted an intriguing parallel between human cognitive aging and AI models. Older chatbots consistently underperformed on cognitive tests compared to their newer versions. This phenomenon, referred to as “digital aging,” was particularly evident in Google’s Gemini models. For instance, the older Gemini 1.0 scored significantly lower on tests than the newer Gemini 1.5, despite being released less than a year apart.

Key Findings

  • ChatGPT-4 vs. ChatGPT-4o: ChatGPT-4 demonstrated minor losses in executive function compared to its predecessor, ChatGPT-4o.
  • Gemini Models: Gemini 1.5 outscored Gemini 1.0 by six points on the Montreal Cognitive Assessment (MoCA) test.
  • Visuospatial Reasoning: All models struggled with tasks such as the Trail Making B test and the clock-drawing exercise. Notably, Gemini 1.5 produced an avocado-shaped clock, a pattern associated with dementia in humans.

These results challenge the assumption that AI systems are ready to replace human doctors, particularly in cognitive assessment tasks. The findings emphasize the need for continuous refinement and evaluation of AI in healthcare applications.

Comparing GPT-4 and Gemini Models

GPT-4 and Gemini are among the most advanced AI chatbots today, each excelling in different areas. Here’s how they stack up:

FeatureGPT-4Gemini
StrengthsNatural language processing,Multimodal capabilities handling text,

mathematical reasoning, and codingimages, and other data types
PerformanceSuperior in complex reasoningExcels in specific tasks like digital

and language tasksadvertising
MoCA Test Score26 (ChatGPT-4o), 25 (GPT-4)16 (Gemini 1.0), 22 (Gemini 1.5)
Visuospatial TasksStruggles with visuospatial tasksSimilar struggles
Unique CapabilitiesStrong contextual adaptabilityAdvanced targeting in advertising

While GPT-4 outshines Gemini in complex reasoning and language understanding, Gemini’s multimodal prowess makes it a strong contender in areas like digital advertising. Both models continue to evolve, and their strengths could diversify as AI technology advances.

AI Cognitive Decline and MoCA Results

The Montreal Cognitive Assessment (MoCA) is a widely used tool for detecting early dementia in humans. When administered to AI chatbots, the results were surprising:

  • Top Performer: ChatGPT-4o scored the highest with 26 out of 30.
  • Mid-Range: ChatGPT-4 and Claude scored 25 each.
  • Lowest Score: Gemini 1.0 trailed with 16 points.

A score below 26 typically indicates mild cognitive impairment. Additionally, all chatbots struggled significantly with visuospatial tasks. Only ChatGPT-4o succeeded in the incongruent stage of the Stroop test, a measure of cognitive flexibility.

Performance Highlights

  • Visuospatial and Executive Functions: Uniform difficulty across all models.
  • Clock-Drawing Test: Gemini 1.5’s avocado-shaped clock underscored challenges in interpreting such tasks.
  • Stroop Test: ChatGPT-4o stood out by passing a critical stage of the test.

What These Findings Mean for AI in Healthcare

These revelations raise critical questions about the reliability of AI in sensitive applications like medical diagnostics. While AI shows immense potential, these cognitive limitations highlight areas where human expertise remains indispensable.

Implications for Development

  1. Continuous Updates: Ensuring AI systems don’t “age” digitally requires regular updates and retraining.
  2. Task-Specific Models: Developing specialized AI for healthcare might yield better results than relying on general-purpose models.
  3. Human Oversight: AI should complement, not replace, human professionals in critical fields.

Quick Comparison Table

ModelMoCA ScoreStrengthsWeaknesses
ChatGPT-4o26Cognitive flexibility, NLPStruggles with visuospatial
ChatGPT-425Language and mathSimilar struggles
Claude25General performanceExecutive functions
Gemini 1.522Multimodal capabilitiesPoor visuospatial tasks
Gemini 1.016Baseline functionalitySignificant impairments

Conclusion

The study’s findings reveal that even the most advanced AI models are not immune to cognitive limitations. While they excel in many areas, their performance in cognitive assessments highlights the need for ongoing improvements. As AI technology continues to evolve, maintaining a balance between innovation and reliability will be key, especially in critical applications like healthcare.

Share This Article
Leave a comment