The Ghost in the Machine

Decoding AI Hallucinations

Feb 10, 2025

Have you ever been asked a question you didn’t know the answer to? Some of us might readily admit we don’t know. But often, when faced with uncertainty, we try to make something up—hoping it sounds plausible and helps us navigate the situation. Of course, there are also those who simply admit, “I don’t know,” and that’s perfectly okay too.

This tendency becomes especially evident during an exam, particularly when confronted with a question you haven’t studied for. If the question carries significant weight toward your overall score, you might find yourself writing down as much relevant information as possible, hoping to stumble upon the correct answer. Later, when reviewing the exam and learning the correct response, you may feel embarrassed, realizing your answer didn’t make sense at all. Why did this happen? You made something up because you simply didn’t know.

But why are we discussing this human tendency to invent answers when unsure? As mentioned in the previous article series, humans and machine learning models share notable similarities. While we may not fully understand the intricacies of machine learning models yet (hopefully, this newsletter helps as you learn alongside me), we do understand ourselves. By comparing and contrasting humans with machines, we can gain better insights into how these models function.

Now, let’s focus on large language models (LLMs). They “make things up,” too. Unlike humans, however, machines lack consciousness or intent. So, why does this happen? Is there a “ghost in the machine” causing this anomaly?

When an LLM generates creative content, its ability to “make things up” can be highly desirable. But when the goal is to provide factual and accurate responses, this behavior becomes problematic and even misleading. This phenomenon—where an AI generates non-factual or illogical responses—is commonly referred to as AI or LLM hallucinations, or simply hallucinations.

A human and a robot both answering Pluto to the question What is the capital of Neptune? — Humans and Machine hallucinate ( Generated By Copilot)

This issue was particularly evident in the early days of LLMs. For example, when Google Bard was asked, “What new discoveries from the James Webb Space Telescope can I tell my 9-year-old about?” it provided three bullet points, one of which falsely claimed that the telescope “took the very first pictures of a planet outside of our own solar system.” This statement was later proven to be factually incorrect.

In another case, a lawyer relied on ChatGPT to prepare legal documents, mistakenly using it as a search engine. The tool produced fabricated case references, which turned out to be entirely made up. On a lighter note, a humorous example shared on Reddit described an early version of Google Gemini. When congratulated for quitting smoking crack, the AI amusingly responded by thanking the user and expressing pride in taking that big step.

r/GeminiAI - These are some of the funniest hallucinations and why you shouldn't depend on Gemini for accurate information whatsoever

So, what’s the point? Models make things up too—just like humans. The reasons behind it are often surprisingly similar to what you might imagine. Let’s explore these reasons in detail.

1. Incorrect or Irrelevant Learning

Imagine you’re learning the capitals of different countries. For some reason, you believe that the capital of Canada is Toronto, and you write that in an exam. However, it turns out to be incorrect because you learned it wrong. This analogy applies to models as well.

In earlier discussions, we explored embeddings and output projection weights. By now, we understand that training helps models associate semantically similar words closely. If the data quality is subpar or the model data is incorrect, similar problem could be encountered. The model learns statistical relationships within the data. If the data predominantly associates 'Toronto' with 'Canada' in contexts that imply capital cities, the model might make an incorrect prediction.

Incorrect learning in models doesn’t happen in the human sense but arises from statistical associations influenced by the training data. Factors such as poor data quality, data bias, or even a lack of relevant data contribute to this. Additionally, data learned by the model might become outdated or irrelevant because real-world information is constantly evolving.

2. Contextual Knowledge

Consider this hypothetical scenario:

Biologist: "The kernel of the problem lies in understanding the genetic code."
Computer Scientist: "Kernel? You mean the core of the operating system?"
Biologist: "No, not quite. In biology, a kernel is the central part of a seed, containing the embryo."
Computer Scientist: "Ah, I see. In computer science, a kernel is the core component of an operating system, responsible for managing hardware and software resources."

Here, the word "kernel" is interpreted differently based on each person's contextual background. Similarly, models rely heavily on the context provided in the input as well as their training data. When the context is ambiguous or incomplete, they may produce irrelevant or nonsensical responses.

For example, a general-purpose model like ChatGPT might not offer as precise a response as a custom model trained for a specific domain, such as legal analysis. Ambiguity or lack of context in input can lead to illogical outputs.

3. Session Contextual Drift

Have you ever had a long conversation with someone and, over time, found yourself completely off-topic or providing irrelevant answers? Perhaps you were trying to solve a problem but got distracted and overlooked key facts. Conversely, writing things down step by step often helps you stay focused and solve problems more effectively.

This phenomenon is similar to what happens with large language models (LLMs).

Session contextual drift refers to a model’s diminishing ability to maintain coherence and relevance during extended interactions. As conversations grow longer, the model might lose track of earlier details, resulting in less aligned responses.

For example, imagine you’re planning a trip to Paris with a virtual assistant. You initially mentioned your interest in art museums. As the conversation shifts to other details like cafes, hotels, and transportation, the assistant loses track of your earlier mention of Paris. When you later revisit the topic of museums, it might suggest museums in London or another random location because it has "forgotten" your initial request.

The model also struggles to understand when you say, "Let’s circle back to our first topic," because it no longer has access to earlier parts of the conversation. This loss of context, or session contextual drift, can lead to irrelevant or incorrect outputs, particularly in tasks requiring cross-referencing or reasoning across multiple subtopics.

We’ve identified some of the key causes of AI hallucinations. Simply put, the more diverse, up-to-date, specific, and high-quality the training data, and the clearer and less ambiguous the prompt, the less likely AI hallucinations will occur. On the other hand, any issues with data quality or prompt clarity can significantly contribute to hallucinations. So, what techniques are available to address the problems we’ve discussed?

1. Model Fine-Tuning

To explain this concept simply, let’s revisit the exam example mentioned earlier. Imagine you faced a question in an exam that you didn’t know the answer to, so you made something up on the spot. Later, realizing you lacked knowledge, you went back, learned the correct answer, and updated your understanding. The next time this question comes up in an exam or a casual conversation, you can confidently explain the concept based on your newly acquired knowledge. Essentially, you’ve “fine-tuned” yourself.

This is similar to how education works. In school, you’re exposed to a broad range of subjects, giving you general knowledge across multiple domains—essentially “pretraining” for life. When you go to college, you choose a specific field of interest, such as engineering or medicine, and spend years acquiring specialized skills. By the time you graduate, your understanding of your chosen subject is much deeper, enabling you to deliver better results compared to someone without that specialization.

Now, let’s connect this analogy to model fine-tuning. Large Language Models (LLMs) are like students going through this process. During pretraining, the model is trained on massive amounts of general-purpose data, giving it broad but non-specific knowledge, much like school provides foundational life skills. Through fine-tuning, the model is updated with specialized, domain-specific, or real-world data (e.g., medical records, legal documents, or industry-specific datasets). This enables the fine-tuned model to deliver precise, contextually relevant, and accurate results in its focused domain—just as a college graduate applies their specialized knowledge.

When fine-tuning a model, it adjusts its "weights," which are the underlying parameters determining how it processes information. This is akin to how specialized learning refines your thought process and reasoning. For example, after fine-tuning for engineers, a model will interpret the word “kernel” as the core of an operating system unless otherwise specified in the input.

However, the quality of fine-tuning depends heavily on the quality of the training data. Poor-quality data can lead to inaccurate outputs or increase the likelihood of hallucinations (fabricated information). Similarly, flawed education or misinformation in humans can result in faulty reasoning.

Though fine-tuning is time-intensive and resource-demanding, it significantly enhances the model’s relevance and reliability in its designated domain—just as specialized education improves a person’s capabilities.

2. RAG ( Retrieval Augmented Generation )

RAG may sound technical at first, but it consists of three simple concepts: retrieval, augmentation, and generation. At its core, RAG retrieves relevant external information to enhance (or augment) the model’s ability to generate accurate and contextually relevant responses.

Issues such as data bias, lack of diversity, or irrelevant training data can lead to hallucinations in language models. RAG mitigates these issues by supplementing the model’s reasoning with external data. Let’s break this down using an analogy.

Analogy: The Regular Exam vs. Open Book Exam

Imagine you’re taking an exam. In a regular exam, you rely solely on your memory and reasoning. If you haven’t studied enough, you might make things up to answer questions, potentially providing incorrect responses.

Now, consider an open-book exam, where you can refer to your textbooks and notes. Here, when faced with a question, you can look up the specific section related to the topic, understand the concept, and craft a coherent answer. You’re no longer guessing; instead, you’re augmenting your knowledge with external resources to provide accurate responses.

This is the essence of RAG. Just as referencing textbooks improves performance in an open-book exam, RAG retrieves external information, combines it with the model’s reasoning, and generates a response enriched with relevant context.

How RAG Works: The Technical Breakdown

Retrieval Step
Based on the query (or "prompt"), the system retrieves relevant information from an external source, such as a database, document repository, or search engine. This retrieved information serves as the "open book" material.
Augmentation Step
The retrieved data is combined with the input prompt and injected as additional context. This augmented input provides the model with the necessary background to process the query accurately.
Generation Step
Using its pretrained reasoning abilities, the model processes the augmented input to craft a contextually accurate response.

The Challenge of Large Data Retrieval

Fetching relevant information from massive datasets can be time-consuming. Imagine you’re taking an open-book exam but need to search multiple books in a document library for answers. Without a proper index, you might waste valuable time searching and still miss important details.

Vector Search: A Solution for Efficient Retrieval

Now imagine you’ve prepared an index before the exam, listing the location of specific topics in your books. This index allows you to quickly pinpoint the required chapters or pages, saving time and ensuring accuracy.

In RAG, vector search acts as this index. The system creates embeddings—mathematical representations of relationships between words or concepts—for all relevant documents. When a query is made, the embeddings help retrieve the most relevant information quickly and efficiently.

Real-World Applications of RAG

For instance, consider a copilot tool integrated with a knowledge repository (e.g., SharePoint). Embeddings are created for all repository documents in advance. When a user queries the copilot, the system uses the embeddings to locate relevant context and provide a domain-specific response.

Similarly, modern chatbots like ChatGPT, Claude, and Perplexity integrate with search engines to retrieve real-time information, supplementing their pre trained knowledge with up-to-date data.

3. Prompt Engineering Techniques

We’ve explored methods like model fine-tuning and Retrieval-Augmented Generation (RAG) to address issues like hallucinations and contextual drift. These techniques improve the model’s accuracy, relevance, and context awareness. Notably, RAG also helps mitigate session contextual drift by referencing external sources to maintain coherence over long conversations.

But how do we make prompts less ambiguous to further improve AI performance?

Chain-of-Thought Prompting (CoT)

CoT prompting encourages models to break down complex tasks step-by-step, maintaining context and coherence throughout the process. For example, when solving a mathematical equation, the model approaches it incrementally, much like a human would. This promotes System 2 thinking—logical and deliberate reasoning—over the more error-prone System 1 thinking, which is fast and intuitive.

Other Prompting Techniques

Zero-Shot Prompting: Instruct the model to perform a task without providing examples, relying on clear, detailed instructions.
Few-Shot Prompting: Provides the model with a few examples of input-output pairs to help it identify patterns and generate accurate responses.

Setting a Persona for Clarity

Another effective strategy is defining the model’s role and context. For example, instructing the model, “You are a travel assistant helping me plan a trip to Paris,” narrows its focus and avoids irrelevant suggestions.

Overall, these are some of the many effective prompt techniques to make the prompt less ambiguous.

Final Thoughts

To answer affirmatively: there are no ghosts in the machine. Just like in Interstellar, where the mysterious "ghost" turned out to be Cooper himself sending information through the Tesseract, hallucinations in AI are not supernatural anomalies. Instead, they are entirely explainable phenomena arising from human input—whether through training data or ambiguous prompts.

Hallucinations occur due to imperfections in the data or unclear instructions. However, by grounding models in relevant, diverse, and unbiased data, along with thoughtful prompt design and advanced techniques like fine-tuning, RAG, and Chain-of-Thought prompting, we can significantly reduce these issues. By exploring new approaches, we can pave the way for more reliable and trustworthy AI systems that can effectively augment human capabilities in a wide range of applications.

Sources

Thank you for reading Dheena's Newsletter. This post is public so feel free to share it.

Dheena's Newsletter