Large language models are confident even when they are wrong, because by default they answer from patterns learned during training, not from your facts. Retrieval-augmented generation, or RAG, changes that by fetching relevant passages from a knowledge base you control and handing them to the model as context before it writes a word. When retrieval works, the result is an answer grounded in real, citable sources rather than the model's best guess. RAG does not make a model smarter, it makes it better informed at the moment of answering.
What RAG actually is
RAG is a two-part pattern that pairs a retriever with a generator. When a question comes in, the retriever searches a knowledge base for the most relevant passages, those passages are added to the prompt, and the language model then writes its answer using that supplied context. The term and the formal approach come from a 2020 paper by Patrick Lewis and colleagues at Facebook AI Research, which combined a pretrained model's built-in knowledge with an external, searchable corpus.
How grounding reduces errors
A model without retrieval fills gaps with plausible-sounding text, which is where fabricated answers come from. By placing the actual source text in front of the model at answer time, RAG narrows the model's job from recalling facts to summarizing the documents it was just given. Studies and industry practice consistently report fewer fabrications when answers are constrained to retrieved, verifiable content.
How retrieval finds the right passage
Most production RAG systems search by meaning, not just keywords. Documents are split into chunks and converted into numeric vectors called embeddings, stored in a vector database, so a query can match passages that share meaning even when they use different words. Many systems also blend in traditional keyword search to catch exact terms like product codes or names.
RAG versus retraining the model
RAG adds knowledge without changing the model's weights, so you update what the system knows by editing the knowledge base, not by running an expensive training job. That makes it well suited to information that changes often or is private to your business. Fine-tuning, by contrast, bakes patterns and style into the model itself and is better for changing behavior than for keeping facts current.
Where RAG still fails
RAG reduces hallucination but does not eliminate it, and its accuracy is only as good as what the retriever returns. If the right document is missing, outdated, or wrong, the model can confidently repeat that bad information, and if retrieval surfaces nothing relevant the model may fall back on guessing. RAG also shifts some failures from the model to the retrieval step, where a quietly missed document can be harder to spot than an obvious invented fact.
- RAG pairs a retriever with a language model so answers are written from documents you control, not just from training data.
- Grounding works by putting real source text in the prompt, turning the task from recall into summarization and cutting fabricated answers.
- Update knowledge by editing the knowledge base, not by retraining, which is ideal for private or fast-changing facts.
- RAG reduces hallucination but does not remove it, and answer quality depends directly on the quality and freshness of your sources.
- A well-curated, accurate knowledge base is the single biggest lever on whether a RAG system tells the truth.
