Question 1

What is Retrieval-Augmented Generation (RAG)?

Accepted Answer

An architecture pattern that grounds language model outputs in retrieved documents from a private corpus, reducing hallucination and enabling answers based on the firm's own data rather than the model's training set.

Question 2

How does Retrieval-Augmented Generation (RAG) work?

Accepted Answer

RAG is how enterprises actually deploy language models on internal knowledge. The pattern is: index the firm's documents into a vector database, retrieve the top-k most relevant chunks for a given user question, and pass them to the language model as context for answer generation. The output is grounded in the firm's own material, with citations back to source documents. RAG works because the language model does not need to know the firm's data; it only needs to read and reason over it at inference time. For UK professional services firms, NHS trusts, FCA-regulated firms, and engineering organisations, RAG on a private corpus is the standard architecture for knowledge retrieval, contract review, clinical-letter triage, and engineering-knowledge access.

Retrieval-Augmented Generation (RAG)

How it works

Related terms