Context Window
The maximum number of tokens a language model can attend to at once, encompassing both the input prompt and the generated output, with current frontier models offering windows from 128k up to 1M+ tokens.
How it works
The context window is the working memory of a language model. Everything the model needs to consider for a given output (system prompt, retrieved documents, conversation history, the user question) has to fit inside it. As context windows have grown from 4k to 1M+ tokens, the architectural choices for enterprise AI have shifted: larger windows allow more retrieved documents per RAG query, longer document analysis in a single pass, and richer multi-turn conversations. Caveats matter: long-context performance degrades compared to short-context performance for the same model, costs scale with context length, and "needle-in-a-haystack" retrieval inside the window is not equivalent to a properly-indexed RAG system. Ayoob AI sizes context windows per workload rather than defaulting to maximum.
Related terms
Large Language Model (LLM)
A neural network trained on large text corpora to predict the next token given context, used for text generation, summarisation, classification, and reasoning tasks across enterprise software.
Tokenisation
The process of splitting text into smaller units (tokens) that a language model treats as atomic, typically using subword algorithms like Byte-Pair Encoding (BPE) or SentencePiece.
Retrieval-Augmented Generation (RAG)
An architecture pattern that grounds language model outputs in retrieved documents from a private corpus, reducing hallucination and enabling answers based on the firm's own data rather than the model's training set.
Want to see this technology in action?
Book a Discovery Call