Tokenisation
The process of splitting text into smaller units (tokens) that a language model treats as atomic, typically using subword algorithms like Byte-Pair Encoding (BPE) or SentencePiece.
How it works
Language models do not see characters or words directly. They see tokens: numerical IDs for chunks of text that are typically 1 to 4 characters long. Tokenisation matters because every commercial inference cost, context window limit, and rate limit is measured in tokens. A 1,000-word document is roughly 1,300 tokens in English, but the same content in code, in a non-Latin script, or in an unusual format can produce significantly more. For enterprise deployment this drives concrete decisions: how to chunk documents for RAG (chunks need to fit in context with room for the question and the answer), how to estimate inference cost, and where to invest in prompt compression. Ayoob AI accounts for tokenisation explicitly in production systems, and the engineering blog covers the substrate at depth.
Related terms
Large Language Model (LLM)
A neural network trained on large text corpora to predict the next token given context, used for text generation, summarisation, classification, and reasoning tasks across enterprise software.
Context Window
The maximum number of tokens a language model can attend to at once, encompassing both the input prompt and the generated output, with current frontier models offering windows from 128k up to 1M+ tokens.
Want to see this technology in action?
Book a Discovery Call