Edge AI
AI inference that runs on local devices (browsers, phones, embedded hardware) rather than in a server-side data centre, used for low-latency, offline-capable, or privacy-preserving deployments.
How it works
Edge AI is increasingly viable as model architectures get smaller and runtime frameworks (WebGPU, ONNX Runtime, Core ML, TensorFlow Lite, llama.cpp) get better at running on consumer hardware. The use cases are specific: real-time interaction where 100ms server round-trip is too slow, offline-capable workflows where connectivity is intermittent, and strict-privacy workflows where data should never leave the device. Ayoob AI uses edge AI specifically for browser-side inference on WebGPU, where GPU acceleration in the browser enables sub-second analytics, search, and processing without server round-trips. The patent portfolio (GB2606693.6 and others) covers GPU compute infrastructure underneath this pattern.
Related terms
WebGPU Compute Shaders
Massively parallel data processing pipelines that execute within the browser security sandbox, enabling GPU-accelerated computation without native application installation or server round-trips.
AI Inference
The process of running a trained AI model on input data to produce an output, distinguished from training (which produces the model) and fine-tuning (which adapts it).
Private AI
AI deployed on infrastructure the client controls (on-premise, in the client's cloud tenancy, or air-gapped), with no third-party LLM provider in the data path and no inference-time data export.
Want to see this technology in action?
Book a Discovery Call