WebGPU
27 articles on WebGPU from Ayoob AI, the full code AI automation agency based in Newcastle upon Tyne.
The Ayoob AI Architecture: Merging CPU, Workers, and WebGPU
A complete architectural overview of our heterogeneous dispatch engine. Every operation flows through workload characterization, precision analysis, and dispatch scoring before routing to the optimal tier: CPU main thread, SharedArrayBuffer Web Workers, or WebGPU compute. Cascading fallback guarantees execution continuity.
Trust but Verify: Validating GPU Float32 Math on the CPU
Our post-dispatch spot-check verification selects 16 elements from the GPU's Float32 output, re-computes them in Float64 on the CPU, and compares. If relative error exceeds the tier-specific tolerance (10^-4 for medium sensitivity, 10^-6 for high sensitivity), the engine discards the GPU result and re-executes on CPU. Speed first, correctness guaranteed.
Arithmetic Intensity: Why Matrix Multiplication Loves WebGPU
Why matrix multiplication is the one operation your browser's GPU was built for, and how Newcastle AI teams use it to replace six-figure cloud bills.
Why Hardcoded GPU Dispatch Thresholds Fail in the Browser
Hardcoded GPU thresholds break across devices. Self-calibrating dispatch makes AI software fast on every laptop, engineered for UK SMB workloads.
Managing WebGPU Memory Limits for Enterprise Datasets
Browser GPUs share memory with rendering and enforce strict allocation limits via maxStorageBufferBindingSize. Our engine queries these limits at runtime, routes oversized datasets to CPU unconditionally, and uses a size-bucketed buffer pool to eliminate repeated allocation overhead and prevent memory leaks.
Predicting GPU Hash Map Collisions with the Chao1 Estimator
GPU databases crash when GROUP BY cardinality is guessed wrong. The Chao1 estimator predicts it, used in our Newcastle-built analytics engine.
Executing SQL WHERE Clauses on the GPU with Dictionary Encoding
Filtering 10M customer records on a GPU in a browser, under 200ms. The technique powering our Newcastle AI data-query engines.
The Variable-Width Problem: Why UTF-8 Breaks WebGPU Text Search
GPUs break on variable-width text (apostrophes, emojis, names). Our UTF-8-safe engine is why Newcastle law and finance firms trust our AI search.
Bypassing Array.prototype.sort() with IEEE 754 Bit-Transforms
V8's TimSort coerces numbers to strings and cannot use parallel hardware. Our Adaptive Multi-Tier Sorting System transforms IEEE 754 floats to sort-order-preserving unsigned integers using two bitwise operations, enabling radix-256 sort on CPU workers and a two-phase GPU bitonic-merge sort with 1.45x speedup over Web Workers at 5M+ elements on discrete GPU.
Why We Built the First Non-Comparison Float Sort in JavaScript (And Open Sourced It)
Array.prototype.sort() is broken for numerical data. We built a three-tier adaptive sorting engine that dispatches between CPU, Web Workers, and WebGPU compute shaders based on dataset characteristics. Here is why, and how.
Building Fault-Tolerant AI Workflows: Handling WebGPU Device Loss
Browser GPUs crash, drivers reset, and hardware context vanishes without warning. Our cascading fallback architecture registers on the GPUDevice.lost promise, invalidates all cached state, re-dispatches to CPU workers within the same microtask, and re-probes hardware on the next invocation.
WebGPU Atomic Contention: When to Stop Using the GPU
Sometimes the GPU is slower than the CPU. Knowing when is the real engineering, the decision logic behind our Newcastle AI builds.
Why On-Device WebGPU Architecture is Cheaper Than Cloud LLM APIs
Routing every sort, filter, and aggregation to a cloud server costs $0.12 to $0.85 per 1,000 queries at scale. Our adaptive dispatch engine profiles local hardware via navigator.gpu.requestAdapter() and routes computation to the client GPU, eliminating server compute costs for data transformation entirely.
Preventing Silent Numerical Degradation in GPU-Accelerated Finance AI
GPU-accelerated finance AI silently loses precision below the pound. Our Float32 safety guard catches it before it hits your ledger, engineered in Newcastle.
Eliminating PCIe Bus Bottlenecks in Enterprise AI Compliance Tools
Most compliance AI wastes 80% of its time shuffling data between CPU and GPU. We eliminated that. Built for UK regulated industries.
Real-Time Threat Detection with GPU-Accelerated Streaming Corpora
Live log streams grow continuously. Our searched-frontier tracking mechanism extends the corpus buffer without re-encoding existing documents and dispatches the GPU only against unsearched data beyond the frontier offset. Atomic contention detection prevents non-linear slowdowns when match density spikes.
Eliminating Bot Networks: Two-Phase GPU Pattern Matching for Gaming Anti-Cheat
Standard regex cannot run on GPUs due to SIMD branch divergence. Our two-phase pattern matching engine uses character frequency histograms in 16 KB shared memory to eliminate 97% of candidates before byte-level matching, enabling sub-second fraud detection across millions of chat messages.
Sub-200ms Hospitality CRMs: Moving SQL Relational Operators to WebGPU
Server-side CRM queries add 150 to 400 ms per interaction. Our Adaptive WebGPU Data Query Engine runs relational operators on in-memory columnar data at the client, using dictionary encoding for GPU string processing and a 6-factor scoring function for per-operator dispatch.
Mitigating Atomic Contention in Parallel Browser Environments
When thousands of GPU threads compete for the same atomic memory address, throughput collapses non-linearly. Our engine profiles expected output density and assigns a categorical penalty of negative infinity when contention exceeds safe thresholds, routing to CPU before the GPU stalls.
The Hidden Compute Costs of Array.prototype.sort() in Enterprise SaaS
V8's TimSort performs 20 million comparator callbacks per million elements, each crossing the native-to-JS boundary. Our adaptive sorting system bypasses this entirely with IEEE 754 bit-transforms and a two-phase GPU sort: local bitonic in shared memory, global rank merge via parallel binary search.
Engineering Resilient Compute Pipelines: Handling WebGPU Device Loss
Browser GPUs crash, drivers update, and hardware context vanishes without warning. Our engine detects device loss via the GPUDevice.lost promise, invalidates all cached state, and transparently re-dispatches to CPU within the same operation.
Why Reduced-Precision GPU Arithmetic is Dangerous for Enterprise Finance
WebGPU forces Float64 financial data into Float32, silently corrupting values above 16,777,216. Our Precision Sufficiency Analyser estimates condition numbers and accumulation bounds to prevent GPU dispatch when precision loss exceeds tolerance.
The Two-Phase GPU Text Search Algorithm for Massive Log Files
Brute-force pattern matching on 1 million log entries takes 800 ms on CPU. Our two-phase GPU algorithm uses a character frequency histogram pre-filter in 16 KB shared memory to eliminate up to 97% of candidates before byte-level matching begins.
IEEE 754 Bit-Transforms for High-Speed Float Processing in JavaScript
JavaScript uses Float64. WebGPU requires Float32. The IEEE 754 bit-transform (Herf 2001) converts floats to sort-order-preserving unsigned integers. Our contribution is the Float32 safety guard that inhibits GPU dispatch when Float64-to-Float32 truncation would alter sort order, plus the adaptive multi-tier dispatch system.
GPU-Accelerated Relational Queries: Moving the Database to the Browser
Server round-trips add 50 to 300 ms per dashboard interaction. Our Adaptive WebGPU Data Query Engine compiles structured queries into execution plans where each operator is routed to one of three execution tiers (CPU main thread, Web Worker thread pool, or WebGPU compute pipeline) based on a 6-factor dispatch scoring function.
Handling SIMD Branch Divergence in Browser-Based Compute Shaders
GPU wavefronts serialize when threads diverge. We built a categorical inhibition system that detects divergence-prone workloads at dispatch time and unconditionally routes them to the CPU tier.
Why WebGPU is Replacing Web Workers for Enterprise Data Processing
When to replace Web Workers with WebGPU for enterprise data processing. The calibration tells you which. Built by a Newcastle AI team.
Want to discuss webgpu for your business?
Book a Discovery Call