The latency budget for face-scan recognition
A guest walks into a hotel lobby. A camera captures their face. A vision model extracts an embedding vector. The CRM must resolve that vector against the guest database, retrieve the profile, determine VIP status, pull stay history, check loyalty tier, and surface personalized preferences to the front desk screen.
The guest is walking. They reach the desk in 3 to 5 seconds. The recognition result must appear before they arrive. The total latency budget from camera frame to on-screen profile: under 2 seconds.
The vision model (embedding extraction, nearest-neighbour lookup) consumes 800 ms to 1.2 seconds depending on hardware. That leaves 800 ms to 1.2 seconds for everything else: CRM query, profile assembly, UI render.
A standard SaaS CRM handles this with a server round-trip. The client sends the matched guest ID to the API. The server queries PostgreSQL or a similar RDBMS. The result traverses the network back to the client.
Best case: 150 ms. Enterprise environment with VPN, multi-region database, connection pooling overhead, and JSON serialization: 250 to 400 ms. Add a second query for stay history and a third for loyalty details, and you are at 450 ms to 1.2 seconds of network-bound latency. The budget is consumed. There is no room for UI animation, graceful loading states, or fallback retries.
The query itself is fast. PostgreSQL resolves a primary key lookup in under 1 ms. The server-side application logic takes 2 to 5 ms. The remaining 145 to 395 ms is network and serialization overhead. You are not waiting for compute. You are waiting for packets.
Moving the database to the browser
The guest database for a single hotel property is not large. A 500-room hotel with 10 years of guest history holds 200,000 to 500,000 unique guest profiles. Each profile has 15 to 25 fields: name, email, phone, loyalty tier, VIP status, preferences (room type, pillow type, dietary requirements, minibar preferences), lifetime spend, visit count, last stay date, notes.
In columnar format with dictionary encoding, this dataset occupies 20 to 40 MB. That loads in 1 to 3 seconds on a standard enterprise connection. After the initial load, every query runs locally. No network. No server. No serialization.
The question is whether the browser can query 500,000 profiles fast enough to fit within the latency budget.
With our Adaptive WebGPU Data Query Engine: yes. By a wide margin.
Why columnar storage matters
Standard CRM data models are row-oriented. Each guest record is a JavaScript object with named properties. Querying means iterating over an array of objects, accessing a property on each one.
// Row-oriented: 500,000 objects
const vips = guests.filter(g => g.vipStatus === 'VIP' && g.lastStay > cutoffDate);
This is slow for three reasons. First, each property access traverses V8's hidden class chain and pointer dereferences. Second, the objects are scattered across heap memory with poor cache locality. Third, the filter callback is invoked 500,000 times through the engine's function call machinery.
Columnar storage inverts the layout. Each field becomes a contiguous typed array:
// Columnar: one array per field
const vipStatusCol = new Uint32Array(500_000); // dictionary-encoded
const lastStayCol = new Float64Array(500_000); // epoch timestamps
const nameCol = new Uint32Array(500_000); // dictionary-encoded
const lifetimeSpendCol = new Float64Array(500_000);
A filter scan on vipStatusCol reads a contiguous block of memory. The CPU's prefetcher detects the sequential access pattern and loads cache lines ahead of the read pointer. For a 500,000-element Uint32Array (2 MB), the entire column fits in L3 cache after a single scan. Subsequent queries on the same column hit cache.
On the GPU, columnar storage enables coalesced reads. Adjacent threads in a workgroup read adjacent memory addresses, which the memory controller batches into a single bus transaction. Row-oriented data forces each thread to chase pointers to different heap locations. The reads are uncoalesced, wasting 75% to 90% of memory bandwidth.
Dictionary encoding for GPU string processing
Half the fields in a guest profile are strings: name, email, loyalty tier, VIP status, room preferences, dietary requirements. WebGPU compute shaders cannot process variable-length strings. WGSL has no string type. No strcmp. No Unicode handling.
Dictionary encoding solves this. During data ingestion, the engine builds a sorted dictionary of unique values for each string column and replaces every string with its integer index:
// VIP status column: 4 unique values
const vipDict = ["Bronze", "Gold", "Standard", "VIP"]; // sorted
// 500,000 rows encoded as integer indices
const vipStatusCol = new Uint32Array([
3, 2, 0, 3, 1, 2, 3, 0, ... // 3=VIP, 2=Standard, 0=Bronze, 1=Gold
]);
A filter WHERE vipStatus = 'VIP' becomes WHERE vipStatusCol[i] == 3. A single u32 comparison per row. The GPU evaluates this with one instruction per thread. No string allocation. No byte-by-byte matching. No variable-length handling.
Dictionary encoding statistics
For typical hospitality CRM data:
| Column | Unique values | Dictionary size | Encoded column (500K rows) |
|---|---|---|---|
| VIP status | 4 | 64 bytes | 2 MB (Uint32Array) |
| Loyalty tier | 6 | 96 bytes | 2 MB |
| Room preference | 12 | 240 bytes | 2 MB |
| Dietary requirements | 18 | 360 bytes | 2 MB |
| Country | 195 | ~4 KB | 2 MB |
| City | ~8,000 | ~160 KB | 2 MB |
Every string column, regardless of the original string lengths, encodes to a 2 MB Uint32Array for 500,000 rows. The total dictionary overhead for all categorical columns is under 200 KB. The GPU receives only the integer arrays. The dictionaries stay on the CPU for result display (mapping indices back to human-readable strings after the query completes).
Compound string predicates
Complex filters combine multiple dictionary-encoded columns:
WHERE vipStatus = 'VIP' AND loyaltyTier IN ('Gold', 'Platinum') AND country = 'UAE'
Each predicate resolves to integer comparisons at query compilation time. The compiler looks up 'VIP' in the vipStatus dictionary (index 3), 'Gold' and 'Platinum' in the loyalty dictionary (indices 1 and 4), and 'UAE' in the country dictionary (index 178). The GPU shader evaluates:
let match = (vip_col[idx] == 3u)
&& (loyalty_col[idx] == 1u || loyalty_col[idx] == 4u)
&& (country_col[idx] == 178u);
Three integer comparisons and two logical ORs. No string operations anywhere in the hot path.
The 6-factor scoring function
Not every operator in a CRM query belongs on the GPU. Our engine evaluates each operator independently using a 6-factor scoring function and routes it to the optimal tier.
Factor 1 (F1): Row count vs threshold
The number of rows entering the operator, evaluated against hardware-specific thresholds. F1 carries a weight of 4.0. For a discrete GPU, the threshold is 50,000 rows; for an integrated GPU, 100,000 rows. For the first operator in the pipeline, this is the full guest count (500,000). For downstream operators after a selective filter, it may be 5,000 or fewer.
A filter on 500,000 rows scores high for GPU dispatch. A sort on 50 filtered results scores low. The GPU's overhead (buffer allocation, shader dispatch) is not justified for tiny result sets.
Factor 2 (F2): Operator-specific SQL metric
F2 captures the workload characteristic specific to the SQL operator type. For filter operators, this is predicate selectivity: the fraction of rows that pass the filter. Estimated from per-column statistics maintained at ingestion: min, max, null count, and a 64-bucket histogram. For vipStatus = 'VIP' on a hotel with 8% VIP guests, estimated selectivity is 8%.
For GROUP BY operators, F2 is the Chao1 group cardinality estimate. Low group count (GROUP BY vipStatus: 4 groups) is GPU-friendly. High group count (GROUP BY guestId: 500,000 groups) is GPU-hostile. For dictionary-encoded columns, exact group cardinality is the dictionary size. For composite keys (GROUP BY country, loyaltyTier), the engine uses the Chao1 species richness estimator on a sampled cross-product.
For join operators, F2 is the join key overlap ratio: the fraction of keys in the smaller relation that have matches in the larger relation.
Factor 3 (F3): GPU class adjustment
Adjusts the score based on the detected GPU class. A front desk terminal with a discrete GPU receives a favourable adjustment. A tablet with an integrated GPU receives a penalty reflecting its lower memory bandwidth. The hardware capability detector determines the GPU class at initialisation.
Factor 4 (F4): Vendor tuning
Hardware vendor-specific tuning coefficients that account for differences in atomic throughput, shared memory size, and dispatch overhead across GPU vendors and generations.
Factor 5 (F5): GPU buffer retention bonus
When a preceding operator has already produced its output in a GPU buffer (via the GPUResidentDataset class), the next operator receives a bonus for keeping execution on the GPU. The pipeline executor performs multi-pass re-scoring (up to 3 iterations) to propagate buffer retention bonuses through the operator pipeline. The GPU buffer retention bonus feeds back into the dispatch scoring model to create cascading GPU segment formation across multi-operator plans.
Factor 6 (F6): Hardware-adaptive buffer threshold
The hardware-specific buffer size threshold derived from the hardware capability detector's runtime probing. This accounts for the device's actual GPU buffer limits and memory bandwidth, normalising the scoring function across hardware classes. The same query produces different routing decisions on a front desk workstation versus a concierge tablet.
Score computation and tier routing
The six factors combine into a dispatch score that routes each operator to one of three execution tiers. If the score is positive, the operator dispatches to the WebGPU compute pipeline. If the score is non-positive and the row count falls within a defined medium range (between 10,000 and 500,000 rows in our preferred configuration), the operator dispatches to the Web Worker thread pool. Otherwise, the operator executes on the CPU main thread. If branch divergence or Float32 ordering-preservation safety checks trigger categorical penalties, the score is overridden to negative infinity regardless of the other factors (using the same categorical inhibition principle covered by our GPU Inhibition patent). The pipeline executor provides transparent CPU fallback on GPU failure.
A real CRM query pipeline
A guest is identified by the face-scan system. The CRM receives the matched guest ID and must assemble a full profile. The query:
SELECT g.name, g.vipStatus, g.loyaltyTier, g.lifetimeSpend,
g.roomPreference, g.dietaryRequirements,
COUNT(s.stayId) as totalStays,
MAX(s.checkoutDate) as lastStay,
SUM(s.totalSpend) as recentSpend
FROM guests g
LEFT JOIN stays s ON g.guestId = s.guestId AND s.checkoutDate > '2024-01-01'
WHERE g.guestId = 12847
GROUP BY g.guestId
In a traditional CRM, this is a server round-trip. In our engine, the data is already in the browser. The query compiles to four operators:
| Operator | Input rows | Score | Routed to | Time |
|---|---|---|---|---|
| Filter (guestId = 12847) | 500,000 | 1.6 | GPU | 0.4 ms |
| Join (stays on guestId, date filter) | 1 guest x ~45 stays | 0.02 | CPU main thread | 0.01 ms |
| Aggregate (COUNT, MAX, SUM) | 45 rows | 0.001 | CPU main thread | < 0.01 ms |
| Projection (select columns) | 1 row | 0.001 | CPU main thread | < 0.01 ms |
Total query time: 0.4 ms. The filter runs on the GPU because it scans 500,000 rows (high cardinality). Every downstream operator runs on the main thread because the result set is tiny.
But single-guest lookup is the simple case. The powerful scenario is aggregate analytics.
Aggregate dashboard queries
The hotel operations manager opens a dashboard. They want to see VIP distribution by country for guests who stayed in the last 12 months. The query:
SELECT country, vipStatus, COUNT(*) as guestCount, AVG(lifetimeSpend) as avgSpend
FROM guests
WHERE lastStay > '2024-12-29'
GROUP BY country, vipStatus
ORDER BY guestCount DESC
| Operator | Input rows | Selectivity / Groups | Score | Routed to | Time |
|---|---|---|---|---|---|
| Filter (lastStay > cutoff) | 500,000 | ~40% selectivity | 1.8 | GPU | 1.1 ms |
| GroupBy (country x vipStatus) | ~200,000 | Chao1: ~780 groups | 1.4 | GPU | 1.9 ms |
| Sort (guestCount DESC) | 780 | n/a | 0.01 | CPU main thread | 0.1 ms |
Total: 3.1 ms. The filter and group-by both run on the GPU. The GPUResidentDataset keeps the intermediate buffer in GPU memory between them (no CPU round-trip), with the GPU buffer retention bonus (F5) ensuring the group-by operator's score reflects the data already being GPU-resident. The 780-row grouped result is read back to the CPU for a trivial sort.
The operations manager adjusts the date range. The query re-executes. 3.1 ms later, the chart updates. No loading spinner. No skeleton screen. No "Refreshing data..." toast.
On a server-round-trip architecture, the same interaction takes 150 to 400 ms. The manager notices. They wait. They click less. They explore less. The dashboard that was built to surface insights becomes a tool that punishes curiosity with latency.
The face-scan latency budget revisited
With the query engine running locally, here is the full pipeline from camera frame to on-screen profile:
| Stage | Time |
|---|---|
| Frame capture and preprocessing | 30 ms |
| Face embedding extraction (vision model) | 400 ms |
| Nearest-neighbour lookup (embedding index) | 50 ms |
| CRM profile query (our engine, local) | 0.4 ms |
| UI render (React, single component update) | 8 ms |
| Total | ~489 ms |
Under 500 ms. Under 200 ms for everything after the vision model. The guest is still 4 steps from the desk.
Replace the local CRM query with a server round-trip:
| Stage | Time |
|---|---|
| Frame capture and preprocessing | 30 ms |
| Face embedding extraction | 400 ms |
| Nearest-neighbour lookup | 50 ms |
| CRM API call (network + query + response) | 250 ms |
| UI render | 8 ms |
| Total | ~738 ms |
Still under 1 second in the best case. But add VPN overhead (common in hotel chains), database connection pool exhaustion during peak check-in hours, and a second query for loyalty details, and you are at 1.2 to 1.8 seconds. The budget is tight. There is no margin for retry on network failure.
With the local engine, the CRM query is 0.4 ms. There is room for three redundant queries, a full stay history lookup, and a loyalty calculation before you reach 10 ms. The network was the bottleneck. We removed the network.
Multi-property data architecture
Hotel chains operate across dozens or hundreds of properties. A single-property dataset (500,000 profiles) fits in the browser comfortably. A chain-wide dataset (5 million to 50 million profiles) does not.
Our architecture handles this with a tiered data strategy:
Local tier (browser). The current property's guest database. Full columnar dataset, dictionary-encoded, cached in browser memory. All queries run locally. This covers 95% of front desk interactions (guests who have stayed at this property before).
On-demand tier (server). For guests not found in the local dataset (first-time visitors to this property who have stayed at other properties in the chain), the engine falls back to a server query. The result is cached locally for the duration of the session.
Sync tier (background). Overnight, the local dataset is refreshed with updated chain-wide data for guests likely to visit (based on reservations, loyalty programme activity, and seasonal patterns). This pre-populates the local cache with profiles that will be needed during the next day's operations.
The engine abstracts the tier boundary. The application submits a query to the engine. If the data is local, the query runs in 0.4 ms. If the data requires a server fetch, the engine handles the round-trip transparently. The application code does not branch on data location.
Why this is faster than any SaaS CRM
SaaS CRMs are architecturally constrained by their deployment model. The data lives in a multi-tenant database in a data centre. Every query crosses the network. Every interaction pays the latency tax.
Even "fast" SaaS CRMs with edge caching and CDN-proxied APIs cannot eliminate the fundamental round-trip. A cached API response is still a network request. HTTP/2 multiplexing reduces connection overhead but not propagation delay. GraphQL reduces payload size but not latency.
Our engine eliminates the round-trip for the 95% of queries that can be served from local data. The remaining 5% fall back to a server call. The average query latency across all interactions: under 5 ms. The p99 (server fallback for unknown guests): 250 ms.
No SaaS CRM built on a server-query architecture can match sub-5 ms average query latency. The physics of network propagation prevents it. Moving the data to the client and querying it on the GPU is not an optimization of the existing model. It is a different model.
This is the architecture behind our enterprise AI automation infrastructure applied to hospitality. Probe the hardware. Load the data locally. Query it at hardware speed. Reserve the network for what the network is actually needed for: synchronization and data that does not fit locally. The result is a CRM that responds before the guest reaches the desk.