Between Scene Contract and Validation sits cognition. A plugin orchestrator that runs rules, models, and policies against the structured scene, against memory, against site context — and emits candidate actions with scored confidence.
Reasoning is a layer, not a model. SceneLM is one engine that runs inside it. So are rules, local LLMs, hybrid policies, and cloud adapters. The runtime orchestrates them — and hands the result to the validation envelope before any action leaves the system.
Structured perception output — objects, regions, events, sensor reliability.
Persistent local history — prior scenes, prior decisions, prior outcomes.
Configured rules — restricted zones, allowed actions, escalation paths.
Trends, persistence, cooldowns, expectation models, time-of-day priors.
Whatever upstream extensions added — fused scores, custom detectors, derived intents.
Per-stream confidence, surfaced into reasoning as evidence weights.
Plugin orchestration runs each cognition engine, scores its candidates, then hands the top-N to validation.
Mix and match. The orchestrator routes by capability, latency budget, and policy.
| Engine | Role | Typical latency |
|---|---|---|
rules | Deterministic safety, escalation policies | < 1 ms |
scenelm | Edge-distilled contextual reasoning | 20–80 ms |
ollama | Local LLM with structured output | 200 ms – 2 s |
vllm | High-throughput batched inference | 50–300 ms |
cloud | Optional cloud adapter (OpenAI, Claude, custom) | 500 ms – 3 s |
memory | Temporal influence — priors, expectations | < 5 ms |
hybrid | Composed orchestration (rules + model + memory) | varies |
[scene] event: restricted_zone_entry region: loading_bay_03 objects: person × 1 reliability: 0.92 [reasoning] plugin: scenelm confidence: 0.87 memory: prior_entries=2 in last_60s intent: "notify security; log incident" [validation] C_v: 0.84 threshold: 0.70 status: PASSED [action] emit: notify(target="safety-team", level="medium") log: scene + reasoning + envelope archived
Detect direction of change — crowd density rising, queue stalling, anomaly accumulating.
Don't fire on a single frame. Require the same evidence over a window before escalating.
Compare observed reality to historical baseline — when the world deviates, mark it.
Stepped responses — log → notify → alert → page. Each step gated by a separate threshold.
Suppress repeated alerts. The runtime remembers the last action and waits a tunable window.
Future SceneLM variants will inject expected next-state, so the runtime catches surprises faster.
Most AI vision systems collapse reasoning and execution. VAOS keeps them separate so the validation envelope can sit between them.
Swap engines without rewriting the runtime. Compose hybrids. Trace every decision.