Six layers, one runtime. Local-first. Sensor-aware. Patent-pending innovations at the perception, contract, and action layers.
Each layer is independently extensible. Each layer publishes typed contracts to the next.
Camera streams · ISP · raw inference frontend — vaos.sensors
Sensor reliability + Scene Contract — pixels become structured scene — vaos.scene
Reasoning over scene + memory + site context — vaos.cognition
Action gating against scene confidence + sensor reliability — vaos.validate
Emit to robot, MQTT, REST, dashboard, log — vaos.actions
Outcomes captured into persistent local memory · federated — vaos.memory
# Per frame, ~30 FPS, <50ms total latency [observe] raw frame + ISP metadata + sensor reliability ↓ [structure] detection + Scene Contract { objects, event, metrics } ↓ [think] reason(scene, memory, site_context) → candidate_action ↓ [validate] gate(action, scene.confidence, sensor.reliability) ↓ [act] emit → robot / API / MQTT / dashboard ↓ [learn] memory.append(outcome) → improves future thinking
Continuous evaluation of sensor reliability — calibration, lighting, occlusion, thermal — fed directly into inference confidence.
Validation envelope around structured perception output. Open schema, defended boundary against malformed or untrusted scene data.
Closed-loop runtime that gates every action against current scene confidence and sensor reliability before execution.
Vision-language models alone don't move robots. Wiring perception to validated action requires explicit boundaries, typed contracts, and a memory of state.
VAOS makes those boundaries first-class: every layer publishes a typed contract, every contract can be unit-tested, every action is auditable.
| Concern | Step |
|---|---|
| Bad sensor input | Observe |
| Untrusted perception | Structure |
| Reasoning & context | Think |
| Confidence gating | Validate |
| Outbound action | Act |
| State / outcome capture | Learn |
Run the quickstart, fork a plugin, or read the Scene Contract spec.