The question owners are asking in 2026
Every AI-HVAC procurement conversation in 2026 reduces to a single deployment-topology question: where does the intelligence run? Cloud overlay, hardware-embedded, or edge-first — three deployment topologies that look interchangeable in a sales deck and look completely different on an enterprise procurement checklist.
The answer matters more than it used to. The EU AI Act Article 9 takes effect 2 August 2026 with hard data-residency and risk-management documentation obligations. APAC tenants increasingly require PDPA-aligned in-country processing. And tight building control loops want sub-50ms latency, which physics rules out for any cloud round-trip. The "where does the intelligence run" question is no longer a preference — it is a procurement-defensible constraint.
This piece is the architectural decomposition we use to scope every AI-HVAC pilot. We argue that one of the three topologies will dominate by 2027 — and it is not the one most owners are currently being sold.
Architecture 1 — Cloud overlay
The dominant topology today. A cloud platform ingests BMS time-series, runs ML models in a hyperscaler region, and pushes recommendations back into the BMS via REST or BACnet/IP. BrainBox AI (now owned by Trane), JCI-Nantum, and most early-stage AI-HVAC overlays sit here.
The architectural advantages are real: rapid model iteration, no per-site compute spend, easy multi-building rollup, and shared inference economies. The architectural costs are also real and getting more expensive: 200-800ms end-to-end inference latency that breaks tight control loops, hard data-residency frictions under EU AI Act and APAC PDPA regimes, ongoing per-site SaaS spend that compounds over a 10-year asset hold, and a fundamental dependency on hyperscaler uptime for a control function inside the building.
| Dimension | Cloud overlay |
|---|---|
| Inference latency | 200–800ms (round-trip) |
| Per-site compute spend | $0 capex, $400–1,800/mo opex |
| Data residency | Hard problem; vendor-side fix or denied tender |
| Control-loop fit | Recommend-only; not safe for closed-loop |
| Procurement velocity | Fast — no on-site install |
| Default failure mode | Hyperscaler outage = building reverts to BMS baseline |
Cloud overlay will not disappear. It is the right answer for portfolio-level analytics, benchmarking, and any recommendation that does not need to close a control loop in real time. But it is the wrong answer for the actuation layer, and the field is starting to notice.
Architecture 2 — Hardware-embedded
The "intelligence ships with the box" topology. PassiveLogic is the canonical example, with a quantum digital twin running on a dedicated controller inside the mechanical room. Honeywell Forge with Tridium Niagara sits in a related category — embedded compute, vendor-defined data model, deep integration with the OEM's actuator stack.
This topology solves the latency problem (sub-10ms control loops are feasible) and the data-residency problem (data never leaves the building unless the owner opts in). It also solves a subtler problem: the model knows the equipment, because the model was built by the people who built the equipment. The architectural cost is the trade you would expect — single-vendor lock-in, slow model iteration cycles tied to firmware releases, and capex per controller that ranges $8,000–35,000 depending on the building scope.
| Dimension | Hardware-embedded |
|---|---|
| Inference latency | 2–10ms (local) |
| Per-site compute spend | $8K–35K capex, minimal opex |
| Data residency | Solved — data stays on-site |
| Control-loop fit | Closed-loop capable |
| Procurement velocity | Slow — firmware + commissioning |
| Default failure mode | Vendor lock-in; OEM dependency for upgrades |
The hardware-embedded answer dominates the high-end vertical market — pharmaceutical, semiconductor fabs, mission-critical data center cooling. It is over-engineered for the typical commercial office, where the building owner does not want to refit hardware every five years and does not want to be locked into a single OEM's model-improvement cadence.
Architecture 3 — Edge-first
The newest topology and the one most owners are not being sold yet. A small inference appliance (Raspberry Pi 5-class, $80–150 per site) runs a quantized open-weight model (Gemma 3 4B at INT8 fits in 2.5GB) inside the building, talks to the BMS via local BACnet/IP, and only escalates to cloud-class reasoning when local confidence falls below a threshold. Microsoft Foundry Local and Google's edge Gemma release are the substrate that made this topology economical in late 2025; it was not feasible at this price point 18 months ago.
Edge-first inherits the latency win from hardware-embedded (sub-20ms is comfortable, sub-50ms is the worst case) and the data-residency win (raw data stays in-building, only escalated reasoning queries cross the boundary, and even those can be jurisdiction-tagged). It avoids the vendor lock-in of hardware-embedded, because the open-weight model can be swapped, retrained, or replaced without an OEM firmware release. And it avoids the round-trip latency and ongoing SaaS spend of cloud overlay.
| Dimension | Edge-first |
|---|---|
| Inference latency | 15–50ms (local) / 200–800ms (escalation, <30% of calls) |
| Per-site compute spend | $80–150 capex, $20–60/mo escalation cost |
| Data residency | Default-private; jurisdiction-tagged escalation |
| Control-loop fit | Closed-loop capable for local decisions; advisory on escalation |
| Procurement velocity | Fast — Pi-class appliance ships in days |
| Default failure mode | Appliance failure = degraded mode, BMS continues |
The edge-first topology is what BEAST OS deploys when client constraints favor data residency, latency, or cost discipline — which is most of the time outside of a portfolio rollup. It is also what makes the CRE-EN privacy broker design defensible: raw data never leaves the building unless the owner opts in, and the opt-in is per-query rather than per-system.
The architecture comparison table
| Cloud overlay | Hardware-embedded | Edge-first | |
|---|---|---|---|
| Latency | 200–800ms | 2–10ms | 15–50ms (escalation 200–800ms) |
| Per-site capex | $0 | $8K–35K | $80–150 |
| Ongoing opex | $400–1,800/mo | Minimal | $20–60/mo |
| EU AI Act Art.9 | Hard problem | Solved | Solved (default-private) |
| APAC PDPA | Hard problem | Solved | Solved |
| Closed-loop control | No (recommend-only) | Yes | Yes (local), No (escalation) |
| Vendor lock-in risk | Medium (SaaS contract) | High (OEM stack) | Low (open-weight swap) |
| Best for | Portfolio analytics | Vertical mission-critical | Default commercial AI-HVAC |
The data-collection vs decision-loop split
The deepest design point in the architecture choice is rarely surfaced in vendor pitches: data collection and decision execution are different problems and can live in different topologies. The strongest 2026 architectures collect data in a cloud-friendly schema (for portfolio rollup, benchmarking, and offline model training) while executing real-time decisions at the edge or on hardware. The weakest 2026 architectures collapse both into the cloud topology and inherit the worst of both worlds — slow control loops, hard data-residency problems, and a single point of failure.
Owners should ask vendors a single forensic question: where does each decision actually execute, and on what hardware? A vendor that cannot answer that question on the spot is selling a cloud-overlay product wrapped in edge-friendly marketing.
Which one survives 2027
We expect a market split that looks closer to a barbell than a winner-take-all. Cloud overlay survives for portfolio analytics and the largest REITs with sophisticated FP&A teams. Hardware-embedded survives for vertical mission-critical (pharma, fabs, data centers, hospitals). Edge-first becomes the default for the long-tail commercial market — office, retail, light industrial, multifamily, education — where data residency, capex discipline, and vendor optionality matter more than absolute peak performance.
Three forces drive that split: the EU AI Act enforcement window through August 2026 making cloud overlay a procurement liability for any EU-exposed portfolio; the open-weight model maturity curve making edge-first economically dominant for the long tail; and the cost-discipline reset every CFO is running on per-seat SaaS spend, which compresses cloud-overlay margins.
What this connects to
- Open Protocol Moat — why edge-first only works on top of multi-vendor BMS/IoT infrastructure, and what it costs to retrofit a single-OEM building toward optionality.
- IPMVP Verification — the measurement contract that proves any of the three architectures actually delivers verified savings rather than recommended savings.
- EU AI Act Readiness — the procurement-side checklist that turns "where does the intelligence run" into a tender-defensible answer.
If you are about to commit to a cloud-overlay AI-HVAC contract in the next 90 days, the conversation worth having first is whether the same outcomes are reachable with an edge-first deployment at one-tenth the per-site spend and a cleaner procurement posture. Bring the tender to /ask/ and we will walk it.