Cloud, Hardware, or Edge — Three AI-HVAC Architectures and the One That Will Survive 2027

The question owners are asking in 2026

Every AI-HVAC procurement conversation in 2026 reduces to a single deployment-topology question: where does the intelligence run? Cloud overlay, hardware-embedded, or edge-first — three deployment topologies that look interchangeable in a sales deck and look completely different on an enterprise procurement checklist.

The answer matters more than it used to. The EU AI Act Article 9 takes effect 2 August 2026 with hard data-residency and risk-management documentation obligations. APAC tenants increasingly require PDPA-aligned in-country processing. And tight building control loops want sub-50ms latency, which physics rules out for any cloud round-trip. The "where does the intelligence run" question is no longer a preference — it is a procurement-defensible constraint.

This piece is the architectural decomposition we use to scope every AI-HVAC pilot. We argue that one of the three topologies will dominate by 2027 — and it is not the one most owners are currently being sold.

Architecture 1 — Cloud overlay

The dominant topology today. A cloud platform ingests BMS time-series, runs ML models in a hyperscaler region, and pushes recommendations back into the BMS via REST or BACnet/IP. BrainBox AI (now owned by Trane), JCI-Nantum, and most early-stage AI-HVAC overlays sit here.

The architectural advantages are real: rapid model iteration, no per-site compute spend, easy multi-building rollup, and shared inference economies. The architectural costs are also real and getting more expensive: 200-800ms end-to-end inference latency that breaks tight control loops, hard data-residency frictions under EU AI Act and APAC PDPA regimes, ongoing per-site SaaS spend that compounds over a 10-year asset hold, and a fundamental dependency on hyperscaler uptime for a control function inside the building.

Dimension	Cloud overlay
Inference latency	200–800ms (round-trip)
Per-site compute spend	$0 capex, $400–1,800/mo opex
Data residency	Hard problem; vendor-side fix or denied tender
Control-loop fit	Recommend-only; not safe for closed-loop
Procurement velocity	Fast — no on-site install
Default failure mode	Hyperscaler outage = building reverts to BMS baseline

Cloud overlay will not disappear. It is the right answer for portfolio-level analytics, benchmarking, and any recommendation that does not need to close a control loop in real time. But it is the wrong answer for the actuation layer, and the field is starting to notice.

Architecture 2 — Hardware-embedded

The "intelligence ships with the box" topology. PassiveLogic is the canonical example, with a quantum digital twin running on a dedicated controller inside the mechanical room. Honeywell Forge with Tridium Niagara sits in a related category — embedded compute, vendor-defined data model, deep integration with the OEM's actuator stack.

This topology solves the latency problem (sub-10ms control loops are feasible) and the data-residency problem (data never leaves the building unless the owner opts in). It also solves a subtler problem: the model knows the equipment, because the model was built by the people who built the equipment. The architectural cost is the trade you would expect — single-vendor lock-in, slow model iteration cycles tied to firmware releases, and capex per controller that ranges $8,000–35,000 depending on the building scope.

Dimension	Hardware-embedded
Inference latency	2–10ms (local)
Per-site compute spend	$8K–35K capex, minimal opex
Data residency	Solved — data stays on-site
Control-loop fit	Closed-loop capable
Procurement velocity	Slow — firmware + commissioning
Default failure mode	Vendor lock-in; OEM dependency for upgrades

The hardware-embedded answer dominates the high-end vertical market — pharmaceutical, semiconductor fabs, mission-critical data center cooling. It is over-engineered for the typical commercial office, where the building owner does not want to refit hardware every five years and does not want to be locked into a single OEM's model-improvement cadence.

Architecture 3 — Edge-first

The newest topology and the one most owners are not being sold yet. A small inference appliance (Raspberry Pi 5-class, $80–150 per site) runs a quantized open-weight model (Gemma 3 4B at INT8 fits in 2.5GB) inside the building, talks to the BMS via local BACnet/IP, and only escalates to cloud-class reasoning when local confidence falls below a threshold. Microsoft Foundry Local and Google's edge Gemma release are the substrate that made this topology economical in late 2025; it was not feasible at this price point 18 months ago.

Edge-first inherits the latency win from hardware-embedded (sub-20ms is comfortable, sub-50ms is the worst case) and the data-residency win (raw data stays in-building, only escalated reasoning queries cross the boundary, and even those can be jurisdiction-tagged). It avoids the vendor lock-in of hardware-embedded, because the open-weight model can be swapped, retrained, or replaced without an OEM firmware release. And it avoids the round-trip latency and ongoing SaaS spend of cloud overlay.

Dimension	Edge-first
Inference latency	15–50ms (local) / 200–800ms (escalation, <30% of calls)
Per-site compute spend	$80–150 capex, $20–60/mo escalation cost
Data residency	Default-private; jurisdiction-tagged escalation
Control-loop fit	Closed-loop capable for local decisions; advisory on escalation
Procurement velocity	Fast — Pi-class appliance ships in days
Default failure mode	Appliance failure = degraded mode, BMS continues

The edge-first topology is what BEAST OS deploys when client constraints favor data residency, latency, or cost discipline — which is most of the time outside of a portfolio rollup. It is also what makes the CRE-EN privacy broker design defensible: raw data never leaves the building unless the owner opts in, and the opt-in is per-query rather than per-system.

The architecture comparison table

	Cloud overlay	Hardware-embedded	Edge-first
Latency	200–800ms	2–10ms	15–50ms (escalation 200–800ms)
Per-site capex	$0	$8K–35K	$80–150
Ongoing opex	$400–1,800/mo	Minimal	$20–60/mo
EU AI Act Art.9	Hard problem	Solved	Solved (default-private)
APAC PDPA	Hard problem	Solved	Solved
Closed-loop control	No (recommend-only)	Yes	Yes (local), No (escalation)
Vendor lock-in risk	Medium (SaaS contract)	High (OEM stack)	Low (open-weight swap)
Best for	Portfolio analytics	Vertical mission-critical	Default commercial AI-HVAC

The data-collection vs decision-loop split

The deepest design point in the architecture choice is rarely surfaced in vendor pitches: data collection and decision execution are different problems and can live in different topologies. The strongest 2026 architectures collect data in a cloud-friendly schema (for portfolio rollup, benchmarking, and offline model training) while executing real-time decisions at the edge or on hardware. The weakest 2026 architectures collapse both into the cloud topology and inherit the worst of both worlds — slow control loops, hard data-residency problems, and a single point of failure.

Owners should ask vendors a single forensic question: where does each decision actually execute, and on what hardware? A vendor that cannot answer that question on the spot is selling a cloud-overlay product wrapped in edge-friendly marketing.

Which one survives 2027

We expect a market split that looks closer to a barbell than a winner-take-all. Cloud overlay survives for portfolio analytics and the largest REITs with sophisticated FP&A teams. Hardware-embedded survives for vertical mission-critical (pharma, fabs, data centers, hospitals). Edge-first becomes the default for the long-tail commercial market — office, retail, light industrial, multifamily, education — where data residency, capex discipline, and vendor optionality matter more than absolute peak performance.

Three forces drive that split: the EU AI Act enforcement window through August 2026 making cloud overlay a procurement liability for any EU-exposed portfolio; the open-weight model maturity curve making edge-first economically dominant for the long tail; and the cost-discipline reset every CFO is running on per-seat SaaS spend, which compresses cloud-overlay margins.

What this connects to

Open Protocol Moat — why edge-first only works on top of multi-vendor BMS/IoT infrastructure, and what it costs to retrofit a single-OEM building toward optionality.
IPMVP Verification — the measurement contract that proves any of the three architectures actually delivers verified savings rather than recommended savings.
EU AI Act Readiness — the procurement-side checklist that turns "where does the intelligence run" into a tender-defensible answer.

If you are about to commit to a cloud-overlay AI-HVAC contract in the next 90 days, the conversation worth having first is whether the same outcomes are reachable with an edge-first deployment at one-tenth the per-site spend and a cleaner procurement posture. Bring the tender to /ask/ and we will walk it.