The vendor deck says "20–30% HVAC energy savings." The FM director asks one follow-up question: "How did you measure that?" The conversation usually ends there.

AI-HVAC is the most-pitched category in commercial real estate right now. It's also the category with the weakest verification rigor. Most savings claims are benchmark ranges — "similar buildings saved X%" — not engineering-grade measurements against a baseline of what this specific building would have consumed without the optimization.

That gap matters more in 2026 than it did even last year. Enterprise AI procurement has tightened. LL97 fines start biting in NYC. Boards are asking CFOs to substantiate energy ROI the same way they substantiate revenue. Benchmark ranges don't survive that scrutiny.

This piece walks through the verification protocol we actually run on AI-HVAC deployments, how IPMVP Options C and D distinguish claims that will hold up in an audit from claims that won't, and what a simulation-graded savings number looks like compared to the range estimates most vendors publish.

Want the short version?

If a vendor can't tell you which IPMVP Option their savings number uses, what the baseline period was, and how they normalize for weather, the number isn't verified — it's marketing.

Ask our agent for an IPMVP baseline eligibility check on your specific building.

The Gap: Benchmark Ranges vs. Engineering-Grade Proof

Here's the distinction most operators miss when evaluating AI-HVAC pitches:

Benchmark range: "Buildings like yours typically see 18–25% HVAC energy savings with our platform." This is a statistical claim about a vendor's customer portfolio. It is not a claim about your building.

Engineering-grade proof: "This building consumed 1,842,000 kWh for HVAC over 12 months pre-intervention, weather-normalized to TMY3 data. Post-intervention, normalized consumption was 1,446,000 kWh. Net savings: 395,600 kWh (21.5%), verified under IPMVP Option C with 90% statistical confidence." This is a claim about your building, produced by an M&V protocol, and defensible in front of a utility auditor, a LL97 assessor, or a lender.

The difference isn't rhetorical. It's the difference between a number that can support a guaranteed savings contract and a number that can't.

How to Run an IPMVP Verification Protocol

The International Performance Measurement and Verification Protocol (IPMVP) is the standard reference for this work — used by ESPCs, DOE building programs, and virtually every serious energy services contract. It's not proprietary. Any vendor serious about verified savings can speak it fluently. If they can't, that's the first signal.

A full IPMVP-aligned AI-HVAC verification runs in four phases:

Phase 1 — Establish the Baseline (Months 0–12)

You need 12 months of weather-normalized HVAC consumption data before the AI intervention is deployed. Twelve months captures full seasonal variation and accounts for occupancy, schedule, and weather edge cases. The baseline is normalized against Typical Meteorological Year (TMY3) data — publicly available from NREL — so future comparisons remove weather as a confounding variable.

If you don't have 12 months of sub-metered HVAC data, you have two options: wait and instrument, or model. Both are legitimate. Modeling is what IPMVP Option D is for.

Phase 2 — Select Option C or Option D

IPMVP offers four measurement options. For AI-HVAC retrofits, Options C and D are the relevant ones.

Option C — Whole-Facility Measurement. Use utility meters. Works when the AI intervention affects a large enough share of total consumption that savings are distinguishable from normal variation. Typical fit: buildings where HVAC is 40%+ of total electrical load. Statistical confidence threshold: 90%, per IPMVP.

Option D — Calibrated Simulation. Use a building energy model (typically EnergyPlus) calibrated to actual pre-intervention consumption. The model simulates what the building would have consumed without the AI. Savings = simulated consumption − measured consumption. Fit: new construction, retrofits where baseline data is incomplete, portfolios where sub-metering is inconsistent, or cases where savings are small relative to total load.

Option D is where the category is moving, because it delivers what Option C can't: the counterfactual. Under Option C, if weather or occupancy changes, you spend months arguing about normalization factors. Under Option D, the model replays the post-period with pre-period control logic and you have the counterfactual directly.

Phase 3 — Measurement and Reporting Cadence

Monthly reporting at minimum. Quarterly for enterprise stakeholder reporting. Every report includes: measured consumption, weather-normalized consumption, baseline comparison, savings, confidence interval, and any adjustments.

The cadence matters because AI-HVAC optimization isn't a set-and-forget intervention. Setpoints drift. Occupancy patterns shift. The optimization logic learns and re-tunes. Monthly M&V tells you whether the savings curve is holding, flattening, or degrading.

Phase 4 — Weather and Occupancy Normalization

This is where most vendor "savings" claims fall apart under scrutiny. If you don't normalize for weather, a mild summer makes every AI-HVAC platform look like a genius. A cold winter makes every platform look like it broke the building.

Proper normalization uses: heating degree days (HDD), cooling degree days (CDD), humidity index, and occupancy hours against the baseline period's values. TMY3 data provides the 30-year typical reference. Some protocols also normalize against plug load and process load changes if the building's use mix has shifted materially.

Vendor Savings Claim Methodology — 5-Row Comparison

We ran the same methodology question against four leading AI-HVAC platform categories. The differences are material and should change how you read pilot proposals.

Platform / Category Savings Claim Basis IPMVP Option Weather Normalized Audit-Defensible
BrainBox AI Portfolio benchmark range (15–25%) Not specified publicly Partial (by case study) Case-by-case
Trane ARIA Trane-verified case studies Option C typical Yes, case studies Usually — tied to Trane service contract
Generic AI-HVAC SaaS Estimation from BMS data None No No
75F Pre/post utility comparison Option C (informal) Typically yes Mid-tier — depends on contract
AISB (EnergyPlus-MCP) Simulation-calibrated counterfactual Option D primary + Option C validation Yes, TMY3 Yes — engineering-grade

A note on the AISB row: EnergyPlus-MCP is our integration layer wrapping the LBNL-developed EnergyPlus simulation engine — the reference building energy model used by DOE-2, ASHRAE 90.1 compliance paths, and the majority of serious M&V work. The MCP wrapper makes the simulation agent-queryable in real time. An agent can ask "what would this building have consumed last month under the old control logic?" and get a calibrated answer.

What a Simulation-Graded Claim Actually Looks Like

Here's the difference between a benchmark range and a simulation-graded claim, using rounded numbers from a mid-market Class B office building (120,000 sq ft, Chicago, mixed-use HVAC retrofit):

Benchmark range (vendor deck):
"Expected HVAC energy savings: 18–25%."

Simulation-graded claim (Option D, 12-month post-period):

Notice what's different. The simulation-graded claim tells you the engine, the validation statistics, the counterfactual, and the confidence interval. None of that is optional for an enterprise buyer or a utility rebate program. All of it is standard for IPMVP-aligned work.

The vendor who can produce this report doesn't need to defend a benchmark range. They're showing their work.

What to Ask Your Vendor (Before Signing the Pilot)

If you're evaluating AI-HVAC vendors right now, these five questions separate the serious from the ornamental:

  1. Which IPMVP Option will you use for savings verification? If the answer is "we don't use IPMVP," the savings number is not audit-grade.
  2. What is the baseline period and how will you weather-normalize? Anything short of 12 months with TMY3 normalization is going to leave disputes on the table.
  3. Can you deliver an Option D simulation counterfactual, or only Option C utility-meter comparison? Option D is what enables portfolio-scale M&V where utility data is inconsistent.
  4. What CVRMSE and NMBE does your calibrated model hit? ASHRAE Guideline 14 thresholds are CVRMSE ≤ 15% and NMBE ≤ ±5% for monthly data. If they can't quote these, they're not running calibrated simulation.
  5. Who signs the M&V report? A CMVP (Certified Measurement & Verification Professional) signature carries weight in audits. A vendor-produced marketing PDF doesn't.

Why This Matters More in 2026

Three forces are making verification rigor a harder line item every quarter:

LL97 enforcement. NYC's penalty schedule starts compounding. Building owners need verified energy reductions in their compliance filings. "Vendor-estimated savings" doesn't hold up in that paperwork.

Tightening enterprise AI procurement. The pattern we're seeing across REITs and enterprise owners: CFOs and chief sustainability officers now require third-party verified ROI for AI pilots over $250K. That closes the door on benchmark-range claims.

The bifurcation of the building market. Class A trophy assets command 18–22% rent premiums in NYC (CoStar Q1 2026). That premium is earned with demonstrable operational performance. A building with a simulation-graded M&V report tells a different story to a lender than a building with "vendor-reported estimates."

The AI-HVAC category is maturing out of its pilot era. Verification rigor is becoming the default expectation. The vendors and operators who can produce engineering-grade proof will pick up the premium work. The ones who can't will stay stuck in pilot purgatory.

Run an IPMVP baseline check on your building.

Before you commit to a vendor, find out which IPMVP Option fits your data availability, whether your building qualifies for Option D simulation, and what the verification report would look like.

Ask the Agent →