92% of enterprise CRE teams have piloted AI in the last 18 months. About 5% have shipped it into production. The gap is not a model problem. It is a methodology problem — and it has a name.

That data point comes from Patrick Slumbers' April 2026 deck on agentic AI in commercial real estate, anchored to NAIOP and Visitt enterprise survey work. It lines up with Goldman Sachs reporting agentic AI driving 20–35% productivity gains in pilots that actually deploy. Two numbers. Same dataset. Different sides of one decision.

Most CRE teams sit on the wrong side of it.

What "pilot to production" actually means

"Pilot" is cheap. You spin up a vendor's demo on three buildings, watch a dashboard for a quarter, and write a memo. "Production" is expensive — not in dollars, but in operational discipline. Production means the AI's recommendations are wired into work orders, into maintenance schedules, into capital plans. It means an FM director will defend the savings number to a board, and the M&V protocol behind it has to hold up.

Most CRE pilots fail the second test. They generate signal. Nobody owns the response loop.

The teams that ship are the ones who decide, before the pilot starts, what an "in-production" success state looks like — and budget the methodology to verify it.

The five things that separate the 5% from the 92%

Decision dimension Pilot-only teams (the 92%) Deployed teams (the 5%)
Baseline rigor Vendor-supplied baseline; "trust me" IPMVP-anchored baseline (Option B, C, or D); CV(RMSE) disclosed before the pilot starts
Decision authority AI surfaces signal; humans review every action AI is bounded to recommend or auto-execute on a defined safe envelope (e.g., setpoint adjustments inside ASHRAE 55 comfort band)
Integration depth Read-only overlay on BMS / dashboard tab Wired into CMMS work-order flow + capex planning + tenant ticket lifecycle
Verification cadence Annual report from vendor Monthly IPMVP M&V refresh; quarterly board-grade savings statement
Exit criteria "It seems to be working" Pre-defined thresholds for kill, scale, or recompete; documented at pilot start

Read that table once and the pattern is obvious. The 5% are not running smarter AI. They are running the same AI inside a tighter operational frame.

Why the methodology gap is widening — not closing

Three forces are working against the average CRE team right now.

First, vendor proliferation. The PropTech VC market hit $16.7B in 2025 (+67.9% YoY, AI-only). Every vendor is spinning up an "AI building" pitch. Most of them ship a dashboard plus an LLM, not a closed-loop methodology. The pilot market is loud. The verification market is quiet.

Second, the "expert in the loop" gap. ICSC Exchange 2026 reporting puts the industry-standardized OpEx win at 20–30% via AI-driven optimization. PassiveLogic claims ≥30% via physics-anchored digital twins. BrainBox / BuildingIQ / BGRID benchmarks land at 20–40%. Zero of these vendors are actively marketing a verification methodology. The 6–18 month window before the language consolidates is wide open.

Third, the deployment loop is owned by no one. CIOs see software. CFOs see capex. FMs see operations. The AI building deployment cuts across all three, and most enterprises lack the playbook to assign it cleanly. Pilots stall in committee. Production never starts.

What "pilot to production" looks like in practice

The deployment-grade methodology has four invariants. None of them are about the model. All of them are about the operational frame the model lives inside.

  1. Pre-pilot baseline contract. Before the AI runs, the M&V protocol is signed: which IPMVP option, which CV(RMSE) threshold, which sub-meter coverage, which weather normalization. If the vendor will not commit to it on paper, do not start. (See our IPMVP verification framework for the full template.)
  2. Decision-authority envelope. Define exactly what the AI is allowed to do unsupervised. "Adjust supply-air setpoints inside ASHRAE 55 comfort band" is a clean envelope. "Recommend HVAC changes" is not — it shifts the cost of action onto a human who will not have time. The envelope is the difference between a tool and a teammate.
  3. Open-protocol substrate. The pilot lives on a BMS / sensor stack the team controls. If the AI vendor's "differentiation" is locked into their proprietary protocol, the moat is on the wrong side of the table. (We covered this last week in why open-protocol BMS is now table stakes.)
  4. Production exit criteria. Three numbers, set before the pilot starts: scale threshold (e.g., "≥15% verified savings sustained for 6 months → expand to 12 buildings"), kill threshold ("if CV(RMSE) drifts above 15% for two months → suspend"), recompete threshold ("if a competing vendor lands ≥5pp better verified savings on a comparable site → rebid"). Without the three numbers, the pilot will live forever in “promising” status.

What this means for buyers right now

If you are evaluating an AI building vendor in Q2 2026, the credibility filter is no longer the model. It is whether the vendor can show you a deployment-grade methodology before they show you the AI. The 92% / 5% gap is the signal. Vendors who cannot produce a written M&V contract, an envelope of decision authority, and three exit numbers are selling a pilot, not a deployment.

That is most of the market.

The teams that ship are picking the methodology first and the model second. They are also paying for the verification work — because the alternative is paying for it later, in a board meeting, with a number they cannot defend.

How to use the agent

If you are mid-pilot and unsure whether your protocol is deployment-grade, describe it to the agent and ask: "Does this M&V plan satisfy IPMVP Option C requirements for an AI-HVAC pilot? What is missing?" The answer will reference the IPMVP Core Concepts, the gaps in your sub-meter coverage, and what to fix before the pilot generates a savings claim you have to defend.

Three places to start — depending on where you are in the cycle:

Try the agent

Describe a building, a pilot, or a vendor. Get a verified, IPMVP-anchored answer in under 60 seconds.

Ask the agent →

Or see our Enterprise tier for portfolio-scale deployment with embedded M&V verification.

Sources

This is intelligence, not a sales pitch. If your vendor cannot produce a written M&V contract, an envelope of decision authority, and three exit numbers — you have a pilot, not a deployment. The 5% know this. The 92% are still finding out.