Why 92% of CRE Teams Pilot AI and Only 5% Ship It

92% of enterprise CRE teams have piloted AI in the last 18 months. About 5% have shipped it into production. The gap is not a model problem. It is a methodology problem — and it has a name.

That data point comes from Patrick Slumbers' April 2026 deck on agentic AI in commercial real estate, anchored to NAIOP and Visitt enterprise survey work. It lines up with Goldman Sachs reporting agentic AI driving 20–35% productivity gains in pilots that actually deploy. Two numbers. Same dataset. Different sides of one decision.

Most CRE teams sit on the wrong side of it.

What "pilot to production" actually means

"Pilot" is cheap. You spin up a vendor's demo on three buildings, watch a dashboard for a quarter, and write a memo. "Production" is expensive — not in dollars, but in operational discipline. Production means the AI's recommendations are wired into work orders, into maintenance schedules, into capital plans. It means an FM director will defend the savings number to a board, and the M&V protocol behind it has to hold up.

Most CRE pilots fail the second test. They generate signal. Nobody owns the response loop.

The teams that ship are the ones who decide, before the pilot starts, what an "in-production" success state looks like — and budget the methodology to verify it.

The five things that separate the 5% from the 92%

Decision dimension	Pilot-only teams (the 92%)	Deployed teams (the 5%)
Baseline rigor	Vendor-supplied baseline; "trust me"	IPMVP-anchored baseline (Option B, C, or D); CV(RMSE) disclosed before the pilot starts
Decision authority	AI surfaces signal; humans review every action	AI is bounded to recommend or auto-execute on a defined safe envelope (e.g., setpoint adjustments inside ASHRAE 55 comfort band)
Integration depth	Read-only overlay on BMS / dashboard tab	Wired into CMMS work-order flow + capex planning + tenant ticket lifecycle
Verification cadence	Annual report from vendor	Monthly IPMVP M&V refresh; quarterly board-grade savings statement
Exit criteria	"It seems to be working"	Pre-defined thresholds for kill, scale, or recompete; documented at pilot start

Read that table once and the pattern is obvious. The 5% are not running smarter AI. They are running the same AI inside a tighter operational frame.

Why the methodology gap is widening — not closing

Three forces are working against the average CRE team right now.

First, vendor proliferation. The PropTech VC market hit $16.7B in 2025 (+67.9% YoY, AI-only). Every vendor is spinning up an "AI building" pitch. Most of them ship a dashboard plus an LLM, not a closed-loop methodology. The pilot market is loud. The verification market is quiet.

Second, the "expert in the loop" gap. ICSC Exchange 2026 reporting puts the industry-standardized OpEx win at 20–30% via AI-driven optimization. PassiveLogic claims ≥30% via physics-anchored digital twins. BrainBox / BuildingIQ / BGRID benchmarks land at 20–40%. Zero of these vendors are actively marketing a verification methodology. The 6–18 month window before the language consolidates is wide open.

Third, the deployment loop is owned by no one. CIOs see software. CFOs see capex. FMs see operations. The AI building deployment cuts across all three, and most enterprises lack the playbook to assign it cleanly. Pilots stall in committee. Production never starts.

What "pilot to production" looks like in practice

The deployment-grade methodology has four invariants. None of them are about the model. All of them are about the operational frame the model lives inside.

Pre-pilot baseline contract. Before the AI runs, the M&V protocol is signed: which IPMVP option, which CV(RMSE) threshold, which sub-meter coverage, which weather normalization. If the vendor will not commit to it on paper, do not start. (See our IPMVP verification framework for the full template.)
Decision-authority envelope. Define exactly what the AI is allowed to do unsupervised. "Adjust supply-air setpoints inside ASHRAE 55 comfort band" is a clean envelope. "Recommend HVAC changes" is not — it shifts the cost of action onto a human who will not have time. The envelope is the difference between a tool and a teammate.
Open-protocol substrate. The pilot lives on a BMS / sensor stack the team controls. If the AI vendor's "differentiation" is locked into their proprietary protocol, the moat is on the wrong side of the table. (We covered this last week in why open-protocol BMS is now table stakes.)
Production exit criteria. Three numbers, set before the pilot starts: scale threshold (e.g., "≥15% verified savings sustained for 6 months → expand to 12 buildings"), kill threshold ("if CV(RMSE) drifts above 15% for two months → suspend"), recompete threshold ("if a competing vendor lands ≥5pp better verified savings on a comparable site → rebid"). Without the three numbers, the pilot will live forever in “promising” status.

What this means for buyers right now

If you are evaluating an AI building vendor in Q2 2026, the credibility filter is no longer the model. It is whether the vendor can show you a deployment-grade methodology before they show you the AI. The 92% / 5% gap is the signal. Vendors who cannot produce a written M&V contract, an envelope of decision authority, and three exit numbers are selling a pilot, not a deployment.

That is most of the market.

The teams that ship are picking the methodology first and the model second. They are also paying for the verification work — because the alternative is paying for it later, in a board meeting, with a number they cannot defend.

How to use the agent

If you are mid-pilot and unsure whether your protocol is deployment-grade, describe it to the agent and ask: "Does this M&V plan satisfy IPMVP Option C requirements for an AI-HVAC pilot? What is missing?" The answer will reference the IPMVP Core Concepts, the gaps in your sub-meter coverage, and what to fix before the pilot generates a savings claim you have to defend.

Three places to start — depending on where you are in the cycle:

Pre-pilot vendor evaluation → Ask the agent to compare two vendor proposals against the four invariants above.
Mid-pilot M&V health check → Ask the agent to grade your current baseline + decision envelope.
Production readiness → Ask the agent to draft your three exit-criteria numbers based on the building's load profile.

Try the agent

Describe a building, a pilot, or a vendor. Get a verified, IPMVP-anchored answer in under 60 seconds.

Ask the agent →

Or see our Enterprise tier for portfolio-scale deployment with embedded M&V verification.

Sources

NAIOP / Visitt enterprise CRE survey, 2026 (92% pilot · ~5% deploy data point cited by Patrick Slumbers, Apr 2026)
Goldman Sachs research, Q1 2026 (20–35% productivity gain in deployed agentic AI)
ICSC Exchange 2026 reporting (industry-standardized 20–30% OpEx win)
PropTech VC tracker (Mix Daily, 2026-04-27): $16.7B 2025 PropTech VC, +67.9% YoY, AI-only
IPMVP Core Concepts (2022 edition), EVO — canonical M&V protocol reference
ASHRAE 55-2020 (Thermal Environmental Conditions for Human Occupancy) — comfort-band envelope for decision authority

This is intelligence, not a sales pitch. If your vendor cannot produce a written M&V contract, an envelope of decision authority, and three exit numbers — you have a pilot, not a deployment. The 5% know this. The 92% are still finding out.