For two decades, the cost comparison between on-premises infrastructure, colocation, and public cloud followed a tidy narrative arc. CapEx-heavy on-prem deployments lost ground to OpEx-friendly cloud subscriptions, colocation occupied a pragmatic middle, and the trajectory of enterprise IT pointed in one direction. The arrival of sustained, high-throughput AI workloads has scrambled that story badly enough that the same companies that spent the past decade migrating out of their own data halls are now signing colo contracts and writing eight-figure checks for GPU clusters.
The fundamentals haven’t been repealed. They’ve been reweighted.
The Steady-State Workload Still Belongs Off Cloud
For predictable, always-on enterprise workloads, on-prem and colocation continue to win on five-year TCO, and the margin isn’t subtle. A widely cited five-year model from Italian provider Criticalcase pegged cloud as roughly 20% cheaper than on-prem over three years for variable workloads, but more recent analyses for steady-state mid-market profiles point the other way. TerraZone’s five-year scenario for a representative workload (200 vCPUs, 200TB of storage, 20TB of monthly egress) put cloud at nearly $171,000 per year against an on-prem figure roughly half that, with hardware depreciation modeled at $28,000 annually on a $140,000 capital base.
Egress remains the silent killer in those models. Cross-zone traffic, inter-region replication, and outbound data fees compound in ways that rarely surface in initial migration business cases. Singapore-based Accrets noted in a March analysis that cloud wins TCO for most mid-market organizations running 50 to 150 users, “but not by the margin the marketing materials suggest,” and that the staffing line is the variable most often quietly omitted from on-prem cost models.
That staffing point cuts both ways. The OpEx of running infrastructure includes engineers who know how to run infrastructure, and a generation of cloud-first teams hired into AWS-native environments aren’t a free conversion to bare-metal operations. Repatriation projects routinely run dual environments for months. Tooling, monitoring, and observability stacks need to span both. The Barclays Q4 2024 CIO survey found 86% of CIOs planned to move some workloads back from public cloud, the highest figure ever recorded, but the same surveys consistently show repatriation actually executed at around 21% of workloads, per Flexera’s 2025 State of the Cloud Report. Intent and execution are not the same line item.
AI Has Broken the Curve
The serious shift in 2026 is at the GPU end of the stack, and the math is brutal enough that it deserves the specifics.
An AWS p5.48xlarge instance (8x H100) runs on the order of $22,600 per month on-demand, or roughly $270,000 annually for sustained use. Lenovo’s 2026 TCO whitepaper, which compared ThinkSystem configurations with NVIDIA Hopper and Blackwell GPUs against equivalent hyperscale instances, found a breakeven point under four months for high-utilization workloads and argued for an 18x cost advantage per million tokens versus model-as-a-service APIs over a five-year lifecycle. The Lenovo numbers come from a vendor with an obvious axe to grind, but the directional finding lines up with independent analyses: Deloitte research cited in a Swfte report concluded organizations can hit 60% to 70% of cloud costs running on-prem at scale, with H100 prices settling in the $25,000 to $30,000 range and next-generation B100 and H200 silicon coming in around $40,000 to $50,000 per unit.
The corollary, less convenient for the on-prem case, is that cloud AI API pricing has fallen faster than hardware. Competitive pressure, distillation, speculative decoding, and quantization improvements are dropping inference costs 20% to 30% annually. That raises the utilization threshold required to justify owning silicon, particularly for workloads that aren’t pinned at 80%+ duty cycles. The healthcare case Swfte profiled, a $2.1 million on-prem deployment sitting at 30% utilization with two FTEs babysitting it, is a recognizable cautionary tale rather than an outlier. The company eventually pivoted to hybrid and cut total AI infrastructure costs by 45%.
Colocation Is Winning the Middle
Colocation is the placement that 2026 keeps surfacing for AI workloads, and the reasons are operational as much as financial. High-density power, liquid cooling, and the interconnect proximity required for multi-rack GPU clusters are not realistic line items for most enterprise facilities, which were never designed around 50kW to 100kW cabinets. Colo providers, particularly those that retrofitted to support direct liquid cooling, are absorbing capacity that would otherwise have flowed to hyperscalers.
The price points, per the Lenovo modeling, run roughly $1,500 per month for a high-density rack and $600 per month for standard-density, with cooling power around $0.18 per kWh for air-cooled and $0.09 per kWh for liquid-cooled environments at $0.12 commercial electricity rates. Those numbers vary widely by market (Northern Virginia, Phoenix, and Atlanta all price differently), but the pattern holds: colocation lets enterprises capture most of the unit-economics advantage of owning hardware without the capital project of building a facility, and without the lead times currently quoted for hyperscale GPU capacity, which have stretched into multi-year territory in tight markets.
Capacity, in fact, is now a TCO input in its own right. When a high-power cage takes 18 to 36 months to provision and a GPU shipment slips two quarters, the cost of waiting becomes a real number on the spreadsheet.
What to Actually Compare
The honest TCO comparison in 2026 isn’t on-prem versus colo versus cloud as a categorical question. It’s a workload-by-workload allocation across a hybrid estate, modeled over a 12-to-36 month horizon with the line items that first-pass spreadsheets reliably miss. Those line items: egress and cross-zone traffic, reserved-instance commitments versus on-demand exposure, parallel-run costs during migration, staffing for both the source and destination environments, power and cooling escalators, refresh cycles (sooner for GPUs than for general-purpose compute), data sovereignty and compliance overhead, and the option value of being able to move again later.
The workloads that belong in public cloud haven’t changed much: elastic, bursty, geographically distributed, and consumption-shaped. The workloads that belong on-prem or in colo have grown, particularly anything involving sustained AI inference, regulated data under DORA or sector-specific U.S. frameworks, and steady-state enterprise applications whose growth curves are predictable enough to capitalize. Gartner’s projection that 40% of enterprises will run hybrid compute architectures for mission-critical workflows by year-end 2026, up from 8% in prior years, is less a prediction than a description of what’s already underway in the buying patterns colos are seeing.
The watch items for the back half of 2026: how fast cloud AI inference pricing keeps falling against Blackwell-era hardware costs, whether colo GPU capacity in Tier 1 markets loosens or tightens further, and how aggressively hyperscalers respond with bring-your-own-license, sovereign-region, and hybrid pricing constructs designed to keep workloads from leaving. The repatriation trend is real. It is also slower and more selective than the survey numbers suggest, and the companies that get the financial advantage are the ones doing the placement math workload by workload, not at the portfolio level.






