July 4, 2026

AI Costs Are Crashing. Bills Are Exploding.

Sponsored

First a note from Profits Run

Quick one.

You can grab my Smart Trade Options Checklist free today – normally $29.97.

It’s a one-page filter. Seven checks. Run it before you place any options trade and you’ll catch the bad ones before they cost you.

Print it. Keep it next to your screen. Use it on every trade.

Download your free copy right here.

Heads up: That link expires soon.

Grab it now and it’s yours.

Good Trading,

Bill Poulos

P.S. The download link expires soon. Once it’s gone, the checklist goes back to $29.97. Grab it here.

FEATURED

AI Costs Are Crashing. Bills Are Exploding.

Here is something the market is not pricing correctly.

AI inference is getting cheaper at a speed that has almost no precedent in the history of technology. GPT-4 launched in March 2023 at $30 per million input tokens. Today, GPT-5.4 sits at $2.50 per million input tokens. That is a roughly 12x reduction in 36 months. DeepSeek V3.2 runs at $0.14 per million input tokens on its own API. The cost curve is deflationary by any definition. And if you read those numbers in isolation, you might conclude that the companies selling AI compute are about to get crushed.

Wrong direction. Keep reading.

The paradox is this: the most confusing thing happening in enterprise finance right now is the simultaneous reality of collapsing unit costs and exploding total bills. Stanford HAI found that the inference cost for GPT-3.5-level performance fell roughly 280x in two years. Yet the same enterprises watching token prices collapse are seeing their monthly AI bills multiply. Gartner estimates that enterprise generative AI spending specifically accounts for $127 billion of total AI spend in 2026, growing at 59% year-over-year. The average enterprise is now spending 1.7% of revenue on AI in 2026, more than double the 0.8% level of 2025.

How does this happen? The answer is agentic AI. A simple chatbot query triggers one LLM inference call. An agentic workflow, where an autonomous AI agent reasons iteratively, breaks down a task, calls tools, verifies outputs, and self-corrects, may trigger 10 to 20 LLM calls to complete a single user-initiated task. Cheaper per token. Far more tokens per task. Total cost goes up.

Sponsored

Will You Survive the MAR-A-LAGO RESET?

Bloomberg calls it “a dire shift of fortunes for America” and The Wall Street Journal calls it a “New World Order.” Now, Dr. David Eifrig – a 40-year market veteran who traded through Black Monday and has recommended more than a dozen triple-digit winners – warns that you must make one of the most important financial decisions of your lifetime today.

He strongly recommends this ONE step to potentially secure your retirement.

This dynamic has a name. Gartner has flagged that cheaper tokens will not automatically translate to cheaper enterprise AI because agentic models require far more tokens per task than standard models, and total consumption growth can outpace the pace at which unit prices fall. It is not a pricing problem. It is an architecture problem. And most enterprise finance teams have not yet built the cost models to account for it.

The Uber story makes this concrete. Uber CTO Praveen Neppalli Naga confirmed to The Information that the company exhausted its entire planned 2026 AI coding tools budget by April, just four months into the year. Claude Code adoption across its roughly 5,000-engineer organization jumped from 32% of engineers in February to 84% classified as agentic coding users by March. By spring, 70% of committed code at Uber was originating from AI tools, and roughly 11% of live backend updates were being written by AI agents with no human in the loop. From a productivity standpoint, the rollout worked. From a finance standpoint, the budget was gone. Uber has since implemented a $1,500 monthly per-employee cap on agentic coding tools including Claude Code and Cursor. COO Andrew Macdonald said publicly that he still cannot draw a direct line between the company’s rising AI spend and concrete new consumer features. That is the right question. And it is one that CFOs across the industry are now starting to ask.

The part that gets skipped: this is actually good news for certain infrastructure plays, and a real problem for others. Let’s separate them.

The winners from surging total consumption are not hard to identify. Microsoft, Amazon, Google, and Meta are collectively projected to spend roughly $725 billion on AI infrastructure in 2026 alone, up 77% from approximately $410 billion in 2025. To put individual numbers on it: Amazon is guiding to roughly $200 billion, Microsoft near $190 billion, Google at $175 to $185 billion, and Meta at $115 to $135 billion. The hyperscaler capex cycle is being driven by aggregate demand growth, not unit economics. More usage at lower prices per token still means more revenue for the pick-and-shovel layer.

Slight tangent, but it matters: Broadcom’s Q1 FY2026 AI semiconductor revenue came in at $8.4 billion, up 106% year-over-year. The company guided Q2 AI revenue to $10.7 billion, a 140% year-over-year increase. CEO Hock Tan has stated Broadcom has line of sight to achieve AI chip revenue in excess of $100 billion in fiscal 2027, backed by a $73 billion backlog of committed orders. That is not a company in danger from falling token prices. That is a company printing revenue from volume growth.

The risk that nobody is modeling correctly is on the application side. For two years, the industry operated on an unspoken assumption: frontier AI capability is expensive, and you pay a premium for the best models because the best models are scarce. That assumption is eroding fast.

Here is where the benchmark data gets uncomfortable for the closed-model incumbents. As of late June 2026, DeepSeek V4-Pro-Max scores 80.6% on SWE-bench Verified, the standard benchmark for real-world software engineering tasks. GPT-5.5 from OpenAI scores 88.7% and Claude Opus 4.8 scores 88.6% on the same benchmark. The frontier closed models still lead on raw capability. But DeepSeek V4-Flash outputs tokens at $0.28 per million, while GPT-5.5 runs $30 per million output and Claude Opus 4.8 runs $25 per million output. That is roughly a 90 to 100x cost difference for a model that covers the vast majority of real-world coding tasks. For teams running agentic loops at volume, that gap is not a marginal consideration. It is a completely different product economics conversation.

That pricing gap is a direct challenge to the revenue models of closed-model AI providers, and an indirect challenge to the assumption that massive GPU clusters are always the optimal answer.

Here’s where I’m at on the investment framing. The deflationary pressure on per-token pricing does not kill the AI infrastructure trade. Inference demand is growing fast enough to absorb it. The volume growth is outrunning the price decline, at least for now. What it does do is create real pressure on the middle of the stack: the application-layer AI companies that built their business models on the assumption that frontier model access would remain expensive and scarce. That moat is narrowing.

AI Costs Are Crashing. Bills Are Exploding.

AI Costs Are Crashing. Bills Are Exploding.

More From Author

Japan’s Governance Revolution Is Entering Its Final Phase

The Dow Just Hit a Record. The Nasdaq Fell. That Split Is the Trade.

The Dow Just Hit a Record. The Nasdaq Fell.

The Dow Just Hit a Record. The Nasdaq Fell. That Split Is the Trade.

Japan’s Governance Revolution Is Entering Its Final Phase

Live Market Pulse

Latest Posts

Categories

AI Costs Are Crashing. Bills Are Exploding.

Live Market Pulse

Sign up for our free Active Trader Daily Newsletter!

Latest Posts

Categories