The Cost Challenge of AI in the Cloud

As artificial intelligence becomes a key driver of innovation, cloud platforms are increasingly used to deploy large language models, fine-tune machine learning pipelines, or serve real-time inferences. These workloads, while transformative, often come with highly variable and opaque costs.

Unlike traditional cloud resources (like virtual machines or storage), AI services introduce complex, multidimensional billing models. These include token based pricing for model usage, training hour charges, GPU allocation, managed endpoint runtime, and even request latency premiums. These factors render traditional FinOps practices insufficient without critical adaptation. One analysis notes that for LLMs, “the cost of submitting an identical query to two different models can vary by a factor of up to 100”. 100X ing the price of the same query is crazy work in 2025 !! hhhh

The Problem: Complex, Multi Dimensional Billing Models

AI services don’t just run; they consume, scale, and bill differently.

Where a FinOps team may be used to tagging compute resources or grouping usage by workload or application, AI billing tends to evade such clarity. Token Based Billing: Services like Azure OpenAI or OpenAI’s API charge based on input and output tokens. This is non standard compared to CPU hours or GBs stored. Training vs Inference: Platforms like AWS SageMaker and Google Vertex AI charge separately for training jobs and inference endpoints, often on different SKUs. GPU Consumption in Shared Environments: In Kubernetes clusters with GPUs, it becomes difficult to attribute exact usage and cost back to teams or services unless tools like Kubecost are finely tuned. Model Lifecycle Costs: Hosting a model, tuning it, storing artifacts, and updating weights each carries distinct billing implications.

Traditional FinOps Blind Spots

These gaps are not only technical but organizational. Teams using AI often operate outside standard cloud engineering practices, using notebooks, APIs, and managed services that don’t align to current tagging or FinOps policy enforcement.

FinOps ConcernTraditional CloudAI/ML Workloads
Cost AllocationTags, resource groupsTokens, models, pipelines, endpoints
ForecastingPredictable growth trendsUsage bursts, training spikes, scaling inferences
OptimizationRI/SP purchase, right-sizingSpot GPU usage, batching inference requests
ReportingSKU-based dashboardsRequires model/token specific cost telemetry
AccountabilityApp/Team owners via taggingShared models, composite pipelines

Cloud+: A Broader FinOps Frontier

Recognizing the need to evolve, the FinOps Foundation introduced the Cloud+ concept : an expansion of FinOps to adjacent domains like AI, containers, and SaaS. Cloud+ emphasizes cross functional collaboration to tackle usage domains where traditional cloud FinOps models don’t apply cleanly.

Tools and Evolving Standards

Meanwhile, tools and initiatives from cloud providers and the open source community are stepping in:

  • Microsoft FinOps Toolkit: Offers support for token-level reporting in Azure OpenAI.
  • Kubecost: Enables GPU usage visibility for ML workloads in Kubernetes.
  • AWS, Google, Azure: Each is enhancing their AI platforms to expose more granular billing metrics for training, inference, and model lifecycle.

However, Tracking AI costs effectively isn’t just about tooling. It’s about shifting how organizations design, deploy, and observe AI workloads in a way that aligns with financial accountability and business value. As FinOps practices mature, they must extend visibility and accountability into AI domains.


FinOps Foundation: Cloud+ Scopes
Azure OpenAI Pricing
Google Cloud Vertex AI Pricing
AWS SageMaker Pricing
Kubecost: GPU Optimization