A cloud architect reviews an AWS bill with an engineering manager: "We're paying on-demand rate for EC2 instances that have run continuously for eight months. If we switch to a 1-year Reserved Instance, we'd save 40%. For burstable workloads we should use Savings Plans — they're more flexible than RIs." What is the difference between Reserved Instances and Savings Plans in cloud billing?
Reserved Instances (RIs): a billing commitment to use a specific instance type, in a specific region, for 1 or 3 years. Discounts: 30-72% vs on-demand. Types: Standard RI — fully fixed (instance type + region + AZ); deepest discount. Convertible RI — can change instance family during the term; smaller discount. Savings Plans: commit to a specific $/hour spend (e.g., $10/hr), applied flexibly across any instance type and region. Types: Compute Savings Plans — most flexible; applies to EC2, Fargate, Lambda. EC2 Instance Savings Plans — less flexible (fixed instance family per region) but higher discount. Cloud cost terminology: On-Demand pricing — pay per hour/second, no commitment; most expensive per unit. Spot Instances (AWS) / Preemptible VMs (GCP) — unused capacity auctioned at 60-90% discount; can be terminated with 2-minute notice. PAYG (Pay-As-You-Go) — Azure equivalent of on-demand. Unit economics — cost per transaction, per user, or per GB processed. Cloud spend — total cloud infrastructure cost. TCO (Total Cost of Ownership) — comparing cloud vs on-premises including people, hardware, facilities. In conversation: "We converted our always-on dev environments to Savings Plans and cut the bill by $8,000/month without changing a line of code."
2 / 5
A FinOps engineer presents to leadership: "Right now we can see our total cloud bill, but we can't break it down by team, product, or environment. I'm proposing a tagging strategy: every resource gets tagged with team, product, environment, and cost-centre. Then we can allocate costs and do showback." What is showback and how does it differ from chargeback?
Showback: teams see their allocated cloud costs in reports/dashboards for awareness and accountability — but there's no actual financial transfer. Builds cost-awareness culture without org-chart complexity. Chargeback: teams' cloud costs are transferred to their budget/P&L. Creates hard financial accountability — teams feel direct pain for waste. Cloud cost allocation vocabulary: Cost allocation tags — metadata key-value pairs attached to cloud resources (e.g., Team=backend, Env=production, CostCentre=CC-001). Enable filtering in billing dashboards. Tag governance — enforcing tagging policies; untagged resources = unallocatable spend. Cost centre — an internal accounting unit that accrues spend. Shared cost allocation — shared infrastructure (networking, monitoring) split across teams by a formula (e.g., proportional to compute spend). FinOps vocabulary: Unit economics — metric like cost-per-API-call or cost-per-active-user. Cloud waste — idle resources spending money (stopped EC2 instances with EBS attached, unattached load balancers, unused elastic IPs). Rightsizing — reducing oversized resources to a size that matches actual usage. FinOps Foundation — the industry body defining FinOps practices; FinOps phases: Inform → Optimise → Operate. In conversation: "We did showback for 6 months to educate teams, then switched to chargeback — teams immediately started rightsizing their oversized instances."
3 / 5
A cloud engineer explains a cost anomaly: "The data processing team had instances running around the clock at r6g.4xlarge — 16 vCPUs, 128 GB RAM — but CPU was only at 8% average and memory at 15%. They were massively overprovisioned. We rightsized to r6g.large and saved $4,200 per month on that cluster alone." What is rightsizing in cloud cost management?
Rightsizing: matching the compute, memory, and storage of cloud resources to their actual usage. Over-provisioning is the most common form of cloud waste: instances are sized for peak load that never arrives. Rightsizing process: 1) Collect utilisation data (CPU, memory, network, disk I/O) over 2-4 weeks. 2) Identify instances where peak usage < 30% of provisioned capacity. 3) Test on a smaller instance type. 4) Monitor for performance degradation. 5) Commit. Cloud resource optimisation vocabulary: Oversized instance — paying for compute you don't use. Opposite: undersized — causing performance degradation. CPU credits — T-series AWS instances (burstable) earn/spend CPU credits; important to monitor for applications that burst occasionally. IOPS (Input/Output Operations Per Second) — storage throughput metric; often a hidden cost when running high I/O databases. Auto Scaling — automatically adding/removing instances based on demand; reduces waste during off-peak hours. Hibernation / Stop-Start schedules — turn off development/staging instances outside business hours; saves 60-70% on non-production environments. Graviton instances (AWS) — ARM-based instances up to 40% cheaper than x86 equivalents for compatible workloads. In conversation: "We saved $60K/year just by scheduling dev environments to shut down at 8pm and start at 7am — no code changes needed."
4 / 5
A FinOps practitioner introduces a new team framework: "FinOps isn't just about cutting costs — it's about maximising value from cloud spend. The three pillars are: Inform (visibility into who's spending what), Optimise (reduce waste and increase efficiency), and Operate (ongoing governance through policies and automation). We're starting with the Inform phase." What is FinOps?
FinOps (Cloud Financial Operations): a practice and cultural shift that makes cloud spending a shared engineering-finance-business responsibility — not just a finance problem. The FinOps Foundation defines it as "a financial operating model for cloud." Three FinOps phases: Inform — visibility and allocation. Tagging, dashboards, cost reports, anomaly detection. Answer "who spends what on what?" Optimise — waste reduction and efficiency. Rightsizing, scheduling, commitment discounts, architectural changes. Operate — governance and automation. Budget alerts, policies, automated enforcement, cost benchmarking, unit economics tracking. Key FinOps vocabulary: Cloud cost forecasting — predicting future cloud spend based on growth models and committed capacity. Budget alert — a notification when spending exceeds a threshold. Cost anomaly detection — automated detection of unexpected spend spikes (AWS Cost Anomaly Detection, Azure Cost Alerts). Idle resource (cloud waste) — resources incurring cost without being actively used. Common sources: unattached EBS volumes, idle load balancers, unused static IPs, over-provisioned RDS. Cloud cost per feature — unit economics for product teams. Shared services cost model — how to allocate the cost of Kubernetes clusters, monitoring, NAT gateways shared by multiple teams. In conversation: "Before FinOps, engineering optimised for performance and finance optimised for budget — they rarely talked. FinOps made cloud cost a shared OKR."
5 / 5
A cloud architect makes the case for Spot Instances on a batch processing workload: "Our nightly data transformation job runs for 90 minutes. It's fault-tolerant — if a Spot Instance is reclaimed, the job checkpoints and resumes. We can save 70% versus on-demand by using Spot. For anything that can't tolerate interruption — databases, user-facing APIs — we stay on On-Demand or RIs." What are Spot Instances and what workloads are they suitable for?
Spot Instances (AWS) = Preemptible VMs (GCP) = Azure Spot VMs: cloud provider's unused capacity offered at steep discounts (60-90%). The provider can reclaim them with 2-minute warning when demand increases. Suitable workloads for Spot: Batch processing — nightly ETL, data transformation, log processing. CI/CD runners — build and test pipelines; job failure = retry. Machine learning training — checkpointing allows resuming after interruption. Stateless web tier — behind a load balancer; losing one instance doesn't affect users. Rendering farm — video/image rendering are naturally parallelisable and resumable. NOT suitable: databases (data loss risk), session-based applications without sticky sessions, anything with < 5 minutes of interruption budget. Advanced Spot patterns: Spot fleet — a collection of Spot Instances across instance types and AZs; if one pool is reclaimed, others fill in. Spot + On-Demand mix — base capacity on On-Demand, burst on Spot. Interruption handling — applications receive a termination notice; must checkpoint state within 2 minutes. Spot Advisor — AWS tool showing interruption frequency and savings by instance type. In conversation: "We moved 80% of our CI pipeline to Spot Instances and saved $15,000/month — build times increased by 2 minutes on average but nobody noticed."