Cloud FinOps Vocabulary
5 exercises — master the advanced vocabulary of cloud cost management: rightsizing decisions, showback vs. chargeback models, Spot vs. Reserved Instance trade-offs, unit economics analysis, and cost anomaly response.
0 / 5 completed
Cloud FinOps vocabulary quick reference
- Rightsizing — matching instance size to actual demand; evaluate P99 peaks, not just averages
- Showback — cost visibility for teams; no budget transfer, no financial consequence
- Chargeback — actual cost allocation to the consuming team's budget; enforces financial accountability
- Spot Instance — spare cloud capacity at 50–90% discount; can be reclaimed with 2-minute notice
- Reserved Instance (RI) — 1- or 3-year commitment to specific instance type; 40–72% discount
- Savings Plans — flexible commitment to $/hr spend level; applied across eligible instance types
- Unit economics — cloud spend normalised to a business metric (cost per DAU, per transaction)
- Cost anomaly — an unusual spending deviation; investigate root cause before taking action
1 / 5
A FinOps analyst presents findings: "This EC2 instance has averaged 8% CPU utilization over the past 30 days. It is provisioned as an m5.4xlarge (16 vCPUs, 64GB RAM)."
What is rightsizing, and what specific risk must be carefully evaluated before downsizing a production instance?
Rightsizing is data-driven engineering — but averages lie. An 8% average can hide a 90% peak at month-end batch time.
The right analysis before rightsizing:
Safe rightsizing process:
• AWS Compute Optimizer — analyses 14 days of CloudWatch metrics using ML; provides risk-rated resize recommendations
• AWS Cost Explorer Rightsizing Recommendations — simpler; based on CloudWatch data, no ML
• Azure Advisor — equivalent Azure service; analyses VM utilisation and recommends resize or shutdown
• Infracost / CloudHealth — third-party FinOps platforms with team-level chargeback reporting
Key vocabulary:
• Rightsizing — matching cloud instance type and size to actual workload resource demand, with an appropriate safety margin above measured peak
• P99 (99th percentile) — the resource utilisation level that 99% of measurements fall below; reveals worst-case conditions that averages hide
• Safety margin / headroom — the buffer above measured peak utilisation provisioned to absorb unexpected traffic spikes without degradation
• Compute Optimizer — AWS ML-based service that analyses resource utilisation and recommends right-sized configurations
The right analysis before rightsizing:
| Metric to examine | Why averages are not enough |
|---|---|
| P99 CPU utilisation | 8% average; 85% P99 at end-of-month batch run. The m5.2xlarge recommendation would max out |
| Memory high-water mark | Average 12GB RAM used; peak 58GB during full dataset load. Cannot drop below 64GB RAM |
| Network throughput bursts | Average 50Mbps; peak 8Gbps during backup window — instance network baseline matters |
| Burst credits (T-series) | T-series instances earn CPU credits when idle; low average with occasional bursts may suit t3.large |
Safe rightsizing process:
1. Collect 30-day CloudWatch metrics: CPU, memory (CloudWatch agent), network, disk I/O 2. Check P99 and P999, not just average — look for periodic spikes (backups, batch jobs, month-end) 3. Use AWS Compute Optimizer recommendation + apply 20-30% safety buffer above measured peak 4. Test in staging under representative load 5. Resize in production maintenance window with automated rollback alarm (CPUUtilization > 85%)Tooling for rightsizing:
• AWS Compute Optimizer — analyses 14 days of CloudWatch metrics using ML; provides risk-rated resize recommendations
• AWS Cost Explorer Rightsizing Recommendations — simpler; based on CloudWatch data, no ML
• Azure Advisor — equivalent Azure service; analyses VM utilisation and recommends resize or shutdown
• Infracost / CloudHealth — third-party FinOps platforms with team-level chargeback reporting
Key vocabulary:
• Rightsizing — matching cloud instance type and size to actual workload resource demand, with an appropriate safety margin above measured peak
• P99 (99th percentile) — the resource utilisation level that 99% of measurements fall below; reveals worst-case conditions that averages hide
• Safety margin / headroom — the buffer above measured peak utilisation provisioned to absorb unexpected traffic spikes without degradation
• Compute Optimizer — AWS ML-based service that analyses resource utilisation and recommends right-sized configurations