controlling-costs

Analyzes Axiom query patterns to find unused data, then builds dashboards and monitors for cost optimization. Use when asked to reduce Axiom costs, find unused columns or field values, identify data waste, or track ingest spend.

2estrelas

0forks

Atualizado 2/5/2026

Obter Código Fonte

SKILL.md

readonlyread-only

name

controlling-costs

description

Axiom Cost Control

Dashboards, monitors, and waste identification for Axiom usage optimization.

Before You Start

Load required skills:
```
skill: axiom-sre
skill: building-dashboards
```
Building-dashboards provides: dashboard-list, dashboard-get, dashboard-create, dashboard-update, dashboard-delete
Find the audit dataset. Try axiom-audit first:
```
['axiom-audit']
| where _time > ago(1h)
| summarize count() by action
| where action in ('usageCalculated', 'runAPLQueryCost')
```
- If not found → ask user. Common names: axiom-audit-logs-view, audit-logs
- If found but no usageCalculated events → wrong dataset, ask user
Verify axiom-history access (required for Phase 4):
```
['axiom-history'] | where _time > ago(1h) | take 1
```
If not found, Phase 4 optimization will not work.
Confirm with user:
- Deployment name?
- Audit dataset name?
- Contract limit in TB/day? (required for Phase 3 monitors)
Replace <deployment> and <audit-dataset> in all commands below.

Tips:

Run any script with -h for full usage
Do NOT pipe script output to head or tail — causes SIGPIPE errors
Requires jq for JSON parsing
Use axiom-sre's axiom-query for ad-hoc APL, not direct CLI

Which Phases to Run

User request	Run these phases
"reduce costs" / "find waste"	0 → 1 → 4
"set up cost control"	0 → 1 → 2 → 3
"deploy dashboard"	0 → 2
"create monitors"	0 → 3
"check for drift"	0 only

Phase 0: Check Existing Setup

# Existing dashboard?
dashboard-list <deployment> | grep -i cost

# Existing monitors?
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'

If found, fetch with dashboard-get and compare to templates/dashboard.json for drift.

Phase 1: Discovery

scripts/baseline-stats -d <deployment> -a <audit-dataset>

Captures daily ingest stats and produces the Analysis Queue (needed for Phase 4).

Phase 2: Dashboard

scripts/deploy-dashboard -d <deployment> -a <audit-dataset>

Creates dashboard with: ingest trends, burn rate, projections, waste candidates, top users. See reference/dashboard-panels.md for details.

Phase 3: Monitors

Contract is required. You must have the contract limit from preflight step 4.

Step 1: List available notifiers

scripts/list-notifiers -d <deployment>

Present the list to the user and ask which notifier they want for cost alerts.
If they don't want notifications, proceed without -n.

Step 2: Create monitors

scripts/create-monitors -d <deployment> -a <audit-dataset> -c <contract_tb> [-n <notifier_id>]

Creates 3 monitors:

Total Ingest Guard — alerts when daily ingest >1.2x contract OR 7-day avg grows >15% vs baseline
Per-Dataset Spike — robust z-score detection, alerts per dataset with attribution
Query Cost Spike — same z-score approach for query costs (GB·ms)

The spike monitors use notifyByGroup: true so each dataset triggers a separate alert.

See reference/monitor-strategy.md for threshold derivation.

Phase 4: Optimization

Get the Analysis Queue

Run scripts/baseline-stats if not already done. It outputs a prioritized list:

Priority	Meaning
P0⛔	Top 3 by ingest OR >10% of total — MANDATORY
P1	Never queried — strong drop candidate
P2	Rarely queried (Work/GB < 100) — likely waste

Work/GB = query cost (GB·ms) / ingest (GB). Lower = less value from data.

Analyze datasets in order

Work top-to-bottom. For each dataset:

Step 1: Column analysis

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset>

If 0 queries → recommend DROP, move to next.

Step 2: Field value analysis

Pick a field from suggested list (usually app, service, or kubernetes.labels.app):

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset> -f <field>

Note values with high volume but never queried (⚠️ markers).

Step 3: Handle empty values

If (empty) has >5% volume, you MUST drill down with alternative field (e.g., kubernetes.namespace_name).

Step 4: Record recommendation

For each dataset, note: name, ingest volume, Work/GB, top unqueried values, action (DROP/SAMPLE/KEEP), estimated savings.

Done when

All P0⛔ and P1 datasets analyzed. Then compile report using reference/analysis-report-template.md.

Cleanup

# Delete monitors
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'
axiom-api <deployment> DELETE "/v2/monitors/<id>"

# Delete dashboard
dashboard-list <deployment> | grep -i cost
dashboard-delete <deployment> <id>

Note: Running create-monitors twice creates duplicates. Delete existing monitors first if re-deploying.

Reference

Audit Dataset Fields

Field	Description
`action`	`usageCalculated` or `runAPLQueryCost`
`properties.hourly_ingest_bytes`	Hourly ingest in bytes
`properties.hourly_billable_query_gbms`	Hourly query cost
`properties.dataset`	Dataset name
`resource.id`	Org ID
`actor.email`	User email

Common Fields for Value Analysis

Dataset type	Primary field	Alternatives
Kubernetes logs	`kubernetes.labels.app`	`kubernetes.namespace_name`, `kubernetes.container_name`
Application logs	`app` or `service`	`level`, `logger`, `component`
Infrastructure	`host`	`region`, `instance`
Traces	`service.name`	`span.kind`, `http.route`

Units & Conversions

Scripts use TB/day
Dashboard filter uses GB/month

Contract	TB/day	GB/month
5 PB/month	167	5,000,000
10 PB/month	333	10,000,000
15 PB/month	500	15,000,000

Optimization Actions

Signal	Action
Work/GB = 0	Drop or stop ingesting
High-volume unqueried values	Sample or reduce log level
Empty values from system namespaces	Filter at ingest or accept
WoW spike	Check recent deploys

Related Skills

verify

243K

Use when you want to validate changes before committing, or when you need to check all React contribution requirements.

facebook

Obter

test

243K

Use when you need to run tests for React core. Supports source, www, stable, and experimental channels.

facebook

Obter

feature-flags

243K

Use when feature flag tests fail, flags need updating, understanding @gate pragmas, debugging channel-specific test failures, or adding new flags to React.

facebook

Obter

extract-errors

243K

Use when adding new error messages to React, or seeing "unknown error code" warnings.

facebook

Obter

flow

243K

Use when you need to run Flow type checking, or when seeing Flow type errors in React code.

facebook

Obter

flags

243K

Use when you need to check feature flag states, compare channels, or debug why a feature behaves differently across release channels.

facebook

Obter

controlling-costs

Axiom Cost Control

Before You Start

Which Phases to Run

Phase 0: Check Existing Setup

Phase 1: Discovery

Phase 2: Dashboard

Phase 3: Monitors

Step 1: List available notifiers

Step 2: Create monitors

Phase 4: Optimization

Get the Analysis Queue

Analyze datasets in order

Done when

Cleanup

Reference

Audit Dataset Fields

Common Fields for Value Analysis

Units & Conversions

Optimization Actions

You Might Also Like

Related Skills

verify

test

feature-flags

extract-errors

flow

flags