FinOps: Budget Controls and Usage Tracking

Agent Manager’s FinOps tooling gives you real-time visibility into LLM token usage and associated costs. You can monitor spending per agent and session, set expected burn-rate baselines, and receive alerts when agent costs deviate from normal behavior.

What FinOps tracks

Valuation rates

Token-to-USD conversion rates per model. Agent Manager uses these rates to compute cost estimates from raw token counts reported by the LLM provider.

Burn rates

Real-time USD spend velocity per active session. Sliding-window accumulators track cost per hour so you can detect runaway sessions before they become costly.

Historical trends

Daily cost aggregations over configurable trailing windows (7, 30, or up to 90 days) broken down by agent and organization.

Anomaly detection

Sessions whose burn rate exceeds a registered agent baseline by a configurable multiplier are flagged as anomalies, visible in the dashboard and via API.

Viewing cost data

Historical cost trends (trailing N days, default 7):

GET /api/v1/finops/trends?days=30

Cost allocation by agent and org:

GET /api/v1/finops/allocations?days=7

Cost allocation broken down by LLM model:

GET /api/v1/finops/allocations/by-model?days=7

Active session burn rates:

GET /api/v1/finops/burn-rates/active

Returns one entry per active session with its cumulative USD spend within the current observation window. Cache ROI statistics:

GET /api/v1/finops/roi-stats

Returns accumulated cache savings (USD), embedding costs (USD), and net ROI since the last application restart.

Configuring valuation rates

Agent Manager computes cost estimates using per-model token-to-USD rates. Retrieve the current rate table:

GET /api/v1/finops/valuation-rates

curl -X PUT http://your-host/api/v1/finops/valuation-rates \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "gpt-4o",
    "inputRatePerKTokens": 0.0025,
    "outputRatePerKTokens": 0.01,
    "cachedInputRatePerKTokens": 0.00125,
    "reasoningRatePerKTokens": 0.01
  }'

Rate updates take effect immediately — the new values are stored in an in-memory concurrent cache and applied to all subsequent runs.

Setting agent burn-rate baselines

Baselines define the expected normal USD/hour spend for an agent. Agent Manager uses baselines to identify anomalous sessions.

curl -X PUT http://your-host/api/v1/finops/baselines/{agentId} \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "baselineUsdPerHour": 0.50
  }'

Set baselines after running your agents in normal conditions for a few days. Use the historical trends endpoint to determine a representative USD/hour figure for each agent.

Anomaly detection

When a session’s burn rate exceeds its agent’s baseline by a configured multiplier, it appears as an active anomaly:

GET /api/v1/finops/anomalies/active

[
  {
    "sessionId": "session-uuid",
    "agentId": "finance_agent",
    "burnRateUsdPerHour": 4.20,
    "baselineUsdPerHour": 0.50,
    "anomalyRatio": 8.4
  }
]

An empty array means no sessions are currently anomalous.

Prometheus metrics

Agent Manager exposes FinOps data via Prometheus at the standard actuator endpoint:

GET /actuator/prometheus

Key metrics:

Metric	Type	Description
`agent.runs`	Counter	Total agent run count
`agent.tool.calls`	Counter	Total tool invocations
`finops.cache.savings.usd`	Counter	Cumulative USD saved via semantic cache
`finops.embedding.cost.usd`	Summary	Cumulative USD spent on embeddings

Health check

GET /actuator/health

Returns system status including database connectivity, Docker availability (for the code sandbox), and any configured API provider health.

Use the cache impact time-series endpoint (GET /api/v1/finops/cache-impact) to measure how effectively your agents are leveraging semantic caching. Higher cache hit rates directly reduce LLM spend.

​What FinOps tracks

Valuation rates

Burn rates

Historical trends

Anomaly detection

​Viewing cost data

​Configuring valuation rates

​Setting agent burn-rate baselines

​Anomaly detection

​Prometheus metrics

​Health check

What FinOps tracks

Viewing cost data

Configuring valuation rates

Setting agent burn-rate baselines

Anomaly detection

Prometheus metrics

Health check