Skip to main content

Documentation Index

Fetch the complete documentation index at: https://operativusai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Agent Manager’s FinOps tooling gives you real-time visibility into LLM token usage and associated costs. You can monitor spending per agent and session, set expected burn-rate baselines, and receive alerts when agent costs deviate from normal behavior.

What FinOps tracks

Valuation rates

Token-to-USD conversion rates per model. Agent Manager uses these rates to compute cost estimates from raw token counts reported by the LLM provider.

Burn rates

Real-time USD spend velocity per active session. Sliding-window accumulators track cost per hour so you can detect runaway sessions before they become costly.

Historical trends

Daily cost aggregations over configurable trailing windows (7, 30, or up to 90 days) broken down by agent and organization.

Anomaly detection

Sessions whose burn rate exceeds a registered agent baseline by a configurable multiplier are flagged as anomalies, visible in the dashboard and via API.

Viewing cost data

Historical cost trends (trailing N days, default 7):
GET /api/v1/finops/trends?days=30
Cost allocation by agent and org:
GET /api/v1/finops/allocations?days=7
Cost allocation broken down by LLM model:
GET /api/v1/finops/allocations/by-model?days=7
Active session burn rates:
GET /api/v1/finops/burn-rates/active
Returns one entry per active session with its cumulative USD spend within the current observation window. Cache ROI statistics:
GET /api/v1/finops/roi-stats
Returns accumulated cache savings (USD), embedding costs (USD), and net ROI since the last application restart.

Configuring valuation rates

Agent Manager computes cost estimates using per-model token-to-USD rates. Retrieve the current rate table:
GET /api/v1/finops/valuation-rates
Register or update a model’s rates:
curl -X PUT http://your-host/api/v1/finops/valuation-rates \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "gpt-4o",
    "inputRatePerKTokens": 0.0025,
    "outputRatePerKTokens": 0.01,
    "cachedInputRatePerKTokens": 0.00125,
    "reasoningRatePerKTokens": 0.01
  }'
Rate updates take effect immediately — the new values are stored in an in-memory concurrent cache and applied to all subsequent runs.

Setting agent burn-rate baselines

Baselines define the expected normal USD/hour spend for an agent. Agent Manager uses baselines to identify anomalous sessions.
curl -X PUT http://your-host/api/v1/finops/baselines/{agentId} \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "baselineUsdPerHour": 0.50
  }'
Set baselines after running your agents in normal conditions for a few days. Use the historical trends endpoint to determine a representative USD/hour figure for each agent.

Anomaly detection

When a session’s burn rate exceeds its agent’s baseline by a configured multiplier, it appears as an active anomaly:
GET /api/v1/finops/anomalies/active
[
  {
    "sessionId": "session-uuid",
    "agentId": "finance_agent",
    "burnRateUsdPerHour": 4.20,
    "baselineUsdPerHour": 0.50,
    "anomalyRatio": 8.4
  }
]
An empty array means no sessions are currently anomalous.

Prometheus metrics

Agent Manager exposes FinOps data via Prometheus at the standard actuator endpoint:
GET /actuator/prometheus
Key metrics:
MetricTypeDescription
agent.runsCounterTotal agent run count
agent.tool.callsCounterTotal tool invocations
finops.cache.savings.usdCounterCumulative USD saved via semantic cache
finops.embedding.cost.usdSummaryCumulative USD spent on embeddings

Health check

GET /actuator/health
Returns system status including database connectivity, Docker availability (for the code sandbox), and any configured API provider health.
Use the cache impact time-series endpoint (GET /api/v1/finops/cache-impact) to measure how effectively your agents are leveraging semantic caching. Higher cache hit rates directly reduce LLM spend.