Rate Limits¶

Ag2Trust implements rate limiting to ensure fair usage and system stability.

Rate Limit Overview¶

Scope	Default Limit	Configurable
Organization	60 requests/minute	Yes (per plan)
Per Agent	50 requests/minute	No
Webhook delivery	1/second per customer	No

Rate Limit Headers¶

Every API response includes rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1700308800

Header	Description
`X-RateLimit-Limit`	Maximum requests per minute
`X-RateLimit-Remaining`	Requests left in current window
`X-RateLimit-Reset`	Unix timestamp when limit resets

Rate Limit Exceeded (429)¶

When you exceed the rate limit:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700308800
Retry-After: 15

{
  "error": "Rate limit exceeded",
  "error_code": "RATE_LIMIT_EXCEEDED",
  "details": {
    "limit": 60,
    "reset_at": "2025-01-15T10:31:00Z",
    "retry_after": 15
  }
}

Handling Rate Limits¶

Basic Retry Logic¶

PythonJavaScript

import time
import requests

def send_with_retry(url, payload, headers, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)
            continue

        return response

    raise Exception("Max retries exceeded")

async function sendWithRetry(url, payload, headers, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, {
      method: 'POST',
      headers,
      body: JSON.stringify(payload)
    });

    if (response.status === 429) {
      const retryAfter = parseInt(
        response.headers.get('Retry-After') || '5'
      );
      console.log(`Rate limited. Waiting ${retryAfter}s...`);
      await new Promise(r => setTimeout(r, retryAfter * 1000));
      continue;
    }

    return response;
  }

  throw new Error('Max retries exceeded');
}

Exponential Backoff¶

import time
import random

def exponential_backoff(attempt, base=1, max_delay=60):
    """Calculate delay with jitter."""
    delay = min(base * (2 ** attempt), max_delay)
    jitter = random.uniform(0, delay * 0.1)
    return delay + jitter

def send_with_backoff(url, payload, headers, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 429:
            delay = exponential_backoff(attempt)
            print(f"Rate limited. Backing off {delay:.1f}s...")
            time.sleep(delay)
            continue

        return response

    raise Exception("Max retries exceeded")

Using Libraries¶

Python (tenacity)JavaScript (axios-retry)

from tenacity import (
    retry,
    retry_if_result,
    wait_exponential,
    stop_after_attempt
)

def is_rate_limited(response):
    return response.status_code == 429

@retry(
    retry=retry_if_result(is_rate_limited),
    wait=wait_exponential(multiplier=1, max=60),
    stop=stop_after_attempt(5)
)
def send_message(url, payload, headers):
    return requests.post(url, json=payload, headers=headers)

import axios from 'axios';
import axiosRetry from 'axios-retry';

const client = axios.create({
  baseURL: 'https://api.ag2trust.com'
});

axiosRetry(client, {
  retries: 3,
  retryCondition: (error) =>
    error.response?.status === 429,
  retryDelay: (retryCount, error) => {
    const retryAfter = error.response?.headers['retry-after'];
    return retryAfter ? retryAfter * 1000 : retryCount * 1000;
  }
});

Proactive Rate Limit Management¶

Monitor Remaining Requests¶

class RateLimitAwareClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.remaining = None
        self.reset_at = None

    def send(self, endpoint, payload):
        # Check if we should wait
        if self.remaining is not None and self.remaining < 5:
            wait_time = max(0, self.reset_at - time.time())
            if wait_time > 0:
                print(f"Approaching limit, waiting {wait_time:.0f}s")
                time.sleep(wait_time)

        response = requests.post(
            f"https://api.ag2trust.com{endpoint}",
            json=payload,
            headers={"X-API-Key": self.api_key}
        )

        # Update rate limit info
        self.remaining = int(
            response.headers.get("X-RateLimit-Remaining", 60)
        )
        self.reset_at = int(
            response.headers.get("X-RateLimit-Reset", 0)
        )

        return response

Request Queuing¶

import asyncio
from collections import deque

class RequestQueue:
    def __init__(self, rate_limit=60):
        self.rate_limit = rate_limit
        self.queue = deque()
        self.tokens = rate_limit
        self.last_refill = time.time()

    async def execute(self, func, *args, **kwargs):
        await self._acquire_token()
        return await func(*args, **kwargs)

    async def _acquire_token(self):
        while True:
            self._refill_tokens()
            if self.tokens > 0:
                self.tokens -= 1
                return
            await asyncio.sleep(0.1)

    def _refill_tokens(self):
        now = time.time()
        elapsed = now - self.last_refill
        refill = int(elapsed * (self.rate_limit / 60))
        if refill > 0:
            self.tokens = min(self.rate_limit, self.tokens + refill)
            self.last_refill = now

Rate Limits by Endpoint¶

Endpoint	Limit	Notes
`POST /api/v1/ask/*`	60/min	Per organization
`POST /api/v1/agents/*/messages`	50/min	Per agent
`GET /api/v1/agents`	120/min	Higher for read ops
`GET /api/v1/usage`	120/min	Higher for read ops
`PUT /api/v1/webhook`	10/min	Configuration endpoint
`POST /api/v1/webhook/test`	10/min	Testing endpoint

Agent-Level Rate Limits¶

Individual agents have their own internal rate limits:

Operation	Limit	Purpose
Tool calls	5/minute	Prevent runaway tools
HTTP requests	3/minute	Limit external calls
Web search	3/minute	API cost control
Git push	10/hour	Prevent spam commits

These limits are per-agent and cannot be changed via API.

Per-Run Cost Controls¶

Every agent run has hard limits that cannot be exceeded. These limits are fail-closed - when a limit is hit, the run terminates immediately with an error.

Run Limits¶

Limit	Pro Default	Enterprise	Description
`max_model_calls`	8	20	Maximum LLM API calls per run
`max_total_tokens`	8,000	32,000	Maximum tokens (input + output) per run
`max_wall_time_seconds`	120	300	Maximum execution time
`max_tool_calls`	20	50	Maximum tool invocations per run
`max_tool_output_bytes`	100 KB	500 KB	Maximum tool output size

How Run Limits Work¶

┌─────────────────────────────────────────────────────────────────────────┐
│                         AGENT RUN LIFECYCLE                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Request Received    LLM Calls    Tool Calls    Response                │
│       │                 │             │             │                   │
│       ▼                 ▼             ▼             ▼                   │
│   ────●─────────────────●─────────────●─────────────●────────────────   │
│       │                 │             │             │                   │
│   Limits initialized    │             │         Run complete            │
│                         │             │                                 │
│           ┌─────────────┴─────────────┴─────────────┐                   │
│           │  At each step, limits are checked:       │                   │
│           │  - model_calls < max_model_calls?        │                   │
│           │  - total_tokens < max_total_tokens?      │                   │
│           │  - wall_time < max_wall_time_seconds?    │                   │
│           │  - tool_calls < max_tool_calls?          │                   │
│           │                                          │                   │
│           │  If ANY limit exceeded → Run terminates  │                   │
│           └──────────────────────────────────────────┘                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Run Limit Exceeded Error¶

When a per-run limit is exceeded:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "response": null,
  "error": {
    "error_code": "run_limit_exceeded",
    "message": "Run limit exceeded: max_model_calls (8)",
    "details": {
      "limit_type": "max_model_calls",
      "limit_value": 8,
      "current_value": 8
    }
  },
  "metadata": {
    "tokens_used": 6542,
    "model_calls": 8,
    "wall_time_seconds": 45.2
  }
}

Response Status

Run limit errors return HTTP 200 with an error in the response body. This is because the run executed (partially) successfully before hitting the limit.

Configuring Run Limits¶

Run limits are configured per tier in your subscription. Contact support to adjust limits for your account.

Daily Token Budget¶

Ag2Trust enforces a daily token budget per customer to prevent runaway costs. This is a hard cap on total tokens consumed across all runs in a 24-hour period (UTC).

How Daily Budget Works¶

The daily budget uses atomic Redis-based reservation to ensure you never exceed your budget, even under high concurrency:

Reserve upfront: At run start, tokens are reserved against your daily budget
Execute: The run executes LLM calls using the reserved capacity
Finalize: At run end, unused reserved tokens are released back to your budget

┌─────────────────────────────────────────────────────────────────────────┐
│                       DAILY BUDGET ENFORCEMENT                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Daily Budget: 100,000 tokens                                           │
│                                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                     │
│  │   Run A     │  │   Run B     │  │   Run C     │                     │
│  │ Reserve 8K  │  │ Reserve 8K  │  │ Reserve 8K  │                     │
│  │ Used: 5K    │  │ Used: 7K    │  │ Used: 6K    │                     │
│  │ Release 3K  │  │ Release 1K  │  │ Release 2K  │                     │
│  └─────────────┘  └─────────────┘  └─────────────┘                     │
│                                                                         │
│  Reserved: 24K (atomically checked against budget)                      │
│  Actually used: 18K                                                     │
│  Released: 6K (returned to budget for other runs)                       │
│                                                                         │
│  Remaining budget: 82K (100K - 18K)                                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Daily Budget Limits by Tier¶

Tier	Daily Token Budget	Notes
Free	10,000	Hard cap
Starter	50,000	Hard cap
Pro	100,000	Configurable
Enterprise	Custom	Unlimited available

Budget Exceeded Error¶

When your daily budget is exhausted:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "response": null,
  "error": {
    "error_code": "daily_budget_exceeded",
    "message": "Daily token budget exhausted (budget=100000, reserved=98500, requested=8000)",
    "details": {
      "daily_budget": 100000,
      "reserved_today": 98500,
      "requested": 8000,
      "remaining": 1500
    }
  }
}

Adaptive Reservation¶

When your remaining daily budget is low but not exhausted, Ag2Trust uses adaptive reservation:

If remaining budget >= 2,000 tokens: The run proceeds with a capped max_tokens
If remaining budget < 2,000 tokens: The run is rejected (not enough for a meaningful response)

This allows more runs to succeed when budget is low, while preventing partially complete responses.

Budget Reset¶

Daily budgets reset at 00:00 UTC each day. The reset is automatic and immediate.

Monitoring Budget

Monitor your token usage via the Usage & Metrics endpoints to avoid hitting budget limits during peak hours.

Demo Account Limits¶

Demo accounts have restricted limits to prevent abuse and ensure fair resource sharing. These limits cannot be changed and are enforced across all API and dashboard operations.

Demo accounts are for evaluation only

Demo accounts are intended for evaluation and testing. For production workloads, upgrade to a full account.

Rate & Concurrency Limits¶

Resource	Demo Limit	Full Account
API requests	10/minute	Configurable
Dashboard messages	10/minute	Unlimited
Concurrent runs	2	Configurable
Concurrent runs per endpoint	1	Configurable

Token & Run Limits¶

Resource	Demo Limit	Full Account
Daily token budget	10,000	Configurable
Tokens per run	4,000	Runtime default
Model calls per run	5	Runtime default
Tool calls per run	10	Runtime default
Wall time per run	30 seconds	Runtime default

Resource Creation Limits¶

Demo accounts have hard limits on resource creation:

Resource	Demo Limit	Full Account
Agents	5	Unlimited
Agent Types	3	Unlimited
Teams	2	3
API Keys	3	Unlimited

Demo Account Errors¶

When you hit a demo limit, you'll receive a 403 Forbidden response:

HTTP/1.1 403 Forbidden

{
  "detail": "Demo account limit reached: maximum 5 agents. Upgrade to a full account for unlimited resources."
}

For rate limits, you'll receive a 429 Too Many Requests response:

HTTP/1.1 429 Too Many Requests

{
  "error": "rate_limit_exceeded",
  "message": "Demo account rate limit: 10 messages/minute via dashboard.",
  "retry_after": 45
}

Special Demo Account Rules¶

Bootstrap API keys cannot be revoked - Demo accounts have a bootstrap API key that's required for demo code resolution. Attempting to revoke it returns 403 Forbidden.
Daily budget is hard-capped - Demo accounts are limited to 10,000 tokens per day. When exceeded, requests fail with a budget exceeded error.
Limits apply to both API and dashboard - Unlike full accounts where dashboard usage may be unrestricted, demo accounts enforce limits on all entry points.

Enterprise Rate Limits¶

Higher limits are available on Enterprise plans:

Plan	Rate Limit
Starter	60/min
Professional	300/min
Enterprise	Custom

Contact sales for Enterprise pricing.

Best Practices¶

1. Implement Retry Logic¶

Always handle 429 responses gracefully:

# Don't: Crash on rate limit
response = requests.post(url, json=payload)
response.raise_for_status()

# Do: Handle gracefully
response = requests.post(url, json=payload)
if response.status_code == 429:
    handle_rate_limit(response)

2. Use Backoff with Jitter¶

Prevent thundering herd:

delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, delay * 0.1)
time.sleep(delay + jitter)

3. Monitor Rate Limit Headers¶

Track your usage proactively:

remaining = response.headers.get("X-RateLimit-Remaining")
if int(remaining) < 10:
    alert_operations_team()

4. Batch When Possible¶

Reduce request count by batching operations where supported.

5. Use Async for Long Tasks¶

Instead of polling, use webhooks for async operations.

Debugging Rate Limits¶

Check Current Usage¶

curl https://api.ag2trust.com/api/v1/usage \
  -H "X-API-Key: cust_your_api_key"

Common Issues¶

Issue	Cause	Solution
Constant 429s	Too many requests	Implement queuing
Burst 429s	Sudden traffic spike	Add rate limiting client-side
Slow reset	Multiple clients sharing limit	Coordinate requests

Next Steps¶

Billing & Pricing - Agent-hour billing and tier limits
Usage & Metrics - Monitor token usage and costs
Session Control - Manage session lifecycle
Error Codes - Complete error reference
Authentication - API key management
Pool Endpoint - Load-balanced messaging