Skip to content

Rate Limits

Ag2Trust implements rate limiting to ensure fair usage and system stability.

Rate Limit Overview

Scope Default Limit Configurable
Organization 60 requests/minute Yes (per plan)
Per Agent 50 requests/minute No
Webhook delivery 1/second per customer No

Rate Limit Headers

Every API response includes rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1700308800
Header Description
X-RateLimit-Limit Maximum requests per minute
X-RateLimit-Remaining Requests left in current window
X-RateLimit-Reset Unix timestamp when limit resets

Rate Limit Exceeded (429)

When you exceed the rate limit:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700308800
Retry-After: 15

{
  "error": "Rate limit exceeded",
  "error_code": "RATE_LIMIT_EXCEEDED",
  "details": {
    "limit": 60,
    "reset_at": "2025-01-15T10:31:00Z",
    "retry_after": 15
  }
}

Handling Rate Limits

Basic Retry Logic

import time
import requests

def send_with_retry(url, payload, headers, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)
            continue

        return response

    raise Exception("Max retries exceeded")
async function sendWithRetry(url, payload, headers, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, {
      method: 'POST',
      headers,
      body: JSON.stringify(payload)
    });

    if (response.status === 429) {
      const retryAfter = parseInt(
        response.headers.get('Retry-After') || '5'
      );
      console.log(`Rate limited. Waiting ${retryAfter}s...`);
      await new Promise(r => setTimeout(r, retryAfter * 1000));
      continue;
    }

    return response;
  }

  throw new Error('Max retries exceeded');
}

Exponential Backoff

import time
import random

def exponential_backoff(attempt, base=1, max_delay=60):
    """Calculate delay with jitter."""
    delay = min(base * (2 ** attempt), max_delay)
    jitter = random.uniform(0, delay * 0.1)
    return delay + jitter

def send_with_backoff(url, payload, headers, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 429:
            delay = exponential_backoff(attempt)
            print(f"Rate limited. Backing off {delay:.1f}s...")
            time.sleep(delay)
            continue

        return response

    raise Exception("Max retries exceeded")

Using Libraries

from tenacity import (
    retry,
    retry_if_result,
    wait_exponential,
    stop_after_attempt
)

def is_rate_limited(response):
    return response.status_code == 429

@retry(
    retry=retry_if_result(is_rate_limited),
    wait=wait_exponential(multiplier=1, max=60),
    stop=stop_after_attempt(5)
)
def send_message(url, payload, headers):
    return requests.post(url, json=payload, headers=headers)
import axios from 'axios';
import axiosRetry from 'axios-retry';

const client = axios.create({
  baseURL: 'https://api.ag2trust.com'
});

axiosRetry(client, {
  retries: 3,
  retryCondition: (error) =>
    error.response?.status === 429,
  retryDelay: (retryCount, error) => {
    const retryAfter = error.response?.headers['retry-after'];
    return retryAfter ? retryAfter * 1000 : retryCount * 1000;
  }
});

Proactive Rate Limit Management

Monitor Remaining Requests

class RateLimitAwareClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.remaining = None
        self.reset_at = None

    def send(self, endpoint, payload):
        # Check if we should wait
        if self.remaining is not None and self.remaining < 5:
            wait_time = max(0, self.reset_at - time.time())
            if wait_time > 0:
                print(f"Approaching limit, waiting {wait_time:.0f}s")
                time.sleep(wait_time)

        response = requests.post(
            f"https://api.ag2trust.com{endpoint}",
            json=payload,
            headers={"X-API-Key": self.api_key}
        )

        # Update rate limit info
        self.remaining = int(
            response.headers.get("X-RateLimit-Remaining", 60)
        )
        self.reset_at = int(
            response.headers.get("X-RateLimit-Reset", 0)
        )

        return response

Request Queuing

import asyncio
from collections import deque

class RequestQueue:
    def __init__(self, rate_limit=60):
        self.rate_limit = rate_limit
        self.queue = deque()
        self.tokens = rate_limit
        self.last_refill = time.time()

    async def execute(self, func, *args, **kwargs):
        await self._acquire_token()
        return await func(*args, **kwargs)

    async def _acquire_token(self):
        while True:
            self._refill_tokens()
            if self.tokens > 0:
                self.tokens -= 1
                return
            await asyncio.sleep(0.1)

    def _refill_tokens(self):
        now = time.time()
        elapsed = now - self.last_refill
        refill = int(elapsed * (self.rate_limit / 60))
        if refill > 0:
            self.tokens = min(self.rate_limit, self.tokens + refill)
            self.last_refill = now

Rate Limits by Endpoint

Endpoint Limit Notes
POST /api/v1/ask/* 60/min Per organization
POST /api/v1/agents/*/messages 50/min Per agent
GET /api/v1/agents 120/min Higher for read ops
GET /api/v1/usage 120/min Higher for read ops
PUT /api/v1/webhook 10/min Configuration endpoint
POST /api/v1/webhook/test 10/min Testing endpoint

Agent-Level Rate Limits

Individual agents have their own internal rate limits:

Operation Limit Purpose
Tool calls 5/minute Prevent runaway tools
HTTP requests 3/minute Limit external calls
Web search 3/minute API cost control
Git push 10/hour Prevent spam commits

These limits are per-agent and cannot be changed via API.

Per-Run Cost Controls

Every agent run has hard limits that cannot be exceeded. These limits are fail-closed - when a limit is hit, the run terminates immediately with an error.

Run Limits

Limit Pro Default Enterprise Description
max_model_calls 8 20 Maximum LLM API calls per run
max_total_tokens 8,000 32,000 Maximum tokens (input + output) per run
max_wall_time_seconds 120 300 Maximum execution time
max_tool_calls 20 50 Maximum tool invocations per run
max_tool_output_bytes 100 KB 500 KB Maximum tool output size

How Run Limits Work

┌─────────────────────────────────────────────────────────────────────────┐
│                         AGENT RUN LIFECYCLE                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Request Received    LLM Calls    Tool Calls    Response                │
│       │                 │             │             │                   │
│       ▼                 ▼             ▼             ▼                   │
│   ────●─────────────────●─────────────●─────────────●────────────────   │
│       │                 │             │             │                   │
│   Limits initialized    │             │         Run complete            │
│                         │             │                                 │
│           ┌─────────────┴─────────────┴─────────────┐                   │
│           │  At each step, limits are checked:       │                   │
│           │  - model_calls < max_model_calls?        │                   │
│           │  - total_tokens < max_total_tokens?      │                   │
│           │  - wall_time < max_wall_time_seconds?    │                   │
│           │  - tool_calls < max_tool_calls?          │                   │
│           │                                          │                   │
│           │  If ANY limit exceeded → Run terminates  │                   │
│           └──────────────────────────────────────────┘                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Run Limit Exceeded Error

When a per-run limit is exceeded:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "response": null,
  "error": {
    "error_code": "run_limit_exceeded",
    "message": "Run limit exceeded: max_model_calls (8)",
    "details": {
      "limit_type": "max_model_calls",
      "limit_value": 8,
      "current_value": 8
    }
  },
  "metadata": {
    "tokens_used": 6542,
    "model_calls": 8,
    "wall_time_seconds": 45.2
  }
}

Response Status

Run limit errors return HTTP 200 with an error in the response body. This is because the run executed (partially) successfully before hitting the limit.

Configuring Run Limits

Run limits are configured per tier in your subscription. Contact support to adjust limits for your account.

Daily Token Budget

Ag2Trust enforces a daily token budget per customer to prevent runaway costs. This is a hard cap on total tokens consumed across all runs in a 24-hour period (UTC).

How Daily Budget Works

The daily budget uses atomic Redis-based reservation to ensure you never exceed your budget, even under high concurrency:

  1. Reserve upfront: At run start, tokens are reserved against your daily budget
  2. Execute: The run executes LLM calls using the reserved capacity
  3. Finalize: At run end, unused reserved tokens are released back to your budget
┌─────────────────────────────────────────────────────────────────────────┐
│                       DAILY BUDGET ENFORCEMENT                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Daily Budget: 100,000 tokens                                           │
│                                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                     │
│  │   Run A     │  │   Run B     │  │   Run C     │                     │
│  │ Reserve 8K  │  │ Reserve 8K  │  │ Reserve 8K  │                     │
│  │ Used: 5K    │  │ Used: 7K    │  │ Used: 6K    │                     │
│  │ Release 3K  │  │ Release 1K  │  │ Release 2K  │                     │
│  └─────────────┘  └─────────────┘  └─────────────┘                     │
│                                                                         │
│  Reserved: 24K (atomically checked against budget)                      │
│  Actually used: 18K                                                     │
│  Released: 6K (returned to budget for other runs)                       │
│                                                                         │
│  Remaining budget: 82K (100K - 18K)                                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Daily Budget Limits by Tier

Tier Daily Token Budget Notes
Free 10,000 Hard cap
Starter 50,000 Hard cap
Pro 100,000 Configurable
Enterprise Custom Unlimited available

Budget Exceeded Error

When your daily budget is exhausted:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "response": null,
  "error": {
    "error_code": "daily_budget_exceeded",
    "message": "Daily token budget exhausted (budget=100000, reserved=98500, requested=8000)",
    "details": {
      "daily_budget": 100000,
      "reserved_today": 98500,
      "requested": 8000,
      "remaining": 1500
    }
  }
}

Adaptive Reservation

When your remaining daily budget is low but not exhausted, Ag2Trust uses adaptive reservation:

  • If remaining budget >= 2,000 tokens: The run proceeds with a capped max_tokens
  • If remaining budget < 2,000 tokens: The run is rejected (not enough for a meaningful response)

This allows more runs to succeed when budget is low, while preventing partially complete responses.

Budget Reset

Daily budgets reset at 00:00 UTC each day. The reset is automatic and immediate.

Monitoring Budget

Monitor your token usage via the Usage & Metrics endpoints to avoid hitting budget limits during peak hours.

Demo Account Limits

Demo accounts have restricted limits to prevent abuse and ensure fair resource sharing. These limits cannot be changed and are enforced across all API and dashboard operations.

Demo accounts are for evaluation only

Demo accounts are intended for evaluation and testing. For production workloads, upgrade to a full account.

Rate & Concurrency Limits

Resource Demo Limit Full Account
API requests 10/minute Configurable
Dashboard messages 10/minute Unlimited
Concurrent runs 2 Configurable
Concurrent runs per endpoint 1 Configurable

Token & Run Limits

Resource Demo Limit Full Account
Daily token budget 10,000 Configurable
Tokens per run 4,000 Runtime default
Model calls per run 5 Runtime default
Tool calls per run 10 Runtime default
Wall time per run 30 seconds Runtime default

Resource Creation Limits

Demo accounts have hard limits on resource creation:

Resource Demo Limit Full Account
Agents 5 Unlimited
Agent Types 3 Unlimited
Teams 2 3
API Keys 3 Unlimited

Demo Account Errors

When you hit a demo limit, you'll receive a 403 Forbidden response:

HTTP/1.1 403 Forbidden

{
  "detail": "Demo account limit reached: maximum 5 agents. Upgrade to a full account for unlimited resources."
}

For rate limits, you'll receive a 429 Too Many Requests response:

HTTP/1.1 429 Too Many Requests

{
  "error": "rate_limit_exceeded",
  "message": "Demo account rate limit: 10 messages/minute via dashboard.",
  "retry_after": 45
}

Special Demo Account Rules

  1. Bootstrap API keys cannot be revoked - Demo accounts have a bootstrap API key that's required for demo code resolution. Attempting to revoke it returns 403 Forbidden.

  2. Daily budget is hard-capped - Demo accounts are limited to 10,000 tokens per day. When exceeded, requests fail with a budget exceeded error.

  3. Limits apply to both API and dashboard - Unlike full accounts where dashboard usage may be unrestricted, demo accounts enforce limits on all entry points.

Enterprise Rate Limits

Higher limits are available on Enterprise plans:

Plan Rate Limit
Starter 60/min
Professional 300/min
Enterprise Custom

Contact sales for Enterprise pricing.

Best Practices

1. Implement Retry Logic

Always handle 429 responses gracefully:

# Don't: Crash on rate limit
response = requests.post(url, json=payload)
response.raise_for_status()

# Do: Handle gracefully
response = requests.post(url, json=payload)
if response.status_code == 429:
    handle_rate_limit(response)

2. Use Backoff with Jitter

Prevent thundering herd:

delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, delay * 0.1)
time.sleep(delay + jitter)

3. Monitor Rate Limit Headers

Track your usage proactively:

remaining = response.headers.get("X-RateLimit-Remaining")
if int(remaining) < 10:
    alert_operations_team()

4. Batch When Possible

Reduce request count by batching operations where supported.

5. Use Async for Long Tasks

Instead of polling, use webhooks for async operations.

Debugging Rate Limits

Check Current Usage

curl https://api.ag2trust.com/api/v1/usage \
  -H "X-API-Key: cust_your_api_key"

Common Issues

Issue Cause Solution
Constant 429s Too many requests Implement queuing
Burst 429s Sudden traffic spike Add rate limiting client-side
Slow reset Multiple clients sharing limit Coordinate requests

Next Steps