Rate Limits¶
Ag2Trust implements rate limiting to ensure fair usage and system stability.
Rate Limit Overview¶
| Scope | Default Limit | Configurable |
|---|---|---|
| Organization | 60 requests/minute | Yes (per plan) |
| Per Agent | 50 requests/minute | No |
| Webhook delivery | 1/second per customer | No |
Rate Limit Headers¶
Every API response includes rate limit information:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per minute |
X-RateLimit-Remaining | Requests left in current window |
X-RateLimit-Reset | Unix timestamp when limit resets |
Rate Limit Exceeded (429)¶
When you exceed the rate limit:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700308800
Retry-After: 15
{
"error": "Rate limit exceeded",
"error_code": "RATE_LIMIT_EXCEEDED",
"details": {
"limit": 60,
"reset_at": "2025-01-15T10:31:00Z",
"retry_after": 15
}
}
Handling Rate Limits¶
Basic Retry Logic¶
import time
import requests
def send_with_retry(url, payload, headers, max_retries=3):
for attempt in range(max_retries):
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
continue
return response
raise Exception("Max retries exceeded")
async function sendWithRetry(url, payload, headers, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(payload)
});
if (response.status === 429) {
const retryAfter = parseInt(
response.headers.get('Retry-After') || '5'
);
console.log(`Rate limited. Waiting ${retryAfter}s...`);
await new Promise(r => setTimeout(r, retryAfter * 1000));
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}
Exponential Backoff¶
import time
import random
def exponential_backoff(attempt, base=1, max_delay=60):
"""Calculate delay with jitter."""
delay = min(base * (2 ** attempt), max_delay)
jitter = random.uniform(0, delay * 0.1)
return delay + jitter
def send_with_backoff(url, payload, headers, max_retries=5):
for attempt in range(max_retries):
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 429:
delay = exponential_backoff(attempt)
print(f"Rate limited. Backing off {delay:.1f}s...")
time.sleep(delay)
continue
return response
raise Exception("Max retries exceeded")
Using Libraries¶
from tenacity import (
retry,
retry_if_result,
wait_exponential,
stop_after_attempt
)
def is_rate_limited(response):
return response.status_code == 429
@retry(
retry=retry_if_result(is_rate_limited),
wait=wait_exponential(multiplier=1, max=60),
stop=stop_after_attempt(5)
)
def send_message(url, payload, headers):
return requests.post(url, json=payload, headers=headers)
import axios from 'axios';
import axiosRetry from 'axios-retry';
const client = axios.create({
baseURL: 'https://api.ag2trust.com'
});
axiosRetry(client, {
retries: 3,
retryCondition: (error) =>
error.response?.status === 429,
retryDelay: (retryCount, error) => {
const retryAfter = error.response?.headers['retry-after'];
return retryAfter ? retryAfter * 1000 : retryCount * 1000;
}
});
Proactive Rate Limit Management¶
Monitor Remaining Requests¶
class RateLimitAwareClient:
def __init__(self, api_key):
self.api_key = api_key
self.remaining = None
self.reset_at = None
def send(self, endpoint, payload):
# Check if we should wait
if self.remaining is not None and self.remaining < 5:
wait_time = max(0, self.reset_at - time.time())
if wait_time > 0:
print(f"Approaching limit, waiting {wait_time:.0f}s")
time.sleep(wait_time)
response = requests.post(
f"https://api.ag2trust.com{endpoint}",
json=payload,
headers={"X-API-Key": self.api_key}
)
# Update rate limit info
self.remaining = int(
response.headers.get("X-RateLimit-Remaining", 60)
)
self.reset_at = int(
response.headers.get("X-RateLimit-Reset", 0)
)
return response
Request Queuing¶
import asyncio
from collections import deque
class RequestQueue:
def __init__(self, rate_limit=60):
self.rate_limit = rate_limit
self.queue = deque()
self.tokens = rate_limit
self.last_refill = time.time()
async def execute(self, func, *args, **kwargs):
await self._acquire_token()
return await func(*args, **kwargs)
async def _acquire_token(self):
while True:
self._refill_tokens()
if self.tokens > 0:
self.tokens -= 1
return
await asyncio.sleep(0.1)
def _refill_tokens(self):
now = time.time()
elapsed = now - self.last_refill
refill = int(elapsed * (self.rate_limit / 60))
if refill > 0:
self.tokens = min(self.rate_limit, self.tokens + refill)
self.last_refill = now
Rate Limits by Endpoint¶
| Endpoint | Limit | Notes |
|---|---|---|
POST /api/v1/ask/* | 60/min | Per organization |
POST /api/v1/agents/*/messages | 50/min | Per agent |
GET /api/v1/agents | 120/min | Higher for read ops |
GET /api/v1/usage | 120/min | Higher for read ops |
PUT /api/v1/webhook | 10/min | Configuration endpoint |
POST /api/v1/webhook/test | 10/min | Testing endpoint |
Agent-Level Rate Limits¶
Individual agents have their own internal rate limits:
| Operation | Limit | Purpose |
|---|---|---|
| Tool calls | 5/minute | Prevent runaway tools |
| HTTP requests | 3/minute | Limit external calls |
| Web search | 3/minute | API cost control |
| Git push | 10/hour | Prevent spam commits |
These limits are per-agent and cannot be changed via API.
Per-Run Cost Controls¶
Every agent run has hard limits that cannot be exceeded. These limits are fail-closed - when a limit is hit, the run terminates immediately with an error.
Run Limits¶
| Limit | Pro Default | Enterprise | Description |
|---|---|---|---|
max_model_calls | 8 | 20 | Maximum LLM API calls per run |
max_total_tokens | 8,000 | 32,000 | Maximum tokens (input + output) per run |
max_wall_time_seconds | 120 | 300 | Maximum execution time |
max_tool_calls | 20 | 50 | Maximum tool invocations per run |
max_tool_output_bytes | 100 KB | 500 KB | Maximum tool output size |
How Run Limits Work¶
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT RUN LIFECYCLE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Request Received LLM Calls Tool Calls Response │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ────●─────────────────●─────────────●─────────────●──────────────── │
│ │ │ │ │ │
│ Limits initialized │ │ Run complete │
│ │ │ │
│ ┌─────────────┴─────────────┴─────────────┐ │
│ │ At each step, limits are checked: │ │
│ │ - model_calls < max_model_calls? │ │
│ │ - total_tokens < max_total_tokens? │ │
│ │ - wall_time < max_wall_time_seconds? │ │
│ │ - tool_calls < max_tool_calls? │ │
│ │ │ │
│ │ If ANY limit exceeded → Run terminates │ │
│ └──────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Run Limit Exceeded Error¶
When a per-run limit is exceeded:
HTTP/1.1 200 OK
Content-Type: application/json
{
"response": null,
"error": {
"error_code": "run_limit_exceeded",
"message": "Run limit exceeded: max_model_calls (8)",
"details": {
"limit_type": "max_model_calls",
"limit_value": 8,
"current_value": 8
}
},
"metadata": {
"tokens_used": 6542,
"model_calls": 8,
"wall_time_seconds": 45.2
}
}
Response Status
Run limit errors return HTTP 200 with an error in the response body. This is because the run executed (partially) successfully before hitting the limit.
Configuring Run Limits¶
Run limits are configured per tier in your subscription. Contact support to adjust limits for your account.
Daily Token Budget¶
Ag2Trust enforces a daily token budget per customer to prevent runaway costs. This is a hard cap on total tokens consumed across all runs in a 24-hour period (UTC).
How Daily Budget Works¶
The daily budget uses atomic Redis-based reservation to ensure you never exceed your budget, even under high concurrency:
- Reserve upfront: At run start, tokens are reserved against your daily budget
- Execute: The run executes LLM calls using the reserved capacity
- Finalize: At run end, unused reserved tokens are released back to your budget
┌─────────────────────────────────────────────────────────────────────────┐
│ DAILY BUDGET ENFORCEMENT │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Daily Budget: 100,000 tokens │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Run A │ │ Run B │ │ Run C │ │
│ │ Reserve 8K │ │ Reserve 8K │ │ Reserve 8K │ │
│ │ Used: 5K │ │ Used: 7K │ │ Used: 6K │ │
│ │ Release 3K │ │ Release 1K │ │ Release 2K │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Reserved: 24K (atomically checked against budget) │
│ Actually used: 18K │
│ Released: 6K (returned to budget for other runs) │
│ │
│ Remaining budget: 82K (100K - 18K) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Daily Budget Limits by Tier¶
| Tier | Daily Token Budget | Notes |
|---|---|---|
| Free | 10,000 | Hard cap |
| Starter | 50,000 | Hard cap |
| Pro | 100,000 | Configurable |
| Enterprise | Custom | Unlimited available |
Budget Exceeded Error¶
When your daily budget is exhausted:
HTTP/1.1 200 OK
Content-Type: application/json
{
"response": null,
"error": {
"error_code": "daily_budget_exceeded",
"message": "Daily token budget exhausted (budget=100000, reserved=98500, requested=8000)",
"details": {
"daily_budget": 100000,
"reserved_today": 98500,
"requested": 8000,
"remaining": 1500
}
}
}
Adaptive Reservation¶
When your remaining daily budget is low but not exhausted, Ag2Trust uses adaptive reservation:
- If remaining budget >= 2,000 tokens: The run proceeds with a capped
max_tokens - If remaining budget < 2,000 tokens: The run is rejected (not enough for a meaningful response)
This allows more runs to succeed when budget is low, while preventing partially complete responses.
Budget Reset¶
Daily budgets reset at 00:00 UTC each day. The reset is automatic and immediate.
Monitoring Budget
Monitor your token usage via the Usage & Metrics endpoints to avoid hitting budget limits during peak hours.
Demo Account Limits¶
Demo accounts have restricted limits to prevent abuse and ensure fair resource sharing. These limits cannot be changed and are enforced across all API and dashboard operations.
Demo accounts are for evaluation only
Demo accounts are intended for evaluation and testing. For production workloads, upgrade to a full account.
Rate & Concurrency Limits¶
| Resource | Demo Limit | Full Account |
|---|---|---|
| API requests | 10/minute | Configurable |
| Dashboard messages | 10/minute | Unlimited |
| Concurrent runs | 2 | Configurable |
| Concurrent runs per endpoint | 1 | Configurable |
Token & Run Limits¶
| Resource | Demo Limit | Full Account |
|---|---|---|
| Daily token budget | 10,000 | Configurable |
| Tokens per run | 4,000 | Runtime default |
| Model calls per run | 5 | Runtime default |
| Tool calls per run | 10 | Runtime default |
| Wall time per run | 30 seconds | Runtime default |
Resource Creation Limits¶
Demo accounts have hard limits on resource creation:
| Resource | Demo Limit | Full Account |
|---|---|---|
| Agents | 5 | Unlimited |
| Agent Types | 3 | Unlimited |
| Teams | 2 | 3 |
| API Keys | 3 | Unlimited |
Demo Account Errors¶
When you hit a demo limit, you'll receive a 403 Forbidden response:
HTTP/1.1 403 Forbidden
{
"detail": "Demo account limit reached: maximum 5 agents. Upgrade to a full account for unlimited resources."
}
For rate limits, you'll receive a 429 Too Many Requests response:
HTTP/1.1 429 Too Many Requests
{
"error": "rate_limit_exceeded",
"message": "Demo account rate limit: 10 messages/minute via dashboard.",
"retry_after": 45
}
Special Demo Account Rules¶
-
Bootstrap API keys cannot be revoked - Demo accounts have a bootstrap API key that's required for demo code resolution. Attempting to revoke it returns
403 Forbidden. -
Daily budget is hard-capped - Demo accounts are limited to 10,000 tokens per day. When exceeded, requests fail with a budget exceeded error.
-
Limits apply to both API and dashboard - Unlike full accounts where dashboard usage may be unrestricted, demo accounts enforce limits on all entry points.
Enterprise Rate Limits¶
Higher limits are available on Enterprise plans:
| Plan | Rate Limit |
|---|---|
| Starter | 60/min |
| Professional | 300/min |
| Enterprise | Custom |
Contact sales for Enterprise pricing.
Best Practices¶
1. Implement Retry Logic¶
Always handle 429 responses gracefully:
# Don't: Crash on rate limit
response = requests.post(url, json=payload)
response.raise_for_status()
# Do: Handle gracefully
response = requests.post(url, json=payload)
if response.status_code == 429:
handle_rate_limit(response)
2. Use Backoff with Jitter¶
Prevent thundering herd:
delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, delay * 0.1)
time.sleep(delay + jitter)
3. Monitor Rate Limit Headers¶
Track your usage proactively:
remaining = response.headers.get("X-RateLimit-Remaining")
if int(remaining) < 10:
alert_operations_team()
4. Batch When Possible¶
Reduce request count by batching operations where supported.
5. Use Async for Long Tasks¶
Instead of polling, use webhooks for async operations.
Debugging Rate Limits¶
Check Current Usage¶
Common Issues¶
| Issue | Cause | Solution |
|---|---|---|
| Constant 429s | Too many requests | Implement queuing |
| Burst 429s | Sudden traffic spike | Add rate limiting client-side |
| Slow reset | Multiple clients sharing limit | Coordinate requests |
Next Steps¶
- Billing & Pricing - Agent-hour billing and tier limits
- Usage & Metrics - Monitor token usage and costs
- Session Control - Manage session lifecycle
- Error Codes - Complete error reference
- Authentication - API key management
- Pool Endpoint - Load-balanced messaging