Rate Limits¶
AG2Trust implements rate limiting to ensure fair usage and system stability.
Rate Limit Overview¶
| Scope | Default Limit | Configurable |
|---|---|---|
| Organization | 60 requests/minute | Yes (per plan) |
| Per Agent | 50 requests/minute | No |
| Webhook delivery | 1/second per customer | No |
Rate Limit Headers¶
Every API response includes rate limit information:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per minute |
X-RateLimit-Remaining | Requests left in current window |
X-RateLimit-Reset | Unix timestamp when limit resets |
Rate Limit Exceeded (429)¶
When you exceed the rate limit:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700308800
Retry-After: 15
{
"error": "Rate limit exceeded",
"error_code": "RATE_LIMIT_EXCEEDED",
"details": {
"limit": 60,
"reset_at": "2025-01-15T10:31:00Z",
"retry_after": 15
}
}
Handling Rate Limits¶
Basic Retry Logic¶
import time
import requests
def send_with_retry(url, payload, headers, max_retries=3):
for attempt in range(max_retries):
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
continue
return response
raise Exception("Max retries exceeded")
async function sendWithRetry(url, payload, headers, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, {
method: 'POST',
headers,
body: JSON.stringify(payload)
});
if (response.status === 429) {
const retryAfter = parseInt(
response.headers.get('Retry-After') || '5'
);
console.log(`Rate limited. Waiting ${retryAfter}s...`);
await new Promise(r => setTimeout(r, retryAfter * 1000));
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}
Exponential Backoff¶
import time
import random
def exponential_backoff(attempt, base=1, max_delay=60):
"""Calculate delay with jitter."""
delay = min(base * (2 ** attempt), max_delay)
jitter = random.uniform(0, delay * 0.1)
return delay + jitter
def send_with_backoff(url, payload, headers, max_retries=5):
for attempt in range(max_retries):
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 429:
delay = exponential_backoff(attempt)
print(f"Rate limited. Backing off {delay:.1f}s...")
time.sleep(delay)
continue
return response
raise Exception("Max retries exceeded")
Using Libraries¶
from tenacity import (
retry,
retry_if_result,
wait_exponential,
stop_after_attempt
)
def is_rate_limited(response):
return response.status_code == 429
@retry(
retry=retry_if_result(is_rate_limited),
wait=wait_exponential(multiplier=1, max=60),
stop=stop_after_attempt(5)
)
def send_message(url, payload, headers):
return requests.post(url, json=payload, headers=headers)
import axios from 'axios';
import axiosRetry from 'axios-retry';
const client = axios.create({
baseURL: 'https://agents.ag2trust.com'
});
axiosRetry(client, {
retries: 3,
retryCondition: (error) =>
error.response?.status === 429,
retryDelay: (retryCount, error) => {
const retryAfter = error.response?.headers['retry-after'];
return retryAfter ? retryAfter * 1000 : retryCount * 1000;
}
});
Proactive Rate Limit Management¶
Monitor Remaining Requests¶
class RateLimitAwareClient:
def __init__(self, api_key):
self.api_key = api_key
self.remaining = None
self.reset_at = None
def send(self, endpoint, payload):
# Check if we should wait
if self.remaining is not None and self.remaining < 5:
wait_time = max(0, self.reset_at - time.time())
if wait_time > 0:
print(f"Approaching limit, waiting {wait_time:.0f}s")
time.sleep(wait_time)
response = requests.post(
f"https://agents.ag2trust.com{endpoint}",
json=payload,
headers={"X-API-Key": self.api_key}
)
# Update rate limit info
self.remaining = int(
response.headers.get("X-RateLimit-Remaining", 60)
)
self.reset_at = int(
response.headers.get("X-RateLimit-Reset", 0)
)
return response
Request Queuing¶
import asyncio
from collections import deque
class RequestQueue:
def __init__(self, rate_limit=60):
self.rate_limit = rate_limit
self.queue = deque()
self.tokens = rate_limit
self.last_refill = time.time()
async def execute(self, func, *args, **kwargs):
await self._acquire_token()
return await func(*args, **kwargs)
async def _acquire_token(self):
while True:
self._refill_tokens()
if self.tokens > 0:
self.tokens -= 1
return
await asyncio.sleep(0.1)
def _refill_tokens(self):
now = time.time()
elapsed = now - self.last_refill
refill = int(elapsed * (self.rate_limit / 60))
if refill > 0:
self.tokens = min(self.rate_limit, self.tokens + refill)
self.last_refill = now
Rate Limits by Endpoint¶
| Endpoint | Limit | Notes |
|---|---|---|
POST /api/v1/ask/* | 60/min | Per organization |
POST /api/v1/agents/*/messages | 50/min | Per agent |
GET /api/v1/agents | 120/min | Higher for read ops |
GET /api/v1/usage | 120/min | Higher for read ops |
PUT /api/v1/webhook | 10/min | Configuration endpoint |
POST /api/v1/webhook/test | 10/min | Testing endpoint |
Agent-Level Rate Limits¶
Individual agents have their own internal rate limits:
| Operation | Limit | Purpose |
|---|---|---|
| Tool calls | 5/minute | Prevent runaway tools |
| HTTP requests | 3/minute | Limit external calls |
| Web search | 3/minute | API cost control |
| Git push | 10/hour | Prevent spam commits |
These limits are per-agent and cannot be changed via API.
Enterprise Rate Limits¶
Higher limits are available on Enterprise plans:
| Plan | Rate Limit |
|---|---|
| Starter | 60/min |
| Professional | 300/min |
| Enterprise | Custom |
Contact sales for Enterprise pricing.
Best Practices¶
1. Implement Retry Logic¶
Always handle 429 responses gracefully:
# Don't: Crash on rate limit
response = requests.post(url, json=payload)
response.raise_for_status()
# Do: Handle gracefully
response = requests.post(url, json=payload)
if response.status_code == 429:
handle_rate_limit(response)
2. Use Backoff with Jitter¶
Prevent thundering herd:
delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, delay * 0.1)
time.sleep(delay + jitter)
3. Monitor Rate Limit Headers¶
Track your usage proactively:
remaining = response.headers.get("X-RateLimit-Remaining")
if int(remaining) < 10:
alert_operations_team()
4. Batch When Possible¶
Reduce request count by batching operations where supported.
5. Use Async for Long Tasks¶
Instead of polling, use webhooks for async operations.
Debugging Rate Limits¶
Check Current Usage¶
Common Issues¶
| Issue | Cause | Solution |
|---|---|---|
| Constant 429s | Too many requests | Implement queuing |
| Burst 429s | Sudden traffic spike | Add rate limiting client-side |
| Slow reset | Multiple clients sharing limit | Coordinate requests |
Next Steps¶
- Error Codes - Complete error reference
- Authentication - API key management
- Pool Endpoint - Load-balanced messaging