Agent Pool Endpoint¶
The pool endpoint provides load-balanced access to agents with automatic conversation thread management.
Endpoint¶
Overview¶
The pool endpoint:
- Routes messages to available agents automatically
- Maintains conversation context across messages
- Provides sticky routing for conversation continuity
- Handles load balancing across replicas
Request¶
Headers¶
| Header | Required | Description |
|---|---|---|
X-API-Key | Yes | Your API key |
Content-Type | Yes | application/json |
Idempotency-Key | No | Unique key (max 256 chars) to prevent duplicate agent runs on retries. See Idempotency below. |
Path Parameters¶
| Parameter | Type | Description |
|---|---|---|
endpoint_slug | string | The agent type's endpoint slug |
Body¶
| Field | Type | Required | Description |
|---|---|---|---|
content | string | Yes | Message content (max 10,000 chars) |
thread_id | string | No | Thread ID for conversation continuity |
context_variables | object | No | Key-value pairs injected into agent context. Keys must be UPPERCASE_SNAKE_CASE. Max 20 variables, 4KB total. |
Response¶
Success (200 OK)¶
{
"thread_id": "thread_abc123xyz",
"response_id": "550e8400-e29b-41d4-a716-446655440000",
"agent_id": 42,
"content": "Hello! I'd be happy to help you. What can I assist you with today?",
"timestamp": "2025-01-15T10:30:00Z",
"citations": [
{
"source_id": "source:1",
"document_name": "Product FAQ",
"download_url": "https://api.ag2trust.com/api/v1/knowledge/documents/123/download?thread_id=thread_abc123xyz&token=..."
}
],
"files": [
{
"ref_id": "file:1",
"filename": "report.pdf",
"link_text": "Download Report",
"download_url": "https://api.ag2trust.com/api/v1/uploads/456/download?thread_id=thread_abc123xyz&token=..."
}
],
"requested_variables": ["ORDER_ID"],
"draft": null,
"pending_action": null,
"suggested_actions": [
{
"name": "more_details",
"prompt": "Tell me more about this",
"description": "More Details"
}
]
}
| Field | Type | Description |
|---|---|---|
thread_id | string | Thread ID (use for follow-up messages) |
response_id | string | UUID for submitting feedback (24h TTL) |
agent_id | integer | ID of the agent that handled the request |
content | string | Agent's response text |
timestamp | string | ISO 8601 timestamp |
citations | array|null | Source citations from knowledge documents. Each has source_id, document_name, and optional download_url (signed, 24h TTL). |
files | array|null | Agent-generated files. Each has ref_id, filename, link_text, and optional download_url (signed, 1h TTL). |
requested_variables | array|null | Unresolved {{VAR}} names the agent referenced but were not provided in context_variables. |
draft | object|null | Structured draft (email, SMS, or document) created by the agent. Contains type and content. |
pending_action | object|null | An action requiring user confirmation (e.g., sending an email). Contains action_id, tool, preview, and expires_at. See Session Management for confirming/canceling. |
suggested_actions | array|null | Clickable follow-up buttons (max 3). Each has name, prompt, description. |
Suggested Actions¶
Suggested actions are clickable follow-up buttons that appear in the response. They help guide users to common next steps.
Structure¶
{
"suggested_actions": [
{
"name": "send_email",
"prompt": "Send the email now",
"description": "Send Email"
},
{
"name": "edit_draft",
"prompt": "Let me edit the draft first",
"description": "Edit Draft"
}
]
}
| Field | Description |
|---|---|
name | Unique identifier for the action |
prompt | Text to send as the next message when clicked |
description | Human-readable button label |
How to Use¶
When a user clicks a suggested action button, send action.prompt as the next message:
# User clicks "Send Email" button
next_message = action["prompt"] # "Send the email now"
response = requests.post(
url,
headers=headers,
json={
"thread_id": thread_id,
"content": next_message
}
)
Sources¶
Suggested actions can come from multiple sources:
| Source | Example | When |
|---|---|---|
| Backend state | "Send Email" | When a draft email exists |
| Agent tools | "View Details" | Tool registers default actions |
| Agent decision | "Make it shorter" | Agent determines contextually relevant actions |
Actions are deduplicated by name (max 3 returned).
Examples¶
First Message (New Thread)¶
import requests
response = requests.post(
"https://api.ag2trust.com/api/v1/ask/support",
headers={
"X-API-Key": "cust_your_api_key",
"Content-Type": "application/json"
},
json={"content": "Hello, I need help with my order"}
)
data = response.json()
print(f"Thread ID: {data['thread_id']}")
print(f"Response: {data['content']}")
const response = await fetch(
'https://api.ag2trust.com/api/v1/ask/support',
{
method: 'POST',
headers: {
'X-API-Key': 'cust_your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: 'Hello, I need help with my order'
})
}
);
const data = await response.json();
console.log(`Thread ID: ${data.thread_id}`);
console.log(`Response: ${data.content}`);
Follow-up Message (Same Thread)¶
Thread Management¶
How Threads Work¶
- First message: No
thread_id→ new thread created - Subsequent messages: Include
thread_id→ continues conversation - Context maintained: Agent receives conversation history
Thread Behavior¶
| Aspect | Value |
|---|---|
| Thread TTL | 15 minutes (sliding window) |
| Max messages | 100 per thread |
| Context passed | Last N messages fitting context window |
Thread Expiration
Threads expire 15 minutes after the last activity. This aligns with session billing to avoid unexpected charges. If a thread expires, start a new conversation with a fresh thread_id.
Sticky Routing¶
When you include a thread_id:
- System checks if previous agent is available
- If available (queue < 3), routes to same agent
- If unavailable, routes to least busy agent
- Context is passed regardless of which agent handles it
Load Balancing¶
Routing Logic¶
Priority 1: Sticky (same agent, queue < 3)
↓
Priority 2: Available (any agent, queue < 3)
↓
Priority 3: Overflow (any agent, queue < 10)
↓
Reject: 503 Service Unavailable
Example Scenarios¶
| Scenario | Result |
|---|---|
| 3 agents, all idle | Routes to any |
| 3 agents, 1 busy | Routes to idle ones |
| All agents at queue 5 | Routes to least busy |
| All agents at queue 10+ | 503 error |
Setup Requirements¶
Before using the pool endpoint:
- Create an Agent Type with an
endpoint_slug - Create agents of that type (or deploy via team)
- Start the agents
Agent Type: Customer Support
└── endpoint_slug: "support"
└── Agents: support-1, support-2, support-3
API URL: POST /api/v1/ask/support
Error Responses¶
404 Not Found¶
The endpoint slug doesn't exist for your organization.
503 Service Unavailable¶
All agents are busy or offline.
Headers:
Handling:
response = requests.post(url, ...)
if response.status_code == 503:
retry_after = int(response.headers.get("Retry-After", 5))
time.sleep(retry_after)
# Retry the request
504 Gateway Timeout¶
Agent didn't respond within 60 seconds.
Idempotency¶
The Idempotency-Key header prevents duplicate agent runs when retrying requests. This is important because each agent run consumes tokens and counts toward your daily budget.
How It Works¶
- Generate a unique key (e.g., UUID) for each logical request
- Include it as the
Idempotency-Keyheader - If you retry with the same key, the platform detects the duplicate and avoids running the agent again
Behavior by State¶
| State | Response | Description |
|---|---|---|
| New key | Normal response | Agent processes the request |
| In progress | 409 Conflict | Another request with this key is still running |
| Completed | 200 with confirmation | Agent already ran; returns cached metadata (not the original content) |
| Failed | Normal response | Previous attempt failed; retry is allowed |
Keys expire after 24 hours.
Response Content
When a completed request is replayed, the response includes the thread_id and agent_id but not the original content. The purpose of idempotency here is to prevent double token spending, not to cache responses.
Example¶
import uuid
import requests
idempotency_key = str(uuid.uuid4())
response = requests.post(
"https://api.ag2trust.com/api/v1/ask/support",
headers={
"X-API-Key": "cust_your_api_key",
"Idempotency-Key": idempotency_key,
},
json={"content": "Hello, I need help"},
timeout=65,
)
# Safe to retry with the same idempotency_key on network errors
Best Practices¶
1. Always Store thread_id¶
# Store thread_id for conversation continuity
thread_id = None
def send_message(content):
global thread_id
payload = {"content": content}
if thread_id:
payload["thread_id"] = thread_id
response = requests.post(url, json=payload, headers=headers)
data = response.json()
thread_id = data["thread_id"] # Save for next message
return data["content"]
2. Handle 503 with Retry¶
Use an Idempotency-Key when retrying to avoid duplicate agent runs and double token charges:
import uuid
import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retries = Retry(
total=3,
backoff_factor=1,
status_forcelist=[503]
)
session.mount('https://', HTTPAdapter(max_retries=retries))
# Use the same idempotency key across retries
idempotency_key = str(uuid.uuid4())
response = session.post(
url,
json=payload,
headers={**headers, "Idempotency-Key": idempotency_key},
)
3. Set Reasonable Timeouts¶
response = requests.post(
url,
json=payload,
headers=headers,
timeout=65 # Slightly longer than server timeout
)
4. Monitor Thread Expiration¶
Threads expire after 15 minutes of inactivity:
# Start new conversation if thread might be expired
last_message_time = get_last_message_time()
if time.time() - last_message_time > 840: # 14 minutes (buffer before 15 min TTL)
thread_id = None # Start fresh
Team Ask Endpoint¶
For multi-agent teams, use the team ask endpoint:
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
content | string | Yes | Message content (max 50,000 chars) |
thread_id | string | No | Thread ID for conversation continuity |
The team endpoint has a longer timeout (120s vs 60s) to accommodate multi-agent collaboration. The response schema matches TeamAskResponse:
{
"thread_id": "thread_abc123xyz",
"team_slug": "engineering",
"agent_id": 42,
"content": "Here's the analysis from our team...",
"timestamp": "2025-01-15T10:30:00Z"
}
Next Steps¶
- Session Control - Manage session lifecycle and closing
- Rate Limits - Per-run cost controls and daily budgets
- Webhooks - Async response delivery
- Error Codes - Complete error reference