Skip to content

Agent Pool Endpoint

The pool endpoint provides load-balanced access to agents with automatic conversation thread management.

Endpoint

POST /api/v1/ask/{endpoint_slug}

Overview

The pool endpoint:

  • Routes messages to available agents automatically
  • Maintains conversation context across messages
  • Provides sticky routing for conversation continuity
  • Handles load balancing across replicas

Request

Headers

Header Required Description
X-API-Key Yes Your API key
Content-Type Yes application/json
Idempotency-Key No Unique key (max 256 chars) to prevent duplicate agent runs on retries. See Idempotency below.

Path Parameters

Parameter Type Description
endpoint_slug string The agent type's endpoint slug

Body

{
  "content": "Your message here",
  "thread_id": "optional-thread-id"
}
Field Type Required Description
content string Yes Message content (max 10,000 chars)
thread_id string No Thread ID for conversation continuity
context_variables object No Key-value pairs injected into agent context. Keys must be UPPERCASE_SNAKE_CASE. Max 20 variables, 4KB total.

Response

Success (200 OK)

{
  "thread_id": "thread_abc123xyz",
  "response_id": "550e8400-e29b-41d4-a716-446655440000",
  "agent_id": 42,
  "content": "Hello! I'd be happy to help you. What can I assist you with today?",
  "timestamp": "2025-01-15T10:30:00Z",
  "citations": [
    {
      "source_id": "source:1",
      "document_name": "Product FAQ",
      "download_url": "https://api.ag2trust.com/api/v1/knowledge/documents/123/download?thread_id=thread_abc123xyz&token=..."
    }
  ],
  "files": [
    {
      "ref_id": "file:1",
      "filename": "report.pdf",
      "link_text": "Download Report",
      "download_url": "https://api.ag2trust.com/api/v1/uploads/456/download?thread_id=thread_abc123xyz&token=..."
    }
  ],
  "requested_variables": ["ORDER_ID"],
  "draft": null,
  "pending_action": null,
  "suggested_actions": [
    {
      "name": "more_details",
      "prompt": "Tell me more about this",
      "description": "More Details"
    }
  ]
}
Field Type Description
thread_id string Thread ID (use for follow-up messages)
response_id string UUID for submitting feedback (24h TTL)
agent_id integer ID of the agent that handled the request
content string Agent's response text
timestamp string ISO 8601 timestamp
citations array|null Source citations from knowledge documents. Each has source_id, document_name, and optional download_url (signed, 24h TTL).
files array|null Agent-generated files. Each has ref_id, filename, link_text, and optional download_url (signed, 1h TTL).
requested_variables array|null Unresolved {{VAR}} names the agent referenced but were not provided in context_variables.
draft object|null Structured draft (email, SMS, or document) created by the agent. Contains type and content.
pending_action object|null An action requiring user confirmation (e.g., sending an email). Contains action_id, tool, preview, and expires_at. See Session Management for confirming/canceling.
suggested_actions array|null Clickable follow-up buttons (max 3). Each has name, prompt, description.

Suggested Actions

Suggested actions are clickable follow-up buttons that appear in the response. They help guide users to common next steps.

Structure

{
  "suggested_actions": [
    {
      "name": "send_email",
      "prompt": "Send the email now",
      "description": "Send Email"
    },
    {
      "name": "edit_draft",
      "prompt": "Let me edit the draft first",
      "description": "Edit Draft"
    }
  ]
}
Field Description
name Unique identifier for the action
prompt Text to send as the next message when clicked
description Human-readable button label

How to Use

When a user clicks a suggested action button, send action.prompt as the next message:

# User clicks "Send Email" button
next_message = action["prompt"]  # "Send the email now"

response = requests.post(
    url,
    headers=headers,
    json={
        "thread_id": thread_id,
        "content": next_message
    }
)

Sources

Suggested actions can come from multiple sources:

Source Example When
Backend state "Send Email" When a draft email exists
Agent tools "View Details" Tool registers default actions
Agent decision "Make it shorter" Agent determines contextually relevant actions

Actions are deduplicated by name (max 3 returned).

Examples

First Message (New Thread)

curl -X POST https://api.ag2trust.com/api/v1/ask/support \
  -H "X-API-Key: cust_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"content": "Hello, I need help with my order"}'
import requests

response = requests.post(
    "https://api.ag2trust.com/api/v1/ask/support",
    headers={
        "X-API-Key": "cust_your_api_key",
        "Content-Type": "application/json"
    },
    json={"content": "Hello, I need help with my order"}
)

data = response.json()
print(f"Thread ID: {data['thread_id']}")
print(f"Response: {data['content']}")
const response = await fetch(
  'https://api.ag2trust.com/api/v1/ask/support',
  {
    method: 'POST',
    headers: {
      'X-API-Key': 'cust_your_api_key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      content: 'Hello, I need help with my order'
    })
  }
);

const data = await response.json();
console.log(`Thread ID: ${data.thread_id}`);
console.log(`Response: ${data.content}`);

Follow-up Message (Same Thread)

curl -X POST https://api.ag2trust.com/api/v1/ask/support \
  -H "X-API-Key: cust_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "thread_id": "thread_abc123xyz",
    "content": "Order number is 12345"
  }'
# Continue the conversation using thread_id
response = requests.post(
    "https://api.ag2trust.com/api/v1/ask/support",
    headers={
        "X-API-Key": "cust_your_api_key",
        "Content-Type": "application/json"
    },
    json={
        "thread_id": "thread_abc123xyz",
        "content": "Order number is 12345"
    }
)
const response = await fetch(
  'https://api.ag2trust.com/api/v1/ask/support',
  {
    method: 'POST',
    headers: {
      'X-API-Key': 'cust_your_api_key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      thread_id: 'thread_abc123xyz',
      content: 'Order number is 12345'
    })
  }
);

Thread Management

How Threads Work

  1. First message: No thread_id → new thread created
  2. Subsequent messages: Include thread_id → continues conversation
  3. Context maintained: Agent receives conversation history

Thread Behavior

Aspect Value
Thread TTL 15 minutes (sliding window)
Max messages 100 per thread
Context passed Last N messages fitting context window

Thread Expiration

Threads expire 15 minutes after the last activity. This aligns with session billing to avoid unexpected charges. If a thread expires, start a new conversation with a fresh thread_id.

Sticky Routing

When you include a thread_id:

  1. System checks if previous agent is available
  2. If available (queue < 3), routes to same agent
  3. If unavailable, routes to least busy agent
  4. Context is passed regardless of which agent handles it

Load Balancing

Routing Logic

Priority 1: Sticky (same agent, queue < 3)
Priority 2: Available (any agent, queue < 3)
Priority 3: Overflow (any agent, queue < 10)
Reject: 503 Service Unavailable

Example Scenarios

Scenario Result
3 agents, all idle Routes to any
3 agents, 1 busy Routes to idle ones
All agents at queue 5 Routes to least busy
All agents at queue 10+ 503 error

Setup Requirements

Before using the pool endpoint:

  1. Create an Agent Type with an endpoint_slug
  2. Create agents of that type (or deploy via team)
  3. Start the agents
Agent Type: Customer Support
  └── endpoint_slug: "support"
  └── Agents: support-1, support-2, support-3

API URL: POST /api/v1/ask/support

Error Responses

404 Not Found

{
  "error": "Endpoint not found",
  "error_code": "ENDPOINT_NOT_FOUND"
}

The endpoint slug doesn't exist for your organization.

503 Service Unavailable

{
  "error": "No agents available",
  "error_code": "NO_AGENTS_AVAILABLE"
}

All agents are busy or offline.

Headers:

Retry-After: 5

Handling:

response = requests.post(url, ...)
if response.status_code == 503:
    retry_after = int(response.headers.get("Retry-After", 5))
    time.sleep(retry_after)
    # Retry the request

504 Gateway Timeout

{
  "error": "Agent did not respond in time",
  "error_code": "AGENT_TIMEOUT"
}

Agent didn't respond within 60 seconds.

Idempotency

The Idempotency-Key header prevents duplicate agent runs when retrying requests. This is important because each agent run consumes tokens and counts toward your daily budget.

How It Works

  1. Generate a unique key (e.g., UUID) for each logical request
  2. Include it as the Idempotency-Key header
  3. If you retry with the same key, the platform detects the duplicate and avoids running the agent again

Behavior by State

State Response Description
New key Normal response Agent processes the request
In progress 409 Conflict Another request with this key is still running
Completed 200 with confirmation Agent already ran; returns cached metadata (not the original content)
Failed Normal response Previous attempt failed; retry is allowed

Keys expire after 24 hours.

Response Content

When a completed request is replayed, the response includes the thread_id and agent_id but not the original content. The purpose of idempotency here is to prevent double token spending, not to cache responses.

Example

curl -X POST https://api.ag2trust.com/api/v1/ask/support \
  -H "X-API-Key: cust_your_api_key" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000" \
  -d '{"content": "Hello, I need help"}'
import uuid
import requests

idempotency_key = str(uuid.uuid4())

response = requests.post(
    "https://api.ag2trust.com/api/v1/ask/support",
    headers={
        "X-API-Key": "cust_your_api_key",
        "Idempotency-Key": idempotency_key,
    },
    json={"content": "Hello, I need help"},
    timeout=65,
)

# Safe to retry with the same idempotency_key on network errors

Best Practices

1. Always Store thread_id

# Store thread_id for conversation continuity
thread_id = None

def send_message(content):
    global thread_id
    payload = {"content": content}
    if thread_id:
        payload["thread_id"] = thread_id

    response = requests.post(url, json=payload, headers=headers)
    data = response.json()

    thread_id = data["thread_id"]  # Save for next message
    return data["content"]

2. Handle 503 with Retry

Use an Idempotency-Key when retrying to avoid duplicate agent runs and double token charges:

import uuid
import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retries = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[503]
)
session.mount('https://', HTTPAdapter(max_retries=retries))

# Use the same idempotency key across retries
idempotency_key = str(uuid.uuid4())
response = session.post(
    url,
    json=payload,
    headers={**headers, "Idempotency-Key": idempotency_key},
)

3. Set Reasonable Timeouts

response = requests.post(
    url,
    json=payload,
    headers=headers,
    timeout=65  # Slightly longer than server timeout
)

4. Monitor Thread Expiration

Threads expire after 15 minutes of inactivity:

# Start new conversation if thread might be expired
last_message_time = get_last_message_time()
if time.time() - last_message_time > 840:  # 14 minutes (buffer before 15 min TTL)
    thread_id = None  # Start fresh

Team Ask Endpoint

For multi-agent teams, use the team ask endpoint:

POST /api/v1/teams/{team_slug}/ask

Request Body

Field Type Required Description
content string Yes Message content (max 50,000 chars)
thread_id string No Thread ID for conversation continuity

The team endpoint has a longer timeout (120s vs 60s) to accommodate multi-agent collaboration. The response schema matches TeamAskResponse:

{
  "thread_id": "thread_abc123xyz",
  "team_slug": "engineering",
  "agent_id": 42,
  "content": "Here's the analysis from our team...",
  "timestamp": "2025-01-15T10:30:00Z"
}

Next Steps