Creating Agent Pools¶

This guide walks through setting up load-balanced agent pools for production use.

Overview¶

Agent pools distribute traffic across multiple agent instances:

                    ┌─────────────┐
   Request ────────►│ Load        │
                    │ Balancer    │
                    └──────┬──────┘
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         ┌─────────┐ ┌─────────┐ ┌─────────┐
         │ Agent 1 │ │ Agent 2 │ │ Agent 3 │
         └─────────┘ └─────────┘ └─────────┘

Prerequisites¶

Ag2Trust account with agents enabled
At least one LLM provider configured
Basic understanding of Agent Types

Step 1: Create an Agent Type¶

First, create an agent type that defines the pool configuration:

Go to Agent Types in the Dashboard
Click Create Agent Type
Configure the template:

Field	Example Value
Name	Customer Support
Slug	customer-support
Endpoint Slug	support
System Prompt	(see below)
Provider	OpenAI Production
Model	gpt-4o

Example System Prompt¶

You are a customer support agent for Acme Corp.

## Responsibilities
- Answer product questions
- Help with account issues
- Troubleshoot common problems

## Guidelines
- Be friendly and professional
- Keep responses concise (2-3 paragraphs max)
- If you can't help, say so and offer to escalate

## Information Available
- Product catalog
- Account status
- Common troubleshooting steps

Endpoint Slug

The endpoint slug (support in this example) becomes the API path:

POST /api/v1/ask/support

Step 2: Create a Team¶

Teams manage groups of agents and their deployments:

Go to Teams in the Dashboard
Click Create Team
Enter team details:
Name: Customer Support Team
Slug: support-team

Step 3: Add a Deployment¶

Deployments define how many agents of each type run in a team:

Open the team you just created
Click Add Deployment
Configure:
Agent Type: Select "Customer Support"
Desired Replicas: 3
Click Deploy

Ag2Trust will automatically: - Create 3 agent instances - Name them support-team-customer-support-1, -2, -3 - Start all agents - Register them for pool routing

Step 4: Verify Pool Status¶

Check that your pool is ready:

View the team details page
Verify all agents show "Running" status
Check the "Actual Replicas" matches "Desired Replicas"

Deployment: Customer Support
├── Desired: 3
├── Actual: 3
└── Status: Healthy
    ├── support-team-customer-support-1 (running)
    ├── support-team-customer-support-2 (running)
    └── support-team-customer-support-3 (running)

Step 5: Test the Pool¶

Via Dashboard¶

Navigate to Agents
Click any pool agent
Use the chat interface to test

Via API¶

curl -X POST https://api.ag2trust.com/api/v1/ask/support \
  -H "X-API-Key: cust_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"content": "Hello, can you help me?"}'

Response:

{
  "thread_id": "thread_abc123",
  "agent_id": "uuid-of-agent-that-handled-request",
  "content": "Hello! I'd be happy to help you...",
  "timestamp": "2025-01-15T10:30:00Z"
}

Scaling the Pool¶

Scale Up¶

To handle more traffic:

Go to team details
Find the deployment
Click Scale
Increase "Desired Replicas"
Click Apply

New agents start automatically within 60 seconds.

Scale Down¶

To reduce costs during low traffic:

Go to team details
Find the deployment
Click Scale
Decrease "Desired Replicas"
Click Apply

Excess agents stop gracefully (completing current requests).

Traffic Routing¶

How Routing Works¶

Sticky routing: Same thread_id → tries same agent
Availability: Routes to agents with queue < 3
Overflow: Falls back to agents with queue < 10
Rejection: Returns 503 if all agents overloaded

Thread Continuity¶

Use thread_id for conversation context:

# First message
response1 = send_message("Hello")
thread_id = response1["thread_id"]

# Follow-up (same thread, same context)
response2 = send_message("Tell me more", thread_id=thread_id)

The pool maintains conversation history regardless of which agent handles each message.

Monitoring¶

Pool Health Dashboard¶

Monitor your pool in real-time:

Agents online: Running vs desired count
Queue depth: Messages waiting per agent
Response times: Average latency
Error rates: Failed requests

Alerts¶

Set up alerts for:

Agents going offline
Queue depth exceeding threshold
Error rate spikes
Response time degradation

Best Practices¶

1. Right-Size Your Pool¶

Traffic Level	Recommended Replicas
Development	1
Low (< 10 req/min)	1-2
Medium (10-50 req/min)	2-3
High (50-200 req/min)	3-5
Very High (> 200 req/min)	5+

2. Test Before Production¶

Create a staging pool
Run load tests
Verify routing works correctly
Test failover scenarios

3. Monitor Response Quality¶

Regularly review:

Agent responses for accuracy
User satisfaction metrics
Edge case handling

4. Plan for Failures¶

Always have > 1 replica in production
Set up health monitoring
Configure alerting
Document escalation procedures

Advanced Configuration¶

Custom Load Balancing¶

For custom routing logic, use the Direct Endpoint with your own load balancer.

Geographic Distribution¶

For global traffic, create separate pools per region with different teams.

A/B Testing¶

Run multiple agent types in the same team to test different prompts:

Team: Support
├── Deployment: support-v1 (2 replicas) - Original prompt
└── Deployment: support-v2 (1 replica) - New prompt

Route a percentage of traffic to each using custom logic.

Troubleshooting¶

Pool returns 503¶

Check agent status in Dashboard
Verify agents are running
Check queue depths
Scale up if needed

Inconsistent responses¶

Verify all agents use the same agent type
Check system prompts are consistent
Review temperature settings

Slow response times¶

Check LLM provider status
Review agent logs for bottlenecks
Consider faster models (e.g., gpt-4o-mini)
Scale up pool size

Next Steps¶

Webhook Integration - Async response handling
MCP Integration - Add external tools
Rate Limits - Understand API limits