Creating Agent Pools¶
This guide walks through setting up load-balanced agent pools for production use.
Overview¶
Agent pools distribute traffic across multiple agent instances:
┌─────────────┐
Request ────────►│ Load │
│ Balancer │
└──────┬──────┘
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Agent 1 │ │ Agent 2 │ │ Agent 3 │
└─────────┘ └─────────┘ └─────────┘
Prerequisites¶
- AG2Trust account with agents enabled
- At least one LLM provider configured
- Basic understanding of Agent Types
Step 1: Create an Agent Type¶
First, create an agent type that defines the pool configuration:
- Go to Agent Types in the Dashboard
- Click Create Agent Type
- Configure the template:
| Field | Example Value |
|---|---|
| Name | Customer Support |
| Slug | customer-support |
| Endpoint Slug | support |
| System Prompt | (see below) |
| Provider | OpenAI Production |
| Model | gpt-4o |
Example System Prompt¶
You are a customer support agent for Acme Corp.
## Responsibilities
- Answer product questions
- Help with account issues
- Troubleshoot common problems
## Guidelines
- Be friendly and professional
- Keep responses concise (2-3 paragraphs max)
- If you can't help, say so and offer to escalate
## Information Available
- Product catalog
- Account status
- Common troubleshooting steps
Endpoint Slug
The endpoint slug (support in this example) becomes the API path:
Step 2: Create a Team¶
Teams manage groups of agents and their deployments:
- Go to Teams in the Dashboard
- Click Create Team
- Enter team details:
- Name: Customer Support Team
- Slug: support-team
Step 3: Add a Deployment¶
Deployments define how many agents of each type run in a team:
- Open the team you just created
- Click Add Deployment
- Configure:
- Agent Type: Select "Customer Support"
-
Desired Replicas: 3
-
Click Deploy
AG2Trust will automatically: - Create 3 agent instances - Name them support-team-customer-support-1, -2, -3 - Start all agents - Register them for pool routing
Step 4: Verify Pool Status¶
Check that your pool is ready:
- View the team details page
- Verify all agents show "Running" status
- Check the "Actual Replicas" matches "Desired Replicas"
Deployment: Customer Support
├── Desired: 3
├── Actual: 3
└── Status: Healthy
├── support-team-customer-support-1 (running)
├── support-team-customer-support-2 (running)
└── support-team-customer-support-3 (running)
Step 5: Test the Pool¶
Via Dashboard¶
- Navigate to Agents
- Click any pool agent
- Use the chat interface to test
Via API¶
curl -X POST https://agents.ag2trust.com/api/v1/ask/support \
-H "X-API-Key: cust_your_api_key" \
-H "Content-Type: application/json" \
-d '{"content": "Hello, can you help me?"}'
Response:
{
"thread_id": "thread_abc123",
"agent_id": "uuid-of-agent-that-handled-request",
"content": "Hello! I'd be happy to help you...",
"timestamp": "2025-01-15T10:30:00Z"
}
Scaling the Pool¶
Scale Up¶
To handle more traffic:
- Go to team details
- Find the deployment
- Click Scale
- Increase "Desired Replicas"
- Click Apply
New agents start automatically within 60 seconds.
Scale Down¶
To reduce costs during low traffic:
- Go to team details
- Find the deployment
- Click Scale
- Decrease "Desired Replicas"
- Click Apply
Excess agents stop gracefully (completing current requests).
Traffic Routing¶
How Routing Works¶
- Sticky routing: Same
thread_id→ tries same agent - Availability: Routes to agents with queue < 3
- Overflow: Falls back to agents with queue < 10
- Rejection: Returns 503 if all agents overloaded
Thread Continuity¶
Use thread_id for conversation context:
# First message
response1 = send_message("Hello")
thread_id = response1["thread_id"]
# Follow-up (same thread, same context)
response2 = send_message("Tell me more", thread_id=thread_id)
The pool maintains conversation history regardless of which agent handles each message.
Monitoring¶
Pool Health Dashboard¶
Monitor your pool in real-time:
- Agents online: Running vs desired count
- Queue depth: Messages waiting per agent
- Response times: Average latency
- Error rates: Failed requests
Alerts¶
Set up alerts for:
- Agents going offline
- Queue depth exceeding threshold
- Error rate spikes
- Response time degradation
Best Practices¶
1. Right-Size Your Pool¶
| Traffic Level | Recommended Replicas |
|---|---|
| Development | 1 |
| Low (< 10 req/min) | 1-2 |
| Medium (10-50 req/min) | 2-3 |
| High (50-200 req/min) | 3-5 |
| Very High (> 200 req/min) | 5+ |
2. Test Before Production¶
- Create a staging pool
- Run load tests
- Verify routing works correctly
- Test failover scenarios
3. Monitor Response Quality¶
Regularly review:
- Agent responses for accuracy
- User satisfaction metrics
- Edge case handling
4. Plan for Failures¶
- Always have > 1 replica in production
- Set up health monitoring
- Configure alerting
- Document escalation procedures
Advanced Configuration¶
Custom Load Balancing¶
For custom routing logic, use the Direct Endpoint with your own load balancer.
Geographic Distribution¶
For global traffic, create separate pools per region with different teams.
A/B Testing¶
Run multiple agent types in the same team to test different prompts:
Team: Support
├── Deployment: support-v1 (2 replicas) - Original prompt
└── Deployment: support-v2 (1 replica) - New prompt
Route a percentage of traffic to each using custom logic.
Troubleshooting¶
Pool returns 503¶
- Check agent status in Dashboard
- Verify agents are running
- Check queue depths
- Scale up if needed
Inconsistent responses¶
- Verify all agents use the same agent type
- Check system prompts are consistent
- Review temperature settings
Slow response times¶
- Check LLM provider status
- Review agent logs for bottlenecks
- Consider faster models (e.g.,
gpt-4o-mini) - Scale up pool size
Next Steps¶
- Webhook Integration - Async response handling
- MCP Integration - Add external tools
- Rate Limits - Understand API limits