Skip to content

Creating Agent Pools

This guide walks through setting up load-balanced agent pools for production use.

Overview

Agent pools distribute traffic across multiple agent instances:

                    ┌─────────────┐
   Request ────────►│ Load        │
                    │ Balancer    │
                    └──────┬──────┘
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         ┌─────────┐ ┌─────────┐ ┌─────────┐
         │ Agent 1 │ │ Agent 2 │ │ Agent 3 │
         └─────────┘ └─────────┘ └─────────┘

Prerequisites

  • AG2Trust account with agents enabled
  • At least one LLM provider configured
  • Basic understanding of Agent Types

Step 1: Create an Agent Type

First, create an agent type that defines the pool configuration:

  1. Go to Agent Types in the Dashboard
  2. Click Create Agent Type
  3. Configure the template:
Field Example Value
Name Customer Support
Slug customer-support
Endpoint Slug support
System Prompt (see below)
Provider OpenAI Production
Model gpt-4o

Example System Prompt

You are a customer support agent for Acme Corp.

## Responsibilities
- Answer product questions
- Help with account issues
- Troubleshoot common problems

## Guidelines
- Be friendly and professional
- Keep responses concise (2-3 paragraphs max)
- If you can't help, say so and offer to escalate

## Information Available
- Product catalog
- Account status
- Common troubleshooting steps

Endpoint Slug

The endpoint slug (support in this example) becomes the API path:

POST /api/v1/ask/support

Step 2: Create a Team

Teams manage groups of agents and their deployments:

  1. Go to Teams in the Dashboard
  2. Click Create Team
  3. Enter team details:
  4. Name: Customer Support Team
  5. Slug: support-team

Step 3: Add a Deployment

Deployments define how many agents of each type run in a team:

  1. Open the team you just created
  2. Click Add Deployment
  3. Configure:
  4. Agent Type: Select "Customer Support"
  5. Desired Replicas: 3

  6. Click Deploy

AG2Trust will automatically: - Create 3 agent instances - Name them support-team-customer-support-1, -2, -3 - Start all agents - Register them for pool routing

Step 4: Verify Pool Status

Check that your pool is ready:

  1. View the team details page
  2. Verify all agents show "Running" status
  3. Check the "Actual Replicas" matches "Desired Replicas"
Deployment: Customer Support
├── Desired: 3
├── Actual: 3
└── Status: Healthy
    ├── support-team-customer-support-1 (running)
    ├── support-team-customer-support-2 (running)
    └── support-team-customer-support-3 (running)

Step 5: Test the Pool

Via Dashboard

  1. Navigate to Agents
  2. Click any pool agent
  3. Use the chat interface to test

Via API

curl -X POST https://agents.ag2trust.com/api/v1/ask/support \
  -H "X-API-Key: cust_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"content": "Hello, can you help me?"}'

Response:

{
  "thread_id": "thread_abc123",
  "agent_id": "uuid-of-agent-that-handled-request",
  "content": "Hello! I'd be happy to help you...",
  "timestamp": "2025-01-15T10:30:00Z"
}

Scaling the Pool

Scale Up

To handle more traffic:

  1. Go to team details
  2. Find the deployment
  3. Click Scale
  4. Increase "Desired Replicas"
  5. Click Apply

New agents start automatically within 60 seconds.

Scale Down

To reduce costs during low traffic:

  1. Go to team details
  2. Find the deployment
  3. Click Scale
  4. Decrease "Desired Replicas"
  5. Click Apply

Excess agents stop gracefully (completing current requests).

Traffic Routing

How Routing Works

  1. Sticky routing: Same thread_id → tries same agent
  2. Availability: Routes to agents with queue < 3
  3. Overflow: Falls back to agents with queue < 10
  4. Rejection: Returns 503 if all agents overloaded

Thread Continuity

Use thread_id for conversation context:

# First message
response1 = send_message("Hello")
thread_id = response1["thread_id"]

# Follow-up (same thread, same context)
response2 = send_message("Tell me more", thread_id=thread_id)

The pool maintains conversation history regardless of which agent handles each message.

Monitoring

Pool Health Dashboard

Monitor your pool in real-time:

  • Agents online: Running vs desired count
  • Queue depth: Messages waiting per agent
  • Response times: Average latency
  • Error rates: Failed requests

Alerts

Set up alerts for:

  • Agents going offline
  • Queue depth exceeding threshold
  • Error rate spikes
  • Response time degradation

Best Practices

1. Right-Size Your Pool

Traffic Level Recommended Replicas
Development 1
Low (< 10 req/min) 1-2
Medium (10-50 req/min) 2-3
High (50-200 req/min) 3-5
Very High (> 200 req/min) 5+

2. Test Before Production

  1. Create a staging pool
  2. Run load tests
  3. Verify routing works correctly
  4. Test failover scenarios

3. Monitor Response Quality

Regularly review:

  • Agent responses for accuracy
  • User satisfaction metrics
  • Edge case handling

4. Plan for Failures

  • Always have > 1 replica in production
  • Set up health monitoring
  • Configure alerting
  • Document escalation procedures

Advanced Configuration

Custom Load Balancing

For custom routing logic, use the Direct Endpoint with your own load balancer.

Geographic Distribution

For global traffic, create separate pools per region with different teams.

A/B Testing

Run multiple agent types in the same team to test different prompts:

Team: Support
├── Deployment: support-v1 (2 replicas) - Original prompt
└── Deployment: support-v2 (1 replica) - New prompt

Route a percentage of traffic to each using custom logic.

Troubleshooting

Pool returns 503

  1. Check agent status in Dashboard
  2. Verify agents are running
  3. Check queue depths
  4. Scale up if needed

Inconsistent responses

  1. Verify all agents use the same agent type
  2. Check system prompts are consistent
  3. Review temperature settings

Slow response times

  1. Check LLM provider status
  2. Review agent logs for bottlenecks
  3. Consider faster models (e.g., gpt-4o-mini)
  4. Scale up pool size

Next Steps