Rate Limiting¶

Rate limiting and concurrency control protect your tools from runaway invocation loops, expensive API hammering, and resource exhaustion. ai-tool-guard provides a sliding-window rate limiter and a concurrency cap that can be configured globally with per-tool overrides.

Overview¶

Rate limiting is handled by the RateLimiter class, which is instantiated internally by the guard engine. You configure limits declaratively through GuardOptions and ToolGuardConfig. The limiter tracks call timestamps and active concurrency counts per tool and enforces them on every acquire call.

When a limit is exceeded, the behaviour depends on the configured strategy: either reject immediately or queue the call until a slot becomes available.

Basic Usage¶

Set global defaults on GuardOptions and override per tool as needed:

import { createToolGuard } from "ai-tool-guard";

const guard = createToolGuard({
  rules: [{ id: "allow-all", toolPatterns: ["*"], verdict: "allow" }],

  // Global defaults applied to every tool.
  defaultRateLimit: {
    maxCalls: 60,
    windowMs: 60_000,   // 60 calls per minute.
    strategy: "reject",
  },
  defaultMaxConcurrency: 5,
});

// This tool gets its own tighter limits.
const wrappedExpensiveTool = guard.guardTool("llmSummarize", llmSummarizeTool, {
  riskLevel: "medium",
  rateLimit: {
    maxCalls: 5,
    windowMs: 60_000,   // 5 calls per minute.
    strategy: "queue",  // Queue excess calls instead of rejecting.
  },
  maxConcurrency: 2,
});

Configuration Options¶

`RateLimitConfig`¶

Field	Type	Default	Description
`maxCalls`	`number`	required	Maximum number of calls allowed within the window.
`windowMs`	`number`	required	Window size in milliseconds.
`strategy`	`"reject" \\| "queue"`	`"reject"`	What to do when the limit is exceeded.

Global Defaults via `GuardOptions`¶

Field	Type	Description
`defaultRateLimit`	`RateLimitConfig`	Applied to every tool that does not specify its own `rateLimit`.
`defaultMaxConcurrency`	`number`	Maximum concurrent executions for any tool without an explicit `maxConcurrency`.

Per-Tool Overrides via `ToolGuardConfig`¶

Field	Type	Description
`rateLimit`	`RateLimitConfig`	Overrides `defaultRateLimit` for this specific tool.
`maxConcurrency`	`number`	Overrides `defaultMaxConcurrency` for this specific tool.

Per-tool configuration always takes precedence over global defaults. A tool with no rate limit configuration and no global defaults has no rate limiting applied.

Strategies¶

"reject" — When the rate limit or concurrency cap is exceeded, acquire returns immediately with allowed: false. The guard engine throws a ToolGuardError with code: "rate-limited" and includes retryAfterMs when available (rate limit case only — not for concurrency rejections). The tool is never executed.

"queue" — When the rate limit or concurrency cap is exceeded, acquire suspends the current call until a slot opens. Calls are released in FIFO order via a per-tool queue. This provides backpressure rather than hard rejection. Use it when occasional latency is preferable to dropped calls.

Queue strategy and timeouts

Queued calls wait indefinitely for a slot. If you use the "queue" strategy, ensure your caller has an appropriate timeout so that a stalled queue does not block your application indefinitely.

`RateLimiter` Class¶

The RateLimiter class is used internally by the guard engine. It is also exported for testing and custom integration scenarios.

`acquire(toolName, config, maxConcurrency?)`¶

Attempt to claim a slot for the given tool:

import { RateLimiter } from "ai-tool-guard/guards";

const limiter = new RateLimiter();

const result = await limiter.acquire("my-tool", {
  maxCalls: 10,
  windowMs: 1000,
  strategy: "reject",
}, /* maxConcurrency */ 3);

if (!result.allowed) {
  console.error(result.reason);
  // result.retryAfterMs is set for rate limit violations (not concurrency).
}

acquire returns RateLimitAcquireResult:

interface RateLimitAcquireResult {
  allowed: boolean;
  reason?: string;        // Human-readable explanation when not allowed.
  retryAfterMs?: number;  // Milliseconds until the oldest call leaves the window.
}

For the "queue" strategy, acquire does not return until a slot is available. The resolved result.allowed is always true in that case.

`release(toolName)`¶

Release a concurrency slot after tool execution completes. The guard engine calls this in a finally block, guaranteeing cleanup even when the tool throws:

// Internal pattern — the guard engine does this automatically.
await limiter.acquire(toolName, config, maxConcurrency);
try {
  result = await tool.execute(args);
} finally {
  limiter.release(toolName);
}

Calling release also wakes the next queued caller (if any) for the "queue" strategy.

`getState(toolName)`¶

Returns the current RateLimitState for a tool, useful for observability and debugging:

const state = limiter.getState("my-tool");
// state.timestamps  — array of call timestamps within the current window
// state.activeCalls — number of currently executing calls

`reset()`¶

Clears all state and rejects all queued callers with an error. Intended for use in tests between test cases:

limiter.reset();

Advanced Examples¶

Protecting an Expensive External API¶

Cap calls to a third-party API that bills per request, and queue excess calls rather than dropping them:

import { createToolGuard } from "ai-tool-guard";

const guard = createToolGuard();

const wrappedOcrTool = guard.guardTool("ocrApi", ocrApiTool, {
  riskLevel: "medium",
  rateLimit: {
    maxCalls: 100,
    windowMs: 60_000,   // 100 calls per minute matches API plan limit.
    strategy: "queue",  // Back-pressure excess calls.
  },
  maxConcurrency: 10,   // No more than 10 in-flight requests at once.
});

With this configuration, calls beyond the 100/min window wait in the queue. As calls complete and their timestamps age out of the window, queued callers are admitted in order.

Preventing Runaway Tool Loops¶

AI agents can enter feedback loops where a tool result causes the model to call the same tool repeatedly. A tight rate limit on high-risk tools breaks these loops before they cause damage:

import { createToolGuard } from "ai-tool-guard";

const guard = createToolGuard({
  rules: [
    {
      id: "require-approval-high",
      toolPatterns: ["db.*"],
      riskLevels: ["high", "critical"],
      verdict: "require-approval",
    },
  ],
  defaultRateLimit: {
    maxCalls: 20,
    windowMs: 60_000,
    strategy: "reject",
  },
});

const wrappedDelete = guard.guardTool("deleteRecord", deleteRecordTool, {
  riskLevel: "critical",
  riskCategories: ["data-delete"],
  rateLimit: {
    maxCalls: 3,
    windowMs: 60_000,   // Maximum 3 delete operations per minute.
    strategy: "reject",
  },
  maxConcurrency: 1,    // Never run more than one delete at a time.
});

When strategy: "reject" fires, the caller receives a ToolGuardError:

try {
  await wrappedDelete.execute(args);
} catch (err) {
  if (err instanceof ToolGuardError && err.code === "rate-limited") {
    console.warn(`Rate limited: ${err.message}`);
  }
}

Observing Limiter State¶

Use getState to expose rate limit metrics to your monitoring system:

import { RateLimiter } from "ai-tool-guard/guards";

// Access the internal limiter (if you hold a reference to it).
setInterval(() => {
  const tools = ["db.query", "email.send", "payment.charge"];
  for (const tool of tools) {
    const state = limiter.getState(tool);
    if (state) {
      metrics.gauge(`tools.${tool}.active_calls`, state.activeCalls);
      metrics.gauge(`tools.${tool}.window_calls`, state.timestamps.length);
    }
  }
}, 5_000);

How It Works¶

Sliding Window Algorithm¶

The limiter uses a sliding window rather than a fixed window. On each acquire call:

Date.now() is sampled as now.
The timestamps array for the tool is pruned: any timestamp where now - timestamp >= windowMs is removed.
If timestamps.length >= maxCalls, the rate limit has been hit.
Otherwise, now is appended to timestamps and the call is admitted.

The sliding window avoids the burst-at-boundary problem of fixed windows. A call made at t=59s does not reset the counter at t=60s; its timestamp ages out of the window at t=119s.

Concurrency Checks¶

Concurrency is tracked separately via state.activeCalls:

After the rate limit check passes, state.activeCalls is compared to maxConcurrency.
If activeCalls >= maxConcurrency, the concurrency cap has been hit.
Otherwise, activeCalls is incremented and the call is admitted.
release(toolName) decrements activeCalls in the finally block of tool execution, guaranteeing the slot is always returned.

Both checks happen within the same acquire loop, so a queued call re-evaluates both conditions when it wakes up.

Queue Mechanics¶

When the strategy is "queue" and a limit is hit, acquire calls enqueue(toolName) which pushes a { resolve, reject } pair onto a per-tool queue and returns a Promise. acquire then awaits that Promise, suspending the caller. When release(toolName) is called, it shifts the first waiter off the queue and calls resolve(), waking the oldest queued caller. That caller re-enters the acquire loop and re-checks limits before being admitted. If reset() is called while callers are queued, all pending Promises are rejected.