Rate Limiter

Pre-requisite: Convex#

You'll need an existing Convex project to use the component. Convex is a hosted backend platform, including a database, serverless functions, and a ton more you can learn about here.

Run npm create convex or follow any of the quickstarts to set one up.

Installation#

Install the component package:

npm install @convex-dev/rate-limiter

Create a convex.config.ts file in your app's convex/ folder and install the component by calling use:

// convex/convex.config.ts
import { defineApp } from "convex/server";
import rateLimiter from "@convex-dev/rate-limiter/convex.config";

const app = defineApp();
app.use(rateLimiter);

export default app;

Define your rate limits:#

import { RateLimiter, MINUTE, HOUR } from "@convex-dev/rate-limiter";
import { components } from "./_generated/api";

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // One global / singleton rate limit, using a "fixed window" algorithm.
  freeTrialSignUp: { kind: "fixed window", rate: 100, period: HOUR },
  // A per-user limit, allowing one every ~6 seconds.
  // Allows up to 3 in quick succession if they haven't sent many recently.
  sendMessage: { kind: "token bucket", rate: 10, period: MINUTE, capacity: 3 },
  failedLogins: { kind: "token bucket", rate: 10, period: HOUR },
  // Use sharding to increase throughput without compromising on correctness.
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE, shards: 10 },
  llmRequests: { kind: "fixed window", rate: 1000, period: MINUTE, shards: 10 },
});

You can safely generate multiple instances if you want to define different rates in separate places, provided the keys don't overlap.
The units for period are milliseconds. MINUTE above is 60000.

Strategies:#

The token bucket approach provides guarantees for overall consumption via the rate per period at which tokens are added, while also allowing unused tokens to accumulate (like "rollover" minutes) up to some capacity value. So if you could normally send 10 per minute, with a capacity of 20, then every two minutes you could send 20, or if in the last two minutes you only sent 5, you can send 15 now.

The fixed window approach differs in that the tokens are granted all at once, every period milliseconds. It similarly allows accumulating "rollover" tokens up to a capacity (defaults to the rate for both rate limit strategies). You can specify a custom start time if e.g. you want the period to reset at a specific time of day. By default it will be random to help space out requests that are retrying.

Usage#

Using a simple global rate limit:#

const { ok, retryAfter } = await rateLimiter.limit(ctx, "freeTrialSignUp");

ok is whether it successfully consumed the resource
retryAfter is when it would have succeeded in the future.

Note: If you have many clients using the retryAfter to decide when to retry, defend against a thundering herd by adding some jitter. Or use the reserve functionality discussed below.

Per-user rate limit:#

Use key to use a rate limit specific to some user / team / session ID / etc.

const status = await rateLimiter.limit(ctx, "sendMessage", { key: userId });

Consume a custom count#

By default, each call to limit counts as one unit. Pass count to customize.

// Consume multiple in one request to prevent rate limits on an LLM API.
const status = await rateLimiter.limit(ctx, "llmTokens", { count: tokens });

Throw automatically#

By default it will return { ok, retryAfter }. To have it throw automatically when the limit is exceeded, use throws. It throws a ConvexError with RateLimitError data (data: {kind, name, retryAfter}) instead of returning when ok is false.

// Automatically throw an error if the rate limit is hit
await rateLimiter.limit(ctx, "failedLogins", { key: userId, throws: true });

Check a rate limit without consuming it#

const status = await rateLimiter.check(ctx, "failedLogins", { key: userId });

Reset a rate limit#

// Reset a rate limit on successful login
await rateLimiter.reset(ctx, "failedLogins", { key: userId });

Define a rate limit inline / dynamically#

// Use a one-off rate limit config (when not named on initialization)
const config = { kind: "fixed window", rate: 1, period: SECOND };
const status = await rateLimiter.limit(ctx, "oneOffName", { config });

Using the React hook#

You can use the React hook to check the rate limit in your browser code.

First, define the server API to get the rate limit value:

// In convex/example.ts
export const { getRateLimit, getServerTime } = rateLimiter.hookAPI(
  "sendMessage",
  // Optionally provide a key function to get the key for the rate limit
  { key: async (ctx) => getUserId(ctx) }
);

Then, use the React hook to check the rate limit:

function App() {
  const { status: { ok, retryAt }, check } = useRateLimit(api.example.getRateLimit, {
    // [recommended] Allows the hook to sync the browser and server clocks
    getServerTimeMutation: getServerTime,
    // [optional] The number of tokens to wait on
    count: 1,
  });

  // If you want to check at specific times and get the concrete value:
  const { value, ts, config, ok, retryAt } = check(Date.now(), count);

Fetching the current value directly#

You can fetch the current value of a rate limit directly, if you want to know the concrete value and timestamp it was last updated.

const { config, value, ts } = await rateLimiter.getValue(ctx, "sendMessage", { key: userId });

And you can use calculateRateLimit to calculate the value at a given timestamp:

import { calculateRateLimit } from "@convex-dev/rate-limiter";

const { config, value, ts } = calculateRateLimit({ value, ts }, config, Date.now(), count || 0);

Scaling rate limiting with shards#

When many requests are happening at once, they can all be trying to modify the same values in the database. Because Convex provides strong transactions, they will never overwrite each other, so you don't have to worry about the rate limiter succeeding more often than it should. However, when there is high contention for these values, it causes optimistic concurrency control conflicts. Convex automatically retries these a number of times with backoff, but it's still best to avoid them.

Not to worry! To provide high throughput, we can use a technique called "sharding" where we break up the total capacity into individual buckets, or "shards". When we go to use some of that capacity, we check a random shard.¹While sometimes we'll get unlucky and get rate limited when there was capacity elsewhere, we'll never voilate the rate limit's upper bound.

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Use sharding to increase throughput without compromising on correctness.
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE, shards: 10 },
  llmRequests: { kind: "fixed window", rate: 1000, period: MINUTE, shards: 10 },
});

Here we're using 10 shards to handle 1,000 QPM. If you want some rough math to guess at how many shards to add, take the max queries per second you expect and divide by two. It's also useful for each shard to have five (ideally ten) or more capacity. In this case, we have ten (rate / shards) and don't expect normal traffic to exceed ~20 QPS.

Tip: If you want a rate like { rate: 100, period: SECOND } and you are flexible in the overall period, then you can shard this by increasing the rate and period proportionally to get enough shards and capacity per shard: { shards: 50, rate: 250, period: 2.5 * SECOND } or even better: { shards: 50, rate: 1000, period: 10 * SECOND }.

Power of two#

We're actually going one step further and checking two shards and using the one with more capacity, to keep them relatively balanced, based on the power of two technique. We will also combine the capacity of the two shards if neither has enough on their own.

Reserving capacity:#

You can also allow it to reserve capacity to avoid starvation on larger requests.

When you reserve capacity ahead of time, the contract is that you can run your operation at the specified time (via the retryAfter field), at which point you don't have to re-check the rate limit. Your capacity has been "ear-marked".

With this, you can queue up many operations and they will be run at spaced-out intervals, maximizing the utilization of the rate limit.

Details in the Stack post.

const myAction = internalAction({
  args: {
    //...
    skipCheck: v.optional(v.boolean()),
  },
  handler: async (ctx, args) => {
    if (!args.skipCheck) {
      // Reserve future capacity instead of just failing now
      const status = await rateLimiter.limit(ctx, "llmRequests", {
        reserve: true,
      });
      if (status.retryAfter) {
        return ctx.scheduler.runAfter(
          status.retryAfter,
          internal.foo.myAction,
          {
            // When we run in the future, we can skip the rate limit check,
            // since we've just reserved that capacity.
            skipCheck: true,
          }
        );
      }
    }
    // do the operation
  },
});

Adding jitter#

When too many users show up at once, it can cause network congestion, database contention, and consume other shared resources at an unnecessarily high rate. Instead we can return a random time within the next period to retry. Hopefully this is infrequent. This technique is referred to as adding "jitter."

A simple implementation could look like:

const retryAfter = status.retryAfter + Math.random() * period;

For the fixed window, we also introduce randomness by picking the start time of the window (from which all subsequent windows are based) randomly if config.start wasn't provided. This helps from all clients flooding requests at midnight and paging you.

More resources#

Check out a full example here.

See this article for more information on usage and advanced patterns, for example:

How the different rate limiting strategies work under the hood.
Using multiple rate limits in a single transaction.
Rate limiting anonymous users.

Back to Components

Rate Limiter

get-convex/rate-limiter

View repo

View package

Pre-requisite: Convex#

You'll need an existing Convex project to use the component. Convex is a hosted backend platform, including a database, serverless functions, and a ton more you can learn about here.

Run npm create convex or follow any of the quickstarts to set one up.

Installation#

Install the component package:

npm install @convex-dev/rate-limiter

Create a convex.config.ts file in your app's convex/ folder and install the component by calling use:

// convex/convex.config.ts
import { defineApp } from "convex/server";
import rateLimiter from "@convex-dev/rate-limiter/convex.config";

const app = defineApp();
app.use(rateLimiter);

export default app;

Define your rate limits:#

import { RateLimiter, MINUTE, HOUR } from "@convex-dev/rate-limiter";
import { components } from "./_generated/api";

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // One global / singleton rate limit, using a "fixed window" algorithm.
  freeTrialSignUp: { kind: "fixed window", rate: 100, period: HOUR },
  // A per-user limit, allowing one every ~6 seconds.
  // Allows up to 3 in quick succession if they haven't sent many recently.
  sendMessage: { kind: "token bucket", rate: 10, period: MINUTE, capacity: 3 },
  failedLogins: { kind: "token bucket", rate: 10, period: HOUR },
  // Use sharding to increase throughput without compromising on correctness.
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE, shards: 10 },
  llmRequests: { kind: "fixed window", rate: 1000, period: MINUTE, shards: 10 },
});

You can safely generate multiple instances if you want to define different rates in separate places, provided the keys don't overlap.
The units for period are milliseconds. MINUTE above is 60000.

Strategies:#

Usage#

Using a simple global rate limit:#

const { ok, retryAfter } = await rateLimiter.limit(ctx, "freeTrialSignUp");

ok is whether it successfully consumed the resource
retryAfter is when it would have succeeded in the future.

Note: If you have many clients using the retryAfter to decide when to retry, defend against a thundering herd by adding some jitter. Or use the reserve functionality discussed below.

Per-user rate limit:#

Use key to use a rate limit specific to some user / team / session ID / etc.

const status = await rateLimiter.limit(ctx, "sendMessage", { key: userId });

Consume a custom count#

By default, each call to limit counts as one unit. Pass count to customize.

// Consume multiple in one request to prevent rate limits on an LLM API.
const status = await rateLimiter.limit(ctx, "llmTokens", { count: tokens });

Throw automatically#

// Automatically throw an error if the rate limit is hit
await rateLimiter.limit(ctx, "failedLogins", { key: userId, throws: true });

Check a rate limit without consuming it#

const status = await rateLimiter.check(ctx, "failedLogins", { key: userId });

Reset a rate limit#

// Reset a rate limit on successful login
await rateLimiter.reset(ctx, "failedLogins", { key: userId });

Define a rate limit inline / dynamically#

// Use a one-off rate limit config (when not named on initialization)
const config = { kind: "fixed window", rate: 1, period: SECOND };
const status = await rateLimiter.limit(ctx, "oneOffName", { config });

Using the React hook#

You can use the React hook to check the rate limit in your browser code.

First, define the server API to get the rate limit value:

// In convex/example.ts
export const { getRateLimit, getServerTime } = rateLimiter.hookAPI(
  "sendMessage",
  // Optionally provide a key function to get the key for the rate limit
  { key: async (ctx) => getUserId(ctx) }
);

Then, use the React hook to check the rate limit:

function App() {
  const { status: { ok, retryAt }, check } = useRateLimit(api.example.getRateLimit, {
    // [recommended] Allows the hook to sync the browser and server clocks
    getServerTimeMutation: getServerTime,
    // [optional] The number of tokens to wait on
    count: 1,
  });

  // If you want to check at specific times and get the concrete value:
  const { value, ts, config, ok, retryAt } = check(Date.now(), count);

Fetching the current value directly#

You can fetch the current value of a rate limit directly, if you want to know the concrete value and timestamp it was last updated.

const { config, value, ts } = await rateLimiter.getValue(ctx, "sendMessage", { key: userId });

And you can use calculateRateLimit to calculate the value at a given timestamp:

import { calculateRateLimit } from "@convex-dev/rate-limiter";

const { config, value, ts } = calculateRateLimit({ value, ts }, config, Date.now(), count || 0);

Scaling rate limiting with shards#

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Use sharding to increase throughput without compromising on correctness.
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE, shards: 10 },
  llmRequests: { kind: "fixed window", rate: 1000, period: MINUTE, shards: 10 },
});

Power of two#

Reserving capacity:#

You can also allow it to reserve capacity to avoid starvation on larger requests.

With this, you can queue up many operations and they will be run at spaced-out intervals, maximizing the utilization of the rate limit.

Details in the Stack post.

const myAction = internalAction({
  args: {
    //...
    skipCheck: v.optional(v.boolean()),
  },
  handler: async (ctx, args) => {
    if (!args.skipCheck) {
      // Reserve future capacity instead of just failing now
      const status = await rateLimiter.limit(ctx, "llmRequests", {
        reserve: true,
      });
      if (status.retryAfter) {
        return ctx.scheduler.runAfter(
          status.retryAfter,
          internal.foo.myAction,
          {
            // When we run in the future, we can skip the rate limit check,
            // since we've just reserved that capacity.
            skipCheck: true,
          }
        );
      }
    }
    // do the operation
  },
});

Adding jitter#

A simple implementation could look like:

const retryAfter = status.retryAfter + Math.random() * period;

More resources#

Check out a full example here.

See this article for more information on usage and advanced patterns, for example:

How the different rate limiting strategies work under the hood.
Using multiple rate limits in a single transaction.
Rate limiting anonymous users.

Get your app up and running in minutes

Start building

ProductSync Realtime Auth Open source AI coding Chef FAQ Pricing

DevelopersDocs Blog Components Templates Startups Champions Changelog Podcast LLMs.txt

CompanyAbout us Brand Investors Become a partner Jobs News Events Terms of service Privacy policy Security

SocialTwitter Discord YouTube Luma LinkedIn GitHub

A Trusted Solution

SOC 2 Type II Compliant
HIPAA Compliant
GDPR Verified

Rate Limiter

Category

Pre-requisite: Convex#

Installation#

Define your rate limits:#

Strategies:#

Usage#

Using a simple global rate limit:#

Per-user rate limit:#

Consume a custom count#

Throw automatically#

Check a rate limit without consuming it#

Reset a rate limit#

Define a rate limit inline / dynamically#

Using the React hook#

Fetching the current value directly#

Scaling rate limiting with shards#

Power of two#

Reserving capacity:#

Adding jitter#

More resources#

Rate Limiter

Category

Pre-requisite: Convex#

Installation#

Define your rate limits:#

Strategies:#

Usage#

Using a simple global rate limit:#

Per-user rate limit:#

Consume a custom count#

Throw automatically#

Check a rate limit without consuming it#

Reset a rate limit#

Define a rate limit inline / dynamically#

Using the React hook#

Fetching the current value directly#

Scaling rate limiting with shards#

Power of two#

Reserving capacity:#

Adding jitter#

More resources#