Adjusting your code for rate limiting

Improving instance stability with rate limiting

On this page

Still need help?

The Atlassian Community is here for you.

Ask the community

Whether it’s a script, integration, or app you’re using — if it’s making external REST API requests, it will be affected by rate limiting. Until now, you could send an unlimited number of REST API requests to retrieve data from Confluence, so we’re guessing you haven’t put any restrictions on your code. When admins enable rate limiting in Confluence, there’s a chance your requests will get limited eventually, so we want to help you prepare for that.

Before you begin

To better understand the strategies we’ve described here, it’s good to have some some basic knowledge about rate limiting in Confluence. When in doubt, head to Improving instance stability with rate limiting and have a look at the first paragraph.

Quick reference


Request codes...

Success: When your request is successful, you’ll get a 2xx code.

Error: When your request fails, you’ll get a 4xx code. If you’re rate limited, it will be 429 (too many requests).

HTTP headers...

The following HTTP headers are added to every authenticated request affected by rate limiting:

Header

Description

X-RateLimit-LimitThe max number of requests (tokens) you can have. New tokens won’t be added to your bucket after reaching this limit. Your admin configures this as Max requests.
X-RateLimit-RemainingThe remaining number of tokens. This value is as accurate as it can be at the time of making a request, but it might not always be correct.
X-RateLimit-Interval-Seconds

The time interval in seconds. You get a batch of new tokens every time interval.

X-RateLimit-FillRate

The number of tokens you get every time interval. Your admin configures this as Requests allowed.

retry-after

How long you need to wait until you get new tokens. If you still have tokens left, it shows 0; this means you can make more requests right away.


Strategies

We’ve created a set of strategies you can apply in your code so it works with rate limits. From very specific to more universal, these reference strategies will give you a base, which you can further refine to make an implementation that works best for you.

1. Exponential backoff

This strategy is the most universal and the least complex to implement. It’s not expecting HTTP headers or any information specific to a rate limiting system, so the same code will work for the whole Atlassian suite, and most likely non-Atlassian products, too. The essence of using it is observing whether you’re already limited (wait and retry, until requests go through again) or not (just keep sending requests until you’re limited).

(tick) Universal, works with any rate limiting system.

(tick) Doesn’t require too much knowledge about limits or a rate limiting system.

(error) High impact on a Confluence instance because of concurrency. We’re assuming most active users will send requests whenever they’re available. This window will be similar for all users, making spikes in Confluence performance. The same applies to threads — most will either be busy at the same time or idle.

(error) Unpredictable. If you need to make a few critical requests, you can’t be sure all of them will be successful.

Summary of this strategy

Here’s the high-level overview of how to adjust your code:

  1. Active: Make requests until you encounter a 429. Keep concurrency to a minimum to know exactly when you reached your rate limit.
  2. Timeout: After you receive a 429, start the timeout. Set it to 1 second for starters. It’s a good idea to wait longer than your chosen timeout — up to 50%.
  3. Retry: After the timeout has passed, make requests again:
    1. Success: If you get a 2xx message, go back to step 1 and make more requests.
    2. Limited: If you get a 429 message, go back to step 2 and double the initial timeout. You can stop once you reach a certain threshold, like 20 minutes, if that’s enough to make your requests work.

With this strategy, you’ll deplete tokens as quickly as possible, and then make subsequent requests to actively monitor the rate limiting status on the server side. It guarantees you’ll get a 429 if your rate is above the limits.

2. Specific timed backoff

This strategy is a bit more specific, as it’s using the retry-after header. We’re considering this header an industry standard and plan to use it across the Atlassian suite, so you can still be sure the same code will work for Bitbucket and Confluence, Server and Cloud, etc. This strategy makes sure that you will not be limited, because you’ll know exactly how long you need to wait before you’re allowed to make new requests.

(tick) Universal, works with any rate limiting system within the Atlassian suite (and other products using retry-after) — Bitbucket and Confluence, Server and Cloud, etc.

(tick) Doesn’t require too much knowledge about limits or a rate limiting system.

(error) High impact on a Confluence instance because of concurrency. We’re assuming most active users will send requests whenever they’re available. This window will be similar for all users, making spikes in Jira performance. The same applies to threads — most will either be busy at the same time or idle.

Summary of this strategy

Here’s the high-level overview of how to adjust your code:

  1. Active: Make requests and observe the retry-after response header, which shows the number of seconds you need to wait to get new tokens. Keep concurrency level to a minimum to know exactly when the rate limit kicks in.
    1. Success: If the header says 0, you can make more requests right away.
    2. LimitedIf the header has a number greater than 0, for example 5, you need to wait that number of seconds.
  2. Timeout: If the header is anything above 0, start the timeout with the number of seconds specified in the header. Consider increasing the timeout by a random fraction, up to 20%.
  3. Retry: After the timeout specified in the header has passed, go back to step 1 and make more requests.

With this strategy, you’ll deplete tokens as quickly as possible, and then pause until you get new tokens. You should never hit a 429 if your code is the only agent depleting tokens and sends requests synchronously.

3. Rate adjustment

This strategy is very specific and expects particular response headers, so it’s most likely to work for Confluence Data Center only. When making requests, you’ll observe headers returned by the server (number of tokens, fill rate, time interval) and adjust your code specifically to the number of tokens you have and can use.

(tick) It can have the least performance impact on a Confluence instance, if used optimally.

(tick) Highly recommended, especially for integrations that require high-volume traffic.

(tick) Safe, as you can easily predict that all requests that must go through will in fact go through. It also allows for a great deal of customization.

(error) Very specific, depends on specific headers and rate limiting system.

Summary of this strategy

Here’s the high-level overview of how to adjust your code:

  1. Active: Make requests and observe all response headers.
  2. Adjust: With every request, recalculate the rate based on the following headers:
    1. x-ratelimit-interval-seconds: The time interval in seconds. You get a batch of new tokens every time interval.
    2. x-ratelimit-fillrate: The number of tokens you get every time interval.
    3. retry-after: The number of seconds you need to wait for new tokens. Make sure that your rate assumes waiting longer than this value.
  3. Retry: If you encounter a 429, which shouldn’t happen if you used the headers correctly, you need to further adjust your code so it doesn’t happen again. You can use the retry-after header to make sure that you only make requests when the tokens are available.

Customizing your code

Depending on your needs, this strategy helps you to:

Spread huge amount of requests over time...

By following the headers, you should know how many tokens you have, when you will get the new ones, and in what number. The most useful headers here are x-ratelimit-interval-seconds and x-ratelimit-fillrate, which show the number of tokens available every time interval. They help you choose the perfect frequency of making your requests.

Perfectly time complex operations...

You can wait to perform complex operations until you’re sure you have enough tokens to make all the consecutive requests you need to make. This allows you to reduce the risk of leaving the system in an inconsistent state, for example when your task requires 4 requests, but it turns out you can only make 2. The most useful headers are x-ratelimit-remaining and x-ratelimit-interval-seconds, which show how many tokens you have right now and how long you need to wait for the new ones. 

Create more advanced traffic shaping strategies...

With all the information returned by the headers, you can create more strategies that work best for you, or mix the ones we’ve described here. For example:

If you’re making requests once a day, you can focus on the max requests you can accumulate (x-ratelimit-limit), or lean towards the remaining number of tokens if a particular action in Confluence triggers your app to make requests (x-ratelimit-remaining).

If your script needs to work both for Confluence Data Center and some other application, use all headers for Confluence and focus on the universal retry-after or request codes if the app detects different software.

Last modified on Feb 21, 2023

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.