API requests at scale: Rate-limits and Quotas - Kloudless 科迪股份有限公司, a Netskope company

Most multi-tenant cloud services have a strategy to ensure fair quality of service to all users on their platform. This includes ensuring the service is available and responsive to both human users as well as bots and applications that access the service via its API. A rate limit for the API achieves this by ensuring usage thresholds are in place to protect the API from malicious or unintentional overuse.

There are several algorithms that could be used to implement rate limits or quotas. As an API abstraction layer, Kloudless maps the various error responses that could be received when rate limits are exceeded to a unified format. This lets applications handle limits easily with a single implementation (docs). Below, we’ll cover some of the popular approaches we’ve seen using examples of SaaS applications that we’ve seen them be used in.

Sliding window rate-limit

Typically, rate-limit algorithms track the number of requests over a short period of time, such as a second or a minute. If requests begin to exceed the threshold, we commonly see error responses with the 429 status code based on RFC 6585. This includes the Retry-After header that indicates how long the client should wait (in seconds) before retrying the request.

This type of rate-limit is the most common we see. Examples of APIs that implement it include Kloudless’ own multi-tenant API, Dropbox, Google Drive, SharePoint Online, Egnyte, and several others. Kloudless unifies the Retry-After header returned via each API. Kloudless also takes steps to exponentially back-off and retry requests for up to thirty seconds based on each specific API’s recommendations and best practices.

For example, SharePoint Online recommends a specific User-Agent format and certain headers in API requests to ensure more favorable rate limit thresholds. SharePoint also varies the exact rate limit based on the tenant being accessed to ensure tenants with a larger number of users have higher rate limits overall. Others such as Google Drive (pictured below) include per-user as well as per-developer rate limits.

Quotas

Rate-limits generally handle spikes in traffic well over short time intervals. However, an API sometimes also need to regulate the total number of requests over much longer time intervals, such as an hour, entire day, or even month. In these scenarios, the API is effectively providing a quota of usage over that time period.

Quotas complement rate-limits by allowing rate-limits to be set higher. Otherwise, an API service might not be able to sustain a constant level of requests near the rate-limit threshold by an ever-increasing number of applications. Providing a quota ensures that applications are permitted to occasionally burst to high levels of usage, but not maintain that level.

By tenant

Salesforce caps the number of requests per customer based on the customer’s Salesforce edition and number of licenses. Note that this cap is actually placed on the tenant’s use of API requests, rather than a specific developer application. This means that a misbehaving application that exhausts a tenant’s daily quota could effectively cause all of the tenant’s other integrations to temporarily fail as well. This elevates the impact to affect a customer’s usage of the API provider itself rather than impacting a single developer application. Kloudless adopts stricter internal rate-limits when accessing Salesforce for this reason, especially when performing repetitive actions such as polling for changes in Salesforce.

By developer application

As pictured earlier, some APIs like Google Drive also limit the overall number of API requests a developer application can perform across all users who have authorized access to the application. This can begin to become a concern as an application gains larger adoption with more users authorizing access to their data. APIs with these quotas usually allow developers to request an increase to the limit if justified.

By authorized user

This is the most flexible quota of the above. It can be viewed as similar to a rate limit but over a longer time interval. For example, the Egnyte API defaults to a limit of 1000 requests per authorized user per day. This is in addition to its rate-limit of 2 requests per second. Both these limits can be increased by contacting Egnyte, but provide sufficient room to develop and test an application accessing the Egnyte API.

Since these requests are per individual user, running into the rate limit for a user does not affect your app’s ability to make requests to other users’ accounts, or impact a tenant’s other integrations as described in the scenarios above.

Summary

We see a wide variety in how APIs ensure fair access to resources, even within a specific type of algorithm. Here are some helpful tips to keep in mind for applications working with APIs:

Implement exponential back-off to retry requests that are rate-limited.
Work with API providers to ensure your application has a high enough quota and is well-designed.

To learn more about how Kloudless can help your application when it comes to handling rate-limits, go check out our new official guide to API integration now!

Why Kloudless?

Resources

Unified API Platform

Enterprise

Industry

Use Case

Documentation

Tools

Resources

Kloudless Blog

Sliding window rate-limit

Quotas

By tenant

By developer application

By authorized user

Summary

Related Articles

File Picker Updates: Two New Ways to Access Files

Making Sure the Message Sticks: Our Time at GlueCon 2019

Business Cards for the Future: An ABBYY Case Study