How to Handle Timeout Errors from Third Party APIs

Daniel Pericich
10 min readJan 20, 2023

--

Photo by Karsten Würth on Unsplash

Working with third party APIs can be tricky. Dealing with any system that is not under your direct control can lead to issues. Whether these issues are caused by a lack of your understanding of the system due to poor or missing documentation, or from flaky endpoints that return inconsistent responses, there’s a lot that can go wrong.

One issue that can be especially frustrating is dealing with timeout errors. These errors occur for a number of reasons, but can be flaky (inconsistently occurring), and depend as much on network conditions to and from the API as how the third party API was constructed.

While the root causes of Timeout errors aren’t complicated, how to handle HTTP requests resulting in them can be. Depending on whether your action modifies data on your server, or the API’s server, it is important that you account for Timeout errors very carefully.

In this article I want to walk through what causes Timeout errors as well as different HTTP method scenarios. With these scenarios we will explore how to protect your servers’ data integrity as well as the data integrity of your target API service’s database.

What is a Timeout Error?

Have you ever clicked on a button to download a file, or submit a form, and watched as the progress indicator endlessly spins or the bar freezes? Maybe you’ve had the misfortune of having Timeout errors cause your page to crash and be replaced with an ominous 500 server error.

Figure 1. Image of 500 timeout error (courtesy of Kinsta cloud hosting).

If you’ve seen this behavior then you’ve probably experienced a Timeout error. What is a Timeout error? This type of error is caused by services taking too long to finish their requested action.

Timeout errors are caused by actions exceeding a set timeout threshold. The range of a timeout threshold could be seconds, or even minutes, and varies by application and requested action. Whatever the timeout threshold is, once the server has attempted to handle a request for a given period it will terminate the action to prevent further blocking of the server, thus producing a Timeout error.

What causes these errors to occur? The two most common errors come when a call involves too much data and when there is an issue with the network or service and you lose communication with the server. We will explore these causes more in the next two sections.

How to Handle too Much Data

The first of the two issues is straightforward to fix. If you know the limit of how much data an endpoint can handle, whether accepting data to create records or retrieving records or data, then you can break up your large query into smaller queries to stay within the bounds of the timeout limit.

A common way to do this is to batch GET or POST requests. Batching is the method of processing many records at once, but can also be seen as a method for segmenting sets of records into smaller subsets. To decrease the likelihood of timeout for your app’s request, you can have your action accept a large number of records, and iterate over this collection of records to segment and send acceptable sized batches to the service:

Figure 2. Large POST call being batched into smaller POST calls.

Good API documentation should include request limits, but sometimes you will not be given an explicit bound. If this happens you may have to do trial and error in development to determine an expected threshold. From this expected threshold you may want to slightly decrease your request size to ensure the user doesn’t get server errors when using your app to request records.

What about Network Issues?

This is where dealing with timeouts gets interesting. While data overloads are quantifiable and permanently solvable, network issues are randomized chaos. Maybe the server you are talking to loses power. Or the server is taken down mid request for scheduled maintenance. These are a few of the many items that may cause a blip in service or lead to a full down for server you want to work with.

If the server you are talking to loses connection, your server or client may be left waiting for a response until it times out and reports a failed request. Though the causes can be many, the result will usually be an incomplete request. The impact of this incomplete request is determined by whether the request is idempotent or non-idempotent.

Idempotent vs. Non-Idempotent Requests

What is an idempotent request? Idempotent requests are requests that can be called infinite times while returning the same result. For HTTP methods the GET, PATCH, PUT and DELETE methods can all be idempotent. GET requests are the most straightforward as they just ask for and receive some information. Unless the resource changes between GET requests, the response should be identical.

PUT, PATCH and DELETE can also be idempotent as they require a record id to act on and once they’ve been called, subsequent calls should have identical responses. For these three methods, the first call will modify the record and any further repeat calls should return some form of success response or no action taken response (or in the case of DELETE a no existing record response).

Figure 3. Idempotent GET request.

Now that we have talked about idempotent requests, let’s look at a non-idempotent request. These are calls that have a different result on every call. POST calls are the most common idempotent requests as POST requests usually trigger the creation of a brand new record. Even if the body of the request is identical, the POST call can lead to a successful call for new record creation.

This is true most of the time, though some systems will have create action validations that look at specific record attributes to prevent duplicates records from being created. When hitting these types of systems you may get 200 status code messages for a successful record creation or 400 message for client side error on incorrect record entry (duplicate record).

Figure 4. Non-idempotent POST call creating duplicate user records.

Making 3rd Party Non-Idempotent Calls

Non-Idempotent calls can ruin the integrity of our database by causing duplicate records or duplicate actions on records’ attributes. Imagine making a call to charge a bank account, but doing it three times. Your customer would be pretty angry if their coffee ended up costing $30. There are a number of tools that we can use to prevent non-idempotent calls from causing issues with our 3rd party servers:

Figure 5. Timeout error causing 3 duplicate charges for a coffee enthusiast.

Most SQL databases include a database operation wrapper called a transaction. This wrapper goes around a set of database reads or writes and allows us to roll back all the actions if a set condition is not met. This is a safety feature that allows us to prevent partial operations from being persisted when only a full operation can be accepted (think about a transfer between 2 bank accounts where we only deduct money from the first account, but never credit the receiver).

While our 3rd party API call is not part of our SQL database, its operations can be linked to our database and should therefore be grouped with our database updates. Different back end frameworks have different ways to handle this, but in Rails the way to do this is wrap all related code in a transaction block. This block will check for any errors thrown within its execution and rollback changes to our database if an error occurs.

We have a tool to rollback our changes if any code in our transaction block throws an error, but how do we account for the 3rd party API updates? There are a few things we can do if we are worried about partial operations occurring when we update 3rd part APIs but not our records.

In theory we could have a call to the 3rd party API to check for the existence of any record we created or a certain record state before every update we push. This sounds nice but comes with a few issues. First, the checks would slow down all of our operations. This would at least double the time for every call we make, weighing down our servers and the 3rd party servers with extra calls. This extra operation time would also make the user experience poor.

Second, only knowing the current API state would make it impossible to check if our the state includes our operation. If we are checking that a bank account is at $8 after we credit it $3, it is going to be difficult to prove this with certainty. Maybe in the time after we experienced a timeout error, the user spent $5, got credited $10 and spent another $2. We’re still at $8 from our starting point of $5, but the way in which we got there is entirely different. Again, we can’t trust current state as an indicator of prior actions.

If checking is only as good as guessing for verifying prior actions then what we really need to do is log all of our actions. Logs are very useful for getting a full picture of what is happening with transactions that aren’t transparent to users. In this case we can log both successes and failures and include the error messages with the failures to make follow up easier.

Again, the timeout failures we are checking for are flaky so we aren’t going to know what caused them or when to expect them. If a call goes through, great. We can log a success message with the ids for any new records. Failures will be a little more complicated.

Order Matters with Transactions

When accounting for timeout errors with our failures there are a few things we’ll want to do. First we want to separate as much of our database manipulation code from the third party database code. All of our code can be rolled back, and as long as the 3rd party API is not directly dependent on our database manipulation code, we can segregate our operations on it for easier access.

Now that we have our code and the 3rd party API code separated in our transaction block, we need to determine which should come first. There are two options, our code comes first then the 3rd party API or we run the 3rd party API modifying code then our own code.

At first I had strong feelings on how to order this, but both orderings have their pros and cons. If you put your code first, if it fails then you can rollback your own changes and never touch the 3rd party API. If your code runs and the 3rd party API fails, you can still roll back your code, but have to worry about 3rd party API changes that persisted.

If you run the 3rd party API operations before your code then you don’t even need to roll back your code if the 3rd party API operations fail. If the 3rd party operations succeed then you can run your code, but if it fails then all of your code will roll back, but the 3rd party API operations will persist.

Both orderings leave the chance that you will persist a partial change to the 3rd party API, but I would argue that the first method is better. I would rather have my code fail and prevent calls to the 3rd party API then have a failed transaction from my code after successfully working on the 3rd party API. It is easier for me to react to failures in my system then track down partial or full modifications on someone else’s system.

Whichever way you decide to go, you will want to have descriptive logs for both successful and failing calls. If the calls fail on your end your logs should include the endpoint you were calling, records touched and the error thrown. Depending on what the 3rd party API exposes, you will want to include the method you are calling on the API, the parameters passed, any records that were created or updated and the error returned by the API. This sounds like a lot, but the more information the easier it will be for you and your business team to troubleshoot issues with data syncs and partial records.

Conclusion

Data integrity is the most important part of today’s apps. If the data gets out of sync or corrupted by bad transactions, the fallout could be anything from expensive coffee to failing medical equipment. Timeout errors can wreak havoc on our systems, but there are steps you can take to reduce the impact they make and the time it takes to resync your data. I hope this article helped you better understand how to fix these problems.

Notes

https://www.replicon.com/help/avoiding-api-timeouts/#:~:text=Timeouts%20are%20typically%20caused%20by,is%20a%20network%2Fservice%20issue

https://softwareengineering.stackexchange.com/questions/396216/how-to-handle-database-errors-after-a-successful-3rd-party-payments-api-response

https://stackoverflow.com/questions/48824333/how-to-deal-with-internet-drops-and-timeout-errors-with-third-party-apis

--

--

Daniel Pericich
Daniel Pericich

Written by Daniel Pericich

Former Big Beer Engineer turned Full Stack Software Engineer

No responses yet