ROBLOXCRITICAL Data Loss from 502: HTTP 403 Error

We still are receiving reports of players losing data. We are seeing the following error occur nearly every day now, sometimes for up to 45 minutes at a time and strongly believe it is related:



(pics from the last 2 days. none today so far. we have been seeing massive spikes of these errors since June 3rd)

We have our own backup server to handle data – however it seems that there are times when both HttpService and DataStore service is unavailable, preventing requests from going out and ultimately causing players to lose their data.

We track all datastore fails and the reason for why the datastore request failed (including this particular 502 error). We are also tracking if at any point we exceed the maximum number of datastore requests that can go out per game server separately to google analytics. We are ensuring that we never hit the cap, nor saving more than once for the same key within 5 seconds. We retry all requests 5 times (both http requests to our backend and datastore requests to roblox’s servers), and still are receiving reports.

Something really needs to be done about this - these spikes shouldn’t be happening at all. Both HttpService and DataStore service need to be more reliable.

We are also open to the idea of an engineer looking at our code if needed.

Thank you very much for your time.
The Royale High Team,
ice7, Ironclaw33, & callmehbob

16 Likes

We will investigate this and follow up with you. In the meantime, could you add 3 retries to this call with a pause in between? This will increase the odds of success.

5 Likes

I checked my datastore analytics for phantom forces, and only had 35 errors in total over the last 48 hours. Nothing concerning from our end

4 Likes

Thanks Unholy - much appreciated. We do retry 5 times, with 6 second pauses in between (so that the same key does not get called within a period of 5 seconds).

Thanks @Semaphorism for the info!

Ice

2 Likes

Could it be the amount and/or type of data? I doubt it but that could be a factor.

1 Like

I recommend an exponential falloff, so the first attempt is 2, 4, 8, 16, etc. This can also help with too many requests trying at once. I got recommended to do this previously

If you’re unsure, something along the lines of

local attempts = 0
while wait() do
    -- do datastore request
   if not successful then
        if attempts >= 8 then break end
        attempts=attempts+1
        wait(math.pow(2,attempts)) -- wait 2^attempts
   else
       break
   end
end
8 Likes

Hey that’s brilliant! We will implement that. Thank you =)