UpdateAsync very rarely calls the transformFunction twice on different versions of stored data

ZolarKeth · March 25, 2021, 8:45pm

Issue Type: Other
Impact: High
Frequency: Very Rare
Date First Experienced: 2021-03-16 00:03:00 (-04:00)
Date Last Experienced: 2021-03-25 00:03:00 (-04:00)

Reproduction Steps:
This is an incredibly rare but potentially severe bug which has no reliable reproduction that I have found.

Expected Behavior:
UpdateAsync should only call the transformFunction multiple times if data is being written to a key simultaneously from multiple sources.

Actual Behavior:
I am aware that UpdateAsync can call the transformFunction multiple times if the data is being written to from multiple sources simultaneously; that is NOT what I am attempting to describe here.

When saving player data, I detect if it is “stale” by comparing against a save ID which increments by one each time the player’s data is saved. In the very rare situation that the save ID does not match what is expected, the player is kicked from the game. Over the past few days, my players have been reporting an issue where they will go into a “stale data loop”; about one player per day will have an issue where they continually get kicked for stale data every time they join the game.

I managed to catch it while it was happening this time, and figuring out what was going on was a confusing mess. However, I came to the conclusion that every time the game would call UpdateAsync on this player’s key, it would call the transformFunction twice. I believe it was calling it on two different versions of the data; first it would call it on the version that GetAsync returned, and then it would call it using a “hidden” version that I had no way of reliably accessing. This “hidden” version of the data had an unexpected save ID which is what caused the stale data loop to occur.

While trying to debug in my live game, I decided to call UpdateAsync on the user’s data to see if I could get it to return both values. I tried with a return value of nil in the transformFunction, like so, so that no data would be overwritten:

dataStore:UpdateAsync(key, function(oldData)
	print(oldData.SaveId)
	return nil
end)

but this only returned the expected SaveId. Then, I tried returning oldData itself from the function, using the following code:

dataStore:UpdateAsync(key, function(oldData)
	print("hello")
	return oldData
end)

I do not know what possessed me to only print hello, but that led to the following result in the server output:

please ignore the timestamps

which, as you can see, called the transformFunction twice, though we cannot be certain on what data it was called. Unfortunately, performing this call to UpdateAsync fixed the issue, and all subsequent calls to the key have been performing as-expected, so I was unable to get any more info on what stored data was being used with each call.

A very confusing issue with not too much to go off of, I know, but if there were any changes made to DataStoreService in the past ~week, they could be responsible for this issue. I tried as hard as I possibly could to rule out any fault on my end, though there always remains the possibility that something I am doing is the culprit.

In addition, this is probably somewhat ephemeral. All other instances of stale data loops in my game have cleared up after a short period of time (less than an hour) without my intervention.

Workaround:

OuterspaceNemo · March 31, 2021, 6:31pm

Thanks for the report! We’ve filed a ticket to our internal database and we’ll follow up when we have an update for you.

buffyreal123 · August 19, 2022, 8:10pm

Is this issue still happening? We have made various fixes, but since it’s hard to reproduce so please let us know if this is happens.

CarefreeCarrot · May 7, 2024, 9:40pm

From my understanding, updateasync is just a get, set, get in a trenchcoat.
The first get grabs the value and caches it. If the final get grabs a value that is not equal to the cached value, UpdateAsync() will be called again since some change has occurred.

However, a rare edge case occurs when the get, set, get are processed in a particular staggered way between two update calls:

Say the value associated with a key is 1:
If 2 simultaneous UpdateASync() calls from 2 servers occur at almost the same instant:
Server 1 get_initial grabs the value 1 and remembers it
Server 1 set changes value to 2 and remembers it
Server 2 get_intial grabs the value 2 and remembers it
Server 2 set changes value to 3 and remembers it
Server 1 get_final grabs the value 3 and sees that 3 does not equal what it remembered, 2. This forces it to run the callback again.
Server 1 get_initial grabs the value 3 and remembers it
Server 1 set changes value to 4 and remembers it
Server 2 get_final grabs the value 4 and sees that 4 does not equal what it remembered, 3. This forces it to run the callback again.

As you can see, if the second call happens to run at the exact instant that the first call has set a new value, but not yet returned, the second call may cause a change that forces the first call to repeat. This causes an arbitrary number of repeats (until the two threads go slightly out sync from this exact rhythm due to latency)

None of this would happen if multiple UpdateAsync() requests for the same key are just processed in sequence (queue) so that the meshing of these (get-set-get) steps for simultaneous calls don’t do anything weird, while regular get and set calls can still run in parallel (since they aren’t supposed to guarantee any correctness anyways).