Player object can still be retrieved through GetPlayerByUserId for a short interval after PlayerRemoving bound threads are run

In my game I track players like this

PlayerAdded → Get data, place client in a table
PlayerRemoving → remove client from table

Now there’s a race condition in the retrieval of data since datastore calls are yielding. This means that if the player leaves during the process of getting data, the PlayerRemoving event would fire before the PlayerAdded callback had finished, and before the client was added to the table.

Normally this isn’t a problem, but because of my datastore implementation with session locking, I need to account for this edge case and close the datastore session for the client if they leave in the process of retrieving data.

I do this by checking to see if the player still exists inside of Players after retrieving data:

local data = GetData(UserId)
if not Players:GetPlayerByUserId(UserId) then --player left while getting data
  --close session
end

However, for whatever reason, EVEN ONCE the player leaves, it still remains inside of Players for some interval of time, maybe less than a frame. This means that the detection line is saying the player is still in the game, and the function goes through, adding the client to the table at the end. However because the player isn’t actually in the game anymore the PlayerRemoving event had already fired before the PlayerAdded callback had finished, so there is a stale client inside of the table, and the datastore session for this player is frozen, preventing them from joining again.

Immediately after retrieving the data and doing the player existence check, I manually spawn in the player’s character and set the Player.Character property to it. (Everything after the retrieval of data is NON YIELDING so the race condition can only be coming from getting the data).

The error report logged:
Character cannot be changed as Player (playername) is being removed.
This means that

  1. Player Removing had already fired
  2. The server recognizes that the player had left and therefore locked the Character property
  3. The player still exists inside of the Players service and can be retrieved by GetPlayerByUserId

If the player existence check had detected that the player was no longer in the game, then it would trip a return nil statement and the function wouldn’t attempt to spawn the character


I suspect the root cause of this change in behavior is deferred events since this wasn’t always the case. I’m guessing that the removal of the player from the Players service registry is scheduled to occur at deferred point from Player Removing, and that this deferred point is MULTIPLE resumption points after Player Removing, such that it ends up after the Network Step (which is when threads calling async network functions like GetAsync, UpdateAsync, etc are resumed)

info from testing and microprofiler

task scheduler diagram

-The player removing event fires at the earliest point in the frame, before input resumption point, prerender, and replication receive(network step) → for some reason schedules the removal of a player from Players to occur 3 or more resumption points later, probably due to a chain of deferrals
(Player remove event ->defer to pre render, run some callback → defer another thread, run code → defer again, this callback finally removes the player reference from Players)

Because threads yielding to web calls like GetAsync are resumed at the replication receive (I don’t have a picture for this but you can recreate a simple test and open microprofiler to see this behavior), the code in my function is doing the Players:GetPlayerByUserId check at this point, and seeing that the player still exists, BEFORE the very, very deferred backend callback from Player Removing manages to clean up the reference in Players service

Expected behavior

When PlayerRemoving fires, the player object for that specific player should no longer be retrievable from Players:GetPlayerByUserId

3 Likes

Why not check if not Player.Parent to see instead of… this?

Player.Parent is still Players at the point at which Players:GetPlayerByUserId exists
edit: I misread what you said, my datastore module is separate from the player handling module and doesn’t hold references to player instances

I always understood the event as “Player is being removed”, which is also what the error says. Not that the player has been removed.

If the Player was already removed when PlayerRemoving fires, then how would there be a Player object that the event can pass to the bound function? Additionally, the Documentation says it fires “right before a Player leaves”.

1 Like

The player object does not die as long as you retain a strong reference to it. This means you can reference any properties, even if it is not parented to the datamodel. I expected Player removing to remove the player object from Players, and that has nothing to do with the object being “unable to be used”. Its simply a state update of one property of the object - you can still invoke methods like Destroy on the object for final cleanup, SetAttribute (no reason to do this tho) and all other methods without error, since a reference to the player object is passed by the PlayerRemoving event

--psuedocode, actual implementation probably in c++, and most likely NOT using events
--actual impl probably iterates through some structure at the beginning of every frame
RakNet.ClientDisconnected:Connect(function(LeavingUserId, IP, ...)
  local LeavingUser = Players.SomeInternalRegistry[LeavingUserId]
  Players.SomeInternalRegistry[LeavingUserId] = nil
  LeavingUser.Parent = nil
  -- removed from PlayersService but object does not die because reference exists in the LeavingUser

  --Expecting this to fire once the player is already parented to nil and removed from PlayersService
  for _, fn in PlayerRemovingHooks do task.defer(fn, LeavingUser) end

end)

What I’m saying is at that point in time, it should no longer be retrievable from the Players service. It’s a matter of the ordering of events - say you have some custom class and you create an OnRemoving callback hook for it whenever you delete class objects from a table so that it can do additional cleanup (i.e. maybe a NPC died). It would be pertinent to have the hard reference in the registry table be removed (i.e. registry[ObjectKey] = nil) prior to firing the hook, so that anything that occurs from that point forward, the callback, as well as any subsequent logic that relies on the removal can expect a clean separation from any dependencies that operate on the live objects inside the registry.

In the post I linked, what I’m describing is behavior that used to be the case (the Parent property of a player would already be nil by the time PlayerRemoving is fired). The bug report is just saying that this is not always the case.