Table.sort() lags when a table with 400+ items is inserted

Master_Melodic · August 24, 2021, 7:23pm

Hi, I have an equip best system which works fine until 400 pets are added and it causes lag. How would I rewrite my system so it won’t lag?

function Pet:EquipBest(plr)
	if not isServer then return end
	self:UnequipAll(plr)
	local pets = plr.Pets:GetChildren()

	table.sort(pets,
		function(a,b)
			return bNum.me(self:GetMultiplier(plr, a.PetID.Value, "Coins"), (self:GetMultiplier(plr, b.PetID.Value, "Coins")))
		end
	)
	
	for i = 1, plr.PetStats.MaxPetsEquipped.Value do
		if pets[i] then
			self:Equip(plr, pets[i].PetID.Value, pets[i].Name)
		end
	end
end

Master_Melodic · August 24, 2021, 7:25pm

I am using big nums btw.

return bNum.me(self:GetMultiplier(plr, a.PetID.Value, "Coins"), (self:GetMultiplier(plr, b.PetID.Value, "Coins")))

is basically the same as

return self:GetMultiplier(plr, a.PetID.Value, "Coins") > self:GetMultiplier(plr, b.PetID.Value, "Coins")

Xysteric · August 24, 2021, 7:29pm

Add a wait(), that should solve your issue with lag.

Master_Melodic · August 24, 2021, 7:31pm

Already tried that, just results in errors:

cannot resume non-suspended coroutine
attempt to yield across metamethod/C-call boundary

Xysteric · August 24, 2021, 7:36pm

Then you might want to implement your own system if the one you are using is lagging.

MettaurSp · August 24, 2021, 8:05pm

It depends on what you want to do with it. table.sort lags because the complexity of the problem doesn’t scale one to one with the size of the input. It needs to compare each individual items against multiple items to find their place in the sorted version of the table.

To combat this you would usually want to off load some of the work to other parts of the code so you don’t have to do it all at once. Here are a few options, but the list isn’t limited to this:

Keep a sorted version of the table and sort again when you insert a new item. Sorts on mostly sorted tables are a lot faster to sort than unsorted tables.

Use a more specialized data structure. Different types of data structures are built to work well in different use cases, and there are some that are built to maintain sorted data. If you don’t care about exact order and you just want to fetch each item from the container once in either ascending or descending order you can use a min/max heap. Heaps are fast to insert into and remove from, but if you need to access the same data multiple times then a binary search tree would work better.

You could also write a custom sorted inserter instead of using table.sort each time new data is added. Custom inserters have the potential to be faster if you pick the right one for your data set.

If you don’t need every single item in the input to appear in the output, you could make a special case for your structure to suit your needs. If you just need 1 item, scan through the list one at a time and only remember the best result.

If you need a few from the list, you could make a second table that holds amountYouNeed + 1 where you replace the last entry with the new item you want to insert and sort that. The last item will always be the worst so whatever is stored in there is free to be replaced/thrown away after sorting is done. This would drastically reduce the number of items that need to be checked per sorted item, and limit the growth rate of the complexity of the sort.

Making a custom sort function where waiting is allowed would technically work, but it wouldn’t make the sorting go any faster, it just surrenders control back to other tasks after a certain amount of time. It is a viable solution, but only if your data set remains small enough that it doesn’t get in the way of the operation the sorting is being done for. If a player clicks a button and the button takes too long to respond it could get in the way of user experience.

Master_Melodic · August 25, 2021, 6:30am

I think it is the sheer amount of items there are, even looping through all of the pets without any breaks is quite laggy. I tried sorting them as new pets are added but it’s still laggy, this time worst. Also , can you elaborate on what you mean by min/max heap and specialised data structures please? Sorry for replying late I had to go sleep so ye.

MettaurSp · August 25, 2021, 7:31am

By the sounds of it, if looping through the items is slow then it might be the operation that you are doing each loop iteration that is the slow part. I would start by looking there to see how much time it takes to perform that operation if looping through 400 items is slow.

If it isn’t the iteration that is slow, and it is the sorting part then it is safe to look for another sorting option like the ones outlined in my previous post, and outlined below. This response is big so I split it into 4 different sections: deciding which approach to use, using the smaller secondary table, using a heap, and a partial explanation of a heap.

I find it unlikely that you would need a heap for this specific use case, but I’ll include the explanations below for academic purposes in case you want to learn about them. These kinds of things are usually taught in data structures + algorithms courses, so you could search by that course name if you want to learn more.

If you have a smaller number of items that you want to use self:Equip() on, like around or under 20, then you should use this approach

If you need a lot of items then you should use something like a min/max heap.

If possible, you should keep a copy of the table at all times instead of recreating it using GetChildren. When you add or remove an item you should update that copy of the table using table.insert + table.sort, or table.remove. That would spread out the work across all the times that the items are updated instead of updating it all at once.

The quoted approach would be one example of a specialized data structure. You loop over all the items from GetChildren and add them to the back of a secondary table that starts out empty. Every time you add a new one, either use a sorted insert, like inserting directly into the right spot, or put the new item in the last table index and use table.sort on it.

With a table that has 400 items in it, table.sort might need to do anywhere from 400 to 400^2 comparisons, with an average of 400 * log2(400) comparisons. That’s 400 - 160,000, with an average somewhere around 3,500 comparisons at the very least.

With a secondary table that only has as many sorted items as you need, assuming n is the number of output items, that would be 400 - 400n comparisons. If you only need 25 of the sorted items, that would be 400 - 10,000 potential comparisons with a significantly smaller average than the 3,500+ that the other method needs.

For the min/max heap, those are binary trees that are designed to have very fast sorted insertion operations, and very fast removal operations, with an instantaneous “get min” or “get max” operation. You would convert the input table from GetChildren() into a heap by inserting each item into a new heap one at a time. You would then extract the items in a sorted order by reading the top item on the heap, then removing it.

Heaps are semi-sorted so only the top item on the heap is guaranteed to be optimal (minimum item for a min-heap, maximum item for a max-heap), but they are very fast to insert and remove from. They also have the advantage of being able to be converted to an array/vector, so just a table with number indices like { 1, 2, 3, 4, 5 }, so you don’t have to mess around with tree nodes. Only issue is that you need to remove the top item to read the second best item.

The algorithm for traversal, looking through the heap from best to worst, would be:

Read the item at the front of the heap (index 0 for heaps)
Remove item at the front of the heap
Repeat until the heap is empty

If you want to reuse the heap multiple times you could make a copy and traverse that copy.

For inserting a new item its:

Insert item at the back of the table (index 0 for empty tables, not index 1)
Find the parent node, formula is parent = (i - 1) / 2
Compare with the parent node. If the parent is less optimal than the newly inserted node then swap the 2 in the table
Less optimal for a min heap (smaller items first) means swap if the parent is larger, for max heaps swap if the parent is smaller
If you swapped the node you are inserting with the parent, set the index you are looking at to the parent index ( i = parent ), because that will be the new location of the item you are inserting
Repeat from step 2 until the parent isn’t less optimal than the new item, or until the index the new item is at is 0

For removing an item:

Swap the first node (i = 0) and the last node in the table
Remove the last node in the table
Repeat starting here:
Find the left child node: left = 2 * i + 1
Find the right child node: right = 2 * i + 2

If the left node is more optimal than the right node, and the left node is more optimal than the current node:

Swap the current node with the left node
Update the current index to the left node’s index: i = left

If the right node is more optimal than the left node, and the right node is more optimal than the current node:

Swap the current node with the right node
Update the current index to the right node’s index: i = right

If neither of the above happens, stop repeating. Otherwise, repeat again from step 3

Heaps are a kind of binary tree. If you are new to binary trees, a binary tree is an item that can have 2 child items: the left node and the right node. Each child item can have its own children, and those can have their own children, and so on and so on.

A heap is a special case of a binary tree where the parent item is “more optimal” than the child items. If you want smaller values to be more important, then the parent is always smaller than the 2 children. This is a min heap. A max heap is the opposite, where larger items are more important, and the parent is larger than the children.

A binary heap can be represented as a table because it has a special property where all of the nodes in the tree must be filled in except for on the very bottom level. On the bottom level they must be filled in from left to right:

The picture above is a max heap, where the most important items are the largest, so they bubble up to the top. The array representation would be a table that starts at index 0.

This makes it very easy to map a node in the tree to an index in the table mathematically. It isn’t quite straightforward how to derive what that mathematical relationship is, but the relationship itself is very simple and luckily the work to derive it has been done for us. The relationship is the parent node, left node, right node stuff that came up before:

parent = (i - 1) / 2
left = 2 * i + 1
right = 2 * i + 2

If you inserted a 1 into the heap in the above picture, you would put it next to the 7 in the bottom row, connected to the 3 in the row above as the 3’s left node.

Decker004 · August 25, 2021, 10:53am

If I should take a guess, then that self:GetMultiplier() you call in the compare-function, may be one reason for the slowness. - How complex is it, with regards to calculating and returning a value?

Couldn’t you cache the result the self:GetMultiplier() method returns for each specific pet of the player, before calling the table.sort() and then use the cached value (per pet) in the compare-function?

Another suggestion; try to examine out how many times your original code is actually calling self:GetMultiplier() (in the compare-function), to learn/understand - and then compare it to a revised and optimized one using cached values. - I.e. ‘profile the code’, to get actual proof of how your changes to the code makes it better or worse regarding performance.

Master_Melodic · August 25, 2021, 1:36pm

Hi, what do you mean by cache the result?

Decker004 · August 25, 2021, 2:07pm

That instead of calling self:GetMultiplier() for the exact same pet several times (in the compare-function), it should only be done ‘one time’ and be put into a cache - thereby avoiding the computations which will just result in the exact same value again and again and again …

Something like:

-- Cache the result from `self:GetMultiplier()` per pet, as it (most likely) won't change before this EquipBest function returns
local playerPetsAndValue = {}
for _, pet in ipairs( plr.Pets:GetChildren() ) do
  local petValue = self:GetMultiplier(plr, pet.PetID.Value, "Coins")
  table.insert( playerPetsAndValue, { pet=pet, petValue=petValue } )
end

-- Sort table, where compare-function use the cached petValue (thereby avoiding several multiple identical calls to `self:GetMultiplier()`)
table.sort(
  playerPetsAndValue,
  function(a,b)
    return a.petValue > b.petValue
  end
)

MettaurSp · August 25, 2021, 4:39pm

Caching would be keeping a copy of the last result wherever possible instead of recomputing it. It’s a pretty useful technique when working with expensive operations.