Performance concerns over DictionarySelectRandom function

colbert2677 · April 3, 2020, 5:59am

I decided to, after a long time of inactivity on it, revisit my GitHub account and the content there. While reorganising one of my repositories full of free code samples I’ve dumped for public use, I had the urge to add a function to my TableUtility library (formerly Array), “DictionarySelectRandom”.

As you can probably tell by the function’s name, the intention of the function is to allow the developer to select a random item from a dictionary. A simple random statement won’t do the trick so needing to create a function to do so is necessary. Now who would ever need it, I don’t know, you never usually apply randomised selections for a dictionary.

I wrote this on the fly just now and don’t have any ideal testing environments to put this through other than spam loops in blank baseplates. The function is available as follows:

function TableUtility.DictionarySelectRandom(dictionary)
	local keys = {}
	
	for key, _ in pairs(dictionary) do
		table.insert(keys, key)
	end
	
	if #keys == 1 then
		local onlyKey = keys[1]
		return dictionary[onlyKey]
	end
	
	local position = Random.new():NextInteger(1, #keys)
	local key = keys[position]
	
	return dictionary[key]
end

Now the reason I put “performance concerns” in the thread title is because obviously I’m concerned about the performance of this function, but also because the repository is open source. I don’t know who looks through my GitHub, if anyone at all, and if I’m sharing them code that isn’t performant, that wouldn’t bode well with me.

The only thing I’ve considered so far to improve performance is to use table.create so that the keys array isn’t getting resized every time a new entry is inserted. I don’t know what else I can do to make this more performant; and if it’s worth even fretting about at all. Any insight?

PS: I will not sacrifice readability for performance unless the latter reduces the time the function takes to run by a significant amount.

Complete library:

github.com

colbert2677/Roblox-Code-Samples/blob/master/Libraries/TableUtility.lua

--[[
	@ TableUtility
	@ colbert2677
	@ April 3, 2020
	
	Utilities for working with tables.
--]]

local TableUtility = {}

function TableUtility.ArraySelectRandom(array)
	if #array == 1 then
		return array[1]
	end
	
	local position = Random.new():NextInteger(1, #array)
	
	return array[position]
end

This file has been truncated. show original

Thank you in advance.

colbert2677 · April 4, 2020, 6:02am

I’ve decided, after my own re-review, that what I have in this function is good enough. Most of the optimisations that I can perform here are either negligible, may hurt readability or not function the way that they’re intended to. For reference, here are the modifications I’ve considered and my notes:

table.create over empty table construction: This would remove the need for the table to resize n amount of times whenever a key is tracked in the dictionary and instead only need to resize once. Adding a size parameter would be counterintuitive because it’s not consistent with ArraySelectRandom and wouldn’t be arbitrary. Why I’m not choosing to use table.create is specifically because of that arbitrary bit. Randomly creating a table of a large size (e.g. 1000) without filling many slots is wasteful. You also can’t count the number of items in a dictionary: adding an elements counter function would slow this down.
Removing local variable declarations: I thought about just returning items directly rather than assigning any variables. This kills readability, makes it harder for other cases (backtracking, adding, path modifying, debugging, etc) and I’m pretty sure would not speed things up. As such, I am keeping the declarations as is.
External Random object: Every time DictionarySelectRandom or even ArraySelectRandom is called, a new Random object is created and used only once. I don’t know how expensive the creation of a Random object is or what it does internally to fill all its fields, but I decided to keep this the way it is. An external Random object would just be equivalent to a global random so completely unrelated functions will be working from the same seed and pseudorandom results cycle. This sounds horrible.

I’m still concerned about performance and I don’t have a testing environment handy, but this review will do to quell some of my other concerns. Thanks for stopping by if you did, or if you considered responding or investigating performance for this function.