How to performantly check if two dictionaries are the same?

Every frame I’m checking if something is under a Gui Frame:

local function findPartsInSelectionBox()

	local a1 = GuiElement.AbsolutePosition
	local b1 = a1 + GuiElement.AbsoluteSize

	--- make sure a is the top left and b the bottom right
	local a = Vector2.new(math.min(a1.X, b1.X), math.min(a1.Y, b1.Y)) -- both are math.min
	local b = Vector2.new(math.max(a1.X, b1.X), math.max(a1.Y, b1.Y)) --both are math.max
	
	local instances = {}

	for part, position in inputSelectablePartReturnExtentsCFramePosition do

		local point, onScreen = camera:WorldToScreenPoint(position)

		if not onScreen then
			continue
		end

		if a.X < point.X and point.X < b.X and a.Y < point.Y and point.Y < b.Y then
			instances[part] = part
		end

	end
	
	return instances
	
end
- ContextActionService: Unexpected error while invoking callback: invalid key to 'next'  -  Studio
- invalid key to 'next'

I got this error…

Also my keys and values are the same, the table is structured like this:

table[part] = part

So I’m not sure if I need both keys and values, I just need to check if the same keys exist in both tables… I think?

Which happens more often: the tables are equal, or the tables are different? I ask because I’d optimize the two cases in different ways.

If the tables are expected to be the same most of the time, then constant naïve comparison is probably not the best option. If you can narrow down a subset of keys or values that are expected to change, and check only those, that would be better. If there is no such set, probably the best you can do is start your comparison loops and early out as soon as you find any difference in the tables.

If your tables are expected to change every frame, and that you’re trying to catch the occasional time when they are equal, then the best option is to compute a hash for the table when you populate it, and compare hashes.

Instead of

local table = someFunction()
 if table == previousTable then return end

you would have

local table, hash = someFunction()
 if (hash == previousTableHash) and fullCompare(table,previousTable)  then return end

What’s happening here is that a full key-by-key comparison is only done if the tables have the same hash. This is because different hashes means the tables are different, but equal hashes only means that the tables are probably the same. This is because in order for the hash provide value as an optimization, it has to be much cheaper to create than doing the full table comparison, so it’s probably not perfect.

Creating a hash as you build the table can be done many ways. It could be something as simple as the number of keys (obviously not ideal), or a checksum of the values (better it makes sense for the data types). Hash functions are a whole topic on their own, so best to check Wikipedia for ideas.

okay so you’re saying to also return the length of the table? I can do that.

However, actually, there might be a case where a part leaves the viewing window and another one enters at the same time, so therefore we still need to check if both tables contain the same data…

Just iterate manually over both dictionaries using next.

local function AreDictionariesSame(Left: {[unknown]: unknown}, Right: {[unknown]: unknown}): boolean
	local LeftKey, LeftValue, RightKey, RightValue
	
	repeat
		LeftKey, LeftValue = next(Left, LeftKey)
		RightKey, RightValue = next(Right, RightKey)
		
		if LeftKey ~= RightKey or LeftValue ~= RightValue then
			return false
		end
	until not (LeftKey or RightKey)
	
	return true
end

A very similar solution was left by another user:

It resulted in an error, and they haven’t responded…

I was giving key count as the simplest valid example of a hash function I could think of, but it’s a terrible choice of hash function, for the exact reason you point out. A perfect hash function would return a different value for tables that didn’t exactly match. But in practice, evaluating such a function is often too expensive, not saving you time over just doing the brute force comparison (or even making things worse).

The best hash function is one that’s super fast to generate, and has a very, very high probability of dissimilar tables having different hashes, but not a 100% guarantee. That way, the only time you end up fully comparing tables is when they are actually identical, or once in a blue moon when you happen to get two different tables that compute the same hash by coincidence (this is called a hash collision).

The challenging part of this type of optimization is designing the great hash function that maximizes speed and minimizes collisions.

I can make the table into an array. However, I thought it’s faster to do table[key] than table.find(table, key)

If you make an array then:

local tab1={}
local tab2={}
local db = true
if #tab1==#tab2 then
for i,v in tab1 do
if tab2[i]~=v then db=false break end
end
else
db = false
end
print(`Tables are equal: {db}`)


table.find() and table[key] have absolute 2 different uses
table.find returns a key that contains a value while table[key] returns a value from a key

This is something I wrote up in about 10 minutes. It’ll compare the indices and values and return whether they’re exactly the same, and that there’s no extras.

local dict1 = {Hello = "World", Epic = "Gamer"}
local dict2 = {Hello = "World", Epic = "Gamer"}

local function CheckDictionaries(d1, d2)
	local temp = {}
	for i, v in pairs(d1) do
		if d2[i] ~= nil and d2[i] == v then -- If you want to only check for indices specifically, remove d2[i] == v
			temp[i] = true
		end
	end
	local chk = true
	for i in next, d1 do
		if temp[i] == nil then
			chk = false
			break
		end
	end
	if chk == false then
		return false
	end
	for i in next, d2 do
		if temp[i] == nil then
			chk = false
			break
		end
	end
	return chk
end

print(CheckDictionaries(dict1, dict2)) -- true

dict2 = {Hello = "World", Epic = "Gamer", Apple = "Sauce"}

print(CheckDictionaries(dict1, dict2)) -- false

dict2 = {Hello = "World", Epic = "Face"}

print(CheckDictionaries(dict1, dict2)) -- false

Yes, but in this situation, to get O(N) comparison of the arrays, they would need to be built in a way such that they are sorted the same (deterministically) by key. Thus, given a key, finding the matching data requires a search. It’s at very least a binary search in O(log N) time. One way to implement the search is to build parallel “indexing” arrays that are just arrays of the keys. With this, you’d use table.find( indexingArray, key ) to get the index into the main data array for that key.

Switching to arrays guarantees worst case comparison to be O(N), but the only real performance gain you get comes from being able to early out by knowing the array sizes without iterating them. And you could solve that better just by adding key count to the dictionaries when building them. And you’ve slowed down access to the data from O(1) to at best O(log N), but in practice probably O(N).

TLDR; Keeping these dictionaries is the right move.

No it will not help.
At this point just check last value if they are equal and will return the same “not accurate result”
You are overthinking and it leads to nowhere more so this will not be 100% accurate and as i said checking last value in array being equal would do the same exact thing.

Also table.find is for arrays anyway so you are making it even worse.

I don’t think there’s any solution that’s performant. Looping takes a lot of power.

I would rather stop the search if a falsehood is found. Do an early return instead and check if any of the below are false. If so, they are not equal.

Compare the table variables.
Then compare the array length.
Then do a k,v loop in Dictionary 1, and compare the value with Dictionary 2’s key’s values.

If you need both tables to be thoroughly the same, then I’m uncertain. Perhaps reduce the tick rate in which the check is being done.

I wasn’t suggesting using table.find. I was explaining why the OP mentioned it, and how it can actually be used with parallel arrays to do key-value lookups, even though what it does directly is find the index, given a value. And we seem to agree that there is no benefit to doing this, so there’s no need to argue that point at all.

1 Like

Literally every method people has sent here (including me) does use early return :skull:
What are you talking about?

I do agree with you that it is the best to not have that setuation in the first place and even if it will require restructuring the game.

I was replying to the thread itself, not your post.

I’ve done exactly that. If you look at this code here:

I’m already looping through one of the tables, so I used that to my advantage and to check off all the repeating parts, and to only leave the deleted/new parts.