Remove every element that has any duplicates from table

DudusJestem · May 7, 2021, 1:18pm

Hello my question is simple, how could I remove every element that has any duplicates even with the “first” element of that type?

Code:

local function removeDupes(tab)
     --Code goes here
end

local t = {"apple", "apple", "apple", "orange", "banana", "lemon", "tomato", "tomato"}

print(removeDupes(t))

Output i want to achieve:

{
    "orange",
    "banana",
    "lemon",
}

As you can see there are no elements named “apple” or “tomato” in the output, because they had duplicates

TDtheTV · May 7, 2021, 1:23pm

You could make the function create a new local table and then begind looping and cloning everything inside the asigned table with a in pairs() loop. You gotta check using the table.find() inside the loop to see if the local table has that string already aka table.find its not equal to nil. Then you remove it from the original table. After that, the original table will be empty, and the one inside the function ill be without any duplicates. You could loop around the new created table and clone everything to the original table.

123marble · May 7, 2021, 1:25pm

Are all the duplicates grouped together, as shown in your example? There is a significant impact on the efficiency of the implementation depending on the initial structure of the table…

DudusJestem · May 7, 2021, 1:29pm

Elements are just added in script to a table, they don’t have any specific order, this is just an example script i made

RobloxGamerPro200007 · May 7, 2021, 1:30pm

Use this:

local function removeDupes(tab)
	local rep = {}

	for index, element in ipairs(tab) do
		if table.find(rep, element) ~= nil and tonumber(index) ~= nil then
			table.remove(tab, index)
		elseif table.find(rep, element) ~= nil then
			tab[index] = nil
		else
			rep[#rep + 1] = tab[index]
		end
	end

	return tab
end

DudusJestem · May 7, 2021, 1:33pm

It removes duplicates but without the “first” value

I need to remove the first value too (in this example it is “Fairout”)

EDIT:
The first table in the output is “unduplicated” table

123marble · May 7, 2021, 1:37pm

An alternative approach I would opt for is using a hashset since element lookup will be O(1), and therefore overall time complexity is O(n). table.find implementation will be O(n^2) since find performs a linear search each time.

local function removeDupes(tab)
	local set = {}
	local finalTab = {}
	for _, val in pairs (tab) do
		if not set[val] then
			set[val] = true
			finalTab[#finalTab+1] = val
		end
	end
	return finalTab
end

DudusJestem · May 7, 2021, 1:41pm

It doesn’t work, it left one of the element that duplicates and removed other that don’t have duplicates

DudusJestem · May 7, 2021, 1:50pm

It’s working, I’m very thankfull but I think it might be kinda “unoptimized” idk how to say it

123marble · May 7, 2021, 1:51pm

Oops, my answer was wrong. This would work better.

local function removeDupes(tab)
	local set = {}
	local finalTab = {}
	for _, val in pairs (tab) do
		if not set[val] then
			set[val] =  1
		else
			set[val] = set[val] + 1
		end
	end
	
	for k, v in pairs(set) do
		if v == 1 then
			finalTab[#finalTab+1] = k
		end
		
	end
	
	return finalTab
end

RobloxGamerPro200007 · May 7, 2021, 1:55pm

Ok, is this more “optimized”?

local function removeDupes(tab)
	local repElement = {}
	local repIndex = {}

	for index, element in ipairs(tab) do
		if table.find(repElement, element) == nil then
			repElement[#repElement + 1] = tab[index]
			repIndex[#repIndex + 1] = index
		else
			if tonumber(index) == nil then
				tab[index] = nil
			else
				table.remove(tab, index)
			end
			if tonumber(repIndex[table.find(repElement, element)]) == nil then
				tab[repIndex[table.find(repElement, element)]] = nil
			else
				table.remove(tab, repIndex[table.find(repElement, element)])
			end
			table.remove(repIndex, table.find(repElement, element))
			table.remove(repElement, table.find(repElement, element))
		end
	end

	return tab
end

BuilderBob25620 · May 7, 2021, 1:58pm

Here’s a shorter way to do it if you want. I believe it’s more “optimized” but I may be wrong.

local find = table.find
local insert = table.insert

local function RemoveDuplicates(tbl : {[number] : any})
	local hash = {}
	local res = {}
	local rep = 0/0 -- NaN | Note: This value has to be something other than nil. Otherwise table.find won't work properly.

	for i,v in ipairs(tbl) do
		tbl[i] = rep
		
		if not (find(tbl, v) or hash[v]) then
			insert(res, v)
		end
		hash[v] = true
	end

	return res
end

BuilderBob25620 · May 7, 2021, 6:17pm

If I just checked the hash then it would remove all the duplicates. However, it would leave 1 remaining. What @DudusJestem wants however is to remove all values that have duplicate of themselves. So because I removed the value from the table already in this line

then it checks if there are duplicates. If there are then it doesn’t add it to the new table. It is then added to the hash table. Then if a value that is duplicate is processed, because the first value was added to the hash table it won’t be added to the new table either.

posatta · May 7, 2021, 7:02pm

My bad, I misinterpreted as “remove duplicates”, not “remove elements that have duplicates.” I figured one of each was what OP wanted.

BuilderBob25620 · May 7, 2021, 7:05pm

No problemo. If you did just want one of each you could just use a hash table like you said.