Vector3 & Other Positional Data Type Caching

As a Roblox developer, it is currently too hard to performantly utilize certain data types as hash keys in tables due to them being uniquely created userdatas.

If Roblox is able to address this issue, it would improve my development experience because it would allow me to map points in space to other values without additional performance cost, and, would generally improve vector performance when used in bulk. The same could be applied to CFrames, Vector2s, UDims, and UDim2s for the same benefits all around, and, likely other values like Color3s.

I feel that Vector3.new(0, 0, 0) is called twice, the same Vector3 userdata should be returned. The same position should result in the same userdata. This would eliminate the need for an __eq metamethod for Vector3s since they could just be directly compared which would result in better performance when comparing two vectors, and, it would eliminate the need to index each component of the position when used as a table key. This would likely also save on memory usage, because creating a bunch of the same Vector3s would have a memory complexity of O(1) whereas currently the memory complexity is O(n).

Single dimensional tables are somewhat faster than multi dimensional tables due to the cost of the additional two index operations, so, currently I create a numerical value based on the X Y and Z component of the vector. The problem with this is the cost of indexing each component still stacks up quite a bit, the cost of accessing vector components is significantly more than the cost of a regular table index.

The cost of indexing the components of vectors is also enough that CFrame:GetComponents() performs nearly 2x-3x faster assuming you have the data type as a CFrame, but, that’s not really related to this feature request necessarily.

I would think that this would also make vectors more intuitive to newer and intermediate programmers since they could be used as table keys. There are a lot of old scripts I wrote that tried to use this, and, I’ve seen it happen quite a bit, and, I think that the fact that two vectors will == eachother but will appear to only sometimes work in a table can be super confusing, especially because printing the two vectors will result in the exact same output.

This has the potential to break scripts, but, only if those scripts rely on the references being different for different vectors, and, I’m not sure where that would generally be useful.

6 Likes

Wouldn’t this add overhead for everything that would return a positional data type (when it doesn’t already exist)? There would be some overhead of accessing a global table of all objects that are cached to see if one is already present.

How would this work when one or more components are NaN? NaN returns false when compared with anything, would the objects be cached with other objects that have the same NaN values?

NaN compares unequal to itself, so __eq would still be needed.

For Vector3 specifically, it may become a value type in the future and not need caching (which would mean no difference between == and rawequal).
https://devforum.roblox.com/t/a-lesson-in-over-optimisation-magnitude-calculation/533956/21?u=halalaluyafail3

1 Like

This is a really good point, and, you’re right there. I even recall it being mentioned in one of the recent updates that Vector3s were comparing against themselves meaning v3 ~= v3 would tell you if the components were nil which was technically a bug but ended up becoming a feature.

Also, side note, but, that faster Magnitude check is actually a lot slower in luau as I found out not too long ago, I ended up posting about it (I can find the post if you want).

It’s very interesting that they were planning to make it its own data type, that’d be very cool to see and would maybe resolve my issue here.

It wouldn’t add any real overhead as far as I am aware, it’d just be a hashmap where the key is the float value for each component, so it should cost O(1) to do a lookup as any hash table should. In lua code it might have a cost but when it comes to machine code and direct memory access that’s when that becomes much less of a problem. There are already types in some of the included C libraries for doing hashmaps performantly I believe.

1 Like

Adding it to a large hash map of all objects isn’t free, especially when it needs to rehash. The GC would also have to be aware of these hash maps too, the pointers to the objects would need to be removed from the hash map when the object gets collected. When doing lots of operations on temporary vectors (that are unlikely to already exist) the cost would probably be a reasonable amount, with the benefit of vector caching being very minor (using Vector3 as keys in a table is quite uncommon).

2 Likes

This is definitely fair. I don’t really know, I would expect that this wouldn’t cost that much but I don’t have a perfect understanding of the way things are done and the way things work at the C level. I would expect that you would save more memory than you would lose because you’ll save memory for using lots of similar vectors which is usually the case when using the same code structure. If you think about the way caching could be done, it’d just be as expensive as a weak table. If they aren’t cached, Vectors and other data types should at least be able to be indices in tables.

I think that this maybe is enough to justify just implementing a hash system in lua, but, that’s super costly compared to being done in C due to how much it costs to index vector components.