Roblox uses rbxcdn.com to store user-generated static assets. From what I can tell, it is an S3 bucket. It organizes files using checksum hashes of the file itself (for example, https://tr.rbxcdn.com/db37c39c98d8dca0e6ec99c6264e68f9/150/150/AvatarHeadshot/Png). This way, if two identical files are generated, they can be served from a single location. This approach makes moderation more manageable and also conserves disk space. It also means that the assets are 100% static and not linked to an ID (if I update my avatar, my profile picture will not change and keep the same URI — a new picture will be generated instead).
I’m working on a project and I’m wondering how the hashes are calculated. I know I can obviously use an API to get the URI but that isn’t what I’m looking for. I’m looking for how hashes are actually calculated in order to get the hash in reverse.
On first glance, I assume it is hashed using MD5/4 or SHA1 because it is a 128-bit string. However, when I generated a checksum for mariofan9839’s profile picture with each of these algorithms, the hashes did not line up. This probably means it’s using a different or modified hash algorithm. It wouldn’t surprise me as both MD5 and SHA1 have documented collision vulnerabilities (unrelated: it would be interesting to see how Roblox reacts if you upload two images programmed to collide if they do use an insecure hashing method).
Does anyone know how these hashes are calculated? Is this documented anywhere? Sorry for the unusual question or if I put this in the wrong category.
I know very little about the question you are asking, but I do know quite a bit about how roblox likes their information / algos: private.
There might be some way you can reverse engineer your way into getting their cipher from the userids provided to the api, but you will have a very hard time. There’s also a pretty likely chance they’re using some kind of random functionality so that it isn’t a set in stone algorithm.
Again, just guessing here, but I would unfortunately assume the worst and say you have to use the API.
Thanks for the reply. I realize Roblox most likely keeps its secrets, especially for technical stuff like this. But I don’t imagine this is what it is. This is a hashing algorithm that would make no difference if was public because all you can do with it is generate more. I wouldn’t be surprised if it’s a default S3 “option” for managing unstructured files on a CDN (I don’t usually use AWS but I know how to use it) because I’ve seen multiple other websites use a similar system.
You can’t reverse engineer a hash. The point of them is to be one-way. Once you’ve hashed something, you can’t get the original value back. That’s why they’re commonly used to store passwords securely - no one will know the password but you’ll still be able to check against it by hashing inputs.
As I mentioned earlier, the hash does not correlate to any ID. There is no way you can tell where an asset originates from on the CDN - it is uploaded and essentially left there forever. If you change an asset, it will not change on the CDN because there is no correlation between the asset and the file, and will instead upload a new file to the CDN and store its hash in the database (which acts as the bridge between the CDN and Roblox). That’s because the hash in the URL is literally of the file itself (known as a “checksum” hash). You cannot get that exact hash any other way aside from uploading the exact same file again (aside from collision-inducing files, which the MD5 and SHA1 algorithms are vulnerable to). This is why the two profiles I mentioned have the same profile picture URL on the CDN — the avatars are the same, and thus the profile pictures have the same hash.
It can’t be random because it directly correlates to the file. If they produced a hash randomly, then two files that are the same would not end up in the same location.
It is possible but very unlikely that they have made their own or modified a preexisting algorithm. There would be no reason to do that, however, so I doubt they did. I imagine it is just a standard algorithm but it’s hard to figure out which without information from Roblox. Essentially all I know is that it is 32 characters (128 bits) which narrows it down to MD3/4/5 — but it could also be something like SHA-3-256 which allows you to truncate the output. My issue is none of these seem to be the answer, which is why I’m confused.
The API wouldn’t work for what I’m doing anyways as I’m essentially trying to get the hash from the file.
I’m hoping someone with experience working with the raw backend side of Roblox knows something about this, but I’ll most likely have to brute-force guess it on my own.