Awhile back I was shown a way of compressing and decompressing strings was by converting the string to 0’s and 1’s and then from there I could compress it into a smaller string.
For example 111100011 would become 0,4,3,2 making the string much smaller (in bytes).
How could I go about implementing this so I can convert a string like hello! into the 1’s and 0’s then into the compressed format? (and same with reversing it)
If you want to know why I need this, it is to compress the strings I am sending via MessagingService.
You’ll probably end up making the size of the string longer if you store/send it as a string of 1 and 0 characters. You’ll also need encode it as a base64 string (or some other encoding scheme) when you want to send/store it.
Doesn’t seem like the correct algorithm, I think it converted into binary, then it ran through a function to turn the binary into the compressed string.
The function worked like this:
It takes in a string of 1’s and 0’s and loops through getting the length of the batch of 1’s or 0’s and then adds it to the list:
1110011110 would become 0,3,2,4,1 since it starts at 0 and grabs the first set of 0’s and determines length. Once it gets the length of the 0’s, it stores it, then grabs the next set of 1’s which the length is 3, then it gets the 0’s which is 2 for the length and so on.
Some examples:
1110000110000001 → 0,3,4,2,6,1
000110111101 → 3,2,1,4,1,1
Still, you’ll be making a lot of strings longer. The character A (10000010) will get encoded as 0,1,5,1,1, which is 8 characters longer than the string itself.
If you are trying to [actually] compress strings (not serialize), I think you should think about how they would be stored in binary
each character can store 256 (2^8) options, and is the equivalent of 8 bits
only dedicating each character to your 10 numbers and a comma (11 total options) wastes 245 possibilities per 256 (so your algorithm will actually make strings 23.27272727272727 times their initial size (256/11)-assuming previously you made use of all 256 characters)
The optimal compression algorithm thus MUST work in binary
If you were to make your algorithm in binary, you would need to basically store a bitbuffer (more coming on this soon ; ) ) of integers
but the question arises of how large each integer should be…if they are too small (use too little bits) then they will overflow and you do not want that
if they are too large then you are being inefficient
so you want to tune the size if you were to use your algorithm (BUT STILL DON’T USE IT)
so what you want to do is convert it to binary (if you use all 256 characters then it is already in binary) and then apply a lot of standard compression algorithms (@XAXA’s huffman is a great one you could use)
If you don’t use all 256 characters, you should convert it to binary using a bitbuffer such as (Near Optimal DataStore) BitBuffer and then apply the compression algorithms (again, you want to spam standard lossless compression algorithms and find the optimal permutation to use-sometimes you may want to use one then another and then the original again)
Based on what you and @XAXA said, if I encode into hexadecimal (to allow me to use more than 256 characters in the Huffman algorithm) (I’ll need to figure this out) and then I compress using the Huffman algorithm, should that work in compressing the strings?
hexadecimal is base 16 (2^4) so you are still missing out on 256-14 possible characters if you are storing as a string
can you tell us a little about what you are specifically trying to compress? if its an array of numbers then that can be easily stored in a bitbuffer for something on the order of 10,000% savings and you won’t even need to compress with the standard algorithms (they probably won’t even be able to get a bitbuffer down to less than 30%)
Since all the other players in the lobby need to know it. (This is part of a huge cross server matchmaking system,) The username is so it can be shown in the gui and the userId is used for backend code
can you not use Players:GetNameFromUserIdAsync(userId)?
edits:
local function serialize(plrs)
local buff=bitbuffer()
buff:ushort(#plrs)
for _,t in next,plrs do
buff:ulong(t.userId)
end
return buff:ds()
end
local function deserialize(s)
local buff=bitbuffer.ds(s)
local plrs={}
for i=1,buff:ushort()do
local uid=buff:ulong()
plrs[i]={
userId=uid,
username=game:GetService'Players':GetNameFromUserIdAsync(uid)
}
end
end
this code will serialize and deserialize your players array into a bitbuffer (use the bitbuffer module I linked) and you shouldn’t run into issues with it getting too big at all
wiki doesn’t mention anything about a rate limit but you should test it out just to be safe
players dont need to be online/in same server for this function
That could work, long as the player doesn’t need to be online the same server for that. (and the function isn’t ratelimited)
I haven’t written the GUI yet, so I already changed my code to not have the username inside of it anymore. I can manually implement the GetNameFromUserIdAsync(userId)
It still doesn’t solve the problem with the JSON gets too big from tons of players joining the same lobby though.
this solves that problem
try running serialize and checking the length of the output string and compare it with json serializing when the player array is like 1k or 10k
local bitbuffer=_G.bitbuffer
local function serialize(plrs)
local buff=bitbuffer()
buff:ushort(#plrs)
for _,t in next,plrs do
buff:ulong(t.userId)
end
return buff:ds()
end
local function deserialize(s)
local buff=bitbuffer.ds(s)
local plrs={}
for i=1,buff:ushort()do
local uid=buff:ulong()
plrs[i]={
userId=uid,
username=game:GetService'Players':GetNameFromUserIdAsync(uid)
}
end
end
local plrs={}
for i=1,10000 do
plrs[i]={userId=113926330,username='CoreDev'}
end
print(#serialize(plrs))
print(#_G.http:JSONEncode(plrs))
(_G is conecting to my framework so ignore it)
output is:
81544
420001
so its 5x improvment (this uses base 96 or something not base 256 because I’m not sure if whatever you are doing supports base 256 but you can swap out the ds calls with the 256 ones)
Just discovered a mistake in my sample JSON, I also supply the jobId of the server for each of the players. (There’s a valid reason) So this brings me back to the compression, how can I do that?
i thought there was some api where you could get the server the player is in, but if not you should take advantage of the fact that the string is alphanumerical when serializing it (no method currently exists for that in my bitbuffer module yet, i will actually right one right now so it will take like 3 mins-in the meantime maybe look for an api? or will that not work for you?)