Storing Massive Amounts of Data in a DataStore

Hi, I’m working on a certain project, and i really need to store two sorted tables (arrays) with 15 million numbers (1-100,000,000) each.
I assume that putting them into a script is impossible, so I want to use DataStoreService.
I am aware of the 4194303 byte per-key and “total keys” limits, but i can still store all the data with partitioning. I also don’t worry about the read/write speed.

My question is: Will the data store be able to store all these numbers? (probably up to 200mb)

are you sure that this is the right way to go? 15 million is an insane amount of data to store; 15m key/value pairs where the type of the array is {[number]:number} would add up to well over 200mb and i can’t think of any real use cases to store this much info

edit: i tried testing out if a lua table could handle all this data here but the site just crashed both with table.create and numeric for, i’ll try it out on roblox later since i’m out right now but if the site crashes i doubt it’ll work

2 Likes

I probably won’t be able to answer your question but could you explain why you would need to store such large numbers?

There might be alternatives if you tell us what you need / are using it for so that others could propose a different solution, or atleast understand what your end goal is

1 Like

I’m a dev of “Every Roblox Game Ever” and I’m making a new project called “Every Group Ever”. I won’t go too much into the details but i wrote a python program that retrieves all groups from groups.roblox.com and sorts them by popularity (member count).
I’m pretty sure that this is the only way to sort all groups by popularity and there is no endpoint for that (I hope that someone proves me wrong).

The reason why i want to store two tables is to be able to quickly lookup on which position (by member count) a group is.
For example:
There are three groups with IDs: 1, 2, 3
The member counts are: 1: 500, 2: 700, 3: 600
Table 1 (IDs sorted by popularity): {2, 3, 1}
Table 2 (For looking up on which position an ID is in the Table 1): {3, 1, 2}

Table 2 isn’t nescessary, it just reduces the time complexity from O(n) to O(1)

I also considered creating my own API endpoint for this data, which would be a good solution, but I need someone to tell me what’s the easiest way to do that. (I don’t have my own server and can’t keep my PC running all the time)

while i’m not an expert on hosting api endpoints it’d help you get past the limitations of datastores, i tested the massive table out and while it did work in luau it’s still dangerously high and made the command bar extremely sluggish afterwards

1 Like

Here’s what I found:
You can’t paste a 15,000,000 element table into a script - studio became unusable when I pasted one with just 100,000 elements.
But if a table is generated during runtime, it can have up to ~60,000,000 elements before a “table overflow” error (regardless of the size of each element).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.