Community made content discoverability?

nicemike40 · March 2, 2021, 10:48pm

Hmmmm just thinking out loud here… the data limit for a single entry in a data store is 4,000,000 characters after JSONEncode.

Could you just store all the levels metadata in a few giant entries to a meta-datastore and, on server load and when request limits allow, just update your in-memory copy of all of those giant entries?

So you just have two datastores, neither of which are ordered:

Your metadata store, which is really just a holder for huge arrays of level metadata.
Your level data store, which is where you store anything that you’d need to actually load a single level (like block layout, etc.). This one would be a traditional map from level GUIDs → data.

Something like:

Example metadata datastore code

local function RefreshMetadata()
	local numPages = metadatastore:GetAsync("NUM_PAGES")
	local metadatas = {}

	-- collect all pages from meta data store into metadatas array
	local totalSize = 0
	for i = 1, numPages do
		local metadataPage = metadatastore:GetAsync("page" .. tostring(i)) -- dynamic paging
		totalSize += metadataPage["NumLevels"] -- number of levels in this page
		table.insert(metadatas)
	end

	-- merge all pages into one massive array (or don't do this and keep them separate?)
	local merged = table.create(totalSize)
	local lookup = table.create(totalSize) 
	local mergedIdx = 1
	for i = 1, #metadatas do
		local page = metadatas[i]
		for j = 1, #page do
			merged[mergedIdx] = page
			
			-- for O(1) lookup of metadata from level ID:
			lookup[page[j].LevelId] = mergedIdx

			mergedIdx += 1
		end
	end

	return merged, lookup
end

Pros:

Can easily process/sort local table for things like rank and timestamp
Fast lookup of metadata by ID
Can keep actual level data in another, standard datastore.
Can keep (UserId → List) info in another standard datastore to find levels by specific person

Cons:

Metatable datastore is going to be a complicated pain to maintain
A whole lot of memory usage (like probably enough that it kills this idea)
Sorting will probably be pretty slow, but you only have to do it once on refresh

Upon reflection, this is probably not workable. I think dealing with request limits would be easier.

Ukendio · March 2, 2021, 10:59pm

The game I previously mentioned, could handle approximately 200-400 unique parts (Before 4MB limit update). Users could instance a normal BasePart, colour it, size it differently and etc. This was important because it didn’t leave much budget for the system. If your users could only move/colour pre-made assets, there’s a lot of implicit optimization that can be done that is not the case for former example.

But I will assume worst case scenario!

On an abstract level, this is what you want:

However this is not ideal, and I will explain it a bit later. Mainly cloud services use relational databases, and we can gain those benefits with a entity-relational model. So we will implement a pseudo version of that! Essentially the same as your idea, one of the differences being that the levels are held by a separate host (globalDataStore).

Consider the use-case where the player wants to read 10 of the newest levels, you sort the value created_at_date, and you get instant access to levels. Vice versa your model, where you have to sort timestamp and double back to the player with the level key to get the level.

You could use a technique called sharding in combination of CDN (Content Delivery System), where we divide the levelService’s database into multiple hosts by their geographic location.

That way you can have an increased amount of total levels, just less per locale region. Also consider sharding metaData and load data in chunks (pages)! Instead of making one request to read all keys at the same time.

Note (not relevant to database design): Depending on the User Experience you seek, you will have a bit of fun with different data structures. There’s the sorted array which is both O(1) on best and worst complexity for accces (relevant for client) but for insertion and deletion, it is O(n), and this would be naive to implement. Stack would be pretty optimal for the server, but that’s not really important since clients won’t query the server 24/7. I suggest looking at skip-lists or balanced search trees if you expect the game to scale a lot.

EgoMoose · March 2, 2021, 11:28pm

Hmm so it seems the general consensus is that for this to be feasible I’d have to host my own DB. Not super keen on that b/c I don’t really want to incur and commit to those upkeep costs (it’s one of the main pros of using Roblox as a platform!)

I do appreciate the information that everyone has contributed to this thread though! I have learned a lot! Thanks!

Ukendio · March 2, 2021, 11:35pm

No!! I wasn’t one of those who advocated for a 3rd party DB. It takes a lot of work to maintain a db and it is especially frustrating when you’re directly accountable for its downtime.

You can make a pretty good data library with Roblox’s DataStore!
All methods and features I wrote about can be implemented in Roblox!

ArgonSuits · March 3, 2021, 4:04am

So, if I understand correctly, you are trying to create a content feed system. In other words, the Roblox website. Why not have a tag system, where people can assign tags based on the level, and some internal tags like, “new”, “popular”, “growing”, ect…
You could have the dataKey for the map, and a parralel part for the metadata, and give them a “summary” list based on an algorithm, or give them the “full” list (loaded and unloaded in segments) ordered by selectable metrics.
If you mean what method is best for discoverability, most places i see use a mix between general popularity, tag relation to “preferences”, and what an algorithm gives a chance for something new to explode.