Increase server memory or provide better memory tools

Preface: Empowering innovation.

Roblox has gone truly above and beyond what it used to be 6 years ago, and definitely for the best. The team at Roblox provided us with incredible tools to enhance the visual effects of our games at arguably low performance cost.
The introduction of PBR textures, material variants, better atmosphere, Future is Bright, Dynamic meshes, Skinned meshes, Rigging, Wind, and so on and so forth, has made possible what we could have only dreamed of in the span of a relatively short time, especially considering the maintenance cost and investment that those feature required and requires still. Truly, Roblox is in my opinion headed in a great direction, both for the players, the creators, and the company itself, but there is one area that has been lacking and that I feel has been overlooked now for a long time: servers.

Right now, our studio is hitting a brick wall which so far shows no cracks. The server’s memory limit of 6.25 Gigabytes used to be great years ago but with the new enhancement, it has now become insufficient.

I am usually not one to go forth and complain when something doesn’t grab me by the hand; in fact I think most programmers do like a certain amount of challenges from time to time. But with the new amazing tools that Roblox has provided us, it has been increasingly incredibly difficult now to use those features while also respecting the server’s limited memory. In fact, there are a lot of time where I’m completely puzzled as to what is there left for me to do better, in order to save again on more server memory.

Context:

Frontline: Karelia has been under development for 2 years now, with the initial release dating back from February 2022.

Initially, the game had only 3 maps, stored in ServerStorage, some of which were terrain regions with parts, some others were exclusively parts.
As time went on, we started adding more and more to our maps, in an effort to break records, in an effort to go truly beyond anything made on Roblox. We started putting an emphasis on map quality first and foremost, with great texture quality, using the full PBR capabilities, material variants, lightings, meshes, Intricate level design, fiddling with collision fidelities, hitboxes.

As of today, the studio is working on a truly huge update, update 1.5 brings in two whole new maps, new classes, additional material variants, more meshes for more details, etc.
But during our testing session, the server crashed after the maps were loaded 3 times, precisely 3. After fixing memory leaks, the issue was persisting.
So what’s the issue ?

Server caching, Physics Octree, Lack of memory management tools, and the 6.25 GBs memory limit.

  • The first glaring issue right now which already digs a huge chunk of usable memory is the Physics octree. The Octree by itself is a necessity, but maps that are located in ServerStorage, where no physic calculation is done whatsoever, are fully loaded into the physics octree, or at least are loaded into the “PhysicsPart” category of the server memory. This should not be the case at all, all maps combined takes up to 800 MBs of memory which is never going to be used for any physics calculation whatsoever.

Here’s a screenshot of the server memory of an empty baseplate:

Here’s a screenshot of the server memory of the same empty baseplate with all maps added into the server storage:

This is studio, so we can easily halve that number for an accurate representation of what the actual server has in storage.
800MBs of data that is, apparently, never going to be used.

On top of that, I tried to no avail, to set the collision, touch, query properties of all the parts in said maps to false, in an effort to at least remove them for the PhysicsPart tree, with no success.
Maybe am I missing something ?
When the maps are cloned to be loaded into the workspace, that memory category can raise over 1.5 gigabytes, and thankfully when the map unloads, decreases back to 800MBs. But there’s a catch, or rather, there’s a cache:

  • Workspace loaded maps (Cloned), which are then unloaded (Destroyed), are immediately cached by the server in the “UntrackedMemory” category. This is bad, really bad. All maps have their own quirks, with some of them having unique meshes that may not be reused in other maps, all maps have unions which are definitely not being reused elsewhere, and surprisingly despite the maps already existing in the server storage, the map gets fully cached when unloaded.

In the following screenshot you can see:
The initial untracked memory of the server during the map voting phase, which stagnates at 350 MBs:

Once our worst case scenario map loads in, the untracked memory raises up to 480 MBs, so far so good:

When the map unloads, and if we load the same map again, you can see a spike as the server caches the map as it unloads (unloading is deferred on this particular map, because of the huge data size it takes, despite our effort to minimize that impact). Once the server loads the same map again (using :Clone()), the server will reuse that cached data and the memory drops back to 480 MBs. Note the spike to 1.5 GBs.

But we already hit the wall; like we previously saw, the caching done by the server is detrimental to our project, because once another map is voted, the unloading phase caches all of that data (despite explicitly using :Destroy()) into untracked memory, and then loads the new map that doesn’t use the same asset as the other one. As such, we have a huge spike in UntrackedMemory that never fully goes down, ever:

We have 8 maps, as such this applies to all of them. This can lead to a huge inflation of the UntrackedMemory. Despite this, we intend to have more maps. Why ? Because this is what Roblox is all about after all. We want to break the grounds, we want to truly bring to the platform what has never been seen before, using all of the tools Roblox provides to us. The platform is going in a direction where experiences are now available for a more mature audience, where those amazing tools are at our disposal to be used, and we can clearly see it by the official funding of Frontline (Game funds project), where the guns, maps, character are all fully PBR, with absolutely mind boggling material variants, effects, sounds, etc.

But if we apply that logic to the full map roster, UntrackedMemory gradually quickly goes over 2.5 ~ 3 Gigabytes, coupled with some memory leaks, Roblox’s own systems, 44 players servers (and we also intend to have more), and the server immediately shuts down after 3 maps have been loaded in. Of course, we could do things differently, obviously there are things that we, as a studio, needs to fix. Memory leaks, offloading some work to the clients, etc, are all improvements that are necessary at this point, but they are absolutely ridiculous compared to the immense data that is stored by the server caching things that we do not want to be cached.

We want the server to destroy the map, not cache it. It is already there anyways, it’s already in server storage, is there a need for caching ? This is hugely limiting our ability to improve the game, and is detrimental to us. There is a loading screen when the server loads the map, so it doesn’t really matter if it takes 10 more seconds for the server to load the map if it doesn’t exist in that black box of a cache, over which we have no control.

Speaking of having control over memory, here are solutions that we discussed with the team over here:

  • Provide us with better memory management tools. Yes, this is straight and brutal, but like said plenty of times, Roblox is headed in a direction to empower creators and innovate; this is simply not possible if we do not have the proper memory management tools that lets us clean the cache or clean certain areas of the cache, or outright prevent caching for certain instance. A new destroy method could be implemented to prevent the aforementioned issue.

  • Increase the max server memory. Like prefaced earlier, 6.25 was very good a few years ago, but this is simply not enough now for games that are truly utilizing all of the capabilities that the engine provides to us. And I am willing to bet that funded projects may face the same issue rather quickly at the pace that they are going.

  • Do not populate the PhysicsPart memory category (which is most likely to be the Physics Octree part of the server, but that we have no certainty since there is no documentation around that) when anything is in ServerStorage.

I feel like this feature request is long enough, and highlights the major drawbacks of how servers currently behave right now.
I would like to conclude again by saying that this platform is going in a direction that is beneficial for everybody, better experiences, huge amount of varied & diverse communities, amazing tools, all of that are being limited by the servers which have been overlooked now for too long. It has become simply impossible to use the new tools to their full extent. This forces creators to make non-beneficial concessions like cutting down on content, and as such is a loss of revenue for both the platform and the creators themselves, and a source of immesurable frustration. We really want to prove that there is much more to Roblox than wacky blocky characters, that full immersion is possible, that triple A games are not just a dream but a real solid possibility. but for that we need improvements on the servers, or there will be no ways for us to use everything at hand other than barebone games or showcases.

413 Likes

I think I speak for all of us 195 users at the moment who have liked this feature request. I am absolutely speechless, and nothing needs to be said.

Except for one thing. Have you ever tried caching the map yourself so that it doesn’t go untracked? Parent it from server storage to workspace and then back to server storage.

Now, let me further discuss some points that may need clarity for other users:

We should really have more options in Workspace for server options such as memory to effectively fine-tune all aspects of the game, including memory. This could be an example for any lacking feature.

I’ve heard some pretty bad things about unions recently.

This is just one example. I know somewhere I heard unions cause performance problems, but I cannot confirm this is true. I’m sure someone in those 195 people who liked the post understands and probably would agree with me on this.

18 Likes

Hello !

Yes these issues are truly overwhelming. I don’t understand some of these design choices, or perhaps these are bugs that I should report using the bug report category. Nevertheless, we still need more memory or better memory management tools.
I mean, even just having access to the garbagecollect() method would help tremendously, not because it would magically fix all of the issues, but we would be able to accurately monitor when garbage collection happens (since we call it ourselves) and track the difference in memory, pinpointing potential memory leak sources.

I’d like to go over your suggestion of manually caching the map. I thought about that, I even thought about letting the clients instantiate the map themselves using the server as purely a data traffic, but one of our recent addition is map destruction and certain parts can be cut down into smaller parts thanks to certain type of explosives. What this means is that making such a system now gets exponentially difficult and it should have been done at the start of the project. I wasn’t really prepared to hit this kind of roadblock, and now the potential workarounds require a lot of engineering and rewriting vital systems that would introduce many potential A-tier bugs. It would need a lot of time, that I honestly don’t really have right now either.

Modern triple A games have a full control over their servers, which isn’t the case here. It’s possible to make great quality games on Roblox, but although networking, sockets, protocols, routing, APIs, etc are all handled by Roblox making things easier and cutting down on expertise and dev time, the servers hard caps are now very limiting especially again considering that the higher-ups want devs to make greater quality games with more mature, serious content and impressive visual qualities.

As for the union issues, I’ve heard a lot of stuff too and I myself experienced some issues with unions. Although rare, they are annoying. The main problems with unions is that, like any boolean operations, there are a lot of garbage vertices, extra edges, hidden faces that on a 3D modeling program you’d at least be able to remove manually after the operation. The topology isn’t great because obviously it’s executed by a code and as good as the code may be there are always edge cases that only (for now) a human brain would be able to process and fix accordingly.

I already did a good optimization pass on certain maps, turning background objects into super bared down meshes with a simple 256x256 texture. It seemed to have helped, but it still doesn’t fix the issue about the physics part being loaded into the memory while still being in the server storage. Makes no sense to me, especially when they are set to Collide/Touch/Query false with box collision.

14 Likes

(Hint: You can bypass the server memory limit all the way up to 12500mb by setting the max players to 700 (provided you have access to beta features) and then using a lobby system to manually limit players per server)

But yes, better memory tools would be vastly helpful.

21 Likes

The problem with that is, as far as I know, website server browsing doesn’t track those custom servers / reserved servers with special places and players will think the game is empty. This may impact the number of players joining the game and completely kill the playerbase. I never really liked teleportation service to reserved servers for that kind of reasons.

17 Likes

Thats hasnt stopped all my various games, Rise of Nations included which numbers in the thousands.

I wager its better than servers crashing very soon.

8 Likes

Your playerbase being established helped with that. My player base is only in the hundreds, and it can be pretty demotivating from players to join a game when there is no servers being listed in the web browser, especially for a desktop only game.

16 Likes

It really isn’t

I too had to start from somewhere.

A playerbase of hundreds is plenty that a lobby system would work without fail. What really matters is what player count they see when clicking on your game. Hell even Naval 1945, one of my other games with a concurrent playerbase of dozens, still gets sizable amounts of players, with or without the lobby.

12 Likes

I will keep that in mind. However, like i previously mentioned, this is the kind of effort that I don’t really have enough time to put into. On top of that we shouldn’t really have to rely on such hackiness in the first place. I’m not really sure if I want to set aside the simplicity of just clicking on a button to play the game in favor of making a lobby for more server memory, and the potential chance of losing a sizeable portion of the playerbase even if you guarantee me that this won’t happen.

All of the issues I originally brought up in the feature request just should not happen. The octree being loaded with maps in server storage, really makes no sense whatsoever to me. Sure you have to keep the instances somewhere, but their collision data and partitioning do not have to be physically loaded into the memory.

This feature request would literally benefit the entire platform, both games that are already live and future games, especially the upcoming PBR projects that are being funded by Roblox themselves.

11 Likes

I am surprised the memory isn’t at least 12 GB. RAM isn’t that expensive these days either.

16 Likes

Considering the average server has at least 32-64 GB of RAM, yes, it is not expensive.

18 Likes

I believe I am experiencing the same thing in my game that cycles between terrain maps. Check the topic I made here and reply with your case if you can: Terrain.PasteRegion memory leaks
I am convinced there is some kind of memory leak when loading/unloading terrain regions. Clearing the terrain doesn’t clear the untrackedmemory (why is it even untracked??). I had to implement an auto restart after only 8 hours because servers would reach their memory limit before then due to the map cycling. It’s really nonsensical that so much stuff gets thrown into the ‘UntrackedMemory’ category with no way to even determine what it could be.

9 Likes

Interesting !! I think this feature request should turn into a bug report of its own. I’m pretty sure it will get even more traction there than here. For some reasons it feels like the Roblox Staff isn’t reading a whole lot in this devforum category…

6 Likes

It would be nice if Roblox would do a hard GC sweep before just shutting the server down too as it nears the limit. The incremental sweep model is nice for performance but I’d take a 50ms freeze over a server crashing any day.

One thing I’ve noticed is that if your game actually just needs a lot of memory is that it can take a very long time for GC to actually crawl through the whole heap, which is problematic since that means it might take 2+ minutes for unreferenced data to be collected and cleaned up. A lot of waste occurs in that time. This can also make you think you have a memory leak when you actually don’t, you just need to wait around for 2 minutes before the memory is deallocated. So perhaps some way to signal to the GC that we’re done with some data and have it force clean up on the next step might make sense. I think Roblox is really trying to go for an “it just works” approach, but I think there are always going to be edge cases where it doesn’t.

One case I’ve observed, and this could 100% be optimized, is where we have this really old script that has a Heartbeat connection which calls Players:GetPlayers() every step, so ~60 times per second, and we have 55 player servers, so it holds on to 55 player references 60 times per second for 2 minutes, like 396k references to players before GC happens. This only consumes a few MB of RAM but it appears as though it might be a memory leak when it, in fact, is not. When you think about all of the different loops allocating data that don’t actually keep any references past that single frame it can really add up.

Obviously you don’t want to just clean up all references every frame, but some sort of way to prioritize the GC might be helpful here, combined with a full heap sweep before killing a server and I’d imagine most games that don’t have actual memory leaks might be able to survive on 6.25GB of RAM.

14 Likes

Interesting. Does setting it to nil in the scope of the loop change anything?

4 Likes

No, setting something to nil doesn’t deallocate the memory, it just removes a reference to allocated memory, the garbage collection will eventually crawl over the portion of the heap which contains that allocated memory and then deallocate it, provided there are no strong references to it remaining.

5 Likes

Try and have a lookup table in which you add/remove players as they enter/exit the game or during the relevant events in your game. This will help.

Like said in the feature request for me the highest impact is made by maps being in server storage and fully loaded into the physics octree and loaded into the untracked data as it is being cached when removed.

The kind of semi-micro optimization that you’re taking about I kind of have ruled out over the whole code. Sure there are some mem leak as the servers ages up but it’s nowhere near as bad as it used to be before the aforementioned problem arose and I had to tackle almost every memory leak before releasing the update.

Either 6.25 gigs is not enough, or we need better tools to make sure GC happens when a map unloads, or Roblox needs to fix the caching/octree issue.

4 Likes

I wasn’t implying that my experience was the chief issue, or that it was even an issue at all. My whole purpose with my statement was to point to the fact that having tooling to prioritize deallocation of memory would be ideal, and such a case was one of many possible scenarios where it presented itself. What I’m essentially saying is that as the heap grows, the problem that I’m talking about also grows, because it takes GC longer to crawl the heap. Why is that a problem? Because these sorts of “memory growths” are extremely common, and they add up to several hundred MB of data, and unless you’re aware of what I just explained then you’re most likely going to think this is a memory leak, as I did. Granted, it’s all more about being educated in how GC works than it is about the “issue” itself.

This is a feature request, not a request for help. I’m simply sharing my experience with memory use. I’m quite aware of how simple the problem I explained is to fix, I just thought it was a noteworthy experience that other developers might also benefit from hearing about.

If you read what I said, I literally agreed with you that we need better tools to more aggressively deallocate memory or we need more memory. :wink:

3 Likes

Sorry, I completely misunderstood what you said.

You’re right, what you provided as a personnal experience is a good example of how easy it is to grow the heap and how long it might take for the GC to collect and free all of that memory that can quickly grow to hundreds of MBs while also foolishly thinking it may be a mem leak when it’s actually an intended behavior.

In the long run you start figuring out that what you’re seeing is not memory leak but you kind of have to have some experience in GCing and caching to realize that.

Apologies about the misunderstanding !

4 Likes

Have you considered to use multiple places within one experience? Make a place per map, one lobby place and use that to teleport people around?