GPU Programming

With the recent project for compiling Luau to native code, editable images & editable meshes, Roblox engineers should be aware of the originally present and growing need for GPU access.

To many the thought of using a scripting language for GPU programming seems impossible, if not an oxymoron. Libraries that utilize GPUs in scripting languages have long been available. Consider for example Torch (lua), PyTorch (python), Tensorflow (python), CuPy (python), and Numba (python). Even direct access to CUDA wrappers and compilation to kernels are available in PyCuda (python) and especially TerraCuda (lua), which uses Terra for types similar to Luau.

Access will allow developers to efficiently run kernel algorithms with are difficult for CPUs and especially scripting languages largely due to the data accesses. These algorithms are often easily parallelizable en masse and so suited perfectly for the GPU.

Examples are such kernel algorithms are image processing / filters & convolution & editing, terrain generation, voxel computations, pathfinding, compression, and many graph algorithms including neural networks and support vector machines.

The most obvious challenge to this besides designing the API is hardware support. I don’t believe servers currently have GPUs, and not all player GPUs support CUDA.

If you have a use case that I missed or see a challenge that overlooked, please share it with us.

104 Likes

This definitely might be a issue on mobile devices. Some of them use APIs as old as OpenGL, but most of them use Vulkan. It seems like it would be a very big thing to maintain

3 Likes

I was thinking of only supporting it for CUDA-enabled cards. In that case, just the CUDA library would be required.

But I agree, if access to graphics libraries was considered then the Embeddable Subset (ES) of OpenGL would be supported everywhere. Vulkan has pretty universal support as well with MoltenVK to port Vulkan to Apple hardware.

2 Likes

That might create issues if game-essential logic is on the GPU (pathfinding, game graphics). Maybe GPU code would be allowed to be ran on the server-side, as Roblox’s data centers probably have GPU’s in them

2 Likes

Given more than half the platform is mobile, and more consoles being used, I doubt the cuda pc market share would be big enough for them to justify it.

Again just a guess but I wouldn’t be surprised if barely 1/10th of all Roblox devices would have a cuda gpu.

Cross-platform is possible (Roblox proves it), but probably at the cost of rewriting code to be optimized on separate platforms.

Access to GPU would definitely be useful though.

4 Likes

This wouldn’t be required as you can simply use GLSL with compute shaders.

2 Likes

This would do wonders with creating shaders.
A world where Roblox devs aren’t limited by engine but rather their brain cells.

9 Likes

For GPU compute, there is OpenCL. I believe the current version is somewhere in the 3.x range. It works with any graphics chip that is a GPU. If Roblox allows GPU access, this is what they would use. I haven’t really messed with GPUs too much beyond the hardwired logic of video accelerators from the 1990s, but there are some very specific requirements…like each memory frame must be aligned on a page boundary and things like that.

GPU code is not like regular assembly code. It’s a form of VLIW (Very Long Instruction Word originally pioneered by Transmeta in the early 2000’s) which is a type of microcode. So all the GPU instructions resemble microcode instructions. If you understand how CPUs work in general, then this will make sense to you.

In a nutshell, microcode represents a series of signal states that operate the hardware of the CPU. Each instruction, if it’s not a RISC instruction, breaks down into a series of microcode instructions while a RISC instruction breaks down into 1 microcode instruction (Execution Units of RISC processors are wired logic). The CPU’s Execution Unit decodes CISC instructions and operates control lines to route data through the different stages of execution, and to select specific results. Remember, the CPU’s units, like the ALU, performs all computations at the same time (add, subtract, and, or, not, xor, etc…) and the control lines select which result to use based on what the instruction was. Pipelined CPUs (all modern processors are pipeline CPUs), at the very least have 5 pipeline stages: Fetch, Decode, Execute, Memory, and Writeback. The decode stage is where the instructions are decoded into microcode to operate the various units of the execute, memory, and writeback stages.

In reality, it’s much more complicated than that. But that’s what it is in a nutshell. Programming on this level is difficult because you have to understand what each bit does and code it, for each instruction. If the microcode word is 72-bits long, then you have to set all 72-bits for each instruction. Compilers help with some of this, but this level is programming is below bare metal because the software controls the hardware directly and there is no room for error. One mistake can crash the entire state of the GPU and a physical reset will be needed to recover it.

On top of that, different GPU manufacturers have different instruction sets. nVidia is different than AMD which is different than Intel. Due to these complexities, I serious doubt that Roblox will allow developers access to this computing resource. With that being said, there is nothing prevent Roblox themselves from using it.

10 Likes

So, what you’re basically asking for is compute shaders? The way you present your request makes no sense in a game dev environment, let alone Roblox. Just because non-CUDA cards are a rarity in certain workloads, doesn’t mean the same when it comes to games. You’re essentially asking 100% of non-PC users and 20% of PC users to be locked out of this. All cards support OpenCL, but AMD also has HIP, and I heard that Intel apparently has DPC++.

You don’t need to know CUDA for this. Any reputable engine that supports these shaders will offer some kind of a universal high-level language. Unity uses HLSL, which it can then compile to a language your target API uses (like GLSL). The same goes for Unreal.

I’m not against your request as I’d love to use the GPU to speed up some dumb compute tasks, but you didn’t really explain what you need this for. HLSL and GLSL already allow compute shaders. It might not be as fast as using CUDA directly, but it allows at least some level of cross-platform compatibility. And from the way you worded this, it sounds more like you want to train models on Roblox’s servers.

The biggest problem here is the before mentioned compatibility. Most engines will ask you what platforms you want to support. This is because some devices simply can’t do certain things in a performant way. In short, phones are holding everyone back once again.

5 Likes

No, it’s the opposite of probable. Game servers have nothing to render or do work particularly well suited to a GPU today, so it’s not likely the hardware SKUs they use to run game servers currently have a GPU at all.

6 Likes

My background in GPU programming is mainly HPC, although I have built my share of triangle renderers in various languages. Thus I’ve worked more often with kernels than compute shaders. I don’t envision supporting developers creating their own rasterization pipeline in Roblox (although doubtlessly some will attempt it as they already have). They don’t need access to a whole graphics library or pipeline. However, I agree that both a compute shader or kernel would get the most bang for Roblox’s buck.

Since you wanted to know more about my use cases, I created this topic because I’m currently working on a custom voxel implementation with buffers, editable meshes, and JIT to native. It would greatly help if I could upload chunks to a GPU and let the GPU do the heavy lifting of computing the mesh. The chunk size is currently 32x32x32, so it’s just begging for warps to run over it. It’ll definitely be better than having a client CPU do it.

The question now is: What is a common subset of features for either compute shaders or kernels that would bring the most benefit to Roblox developers with the most uniform implementation details for various platforms and minimum development time?

I wonder how the RCC thumbnail rendering works. Maybe it’s on dedicated servers or CPU based rendering?

I guess they use a GPU, though that isn’t really the point. Just graphics wise, a single 1050 per thumbnail server should be able to handle the entire playerbase as we’re talking about rendering like 3 images every 1 or 2 weeks per active player. They’d have to equip every game server with graphics cards capable of providing client-grade performance for each instance running on them.

They would just have a separate bunch of machines for that specifically instead of needing to equip each machine running game servers with a full on GPU.

1 Like

This would open some interesting possibilities such as the ability to run local LLMs. Definitely thing it’s something Roblox should consider.

1 Like

Continuing the discussion from Summary of the Creator Roadmap AMA (Jan 24, 2024):

4 Likes

An easy way for roblox to implement this is to have a user-facing “GPU Compute” library that has the backend as either OpenCl or Cuda (maybe Close to Metal)
if neither of those are avaliable the GPU library just does the computations on the CPU like it does currently, maybe with a few more optimizations

2 Likes

Couple things: Great in theory. Horrible in practice.

As scalable as Roblox is, I don’t think that the amount of time, effort, etc is worth the absolute pain it would be to bring this to life. Roblox would never give us direct GPU access so what we’d get would be extremely limited.

On the server side, having a GPU with CUDA support & being able to handle all that would be extremely expensive. Roblox would have to overhaul at some point because, typically, a data center / server would have nothing to render / compute that extensively to justify having next-gen CUDA cards. For that investment to be worth it, we’d have to lose somewhere to the point where we’d be at a net-loss (usage, heavy rate-limits, etc).

On the client side, not all devices have that type of support. And not all graphic devices have the same level of support internally. As a developer, I’m not going to target niche things that very few users can truly appreciate. Therefore, less likely to spend or invest time into using it.

Fundamentally, the reason why Roblox exists and thrives is because people with ok-to-horrible devices can play the same quality on any device. In gaming or virtual experience building, that’s extremely rare to see. Most games that people play these days are on semi-beefy machines loaded with all these cool things. That’s why those studios can get away with it. Because they market to a niche user-base that can actively afford those things. I think if you compared the amount of people with that level of hardware to the games that they play, a minuscule amount of that would be actively playing Roblox.

I’d argue that it’s synonymous with each other in this context. If you don’t have hardware support, you have no reason to build an API. If the API doesn’t adequately utilize supported hardware, you lose appeal, time and money.

Not to mention, I’d generally be scared to play a Roblox game knowing that the developer has access to my GPU. You can run some crazy, malicious things. Even with the obvious safeguards that Roblox would put before it reached us, I still wouldn’t trust it. But that’s just me.

So it boils down to:

  • Cost vs. Benefit
  • Accessibility
  • Inclusivity

Seen as Roblox thrives / survives off of scalability and accessibility, unless something changes drastically, I don’t see it happening and honestly, if you want to do that sort of stuff… that’s why other engines exist.

–
FWIW – A lot of studios, at least outside of Roblox, rent out GPU compute instances. Since Roblox only supports REST, it would be cumbersome to use but not impossible.

  • Nvidia has their NGC (Nvidia GPU Cloud) containers which can be deployed on supported hardware.
  • AWS EC2 has Nvidia P and G instances. Varies from Nvidia A100 to V100 and beyond.
  • IBM has the IBM Cloud.
  • GCP provides access to Nvidia Tesla(s).
  • MS Azure allows you to use Nvidia GPUs via VMs: NV, NC, ND series.

YMMV in terms of expense.

5 Likes

Lately I’ve been working a lot on a 2D game and just now I’ve been trying to add lighting using EditableImages but calculating lighting on the CPU is very resource intensive so having some kind of access to the GPU would make my life way easier.


notice the fps drops and CPU usage while being around multiple light sources

1 Like