@Mr_Root, @nomer888 and @ZarsBranchkin asked about the user-created shaders. They are referring to the hack week project by @darthskrill: http://blog.roblox.com/2016/01/hack-week-2015-shaders/
Here is why you can’t have nice things (warning: this is long and technical; I’m putting it here since some people asked me to share this publically, and I don’t want to create another thread in Public Updates just for that).
The way the hack week project worked is by creating a new type of script that contained shader code written in HLSL, had some parameters tweakable by Lua scripts passed in, and then rendered full-screen passes with each shader, transforming the source picture into what you see on screen. As an input the shader got textures with color and depth information for every pixel - which is how you can do blur, depth of field, sun shafts, etc. etc. - that is, a formula that just transforms current pixel’s color is not enough, you have to have access to any pixel on screen.
When we ship a feature, we generally can’t unship it. So how do we ship this one?
Picking a shader language
We need to pick a language you're writing shaders in. We have to pick one shader language that works the same across all platforms and will work the same across all future platforms. Such language does not exist.
We currently use a complex pipeline of transforming HLSL with some C-like macros into either HLSL for Direct3D 9, HLSL for Direct3D 11, GLSL for desktop OpenGL or GLSL for mobile OpenGL (actually there are 4 GLSL variants now - for OpenGL 2 and 3). This pipeline is full of complicated code we did not write. It does not always work perfectly - we sometimes have to adjust our shader code to work around bugs of this translation layer to make shaders work on Mac/mobile (OpenGL).
Also when we were shipping Direct3D 11, we had to introduce a bunch of complicated macros to work around bugs in HLSL compiler that translates D3D9-style HLSL into D3D11 bytecode. This compiler is not open-source so we weren’t able to fix the compiler. I don’t like closed-source libraries :-/
So as you can see, we don’t even have one language to start with. Sure, we could just give you access to the exact same thing that we use - which is what hack week did - and you’d have to deal with the consequences of shaders potentially miscompiling or being weird on different platforms. When we introduce support for Direct3D 12 or Vulkan or Metal this would mean more macros or even more translation code with new bugs. We currently use a pragmatic approach of dealing with bugs on a case-by-case basis - sometimes we fix the shader code, sometimes we fix the translation layer to work around the bugs.
None of these are practical if you imagine that every single user can create a shader. We don’t want people to have to be experienced graphics programmers to use any part of ROBLOX.
Compiling shaders
Now it gets worse. We have to compile this shader code to something our rendering API of choice recognizes. Most APIs do not work based on source - they work based on custom binary formats that vary per API or platform (Xbox One uses Direct3D 11 but has a custom bytecode format)
This compilation process is slow and - in case of OpenGL - uses the third-party software that was never security-tested so I’m sure there’s LOTS of ways to exploit it to do bad things with the client. The compilation is also not guaranteed to be available on all platforms (some console platforms prohibit runtime shader compilation). Finally, on some platforms the compiler is proprietary - we can’t ship it with Studio so we can’t precompile shaders for all target platforms when you publish (plus this’d mean you have to republish the place to make it work with new platforms or render APIs - Unity can work like this but we can’t).
So if we compile at runtime we’re exposing our users to exploits and slowing down the game launch by potentially tens of seconds.
If we compile during publish we’d have to ship compilers we can’t ship with Studio, and we lose compatibility with future platforms.
If we compile on the server, we’re exposing our server infrastructure to exploits (which is super scary). If somebody discovers an exploit in one of the target platform compilers and we’ll report it to the platform vendor (we don’t have the source) - until they fixed it we’d have to disable user shaders for that platform.
Also - as mentioned - some of the compilers are proprietary and closed-source. This may restrict our choice of server platform if we compile on the server - what if there is no Linux version of the shader compiler?
Maybe make it visual?
Some engines (like Unreal Engine) have a shader node system where instead of writing shader code you build a graph of nodes. It’s very visual and generally people who aren’t programmers love it. We could build something like this.
This solution removes some of the technical problems from above - we would not have to deal with complex text-to-text translation software we did not write that has bug and exploits.
It still has an issue that we need to generate target platform code somehow, and it’s not clear how to deal with issues highlighted in “compiling shaders” part above.
We’d have to design the node system, which is pretty big and involved process. How much functionality we expose? Are there conditions? Are there loops? How do you write a radial blur with 17 samples without using 17 nodes?
We’d have to implement the editing flow for the node system in Studio - it’s a brand new editor where you can place nodes, connect node inputs to other nodes’ outputs, etc. This is a lot of engineering.
So this is also a pretty significant effort. I feel like overall this is closer to what we could have shipped, but note that it only really removes one layer of problems - dealing with another text-based language - while introducing new ones (complicated design & implementation, potential limitations as to what kinds of shaders you could create and how efficient they can be)
Inputs - values
This is pretty straightforward - we’d just map children of type Value to parameters in the shader. This would work regardless of whether we’d pick text or not.
Shader math is pretty expensive. If we had a visual node system one of the important components would have been an automatic “constant folding” process - we’d find subgraphs that are completely driven by shader inputs and precompute them once per frame.
Inputs - textures
There are questions about the possible inputs we could provide to the shader.
There’s a variety of data available that we use in different passes:
- Scene color (with transparent objects, GUI, etc.)
- Scene color only for opaque objects (available on 7+ quality only)
- Scene depth only for opaque objects (available on 7+ quality only)
- Accumulated glow factor from neon
This data changes encoding release-to-release. For example, before post-effects, we had scene color and glow factor packed in one texture; scene depth was packed in two channels of another texture and reconstructed using some math. After post-effects we have scene color and glow factor packed into one texture, but scene depth is now just in one channel of another texture - no math needed for reconstruction. In the future even the scene color may become encoded in some way.
If we blindly expose these kinds of details to the shaders we’d sacrifice our ability to develop rendering changes. So we’d have to come up with a set of APIs that we guarantee to work within a shader, and extend the shading language with them.
Finally, it’s very important for shaders to have access to other textures. We only shipped 4 post-processing effects but one of them already has a custom noise texture that’s being fed in.
This is also relatively straightforward - we’d have to support Image assets as inputs to the shaders - but also there are some questions about the specific setup for this (for example, customizing the filtering type between linear and point filtering is pretty important for some use cases).
Notice that we were ONLY talking about how to make it WORK so far. Not run fast - just work. Let’s now talk about performance.
Raw shader performance
Writing fast shaders is hard. Writing shaders that are fast on many platforms is harder.
When we write our shaders we generally have a balance between engineering effort and performance that we want to achieve. In some cases we can spend a week to optimize a single shader, if it’s really important. The optimization process frequently involves using complicated proprietary tools that give you precise information about how the shader executes on a given architecture and trying to optimize the shader for that - we generally pick a lowest performing target for this.
An additional important component of our shader optimization is shader code review. While we review all C++ and shader changes in ROBLOX (this involves at least one other engineer reading your changes to the code and suggesting improvements) and some of the review comments are about performance, shaders are more important - it’s very common to spot tiny inefficiencies in code review and correct them. This process requires a lot of expertise and effort. Something as simple as “a * b * c” → “a * (b * c)” can make a difference.
So with all that being said, what do we do when users create shaders? Remember - you will be writing a program that executes for EVERY SINGLE PIXEL. On Xbox One there are almost 2 million of them. Making it fast is hard.
It seems like we’d have to have performance guidance. It’d be really common for people to implement or copy&paste a really complicated shader that’d run fine on their GPU, and then most people can’t play their game for performance reasons.
We’d have to provide some tagging so that you can mark your shaders as gameplay-critical or not - so that we can disable them based on quality levels.
We’d have to provide you performance guidance in Studio - as in, how fast/slow do we think your shader set will run on less powerful graphics cards?
Multi-resolution effects
We've done a lot of optimization on the current effects that utilize some tricks to make it possible to compute then in reduced resolution without sacrificing visual quality.
This is a pretty tricky problem. Reducing resolution during effect computation frequently introduces extra artifacts that you have to counter. Some effects could run fine at half-resolution and some could run fine at quarter-resolution. Running a shader at quarter resolution is 16x faster than running it at full resolution - the shader effects that we shipped are much more accessible because we put a lot of effort into it.
Some effects have to be split into multiple passes to make this possible - maybe one pass is 1/4 res and another pass is full-resolution.
Some effects have to be split into multiple passes with same resolution to make them faster - separable blur being the classical example.
You may think of this as just an optimization that we don’t need to support initially, as a potential future improvement to the system. I’d argue it’s critical. The spread in graphics hardware in ROBLOX is huge. You may be able to run your full-screen full-resolution radial blur shader with 10 taps on your NVidia GTX 970 and it’d take 0.5 ms and you’d happily release an update to your game. However on many many graphics cards this shader would take 10-20 ms and either their game will be unplayable or your effect will be invisible.
This means we need to share this responsibility with developers - we need to design a system that allows for multiple shaders to work at different resolution and to feed output to each other, and you have to learn to use it. Needless to say, this is a big additional complexity.
Folding effects and final compositing
Our system is now organized around having one final compositing pass (which is where we do color-correction). Since we pretty much always do it we optimized it carefully.
It’s very very important for performance to fold multiple effects in one shader in certain cases - that is, have a shader that is capable of computing both at once. For compositing shader this means that we tried to fold application of as many effects into it as possible (and have a simpler shader if the effects are disabled) - for example, we compute glow at low resolution but we apply glow in the final compositing shader.
If we did not do this the cost of some effects would double. This is not trivial.
We also fold some effects together in interesting ways. For example, neon and bloom are really one shader. This also significantly reduces the cost.
Finally, this folding and working at low-resolution for some effects introduces some challenges we have to work around to fix. For example, imagine you have sun rays that you compute and then add them to the image in the final shader. And you also have blur that you compute and then add it to the final image. How do you make sure blur is blurring sun rays if they are computed separately? This needs custom adjustment logic and shaders.
---
So, TL;DR - this is a very hard project with many open design questions and many hard engineering problems.
I’d estimate that in the time it takes us to resolve these issues and implement the system we’re happy with we could ship 30 new polished post-processing effects with different ranges of complexity.
That is, if these issues are even resolvable.
And that is why you can’t have nice things.