C, C++, Rust in Roblox (Compile to Luau)

Quick Intro

Hey everyone, so for a long while I have had multiple attempts on compiling C, or just any other low-level-languages to Luau with:

But for the past week I have been working on something different: GitHub - AsynchronousAI/reasm · GitHub. This tool takes assembly code compiled from C, C++, or Rust and turns it into pure Luau.

Please let me know about any issues in compilation since this is a experimental tool I made for fun.

Performance:

using my incredible plugin Scriptbench

Code

Fibbonacci Sequence:

int printf(const char *, ...);

long fibonacci(int n) {
    if (n <= 1)
        return n;
    else
        return fibonacci(n - 1) + fibonacci(n - 2);
}

int main() {
    int n = 100;

    printf("Fibonacci Sequence: ");

    for (int i = 0; i < n; i++) {
        printf("%d: %d ", i, (int)fibonacci(i));
    }

    return 0;
}

Code to control Roblox.

This used external code in Luau to provide the ‘setCameraFOV’ and ‘task_wait’ functions.

int printf(const char *, ...);
int setCameraFOV(int fov);
int task_wait(int time);

int main(){
    int state = 0;
    while (task_wait(1)){
        state = !state;
        setCameraFOV(state == 0 ? 70 : 90);
        printf("State: %d\n", state);
    }
    return 0;
}

C++

extern "C" void printf(const char *, ...);

class BankAccount {
private:
    const char* owner;
    int balance;  // integer balance only

public:
    BankAccount(const char* name, int initialBalance)
        : owner(name), balance(initialBalance) {}

    void deposit(int amount) {
        if (amount > 0) {
            balance += amount;
            printf("Deposited: $%d", amount);
        } else {
            printf("Invalid deposit amount.");
        }
    }

    void withdraw(int amount) {
        if (amount > 0 && amount <= balance) {
            balance -= amount;
            printf("Withdrew: $%d", amount);
        } else {
            printf("Insufficient funds or invalid amount.");
        }
    }

    void showBalance() const {
        printf("%s's balance: $%d", owner, balance);
    }
};

int main() {
    BankAccount account("Alice", 1000);  // initial balance is integer

    account.showBalance();
    account.deposit(250);
    account.withdraw(500);
    account.showBalance();


    BankAccount account2("Bob", 1250);  // initial balance is integer

    account2.showBalance();
    account2.deposit(250);
    account2.withdraw(500);
    account2.showBalance();

    return 0;
}

Rust:

#![no_std]
#![no_main]

extern "C" {
    fn printf(fmt: *const u8, ...) -> i32;
}

fn square(num: i32) -> i32 {
    num * num
}

#[no_mangle]
pub extern "C" fn main() -> i32 {
    let fmt: *const u8 = b"The square of %d is: %d\0".as_ptr(); /* convert to a pointer for the null terminated string, thats how it expects in C */
    let n: i32 = 10;

    unsafe {
        printf(fmt, n, square(n));
    }

    return 0;
}

Quick Start:

You can read the docs for more descriptive instructions, and get a look into the internals (memory model, how floating points are stored, etc).

Usage:

Usage:
  reasm [input] [output] [flags]

Flags:
      --comments             Include debug comments in the output
  -h, --help                 help for reasm
  -I, --import stringArray   Import symbol(s), can be repeated (example: -Imath -Ios)
      --mode string          Mode to compile as: module, main, or bench (default "main")
  -o, --output string        The output luau file.
  -e, --symbol string        The main symbol to start automatically. (default "main")
      --trace                Prints out a trace of the PC

Compilation:

Step 1:

Compile you program with compile of choice.

If you do not have Clang or GCC I would highly reccomend you to use Compiler Explorer to compile languages to RISC-V (32bit IMFD) Assembly.

Step 2:

reasm main.s -o main.luau

Now if you run main.luau the contents of your original program should execute!

Open-source + Docs

36 Likes

This is honestly insane. Good luck with reasm!

2 Likes

This is awesome! I hope that you can get it compiling to a speed comparable to Luau soon, because I’d love to tinker with this.

2 Likes

Ability to export individual functions and call them from Luau dropped recently, this is a big improvement compared to only being able to call main (or _start) without any arguments!

local module = require("./module")

function C_fibonacci(input: number)
   module.util.return_args(input) -- loads input as an integer into the argument register (return_args can take up to 7 ints)
   module.exports.fibonacci()
   return module.util.extract_args()[1] -- only use the first value.
end

print(C_fibonacci(10)) -- 55 (correct!)
#include <stdio.h>

int fibonacci(int n) {
    if (n == 0) return 0;
    if (n == 1) return 1;

    int a = 0, b = 1, next;
    for (int i = 2; i <= n; i++) {
        next = a + b;
        a = b;
        b = next;
    }
    return b;
}

int main() {
    int n = 5;

    printf("Fibonacci sequence up to %d terms:\n", n);
    for (int i = 0; i < n; i++) {
        printf("%d", fibonacci(i));
    }
    return 0;
}
1 Like

Getting closer and closer!

(almost 2x faster)!


New optimizations:

Remove unused labels

In assembly some directives are under labels:

label:
  .word 5 # allocate number 5 and name it label.

In the old version label would be left lying around for runtime eating up a call. Now the compiler detects them and removes it ahead of time! This optimization is most noticable when decompiling from ELF or large programs.

Numerical Registers

Previously registers were a dictionary (registers['x1']) but a process has been placed in the pipeline to convert them into registers[2] in an array. This allows for table.create() to be used during initialization and array indexing is much faster than table indexing.

Compile-time data stored numerically

Previously the compile-time data (strings, arrays, etc) would have their pointer in buffer read using data["name"] but this has been optimized to just be the pointer provided on runtime.


On paper this does not look like much, but implementing it everywhere did take a lot of manual labor to implement.

Having registers and static data become numbers is much less readable than just strings which is why the --comments flag will fill in by adding comments of the original values.

2 Likes

Just so you know Wasynth creator @Rerumu recently released this GitHub - SovereignSatellite/Spider: WebAssembly to Luau translation

2 Likes

First roblox-ts and now reasm. Honestly, how many languages can be translated to luau?

AMAZING work brocacho. Can’t imagine how awesome it felt to get this working properly for the first time. Hope to see this become a standard in higher-level work. Keep it up!

1 Like

More stuff will be coming here soon, what should the priority be?

  • Add IR optimizer to the compiler. (Can shrink assembly instructions to faster luau tricks)
  • Compile from ELF (Adds support for standard libraries in C and system calls, eventually even emulating doom :eyes:)
  • More backends (compile to Python or C also)
  • Vector operations (ex: adding vectors/tables, for niche usecases but very beneficial)
  • 64-bit integer support (Using dedicated lo & hi components)
  • Bit operations (ex: popcount, clz, crz, for niche usecases but very beneficial)
  • Online Compiler (Write C++ or Rust code online, and get back Luau)
0 voters

Small Update:

  • OPTIMIZATION: Represent registers as variables instead of tables, a little bit more messy but leads to ~23% faster execution.
  • ADD: IO no longer prints a \n for every printf or putchar, now instead it stores the IO and flushes whenever a \n is in the string.
  • ADD: Support for stdc putchar function.
  • ADD: Added nop instruction, often added by GCC compiler that literally does nothing.
  • FIX: the internal two_words_to_double function, at times it would reverse endianess of floats.
  • FIX: the AOT memory calculator forgetting symbols
  • FIX: Strings saved locally as "%d " by the compiler would become "%d"

On a side note, 64bit doubles are still missing instructions. This will be fixed eventually but if you want to prioritize it vote for 64-bit integer support above.

ZBB & ZBS Extensions!

Instruction Set Reference: Untitled :: RISC-V Ratified Specifications Library
GitHub Commit: Commits · AsynchronousAI/reasm · GitHub

The ZBB and ZBS extensions provide binary instructions such as setting an individual bit of a number, counting how many zeros are to the left of a number, or how many 1s a number has.

Previously code in C/C++ that used __builtin_clz, __builtin_ctz, or __builtin_popcount would not work. This would require manual polyfill in Luau.

The latest version of this compiler not only adds the above instructions but also shorthand instructions to make operations such as min(x, y), a & ~b run as fast as possible.

This update adds ~50 instructions that take advantage of bit32 functions. __builtin_popcount internally uses the pop_lsb algorithm for fast computation.

Test C++ Program

Compile with -std=c++26 -Oz -march=rv32imfd

Use -std=c++26 -Oz -march=rv32imfd_zbb_zbs and the C compiler can automatically
inline for RISCV bit instructions.

#include <stdio.h>
#include <stdint.h>

// ─── Zbb ──────────────────────────────────────────────────────────────────────

static inline uint32_t zbb_andn(uint32_t a, uint32_t b) {
    uint32_t r;
    __asm__ ("andn %0, %1, %2" : "=r"(r) : "r"(a), "r"(b));
    return r;
}
static inline uint32_t zbb_orn(uint32_t a, uint32_t b) {
    uint32_t r;
    __asm__ ("orn  %0, %1, %2" : "=r"(r) : "r"(a), "r"(b));
    return r;
}
static inline uint32_t zbb_xnor(uint32_t a, uint32_t b) {
    uint32_t r;
    __asm__ ("xnor %0, %1, %2" : "=r"(r) : "r"(a), "r"(b));
    return r;
}

static inline uint32_t zbb_clz(uint32_t a) {
    uint32_t r;
    __asm__ ("clz  %0, %1" : "=r"(r) : "r"(a));
    return r;
}
static inline uint32_t zbb_ctz(uint32_t a) {
    uint32_t r;
    __asm__ ("ctz  %0, %1" : "=r"(r) : "r"(a));
    return r;
}
static inline uint32_t zbb_cpop(uint32_t a) {
    uint32_t r;
    __asm__ ("cpop %0, %1" : "=r"(r) : "r"(a));
    return r;
}

static inline uint32_t zbb_rol(uint32_t a, uint32_t n) {
    uint32_t r;
    __asm__ ("rol  %0, %1, %2" : "=r"(r) : "r"(a), "r"(n));
    return r;
}
static inline uint32_t zbb_ror(uint32_t a, uint32_t n) {
    uint32_t r;
    __asm__ ("ror  %0, %1, %2" : "=r"(r) : "r"(a), "r"(n));
    return r;
}
static inline uint32_t zbb_rori(uint32_t a, int shamt) {
    uint32_t r;
    __asm__ ("rori %0, %1, %2" : "=r"(r) : "r"(a), "i"(shamt));
    return r;
}

static inline int32_t zbb_min(int32_t a, int32_t b) {
    int32_t r;
    __asm__ ("min  %0, %1, %2" : "=r"(r) : "r"(a), "r"(b));
    return r;
}
static inline uint32_t zbb_minu(uint32_t a, uint32_t b) {
    uint32_t r;
    __asm__ ("minu %0, %1, %2" : "=r"(r) : "r"(a), "r"(b));
    return r;
}
static inline int32_t zbb_max(int32_t a, int32_t b) {
    int32_t r;
    __asm__ ("max  %0, %1, %2" : "=r"(r) : "r"(a), "r"(b));
    return r;
}
static inline uint32_t zbb_maxu(uint32_t a, uint32_t b) {
    uint32_t r;
    __asm__ ("maxu %0, %1, %2" : "=r"(r) : "r"(a), "r"(b));
    return r;
}

static inline int32_t zbb_sext_b(uint32_t a) {
    int32_t r;
    __asm__ ("sext.b %0, %1" : "=r"(r) : "r"(a));
    return r;
}
static inline int32_t zbb_sext_h(uint32_t a) {
    int32_t r;
    __asm__ ("sext.h %0, %1" : "=r"(r) : "r"(a));
    return r;
}
static inline uint32_t zbb_zext_h(uint32_t a) {
    uint32_t r;
    __asm__ ("zext.h %0, %1" : "=r"(r) : "r"(a));
    return r;
}

// ─── Zbs ──────────────────────────────────────────────────────────────────────

static inline uint32_t zbs_bset(uint32_t a, uint32_t bit) {
    uint32_t r;
    __asm__ ("bset %0, %1, %2" : "=r"(r) : "r"(a), "r"(bit));
    return r;
}
static inline uint32_t zbs_bclr(uint32_t a, uint32_t bit) {
    uint32_t r;
    __asm__ ("bclr %0, %1, %2" : "=r"(r) : "r"(a), "r"(bit));
    return r;
}
static inline uint32_t zbs_binv(uint32_t a, uint32_t bit) {
    uint32_t r;
    __asm__ ("binv %0, %1, %2" : "=r"(r) : "r"(a), "r"(bit));
    return r;
}
static inline uint32_t zbs_bext(uint32_t a, uint32_t bit) {
    uint32_t r;
    __asm__ ("bext %0, %1, %2" : "=r"(r) : "r"(a), "r"(bit));
    return r;
}

// ─── Test helpers ─────────────────────────────────────────────────────────────

static int pass = 0, fail = 0;

#define CHECK(name, got, expected)                                          \
    do {                                                                    \
        uint32_t g = (uint32_t)(got), e = (uint32_t)(expected);            \
        if (g == e) {                                                       \
            printf("  PASS  %s  0x%08X\n", name, g);                    \
            pass++;                                                         \
        } else {                                                            \
            printf("  FAIL  %s  got 0x%08X, expected 0x%08X\n",         \
                   name, g, e);                                             \
            fail++;                                                         \
        }                                                                   \
    } while (0)

// ─── Tests ────────────────────────────────────────────────────────────────────

void test_zbb(void) {
    puts("\n── Zbb ─────────────────────────────────────────────");

    volatile uint32_t A = 0b10110100; // 0xB4
    volatile uint32_t B = 0b11001010; // 0xCA
    uint32_t a = A, b = B;

    CHECK("ANDN",        zbb_andn(a, b),    0x00000034u);
    CHECK("ORN",         zbb_orn (a, b),    0xFFFFFFB5u);
    CHECK("XNOR",        zbb_xnor(a, b),    0xFFFFFF81u);

    CHECK("CLZ(0xB4)",   zbb_clz(a),        24u);
    CHECK("CTZ(0xB4)",   zbb_ctz(a),         2u);
    CHECK("CPOP(0xB4)",  zbb_cpop(a),        4u);

    CHECK("ROL(A,4)",    zbb_rol(a, 4),     (a<<4)|(a>>28));
    CHECK("ROR(A,4)",    zbb_ror(a, 4),     (a>>4)|(a<<28));
    CHECK("RORI(A,4)",   zbb_rori(a, 4),    (a>>4)|(a<<28));

    volatile int32_t  M1 = -1; volatile int32_t  M0 = 0;
    int32_t m1 = M1, m0 = M0;
    CHECK("MIN(-1,0)",   (uint32_t)zbb_min (m1, m0),  (uint32_t)-1);
    CHECK("MINU(-1,0)",  zbb_minu((uint32_t)m1, 0u),   0u);
    CHECK("MAX(-1,0)",   (uint32_t)zbb_max (m1, m0),   0u);
    CHECK("MAXU(-1,0)",  zbb_maxu((uint32_t)m1, 0u),  (uint32_t)-1);

    CHECK("SEXT.B(0x80)", (uint32_t)zbb_sext_b(0x80), 0xFFFFFF80u);
    CHECK("SEXT.H(0x8000)",(uint32_t)zbb_sext_h(0x8000), 0xFFFF8000u);
    CHECK("ZEXT.H",       zbb_zext_h(0xABCDEF12u),    0x0000EF12u);
}

void test_zbs(void) {
    puts("\n── Zbs ─────────────────────────────────────────────");

    volatile uint32_t V = 0b10110100; // 0xB4
    uint32_t v = V;

    CHECK("BSET(V,0)",   zbs_bset(v, 0),   v | 0x01u);
    CHECK("BSET(V,3)",   zbs_bset(v, 3),   v | 0x08u);
    CHECK("BCLR(V,2)",   zbs_bclr(v, 2),   v & ~0x04u);
    CHECK("BCLR(V,4)",   zbs_bclr(v, 4),   v & ~0x10u);
    CHECK("BINV(V,0)",   zbs_binv(v, 0),   v ^ 0x01u);
    CHECK("BINV(V,2)",   zbs_binv(v, 2),   v ^ 0x04u);
    CHECK("BEXT(V,2)",   zbs_bext(v, 2),   1u);
    CHECK("BEXT(V,3)",   zbs_bext(v, 3),   0u);
    CHECK("BEXT(V,7)",   zbs_bext(v, 7),   1u);
    CHECK("BEXT(V,31)",  zbs_bext(v, 31),  0u);

    CHECK("BCLR(BSET(V,3),3)", zbs_bclr(zbs_bset(v,3), 3), v);
    CHECK("BINV(BINV(V,5),5)", zbs_binv(zbs_binv(v,5), 5), v);
}

int main(void) {
    puts("RISC-V ZBB + ZBS");

    test_zbb();
    test_zbs();

    printf("\n%d passed, %d failed\n", pass, fail);
    return fail ? 1 : 0;
}

Misc. Updates

  • bit32 and math functions are no longer preloaded at the top of the file, it is expected that the Luau compiler should automatically turn them into a fastcall.
  • --!native and --!optimize 2 are now default
  • Generated code will include type hints for Native Luau.

Going to start support for 64bit integers & more floating point instructions later today. After that will get a system call handler going.

Calculating Pi!

A little late for Pi day, but with some fixes to floating point numbers Pi can now be computed!

> luau main.luau
Approximated pi = 3.1415916535897743244731828

This computes by using pi/4 = arctan(1) = 1/1 - 1/3 + 1/5 ... (alternative sum of 1/odd numbers)

Original C Code:

NOTE: Compiled with -Oz -march=rv32imfd_zbb_zbs
I would always reccomend to use -Oz, even over -O3 because it gives the least amount of instructions.

#include <stdio.h>

int main() {
    int terms = 1000000;
    double pi = 0.0;
    
    for (long long k = 0; k < terms; k++) {
        // pi/4 = arctan(1) = (1 - 1/3 + 1/5 - 1/7 + ...)
        double term = 1.0 / (2 * k + 1);
        if (k % 2 == 0)
            pi += term;
        else
            pi -= term;
    }

    pi *= 4.0;

    printf("Approximated pi = %.25f\n", pi);
    return 0;
}

Assembly Code

.LC2:
        .string "Approximated pi = %.25f\n"
main:
        lui     a3,%hi(.LC0)
        fcvt.d.w        fa5,x0
        fld     fa3,%lo(.LC0)(a3)
        addi    sp,sp,-32
        li      a2,999424
        sw      ra,28(sp)
        li      a5,0
        li      a4,0
        addi    a2,a2,576
.L6:
        slli    a3,a5,1
        addi    a3,a3,1
        fcvt.d.w        fa4,a3
        andi    a1,a5,1
        addi    a3,a5,1
        fdiv.d  fa4,fa3,fa4
        bne     a1,zero,.L2
        sltu    a1,a3,a5
        fadd.d  fa5,fa5,fa4
        mv      a5,a3
        add     a4,a1,a4
        j       .L6
.L2:
        sltu    a1,a3,a5
        fsub.d  fa5,fa5,fa4
        mv      a5,a3
        add     a4,a1,a4
        bne     a3,a2,.L6
        bne     a4,zero,.L6
        lui     a5,%hi(.LC1)
        fld     fa4,%lo(.LC1)(a5)
        lui     a0,%hi(.LC2)
        addi    a0,a0,%lo(.LC2)
        fmul.d  fa5,fa5,fa4
        fsd     fa5,8(sp)
        lw      a2,8(sp)
        lw      a3,12(sp)
        call    printf
        lw      ra,28(sp)
        li      a0,0
        addi    sp,sp,32
        jr      ra
.LC0:
        .word   0
        .word   1072693248
.LC1:
        .word   0
        .word   1074790400

Luau Code

--!strict
--!native
--!optimize 2

-- Compiled from RISC-V assembly.
-- API
local mem: number = 2048 -- 2KB of RAM
local memory: buffer = buffer.create(mem) -- our memory!
local r1: number, r2: number, r3: number, r4: number, r5: number, r6: number, r7: number, r8: number, r9: number, r10: number, r11: number, r12: number, r13: number, r14: number, r15: number, r16: number = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
local r17: number, r18: number, r19: number, r20: number, r21: number, r22: number, r23: number, r24: number, r25: number, r26: number, r27: number, r28: number, r29: number, r30: number, r31: number, r32: number = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
local r33: number, r34: number, r35: number, r36: number, r37: number, r38: number, r39: number, r40: number, r41: number, r42: number, r43: number, r44: number, r45: number, r46: number, r47: number, r48: number = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
local r49: number, r50: number, r51: number, r52: number, r53: number, r54: number, r55: number, r56: number, r57: number, r58: number, r59: number, r60: number, r61: number, r62: number, r63: number, r64: number = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

-- Variables
local PC: number = 1 -- current position
local mallocDepth: number = 0
local stdout_cache: string = ""

-- Utility
--- 32-bit helpers
local function u32(v: number): number
    local n = bit32.band(v, 0xFFFFFFFF)
    if n < 0 then
        return n + 0x100000000
    end
    return n
end
local function i32(v)
    local n = u32(v)
    if n >= 0x80000000 then
        return n - 0x100000000
    end
    return n
end

--- Base (can be found in generated code)
local function idiv_trunc(a: number, b: number): number
    if b == 0 then error("division by zero") end
    if a >= 0 then
        return (a - (a % b)) // b
    else
        return -((-a) - ((-a) % b)) // b
    end
end
local function float_to_int(f: number): number
    return string.unpack("i", string.pack("f", f))
end
local function int_to_float(i: number): number
    local packed = string.pack("I4", i)
    return string.unpack("f", packed)
end
local function float_to_double(f: number): number
    local packed_f = string.pack("f", f)
    local padded = packed_f .. ("\0\0\0\0")
    return string.unpack("d", padded)
end
local function two_words_to_double(highWord: number, lowWord: number): number
    local packed = string.pack("<I4I4", u32(lowWord), u32(highWord))
    return string.unpack("d", packed)
end
local function hi(addr: number): number
    return bit32.lshift(bit32.rshift(addr, 12), 12)
end
local function lo(addr: number): number
    return bit32.band(addr, 0xFFF)
end
function fclass(x: number): number
    local result = 0

    if x ~= x then
        -- NaN
        if x < 0 then
            result = bit32.bor(result, bit32.lshift(1, 0)) -- -NaN
        else
            result = bit32.bor(result, bit32.lshift(1, 9)) -- +NaN
        end
    elseif x == math.huge then
        result = bit32.bor(result, bit32.lshift(1, 8)) -- +Inf
    elseif x == -math.huge then
        result = bit32.bor(result, bit32.lshift(1, 1)) -- -Inf
    elseif x == 0 then
        if 1/x == math.huge then
            result = bit32.bor(result, bit32.lshift(1, 5)) -- +Zero
        else
            result = bit32.bor(result, bit32.lshift(1, 4)) -- -Zero
        end
    else
        local absx = math.abs(x)
        local min_normal = 2.2250738585072014e-308 -- 2^-1022
        if absx < min_normal then
            if x > 0 then
                result = bit32.bor(result, bit32.lshift(1, 6)) -- +Subnormal
            else
                result = bit32.bor(result, bit32.lshift(1, 3)) -- -Subnormal
            end
        else
            if x > 0 then
                result = bit32.bor(result, bit32.lshift(1, 7)) -- +Normal
            else
                result = bit32.bor(result, bit32.lshift(1, 2)) -- -Normal
            end
        end
    end

    return result
end
function reset_registers(): ()
    r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14,r15,r16 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    r17,r18,r19,r20,r21,r22,r23,r24,r25,r26,r27,r28,r29,r30,r31,r32 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    r33,r34,r35,r36,r37,r38,r39,r40,r41,r42,r43,r44,r45,r46,r47,r48 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    r49,r50,r51,r52,r53,r54,r55,r56,r57,r58,r59,r60,r61,r62,r63,r64 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    r3 = mem -- x2/sp starts at top of stack
end

--- Strings
local function read_string(startPointer: number): string
    -- read null terminated strings from memory
    local pointer: number = startPointer
    local str: string = ""
    local byte: number = 0
    repeat
        byte = buffer.readbits(memory, pointer * 8, 8)
        if byte == 0 then break end
        str = str .. string.char(byte)
        pointer = pointer + 1
        if pointer >= mem then error("Exceeded buffer size when reading string.") end
    until false
    return str
end
function format_string(fmt: number, args: {number}): string
    local fmtString: string = read_string(fmt)
    local arg_index: number = 1

    return fmtString:gsub("%%([%d%.]*[dfseX])", function(spec)
        if spec:sub(-1) == "d" then
            local val = args[arg_index]
            arg_index += 1
            return tostring(i32(val))
        elseif spec:sub(-1) == "X" then
            local val = args[arg_index]
            arg_index += 1
            return string.format("%X", u32(val))
        elseif spec:sub(-1) == "f" then
            if arg_index % 2 == 1 then
                arg_index += 1 -- 64-bit varargs are aligned to even a-registers (skip a1/a3/...)
            end
            local low = args[arg_index]; arg_index += 1
            local high = args[arg_index]; arg_index += 1
            local float = two_words_to_double(high, low)
            return string.format("%"..spec, float)
        elseif spec:sub(-1) == "s" then
            local val = args[arg_index]
            arg_index += 1
            return read_string(val)
        elseif spec:sub(-1) == "e" then
            if arg_index % 2 == 1 then
                arg_index += 1 -- 64-bit varargs are aligned to even a-registers (skip a1/a3/...)
            end
            local low = args[arg_index]; arg_index += 1
            local high = args[arg_index]; arg_index += 1
            local float = two_words_to_double(high, low)
            return string.format("%"..spec, float)
        else
            return spec
        end
    end)
end


--- Memory
local function malloc(size: number): number
    mallocDepth+=size
    return buffer.len(memory)-mallocDepth
end

--- Args
local function get_args(): (number, number, number, number, number, number, number, number)
    return r11, r12, r13, r14, r15, r16, r17, r18
end
local function push_args(a1: number?, a2: number?, a3: number?, a4: number?, a5: number?, a6: number?, a7: number?, a8: number?)
    r11 = a1 or 0
    r12 = a2 or 0
    r13 = a3 or 0
    r14 = a4 or 0
    r15 = a5 or 0
    r16 = a6 or 0
    r17 = a7 or 0
    r18 = a8 or 0
end
local function get_f_args(): (number, number, number, number, number, number, number, number)
    return r43, r44, r45, r46, r47, r48, r49, r50
end
local function push_f_args(a1: number?, a2: number?, a3: number?, a4: number?, a5: number?, a6: number?, a7: number?, a8: number?)
    r43 = a1 or 0
    r44 = a2 or 0
    r45 = a3 or 0
    r46 = a4 or 0
    r47 = a5 or 0
    r48 = a6 or 0
    r49 = a7 or 0
    r50 = a8 or 0
end

--- IO
local function flush_stdout()
    if #stdout_cache > 0 then
        print(stdout_cache)
        stdout_cache = ""
    end
end


-- Functions
local functions = {
    ["memcpy"] = function()
        local dest,src,count = get_args()

        buffer.copy(memory, dest, memory, src, count)
    end,
    ["memset"] = function()
        local dest, value, count = get_args()
        buffer.fill(memory, dest, bit32.band(value, 0xFF), count)
    end,
    ["malloc"] = function()
        local size = get_args()
        local dest = malloc(size)

        push_args(dest)
    end,
    ["putchar"] = function()
        local c = get_args()
        local char = string.char(c)
        if char == "\n" then
            flush_stdout()
        else
            stdout_cache = stdout_cache .. char
        end
    end,
    ["puts"] = function()
        local fmt = get_args()
        local str = read_string(fmt)
        stdout_cache = stdout_cache .. str
        flush_stdout()
    end,
    ["printf"] = function()
        local args = {get_args()}
        local fmt_ptr = args[1]
        table.remove(args, 1)

        local formatted = format_string(fmt_ptr, args)
        for i = 1, #formatted do
            local ch = formatted:sub(i,i)
            if ch == "\n" then
                flush_stdout()
            else
                stdout_cache = stdout_cache .. ch
            end
        end
    end,
}

-- Extensions


-- Localized Functions
--- buffer
local writei8, writei16, writei32 = buffer.writei8, buffer.writei16, buffer.writei32
local readi8, readi16, readi32 = buffer.readi8, buffer.readi16, buffer.readi32
local writeu8, writeu16, writeu32, writestring, fill = buffer.writeu8, buffer.writeu16, buffer.writeu32, buffer.writestring, buffer.fill
local readu8, readu16, readu32, readf32, readf64, writef32, writef64 = buffer.readu8, buffer.readu16, buffer.readu32, buffer.readf32, buffer.readf64, buffer.writef32, buffer.writef64

local FUNCS: {[number]: () -> boolean} = {}
---- Auto generated code starts here
function init(): ()
	reset_registers()
	writestring(memory, 0, [[Approximated pi = %.25f
]] .. "\0")
	writei32(memory, 25, 0)
	writei32(memory, 29, 1072693248)
	writei32(memory, 33, 0)
	writei32(memory, 37, 1074790400)
	PC = 1
	r3 = (buffer.len(memory) + 41) / 2 -- start at the center after static data
	if r3 >= buffer.len(memory) then error("Not enough memory") end
end
FUNCS[1] = function(): boolean -- main
	r14 = bit32.lshift(0, 12)
	r48 = i32(r1)
	r46 = readf64(memory, 25 + r14)
	r3 = i32(r3 + -32)
	r13 = 999424
	writei32(memory, r3+28, r2)
	r16 = 0
	r15 = 0
	r13 = i32(r13 + 576)
	return false
end
FUNCS[2] = function(): boolean -- .L6
	r14 = bit32.band(bit32.lshift(r16, 1), 0xFFFFFFFF)
	r14 = i32(r14 + 1)
	r47 = i32(r14)
	r12 = bit32.band(r16, 1)
	r14 = i32(r16 + 1)
	r47 = r46 / r47
	if r12 ~= r1 then
		do
			PC = 3
			return true
		end
	end
	r12 = if (u32(r14) < u32(r16)) then 1 else 0
	r48 = r48 + r47
	r16 = r14
	r15 = i32(r12 + r15)
	do
		PC = 2
		return true
	end
	return false
end
FUNCS[3] = function(): boolean -- .L2
	r12 = if (u32(r14) < u32(r16)) then 1 else 0
	r48 = r48 - r47
	r16 = r14
	r15 = i32(r12 + r15)
	if r14 ~= r13 then
		do
			PC = 2
			return true
		end
	end
	if r15 ~= r1 then
		do
			PC = 2
			return true
		end
	end
	r16 = bit32.lshift(0, 12)
	r47 = readf64(memory, 33 + r16)
	r11 = bit32.lshift(0, 12)
	r11 = i32(r11 + 0)
	r48 = r48 * r47
	writef64(memory, r3+8, r48)
	r13 = readi32(memory, r3+8)
	r14 = readi32(memory, r3+12)
	if functions["printf"] then
		functions["printf"]()
		PC = 4
		return true
	else
		error("No bindings for functions 'printf'")
	end
	return false
end
FUNCS[4] = function(): boolean -- .L2 (extended) 
	r2 = readi32(memory, r3+28)
	r11 = 0
	r3 = i32(r3 + 32)
	do
		PC = r2
		return true
	end
	return false
end
FUNCS[5] = function(): boolean -- .LC1
	return false
end
function start(startPosition: number): ()
	PC = startPosition
	while FUNCS[PC] do
		if not FUNCS[PC]() then
			PC += 1
		end
	end
	flush_stdout()
end
init()
start(1)


New Features:

I have updated the compiler to now have a:

  • --memory {bytes} argument, this allows you to allocate more memory for the buffer at compile time.
  • --accurate option, leads to much slower runtimes but correctly simulates overflows & floating point overflows.

Example:

I have a little program that generates a string, here we are comparing between C, and Luau with & without --accurate.

  • C Code (0.074s total): EBAX...BQR
  • Luau (0.536s total): CBMH...NGD
  • Luau with --accurate (2.295s total): EBAX...BQR

This update also adds a couple more instructions for floating point numbers and tail for jumps. Strings being written to memory also now support special characters.

I was able to update how reinterpretation was done to not use string.pack and string.unpack but rather buffer, this was able to get the execution time with --accurate from ~2sec → 0.88 sec!

Going to save 64bit int support for when this RFC gets added to Luau.

Starting ELF file support + Linux system calls.

DOOM port coming soon! I am currently porting libc to work with Luau.

1 Like

Can you support EXE as well?

At the moment no executable files are stable, the best outputs come from compiling to RISC-V assembly directly.

You could manually objdump exe files to assembly and compile from there but once I do add executable files I am only planning on elf files.

Doom Generic 0.1
Z_Init: Init zone memory allocation daemon.
zone memory: 903bf4, 600000 allocated for zone
Using . for configuration and saves
V_Init: allocate screens.
M_LoadDefaults: Load system defaults.
./main.luau:86429: attempt to perform arithmetic (add) on nil and number
stacktrace:
./main.luau:86429
./main.luau:227505 function start
./main.luau:232023
./test.luau:9

getting close, just need to fix a couple of bugs untile DOOM barebones can run. This requires a memory buffer of 4MB also!

Doom Generic 0.1

Z_Init: Init zone memory allocation daemon.

zone memory: 34ae70, 600000 allocated for zone
Using . for configuration and saves
V_Init: allocate screens.

M_LoadDefaults: Load system defaults.

saving config in .default.cfg
-iwad not specified, trying a few iwad names

Trying IWAD file:doom2.wad
W_Init: Init WADfiles.

 adding doom2.wad

:fire:

Got this working in CLI with intialization and gameloop, going to work on actually rendering to an editableimage next!

This is with 10mb of ram inside of Luau btw.

1 Like

Found an edge case using jump tables, that the C compiler automatically generates to be able to shrink switch statements or for calling function by reference.

Going to be updating the memory alignment + function alignment calculators to account for this, DOOM is a crazy good program to catch bugs.