Will you add functionalities that allow us to use and control hardware level instructions such as SIMD (Single Instruction Multiple Data)? I would like to have faster numerical calculation speed for tensor and matrix operations. I think it would be extremely useful for running deep learning libraries that was written in pure Lua like mines, named DataPredict Neural.
Imagine the in-game recommendation systems, in-game generative AIs and extremely performant learning AIs that improves user experience!