How to perform modular multiplication efficiently between large integers?

Let’s say that we have three big integers a, b and p where p is a prime and also a safe prime and all numbers have 2048 bits of information(or more) assigned to them. If we represent these integers in a form of a list(it can be any list, for example an array, a buffer, etc) of integers that can take values between 0 and 2^24-1(meaning that each of them represents three bytes of information, or 24 bits) what’s the most effective way to calculate a*b mod p and get the result as a similar list double the size of the original inputs?

Keep in mind we can’t make any calculations directly with numbers surpassing 2^53 before or after the calculation of the result(due to lua limited precision issues).

You can think of the list representation as a circuit-like data structure representing an unsigned integer.