bytes
Please, take everything I saw with a grain of salt. With different VMs come different implementations. Some may parse directly into bytecode, but who knows. That sort of info isn’t very easy to find, afaik.
I agree with a lot of what you said, but you skipped a ton of steps.
First we the lexical analyses (which is the lexer).
Lexical analyses converts the code into tokens, like:
5+2
—>
[
{
type: 'num',
value: '5',
},
{
type: 'op',
value: '+',
},
{
type: 'num',
value: '2',
}
]
Next is the parser which does syntactic analyses.
Syntactic analyses converts a token ‘stream’ to an object, also known as an ‘Abstract Syntax Tree’, or AST for short.
Pretty much, we take what we have above and convert it to:
{
type: 'program',
body: [
{
type: 'bin_op',
operator: '+',
left: {
type: 'literal',
value: 5,
},
right: {
type: 'literal',
value: 2,
}
}
]
}
As you can see, we still have no bytecode.
The bytecode is the next stop.
We have to compile it to bytecode.
Our output should be:
0 0 # GET_CONST 0 (5)
0 1 # GET_CONST 1 (2)
1 # BINARY_ADD
You may have seen how we’re using some sort of GET_CONST
. A lot (if not all) VMs use something called a ‘constant pool’. It’s a list of constants which are stored before hand. It’s nothing too much, so I won’t talk about it. Feel free to search it up though.
Now, after all of those steps, we can finally interpret the bytecode.
We go step-by-step reading each byte and executing something based on the current byte. All bytecode VMs are stack-based.
Since all bytecode VMs are stack-based, 7 will be on top of the stack.
Also, the bytecode I showed is not completely accurate. Some compilers do some optimization and do the addition calculation before-hand.
This topic is very big. I would recommend looking for some tutorials if you’re interested in the topic. There is also a pretty cool tool which actually lets you view the result of bytecode: https://godbolt.org/