VM for Octave interpreter

Hello Everybody!

As far as I know GNU Octave interprets the code by evaluating the AST (which
is a slow method).

A better idea is to use a virtual machine (VM). By VM I mean something like
Java Virtual Machine (JVM). VMs are very popular among popular programming
languages like Python, Lua, JavaScript, Ruby, Java, etc.

I think the first step toward faster Octave is to have a VM.

My suggestion is to have a VM generated by GNU Jitter (Jitter - GNU Project - Free Software Foundation).

GNU Jitter is a software automatically generating a portable, very efficient
language virtual machine with performance close to native code, starting from
a relatively high-level specification provided by the user.

It can generates 4 kinds of VM implementation. From a portable switch dispatch,
to more sophisticated dispatches which behaves more like a JIT.

GNU poke (GNU poke) project uses jitter for its virtual
machine (Poke Virtual Machine (PVM)).

I’d like to implement the VM for Octave if there’s an interest in the community.

If you are interested I can prepare a presentation with more info for the next
Octave developers meeting to discuss this option.

More info:

1 Like

Since VMs are a separate topic from JIT using libgccjit, could the VM comments (including this one) be moved to a separate topic pls? (@siko1056 I think you can do this? I don’t have that functionality.)

Re VMs, yes @mnabipoor please add your topic as an item here: Online Developer Meeting (2022-05-24) - Octave

1 Like

The reason I talked about a Jitter-generated VM in this topic is that if Octave use Jitter to generate the VM, it no longer needs JIT support through libgccjit. Jitter-generated VM (with advanced dispaches) can gives a very good performance.

Jitter achieves high performance by:

  • machine-generating C code;
  • compiling the generated code ahead-of-time with GCC;
  • copying, patching and recombining pieces of native code at run time into executable memory.
2 Likes

Thank you for clarifying that Jitter and libgccjit are two competing but mutually exclusive or incompatible approaches to the same outcome of faster execution. Since it would be confusing to discuss both incompatible technologies in the same thread, let’s move the Jitter discussion to a new topic and track both threads, in case the libgccjit devs comment on this thread as I’ve invited them to do so. Octave can work with either / both. Competing ones are not a problem.

Happy to hear your thoughts on how Jitter would be implemented in Octave.

@siko1056 or anyone else who can split threads: pls move the Jitter comments, including this one, to a new thread.

1 Like

I’d be glad to discuss ideas for improving the interpreter, whether by using a custom byte-code interpreter that is more efficient than the current parse tree evaluator, something like jitter to generate the code for the virtual machine, or a full JIT compiler.

3 Likes

@mnabipoor I have created an entry for you to discuss Jitter and VMs for the next Octave devs meeting. You can edit it. It will be on May 24 at 1800 UTC. Online Developer Meeting (2022-05-24) - Octave

One layperson question I have is that Octave when configured with --enable-java loads the Java VM for certain operations. Can multiple VMs coexist or will they interfere?

@arungiridhar Thank you for this entry and putting the notes in the wiki (I have problem with creating an account there; I get “Incorrect or missing CAPTCHA.”).

It’s OK to have multiple VMs.

Jitter generates three files (1 .h and 2 .c files) which are the whole implementation of the VM (they use libjitter.a (a static library) to provide ).

I’ll try to prepare a presentation for about 30 minutes to give an overview of Jitter and a small demo VM. And I also try to talk about libgccjit and how can these two approaches can co-exist (which I think a good thing, at least as a performance baseline).

30 minutes might be too much, depending on how many other items for that meeting that will be added in the next two weeks. But do keep monitoring that wiki page and you can adjust your discussion points.

OK, I’ll try it to keep it short and provide more references for further reading/watching.

1 Like

Do you still have problems with the wiki account creation? The mentioned error occurs when the Octave questions for spam protection is not correctly solved. I am sorry for the inconvenience and ask you to try this again.

1 Like

I tried again, and it worked. Thanks.

2 Likes

It would also be helpful if you could provide something prior to the meeting for others to read so that everyone can have some background on what you are proposing before the meeting. I also encourage you to post those ideas here so we can start a discussion. That doesn’t have to wait for the meeting and is probably best to begin sooner rather than later. The meetings are typically pretty high level and we try to not get bogged down too much in technical details.

4 Likes

(Sorry for the late response)

Here I try to give an overview of the whole process. Then we can discuss the
details based on your feedbacks and thoughts and questions.

Current pipeline for Octave is like this: lexer -> parser -> tree_evaluator.
The tree_evaluator is the AST interpreter.

The plan is to change the pipeline to somehting like this:
lexer -> parser -> code_generator -> vm.

It’s like replacing a component with two components to make execution faster :slight_smile:

Some reasons why AST interpreters are slow (ref page “27 of 245”):

  • pointer chasing
  • many conditionals (mispredict penalty)
  • slow variable lookup
  • recursion

Modern CPUs love linear programs, an array of simple instructions to evaluate.
More assembly-like instructions.

The code_generator traverses the AST tree and instead of evaluating it,
generates the array of VM instructions. And once the array is ready, we can
start the execution.

So the expectation is that this (maybe counter-intuitive) approach increases the
execution speed (all of these will be evaluated carefully by measuring the
compilation+execution time for different scripts and commands).

In the process of transition to the VM, tree_evaluator will serve as the
base line for the performance. I expect a “significant” improvement over
tree_evaluator (I don’t have any definition for “significant” for now :D, we will
see!)

There are different techniques on how to execute the VM instructions fast, which
is the magic happens under the Jitter’s hood.
(I can talk about this, or you can watch the talks by Luca Saiu I sent earlier).

IMHO the tree_evaluator is not a good base line for execution, we need a better
one. So, I suggest to have another codegen that “generates” code for libgccjit.
According to the author of libgccjit, we can expect a very slow compilation
and very fast execution (because it uses the real gcc under the hood).

Another expectation is that Jitter should not be significantly slower than
the libgccjit. (I think it’s a reasonable expectation).

I think these two baselines (the tree_evaluator and libgccjit) are good
for guiding the whole process of developing a good VM for Octave.

I also think, MIR is a promising project. But I’m not sure about supporting it
at this stage.

So, my suggested plan in summary (which I confirm that are a little vague):

  • Design of a instruction set for the VM
  • Implementation of the the VM in Jitter.
  • A codegen_jitter pass, which generates code for the VM generated by Jitter
  • A codegen_gccjit pass, which generates code for libgccjit as a good
    baseline.

And to make sure that this plan is actually doable, and make things more clear,
I’m developing a demo for a tiny Octave-like language that implements these ideas
to explore the problem space.

2 Likes

Thanks for the summary.

I agree with the idea of defining a set of instructions and a “linear” evaluator that could be used instead of the current parse tree evaluator. As a first step, I’d like to see this new evaluator implemented in a very simple way without any external dependencies. Would you be interested in helping with that?

Once the new evaluator is working, we can consider using Jitter to generate an evaluator or inferring types and hooking Octave up to libgccjit or LLVM.

2 Likes

Another layperson question from me. Is this an either-X-or-Y-but-not-both choice between the interpreter and the code generator or will both coexist and be selected for a given chunk of user code based on context?

I imagine that a “linear” evaluator could replace the current parse tree evaluator. If we have a JIT compiler and there is code that it can’t compile (eval or functions like load that can currently insert variables into the current workspace) then I expect that the evaluator would have to remain to handle those cases.

1 Like

As a first step, I’d like to see this new evaluator implemented in a very simple way without any external dependencies.

It means we are going to implement a VM from scratch. It’s doable and it’s not that hard. But I think it’s not that helpful, because optimizing the VM is a non-trival task. We’re basically re-doing what Jitter’s author did over the years.

I think I have to finish the demo sooner, before the meeting to give everybody the opportunity to see some code to get a better sense of whole thing.

1 Like

I’m not proposing that we try to beat the performance of Jitter or other well-established VMs. My intent is to do something that is relatively easy to implement and, more importantly, easy for me and other Octave developers to understand. Once we have a simple instruction set and interpreter working, we can look at using Jitter or some other higher performance solution.

2 Likes

@mnabipoor You were missed in today’s monthly dev meeting as you didn’t show. If you put together any material for the rest of us to follow along, please share it here in this Discourse thread.

People may want to listen to this FOSDEM 2019 talk Guile 3: Faster programs via just-in-time compilation by Andy Wingo.

Guile in the past used an evaluator, which was slow. It then moved to using stack based VM, later register based VM. Finally, in Guile 3, it generates native code from VM instructions using a template JIT. This may give us clues on how to do things incrementally.

1 Like