VM for Octave interpreter

This is a modified version of my Hilbert benchmark to allow for direct comparison with C++.

Octave code: hilbmoore.m (2.1 KB)

C++ code to do the bottleneck part of the Octave code: hilbmoore.cpp (1.3 KB)

Octave interpreter time: 153 seconds.

C++ execution time to deal with stdin and stdout is roughly 4.73 seconds + cost of output redirection:

$ time ./hilbmoore <input >/dev/null      ## 4.51 seconds user,  1.61 seconds system,  6.12 seconds wall
$ time ./hilbmoore <input >/tmp/out       ## 4.71 seconds user,  4.71 seconds system,  9.43 seconds wall
$ time ./hilbmoore <input >./out          ## 4.96 seconds user, 12.28 seconds system, 17.27 seconds wall

The C++ execution time is faster than the Octave interpreter by roughly 33 times.

I ran this basic profiler for the running process to see where the delays were: poormanprofiler.sh (398 Bytes)

This was the full result: profile (654.8 KB)

As you see there are several threads that are waiting or polling for new events, while the interpreter thread itself occupies less than 5% of the time. Not sure what that means – is the interpreter really waiting or are the actions hidden from the profiler?

For any VM activities, we could try it on the above Octave code and see how much it improves relative to C++. It could serve as an additional benchmark.

1 Like

@alexvong1995 That is an encouraging talk, thanks for posting it. Given that they started with Scheme and tried to build a compiled version of that, and we don’t use any Lisp-like languages in Octave, what can we reuse from their work on the technical side?

1 Like

See also the entry for Bytecode on Wikipedia (Bytecode - Wikipedia).

It describes a similar progression in performance from (1) AST + evaluator, (2) bytecode + stack-based VM, (3) bytecode + register-based VM, (4) bytecode to native machine code.

3 Likes

Sorry guys!

I did a mistake, I thought it’s on May 25 :man_facepalming: :man_facepalming: :man_facepalming:

Actually when I saw this link I thought I was mistaken. Really sorry. I’ll put my stuff here.

What I wrote is more for brainstorming purposes, rather than making concrete suggestions, as there’re many choices, such as:

I assume we’ve ruled out LLVM due to its unstable API.


Theoretically, one can write a program to convert Octave AST to Tree Intermediate Language (Tree-IL) and let the Guile Compiler Tower to do the rest (Octave AST → Tree-IL → CPS → Bytecode). In fact, this is how Guile manages to support other languages such as ECMAScript 3.1 and Emacs Lisp.

Similarly, for JVM, we can convert the Octave AST to JVM Bytecode (Octave AST → JVM Bytecode) and let the JVM to handle the rest.

For RPython toolchain, one write an interpreter in RPython and the RPython toolchain will automatically generate a JIT compiler for you. This is how PyPy is implemented.

We also need to think about how to call native code. AFAIK Octave relies on lots of C / Fortran libraries for linear algebra operations.


Finally, as a first step, we need a function to print (serialize) Octave AST, as all of the above discussion requires us working on the AST. Documentation on the AST would be highly appreciated as well. Personally, I would prefer the printed AST to be in S-expression format, as they can be easily read (deserialize) by any Lisp / Scheme / Clojure system. For example, Julia uses FemtoLisp for parsing and code-lowering.

It would be fun to create proof-of-concepts, but I need to focus on GSoC in the near future.

1 Like

Following the meeting on Tuesday, I took some time to create a very simple proof-of-concept bytecode generator. I have attached a changeset below if you’d like to take a look.

PLEASE NOTE: my example code only handles a few types of expressions and switch statements but it might be useful to give you an idea of how this kind of thing can work. It DOES NOT actually generate bytecode instructions, it just prints a list. But it allows you to see what the instruction list might look like. There are two examples below with some explanation.

After applying the patch, you can use __dump_bytecode__ (fcn_name) to display a list of instructions for the named function (but remember, it only handles constants, symbols, binary ops, and switch statements).

For the following function

function simple_expr (a, b, c)
  (a + b) * c + 2
endfunction

the corresponding list of instructions (for the body of the function only, just the single expression) is

     0 IDENTIFIER
     1   a
     2 IDENTIFIER
     3   b
     4 +
     5 IDENTIFIER
     6   c
     7 *
     8 CONSTANT
     9   2
    10 +
    11 DISPLAY
    12   ANS or variable name?

The first column is just the instruction number. It’s not important here, but will be in the next example.

Executing this code would cause the following actions:

  • evaluate identifier a by performing the following actions: lookup identifier a, evaluate it (variable value or function call) and push the resulting value on the stack
  • evaluate identifier b
  • perform binary addition operation by popping the top two values on the stack, adding them, and pushing the result
  • evaluate identifier c
  • perform binary multiplication operation
  • push constant value 2 on the stack
  • perform binary addition operation
  • display the result (we need to be able to determine whether to print ans or a variable value; not sure how to do that yet using this kind of evaluator)

Here’s an example with a switch statement that shows jumping (so instruction numbers are needed). Here is the Octave code:

function switch_example (a)
  switch (a)
    case 1
      case_1_code;
    case 3
      case_3_code;
    otherwise
      otherwise_code;
  endswitch
endfunction

and the corresponding instruction list:

     0 IDENTIFIER
     1   a
     2 CONSTANT
     3   1
     4 CASE
     5   10
     6 IDENTIFIER
     7   case_1_code
     8 JUMP
     9   20
    10 CONSTANT
    11   3
    12 CASE
    13   18
    14 IDENTIFIER
    15   case_3_code
    16 JUMP
    17   20
    18 IDENTIFIER
    19   otherwise_code
    20 POP

For this example the actions to execute are:

  • Evaluate switch expression a.
  • Push switch case selection value.
  • Compare switch expression with case selection value (later, this could be a cell array with multiple selection values). The number following the CASE instruction is the number of the first instruction of the next case in the switch expression. If the switch expression matches a selection value, skip that number and continue execution. Otherwise, jump to that instruction.
  • At the end of the code for each case, there is a JUMP instruction that jumps to the end of the switch block (the POP statement) because only one case can match.
  • If the switch expression doesn’t match any of the case selection values, we end up at the otherwise clause and fall through to the final POP instruction after evaluating that code.
  • The final POP discards the switch expression value that was initially pushed on the stack.

Again, my simple prototype is just an example for illustration. To make this work, we have to generate an actual list of instruction codes instead of the text representation that __dump_bytecode__ displays now. This job must be done for all parse tree elements and we need to create the functions to perform the evaluator actions like those described above.

Some existing code could be reused or adapted but these operations are generally new or work differently. The current code in pt-eval.cc and the expression evaluation functions are not directly usable because they perform evaluations recursively and that’s something we want to avoid.

To make things easier and allow some incremental development, it should be possible to disable evaluation of any interpreted code at Octave startup so that individual parse tree elements could be added to the bytecode interpreter and tested one at a time.

proto-bytecode-generator-diff.txt (25.4 KB)

2 Likes

Walking the parse tree and generating another representation is not too hard. But it’s not clear to me how to tell Tree-IL or the JVM how to handle Octave value objects and to call internal Octave functions to perform the actions required by Octave.

@jwe: Rapid start! I need some help on how to invoke it. I typed this function at the command line

function y = fcn (x)
  y = x^2 + 1;
endfunction 

Then I tried to invoke the bytecode dump but got either an error or a blank output:

octave:3> __dump_bytecode__ (fcn)
error: 'x' undefined near line 2, column 5
error: called from
    fcn at line 2 column 3
octave:4> __dump_bytecode__ ("fcn")
octave:5> 

EDIT: Oh I see now! It works if I type function fcn (x) instead of function y = fcn(x).

__dump_bytecode__ (fcn) fails because it is trying to call fcn to evaluate the argument to __dump_bytecode__. You just need to pass the name of the function to __dump_bytecode__ as on line 4 of command line transcript.

This is a very simple implementation of a very simple language using a Jittery VM.

There’s no lexer and parser.

I’d recommend to start with reading ast.test.cpp and cgen.test.cpp (Of course, after README :wink:)

This is the Structure of the code section of README.md:

  • ast.hpp, ast.cpp: Types and functions related to creating AST (Abstract Syntax Tree). This implementation uses stlab::forest to encode the AST. For an introduction to them, read this. And also, this.
  • ast.test.cpp: Test cases for AST manipulation.
  • cgen.hpp, cgen.cpp: Types and functions related to code generation. cgen::sexpr function generates S-expressions. cgen::jitter generates code for the VM generated by Jitter (specified by toctave.jitter file).
  • cgen.test.cpp: Test cases for code generation verification.
  • env.hpp, env.cpp: Run-time environment.
  • toctave.jitter: Specification of Jittery VM.
  • stlab/: stlab libraries.

I have to write implement the eval which is a little tricky, but straightforward.
I also have a half-written “Introduction to Jittery VMs” which I’ll publish soon.

And feel free to ask any question regarding this implementation and Jitter.

1 Like

I can build ast.test and see its output but I cannot build cgen.test because there’s a problem with Jitter itself. Jitter gives the build error:

../example-vms/jitterlisp/jitterlisp-eval-vm.h:27:10: fatal error: jitterlispvm-vm.h: No such file or directory
   27 | #include "jitterlispvm-vm.h"

How did you work around that problem?

You have to build and install Jitter from git repo; please do the following steps:

  • git clone https://git.ageinghacker.net/jitter
  • cd jitter && ./bootstrap

At this point you have to apply the jitter.patch (which is distributed in the toctave repo):

  • patch -p1 </path/to/toctave/jitter.patch

Then,

  • mkdir build && cd build
  • ../configure --prefix=$HOME/usr
  • make
  • make install

Now you have the jitter under ~/usr diretory. And make should work in toctave directory.

Thanks, it works now. The mistake I made earlier was trying to patch Jitter with your patch using git apply .../jitter.patch instead of patch -p1 < .../jitter.patch. For some reason that didn’t work as expected, though it gave no warnings or errors. But it builds now.

What are the next activities? I see you listed eval.

1 Like

Yes, first, I have to implement eval to show the trick. Then I think function calls. Then conditionals. Just to show how it looks and feels. And I want to keep it simple to make easier for everybody to get the general idea.

Then I want to add code generator for libgccjit.

This VM is a stack-based VM, because generating code for stack-based VMs are easier.

And what do you think about the code? Is it readable? Do you have any questions? I tried to keep the test cases simple and clear. (And sorry for short variable names, I love them!)

1 Like

The code style for the header and .cpp files is clear to me. I have not yet understood the purpose of the anonymous functions like post = [](const auto& f) as opposed to defining it as a regular function. The same with cgen.cpp in various constructor and function definitions. Is it something to do with connecting it to the VM in later steps?

I just wanted to limit the scope of functions; instead of a regular function, I defined a lambda inside the corresponding function. Nothing special.

And I also used lambdas as Deleter of std::unique_ptrs. Because to free the memory, we need to call a specific function.

E.g., in the following code I’m allocating a VM routine by calling toctave_make_mutable_routine, and to free the memory we have to call the toctave_destroy_mutable_routine.

JRoutine::JRoutine()
  : r(toctave_make_mutable_routine(), [](struct toctave_mutable_routine* p) {
    toctave_destroy_mutable_routine(p);
  })
{}

(Type of r is std::unique_ptr<struct toctave_mutable_routine, void (*)(struct toctave_mutable_routine*)>).

I could use regular functions, but I decided to use lambdas.

1 Like

@arungiridhar Thanks to your comments, I simplified the code by removing some unnecessary lambdas, here. Thanks!

1 Like

@mnabipoor : The Octave website and associated services were migrated over the past few weeks to a new server. This post might be of interest to you as you develop the VM: Mercurial hosting for Octave developers

I fiddled with making a byte code VM for GNU Octave. I got to the annoying part to implement a debugger and “upvar” functionality and put in on ice for a while.

I might clean the code up and post it somewhere for reference. The speedup is quite noticeable and it
compiles most functions I throw at it.

Compiling to machine code is very hard due to deep Cpp dependencies and is probably not going to happen without a deep rewrite of Octave. But a VM seems to work just fine.

@mnabipoor and @Petter : We have an upcoming developers meeting at 2022-06-28T18:00:00Z and the current agenda has an entry for VM discussion if either/both of you want to join.

Please edit the wiki agenda here under Today’s Items: Online Developer Meeting (2022-06-28) - Octave