Using OpenMP in oct file

Hello,

I’m trying to speed up a cpp for loop and was hoping I could make use of OpenMP directives.
Parallelization seems to work, but when I try to use local variables in the thread the system reports random seg fault that crash octave. Or it happens that the data is inconsistent.

#include <octave/oct.h>
#include <omp.h>

DEFUN_DLD (test_omp_parallel_for_, args, nargout, "Help text")
{

  NDArray A = args(0).array_value ();
  NDArray B (A.dims());
  
  #pragma omp parallel for
  for (octave_idx_type i=0;i<A.numel();i++) {
    
    double r = 2*A.elem(i);  
    
    #pragma omp critical
    A.elem(i) =  r;
    #pragma omp critical
    B.elem(i) = omp_get_thread_num(); 
    
  }
   
  octave_value_list retval (nargout);
  retval(0) = octave_value(A);
  retval(1) = octave_value(B);
  return retval;

}

If I write to A directly the seg faults do not appear.

A.elem(i) = 2*A.elem(i);

So my questions is:

  • is there a tutorial somewhere on using omp with octave?
  • how am I supposed to declare local variables?
  • are Octave classes like NDArray thread-safe?

Thanks a lot for any suggestion!
Ingo

My system

  • OS: Ubuntu 21.10
  • Octave version: 6.2.0
  • Installation method: Ubuntu repo

Yes, I think the problem is that Octave’s C++ interface is not fully thread-safe yet. The given example crashes for me reliably as well :sweat:

The usage of NDArray::elem seems to cause the race condition. Especially the internal call of make_unique touches/modifies lots of internal state concurrently. In this regard, it is necessary to guard the first reading call to r with an “omp critical” section as well (then I did not experience any crashes either). On the other hand, this makes the code execution serial again, with lots of busy waiting threads :sweat_smile:

Luckily there is a function NDArray::xelem, which gives almost bare access to the underlying data structure, that could safely be accessed concurrently using an OpenMP for-loop.

#include <octave/oct.h>
#include <omp.h>

DEFUN_DLD (test_omp_parallel_for_, args, nargout, "Help text")
{

  NDArray A = args(0).array_value ();
  NDArray B (A.dims());

  #pragma omp parallel for
  for (octave_idx_type i=0;i<A.numel();i++) {
    double r = 2*A.xelem(i);
    A.xelem(i) =  r;
    B.xelem(i) = omp_get_thread_num();
  }

  octave_value_list retval (nargout);
  retval(0) = octave_value(A);
  retval(1) = octave_value(B);
  return retval;

}

To learn about the Octave C++ API, I found studying the Index of /doxygen documentation quite helpful + reading the source code.

For background, the elem function is designed to be “safe” for users. It checks for indexing outside of an NDArray and, if it is found, it automatically grows the NDArray to the correct size. This makes it similar to how the Octave interpreted language behaves where you can arbitrarily index an array.

x = [1:5];
x(10) = 10
x =

    1    2    3    4    5    0    0    0    0   10

However, if the programmer can guarantee never to exceed the current array size then the testing is unnecessary, and detrimental to performance. Within liboctave many of the routines use xelem for performance and they use a loop structure like the one you chose

for (octave_idx_type i=0;i<A.numel();i++) {

as this is guaranteed never to exceed the number of elements in A.

1 Like

Dear siko1056 and rik,

thanks a lot for looking into my problem and explaining the cause!
Testing on my machine using only xelem instead of elem solves the problem…

Please keep up your great work on octave!

Regards, Ingo

1 Like