Improving performance of data transfer to and from MEX functions

The following text began with the idea of asking for feedback and ideas about how we can improve the performance of Octave’s MEX file interface. But now seems more like a description of what I plan to do along with a lot of background about why things are the way they are currently. I’d still appreciate comments about this or requests for clarification.

Octave’s MEX file interface accepts and returns data using pointers to mxArray objects. The pointer itself is opaque, so we don’t really know what the layout is. Though we might guess that the actual internal data layout is fairly similar to the data available from the functions that work with mxArray objects. For example, for numeric arrays, you can get (or set) dimensions, an array of values for a real-valued object or the real part of a complex-valued object and a separate array of values for the imaginary part of a complex-valued object. So we might guess that a reasonable set of structure fields would be something like

mxClassID id;
mwSize ndims;
mwSize *dims;
void *real_data;
void *imag_data;

Similarly, for a sparse array, we might have

mxClassID id;
mwSize ndims;
mwIndex *dims;
mwSize nzmax;
void *pr;
void *pi;
mwIndex *ir;
mwIndex *jc;

All mxArray objects might share the same base layout for id, ndims, and dims. These might be passed directly around to computational routines, combining the container type and the array/cell array/struct array/etc. objects.

When implementing the MEX file interface for Octave, it seemed like a good idea to use structures like these for Octave’s mxArray objects. But in Octave, we had already taken a different approach with objects for array operations Array<T>, Matrix, ComplexMatrix, NDArray, Sparse<T>, SparseMatrix, etc. that are not all derived from a common base, and a separate wrapper object, octave_value, that may contain one of those objects and that is passed to functions at the interpreter level. Also, Octave’s internal array objects used interleaved complex data - one array of alternating (real, imaginary) pairs of numbers instead of separate arrays for real and imaginary data. So when passing an octave_value to a MEX function or returning an internally created mxArray object back to Octave, some kind of translation would be needed, at least for complex data.

Octave’s objects also attempt to help programmers to avoid memory errors by not providing constructors like

Array<T>::Array (octave_idx_type ndims, octave_idx_type *dims, const T *data)
  // accept ownership of DATA and assume
  // that the values and sizes of NDIMS, DIMS,
  // and DATA are all consistent.

I have long resisted the temptation to allow these kinds of constructors because they could easily lead to data corruption. But we do provide direct access to the underlying data through the Array<T>::data and (infamously named) Array<T>::fortran_vec methods. There is no guarantee against data corruption when passing the pointers that those functions return on to computational routines - you still have to ensure that the dimensions and data arrays are consistent. So my concerns about these kinds of constructors might not have been well founded.

So Octave’s MEX file interface currently stores data in mxArray objects that have fields similar to what I showed above and copies are generated in many cases. We can sometimes avoid copies when accessing data for read-only purposes, but internally created mxArray objects are almost certainly copied when they are converted to octave_value objects and returned from a MEX function back to the Octave interpreter.

However, it eventually became obvious that we would need to provide a way to access the internal data in an Octave array without regard for the reference count in order to make accessing and modifying arrays work in a compatible way. So we ended up with things like this (in the Array class):

  //! Give a pointer to the data in mex format.  Unsafe.  This function
  //! exists to support the MEX interface.  You should not use it
  //! anywhere else.
  void * mex_get_data (void) const { return const_cast<T *> (data ()); }

See also Need help understanding Matlab behavior - #16 by jwe.

Recently, Matlab introduced an option to use interleaved complex data in MEX functions. So now, if using that option, or if working with real-valued arrays, we might be able to avoid copies in more cases. For example, instead of having the sparse mxArray object store individual arrays as it does now, maybe we could just store a SparseMatrix object (or the lower-level Sparse_rep object) inside the mxArray object. Then we would not have to copy data when returning from a MEX function.

We might still need to provide more functions to set dimension and data arrays independently to properly support mxSetData and similar functions. I find that I’m less opposed to these kinds of interfaces now, but they should probably still come with strong caution flags in the header files and documentation.

Originally, I thought the best course was to store a pointer to the “rep” object (Sparse_rep, Array_rep, etc.) and that got me to thinking that it would be more convenient if those were managed by std::shared_ptr<T> objects which led me to think about how to properly use std::shared_ptr<T> to implement copy-on-write semantics (see also Using std::shared_ptr<T> to implement copy-on-write objects in Octave). But now I’m leaning toward just using an Array<T> or SparseMatrix object directly in the mxArray object.

I hope to have this change ready relatively soon and in time for Octave 7.

Comments or suggestions?

Are there any user-visible changes, or is this strictly a change developers would notice? If the former then I see little downside to improving things. At a major version change users would have to re-compile their .mex files anyways so there’s no extra effort for them.

These proposed changes won’t alter the MEX interface so they shouldn’t affect users. But I think the changes will require providing some “unsafe” methods for creating and modifying the data managed by Octave array objects that will break the guarantees we have made about copy-on-write semantics in the past. Though to be fair, we have already broken some of those promises by providing the mex_get_data methods in various octave_value and array classes. Maybe with visibility attributes we can limit the damage, possibly just to just the MEX interface.

Then I don’t see much of a problem just going forward. Even within the developer community, the number that work directly with the internals of the MEX interface is really small. “Unsafe” routines aren’t going to bother most of us because we never would code with them to begin with.

I pushed a series of changes that should avoid copying data when returning full and sparse mxArray objects from MEX files:

Ultimately, this change was fairly simple. Most of the changes are refactoring existing code. The big new addition is to provide the following constructors in the Array<T> and Sparse<T> classes:

  // Construct an Array from a pointer to an externally allocated array
  // of values.  PTR must be allocated with operator new.  The Array
  // object takes ownership of PTR and will delete it when the Array
  // object is deleted.  The dimension vector DV must be consistent with
  // the size of the allocated PTR array.

  Array (T *ptr, const dim_vector& dv);
  // Construct a Sparse array from pointers to externally allocated
  // arrays of values and indices.  PTR, RIDX, and CIDX must be
  // allocated with operator new.  The Sparse object takes ownership of
  // these arrays and will delete them when the Sparse object is
  // deleted.  The dimension vector DV must be consistent with the sizes
  // of the allocated PTR, CIDX, and RIDX arrays.

  Sparse (const dim_vector& dv, octave_idx_type nz,
          T *ptr, octave_idx_type *ridx, octave_idx_type *cidx);

The ordering of the arguments was selected to avoid conflicts with existing constructors. If there is a better way, I’d be glad to change them before version 7 is released.

These new constructors could also allow us to provide more efficient construction of arrays when using library functions that return pointers to allocated arrays, at least when those allocated arrays may be released by calling delete. Because some library code might allocate with malloc instead of new, should we provide that as an option to these constructors so that the objects can call free to release the externally allocated memory?

1 Like

Compilation of the sparsersb Octave package fails with the following error in MXE Octave after these changes:

x86_64-w64-mingw32-g++ -c -I/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include  -I/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include/octave-7.0.0/octave/.. -I/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include/octave-7.0.0/octave -I/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include   -fopenmp -I/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include/ -DHAVE_OCTAVE_VALUE_ISCOMPLEX -std=gnu++11 -g    -DRSB_SPARSERSB_LABEL=sparsersb -o /tmp/oct-n77NQ5.o
In file included from /home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include/octave-7.0.0/octave/../octave/interpreter.h:59,
/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include/octave-7.0.0/octave/../octave/symtab.h: In member function 'std::__cxx11::list<std::__cxx11::basic_string<char> > octave::symbol_table::top_level_variable_names()':
/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include/octave-7.0.0/octave/../octave/symtab.h:283:39: warning: 'std::__cxx11::list<std::__cxx11::basic_string<char> > octave::symbol_table::top_level_variable_names()' is deprecated: [6]: use 'interpreter::top_level_variable_names' instead [-Wdeprecated-declarations]
  283 |       return top_level_variable_names ();
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~^~
/home/osboxes/Documents/Repositories/Octave/mxe-octave/usr/x86_64-w64-mingw32/include/octave-7.0.0/octave/../octave/symtab.h:281:28: note: declared here
  281 |     std::list<std::string> top_level_variable_names (void)
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~ In member function 'virtual octave_value octave_sparsersb_mtx::subsasgn(const string&, const std::__cxx11::list<octave_value_list>&, const octave_value&)': error: 'const class SparseBoolMatrix' has no member named 'mex_get_ir'
 1245 |                                                         const octave_idx_type * ir = sm.mex_get_ir ();
      |                                                                                         ^~~~~~~~~~ error: 'const class SparseBoolMatrix' has no member named 'mex_get_jc'
 1246 |                                                         const octave_idx_type * jc = sm.mex_get_jc ();
      |                                                                                         ^~~~~~~~~~
make[2]: *** [Makefile:38: sparsersb.oct] Error 1

Fixing this is probably easy (replace mex_get_ir and mex_get_jc by ridx and cidx, respectively). Additionally, there was a note in the header that said that developers shouldn’t use these functions.
But given that it looks like they have been used anyway, should we deprecate these (and the other) removed functions for two major versions instead of removing them directly?