Goals for the next release

Thanks @apjanke, very impressive! The issue might be that you have done some much work already that it might be difficult for anyone apart from you to know where to begin or what to do next.

Is Arrow something we should consider at this stage?

https://news.ycombinator.com/item?id=26451894

Octave might want to add Apache Arrow support at some point, but I don’t think it’s necessary or useful for table and the other stuff Tablicious provides, so it could be a separate project at any time.

Octave already has approaches for representing in-memory columnar data: planar-organized classdef classes are one approach, and table is another. These are nice fits for Octave because they compose over all the functions and types that are already defined in Octave, and they allow extension at the native M-code level by definition of classdef classes. Arrow basically provides the same functionality, but at a lower level; you’d have to completely write a wrapper layer for it, and it would be very hard to get arrow to integrate with classdef/MCOS classes. I think Octave would probably just want to consider Arrow to be an external data format, and provide support for converting between it and native Octave data structures, and maybe an MCOS classdef bridge for people who want to define Octave record types over Arrow storage.

Tablicious is kinda large. I’m up to 33,000 lines of M-files by now. :slight_smile:

As for where to start, we discussed this a bit on the mailing list: I think the first thing someone (like a GSoC student) would want to do is get familiar with relational theory and in-memory columnar data stores (e.g. by reading C. J. Date books like “Database in Depth”) and then decide if the basic data structure and algorithmic approach used by Tablicious’ table is the way that Octave wants to do it. Tablicious’ approach is:

  • Every “variable” in an array is just an Octave column vector or 2-D array, of any type.
  • Those types just need to implement ()-indexing, unique(), and a couple other functions/methods.
  • All the relational operations are computed using the “proxy keys” trick.

From there, pretty much everything else falls out naturally. And if you don’t consider this fundamental issue up front, and instead start tackling table functions one at a time, I think you’re going to have trouble.

I’m happy to write more developer doco if it would be useful.

@rik - about splitting up Documentation: in addition to Language Reference and Applications Manual (which I would expect to be targeted at Octave users/clients), how about a Developer Guide for people hacking on Octave itself, that describes Octave internals, tools, source code layout and guidelines, design decisions, etc? Could start out small and grow from there.

Or maybe the Wiki is a better place for that stuff anyway.

Very interesting to see other people’s interests or priorities. Most of the following items have been mentioned before but I list below some Matlab functionalities that I see being used more and more in source code in the wild, making the Octave-Matlab compatibility slowly diverge.

Also, more specific or of lesser importance for now:

The Matlab linter is keen to suggest using new features so they easily creep in new code…

@Guillaume - FYI, the “HDF5 files” support is pretty much a prerequisite for “MAT-files version 7.3”, since they are a dialect of HDF5.

Thanks @apjanke. I’m aware of this and @jwe makes it clear here: JWE Project Ideas - Octave
I mentioned them separately as an alternative/parallel route would be to use matio:
GitHub - tbeu/matio: MATLAB MAT File I/O Library

Adding a link to another MAT-file v7.3 library in Julia: MAT.jl/MAT_HDF5.jl at master · JuliaIO/MAT.jl · GitHub

Thanks, that could certainly be helpful.

I hope to get back to working on the HDF5 library functions soon.

In case it may be useful, I have been working on a collection of +pakages, similar to ML’s hdf5 “low-level” functions. This is in a very early development stage but I was able to use those to implement a preliminary version of h5info.m (one of the “high level” functions).

@Pantxo: Are your functions published somewhere? I’ll try to update mine and put them in a public repo so we can compare.

No, I didn’t put them in a public repo. (just wanted to wait until I have something cleaner). I attached an archive with the current state of the packages:
h5_package.zip (314.6 KB)

Just unzip, edit h5_package/Makefile to make OCTAVE_BIN point to your octave binary, and then

cd  h5_package
make

I only tested with hdf5-1.10 and I expect that some functions won’t compile OOTB with other versions

1 Like

@Pantxo: Thanks. You have definitely made more progress than I have.

My attempt so far has one large C++ file that contains the interfaces for all the HDF5 functions we might be interested in. I imagined this file would be part of core Octave, though it could also be dynamically linked. Either way, I had planned to write all the user-visible functions as .m files. I thought maybe it would be easier to do all the argument checking in Octave code instead of C++, but I wasn’t sure whether that was the right choice.

Rather than duplicating any further effort, would you be OK with adding your code to core Octave now? I think that might be the best way to share the development effort make better progress toward getting these functions in version 7.

In my experience, the most complicated part is to translate the input/output pointers to return values, and this can only happen at the C++ level, so I chose to also do the input checking in C++ as well. I expect that this should also improve performance (but I may very well be wrong) since some functions are called very often (H5x.close…) and would have to be parsed each time.

I used DEFUN_DLD just because it was easier to get started, but this has at least the disadvantage that each function, and we will need many (100+) of them, needs a ~3Mb large .oct file…

Yes sure. I’ll try to polish what I have in the coming days but then I’ll need help in integrating the package in the build system.

I think it would be fine to merge early and polish later.

I also see .oct files that are about 3MB if they contain debugging symbols but they are much smaller if stripped. Even so, maybe we should avoid individual .oct files since they will all depend on the same external libraries.

Ok, but I really don’t know how to integrate this in the build system.

We could group functions in a single C++ source file per package (e.g. scripts/hdf5/+H5A/__H5A_wrappers__.cc, …) and autoload them from a PKGADD file in the package. Would that work?

Autoloading should be possible.

I can help with the integration with the build system.

Dear Octave Developers,
Will there be any Machine Learning feature? :upside_down_face:

May I suggest the Octave foundation to apply for funding from EU:

If you are going to develop some packages about Artificial Intelligence, EU may be able to support your projects. With the extra funds, Octave can hire AI developers to help Octave to grow rapidly. :upside_down_face:

@HKPhysicist: This thread was about potential new features in core Octave. There is very little chance that an AI toolbox will be developed as part of core Octave.
You seem to be trying to appeal to potential Octave users/developers that might be interested in creating an AI package. I’m not sure if you will reach the correct audience if you bury your proposal deep inside this thread.
You might be more successful when reaching out in a dedicated thread.

There is already the neural network package (on Octave Forge). But that wasn’t updated in quite a while and I don’t know if it still works.

Thank you.
I will try to contact EU Horizon and see if they can provide funding to maintain your mentioned Octave neural network package. :slightly_smiling_face: