Make use of multiple CPUs/cores to load and parse many files

I have many data files that I need to load into Octave, and parse/process the data in each data file. The whole process is rather slow, since the data files are in a specific ASCII format, and the data processing needs a bunch of string manipulations, which tend to be slow.

I am currently reading and processing one file after another, which uses only a single CPU/core. The whole process could be much faster if I could make use of multiple CPUs/cores to load and process the files in parallel. However, I have no clue how to achieve this in Octave. It would be great if someone could point me in the right direction or provide some hints or ideas.

Thanks!

probably not exactly the answer you’re looking for, but the low tech way I accomplished this for a similar ascii file processing task over the past few years was simply by opening four instances of Octave without the gui and partitioning the files into four groups in separate folders. The operating system did a good enough job load sharing the tasks and each generally occupied a full core of my machine.

For something more complicated, where the parallelization necessarily needs to occur organically within your program, there is a parallel package meant to assist with this.

see:
https://wiki.octave.org/Parallel_package
https://octave.sourceforge.io/parallel/

Another choice would be to go outside Octave; It is really meant for numerical calculations.

If you are doing a lot of reading, text parsing, and writing then something like Perl or Python is a better choice.

@rik has a good point. in my case I was reading the data from all of the files into Octave for compilation and further processing. While Octave can just do file i/o, what sort of processing are you doing with the data? that might help determine what approach makes sense.

Thanks for your insights so far.

The data import is part of a larger toolbox for manipulating and processing data from measurement instruments. If it was a one-off thing, I’d probably do the splitting of the work “manually”, using multiple instances of Octave. However, that’s not possible here.

I had a look at the Parallel package and played around with it for an hour or so. It’s an interesting approach, but I found several problems with this:

  • The Parallel tools make Octave crash if/when RAM gets filled up. While I don’t expect this to happen on a regular basis with my toolbox, it’s not impossible (for example if running on a small single-board computer like a RaspberryPi). The Parallel toolbox is obviously not quite as mature as I’d like it to be. I don’t want to compromise the utility of my code by using external tools that like to crash. Is this a known issue, and are there ways to deal with this in a good way?

  • The reading and parsing of the data files shows a bunch of log messages to let the user know how it’s going. Running this in parallel gives garbled log messages. I could possibly revise my code to be less verbose, but that would make it more difficult for users to understand what’s going on, and how to fix their data files if something is not right (which may happen if the measurement instrument did not play nice). Or is there a better way to show log messages in a tidy way with the Parallel tools?

What is the error signal for the crash?
It might be the kernel’s OOM killer doing its job. In that case, there isn’t much that Octave or the parallel package can do to avoid it. Instead, you’d have to limit the number of parallel processes to a reasonable amount.
See e.g.: Linux’s OOM Process Killer (memset.com)

I don’t remember the exact details of the crash result, so I tried to reproduce my tests from yesterday. I could not reproduce the Octave crash… !?

However, I still run into problems when RAM gets filled up. To be more specific, here’s my test code (just a demo of the issue, otherwise not very meaningful):

function y = paral_func(N,x);
	y = x + sum(sum(abs(fft(rand(N)))))/N^2;
endfunction

Nproc = nproc;
y = pararrayfun(Nproc, @(x) paral_func(10000,x), 1:100)

Running this on my machine, I get the following from Octave when RAM is full:

could not save variable.
could not save variable.
could not save variable.
could not save variable.
...
could not load variable
could not receive result
error: __parcellfun_get_next_result__: could not receive result
error: called from
    parcellfun at line 199 column 16
    pararrayfun at line 85 column 28
    paral0 at line 2 column 3

I guess parcellfun could just stop hammering out new processes once it runs into this problem, and try again when one of the previously started processes returns and has released its memory.

Am I doing something wrong? Is there something that can be done to avoid this problem in a clean way without assuming or know the specifics of the hardware? (My code needs to work on “all” machines and therefore must not make any such assumptions.)

In general octave is not made to monitor your memory usage and curb results accordingly. Even without parallel code, Octave will happily run your instructions to the point of memory exhaustion. even in my manual parallel splitting, I would periodically hit memory limits for certain batches and have to rerun them separately.

for your case, you seem to be asking if there’s a way for your code to make a guess on how many processes it should create based on the amount of available RAM. That would require also knowing about how much memory each process might consume, and comparing that to the system’s free RAM.

I do not know that octave includes memory ‘aware’ functions like you are asking. But perhaps with a bit more insight into the memory intensive parts of your code, maybe there are inefficiencies that we could help you reduce so it’s less of a concern?

If you really want to use Octave for the processing of those files, a simple alternative to the parallel package is to use system to run the processing in another process async. Communication between the two processes would need to be manual. The simplest solution there is to write the parsed data to a temp file and then have the parent process read that data file before removing it.

If your package has a single function that does it, you could:

[status, output] = system ([EXEC_PATH " --eval 'some_function(INPATH, OUTPATH)'"], true, "async")

and have a pool of those to process multiple files in parallel. This will not garble the stdout, since they will appear in output and not be printed to stdout.

1 Like

Hello,

I do have the same problem with the parallel package, error is:

execution error
error: parcellfun_get_next_result: could not receive result
error: called from

  • parcellfun at line 201 column 16*
  • pararrayfun at line 85 column 28*
  • test_parrayfun_3 at line 8 column 3*

Test code is the same:

pkg load parallel

function y = paral_func(N,x);

  • y = x + sum(sum(abs(fft(rand(N)))))/N^2;*
    endfunction

Nproc = nproc;
y = pararrayfun(Nproc, @(x) paral_func(10000,x), 1:100)

List of packages:

>> pkg list
Package Name | Version | Installation directory
--------------±--------±----------------------

  • parallel | 4.0.1 | /Users/karlheinz/Library/Application Support/Octave.app/6.2.0/pkg/parallel-4.0.1*
  •  struct |  1.0.17 | /Users/karlheinz/Library/Application Support/Octave.app/6.2.0/pkg/struct-1.0.17*
    

>> test_parrayfun_3

As my MacPro 5.1 has 128GB and 12C/24T this should not be a hardware limit, especially no memory shortage, as memory did not ran out on my machine …
Octave is 6.2.0 for Mac (on Big Sur 11.2.2)
Has anyone a solution for it?

You might have some luck with the parallel package by just bumping up the size of your system’s “swap” or “paging” file: things would still slow down some when you hit your physical memory limit, but it would keep the program from crashing. And the slowdown might not be that bad because much of what would be paged out would be “cool” infrequently-accessed areas of memory.

I had the same issue with “error: parcellfun_get_next_result: could not receive result.” The solution turned out to be defining the passed function in an external file, rather than locally in the script file. See details at the following documentation page.

*Named functions defined in the command line interface or by scripts are not exported to the working sessions used by parcellfun or pararrayfun, and not automatically to the server for remote parallel execution.

1 Like