Fscanf terribly slow compared with Matlab

I’m using fscanf in order to read big ASCII files containing a matrix (for me “big” is a matrix of 5000x5000 for example). Here is an example:

matrix = rand(5000);
save -ascii -double test_file.txt matrix
fid = fopen('test_file.txt','rb');
tic;
[data,count] = fscanf(fid,'%f');
toc
count
fclose(fid);

In Octave 7.3 (compiled by myself in Debian with common options plus -O3) the execution of fscanf takes about 73 seconds while in Matlab (9.2.0.538062 (R2017a), in the same computer) takes only 9 seconds, i.e., octave is in this case 8 times slower.

But if I call help fscanf I read that it is a built-in function from the file libinterp/corefcn/file-io.cc, i.e., it is written in C++. So I can’t understand why fscanf is slower than the Matlab’s counterpart

fprintf is also slower in Octave, but the difference is small. For example, using the code

a = randn(5000);
fid = fopen('test_fprintf.txt','wb');
tic;
fprintf(fid,[repmat(' %20.12E',1,size(a,2)),'\n'],a');
toc
fclose(fid);

the execution of fprintf takes 19 seconds in Octave and 15 seconds in Matlab

Yes, text mode file I/O is slow in Octave. But binary mode is very fast. If you can save -binary instead of -ascii it will speed up both saving and loading. You can also load a file with load and it’ll usually be faster than fscanf. Could you test that and post your results?

As for why it is slower, I’m going to guess that the person who implemented it was more concerned with trying to get it to do approximately the right thing in a reasonable amount of development time thank making it fast. Can you help with improving the performance? Have you looked at the code? Maybe there is something obviously stupid about the implementation.

The code I posted were only examples. Actually I’m reading Esri ASCII raster format files (Esri ASCII raster format—ArcMap | Documentation), so I can’t use load nor binary format.

Replacing fscanf(fid,'%f'); with sscanf(char(fread(fid)), '%f'); speeds it up for me 3.5 times.

p.s.: [data,count] = sscanf(fread(fid, [1,inf], '*char'), '%f');
Even faster

3 Likes

Are they related to mapping and GIS? There may be better functions to read that sort of data in the mapping package and the geographiclib package. It’s not my domain so I’ll copy @cffk and @PhilipN the maintainers of those packages. You can look through https://packages.octave.org for more packages that can read your data faster.

That suggests that we could possibly speed things up with some kind of input buffering but I think we have to be careful to preserve correct behavior when the format fails to match or there is some other error on the input stream. Buffering may also only work when it is possible to seek on the input stream.

Thank you very much for the suggestions, that works very fast and have reduced the differences Octave/Matlab to about 2.5x/3x

1 Like

I would like to ask also about the memory consumption of both alternatives. Using the code

sscanf(fread(fid, [1,inf], '*char'), '%f');

we can read the whole file. The fread() function returns a string containing the whole, file, i.e., if the text file has a size of 250 MB, the string returned by fread() in this case has also a size of 250 MB.

But what about the code

fscanf(fid,'%f');

In this case the whole file is also read. But how it works internally about the memory consumption?

Octave’s fscanf function doesn’t read the entire file into a buffer before performing conversions. The details of that are probably part of why it is slow. Another part of the performance issue could be that the array for the output values may need to be resized as the file is read.

You can examine how it is done internally in Octave by looking at the base_stream::do_scanf function in the libinterp/corefcn/oct-stream.cc source file. Ultimately, the %f conversion is handled by the do_scanf_conv template function.

http://hg.savannah.gnu.org/hgweb/octave/file/870036573716/libinterp/corefcn/oct-stream.cc#l4712

2 Likes

This is a classic time/space tradeoff. Generally, languages like Perl which are designed to work on arbitrarily large inputs do not load everything into memory at once precisely because it is easy to run into artificial limits. Imagine that you want to read in a matrix X that is roughly the size of your computer’s free memory and then increment it by one. The naive approach is to read the matrix into memory and form the result X+1 in a second matrix Y. However, this approach will use twice your available free memory which will cause swapping and will be very slow. Instead, it is better to read in one number from the matrix file at a time, increment it, and put the result in the destination Y.

As long as you have plenty of physical RAM you can use the trick Dmitri proposed. Otherwise, it would be a good idea for someone to review the fscanf code. My guess is the same as jwe’s which is that we are doing character-based I/O like getc, ungetc on a stream rather than on a reasonable-sized buffer.

I ran AMD’s profiler ( AMD μProf - AMD ) on fscanf and fread codes.
In fscanf the bottleneck is obvious, but may be somebody can see if we can improve fread
even further.
Fscanf:

Fread:

Thanks to @dasergatskov for confirming what was only a suspicion. The likely code is in liboctave/utils/lo-utils.cc in the routine read_fp_value. I’ll quote it just to give you an idea of how character-intense the I/O is.

  template <typename T>
  double
  read_fp_value (std::istream& is)
  {
    T val = 0.0;

    // FIXME: resetting stream position is likely to fail unless we are
    // reading from a file.
    std::streampos pos = is.tellg ();

    char c1 = ' ';

    while (isspace (c1))
      c1 = is.get ();

    bool neg = false;

    switch (c1)
      {
      case '-':
        neg = true;
        OCTAVE_FALLTHROUGH;

      case '+':
        {
          char c2 = 0;
          c2 = is.get ();
          if (c2 == 'i' || c2 == 'I' || c2 == 'n' || c2 == 'N')
            val = read_inf_nan_na<T> (is, c2);
          else
            {
              is.putback (c2);
              is >> val;
            }

          if (neg && ! is.fail ())
            val = -val;
        }
        break;

      case 'i': case 'I':
      case 'n': case 'N':
        val = read_inf_nan_na<T> (is, c1);
        break;

      default:
        is.putback (c1);
        is >> val;
        break;
      }

    std::ios::iostate status = is.rdstate ();
    if (status & std::ios::failbit)
      {
        // Convert MAX_VAL returned by C++ streams for very large numbers to Inf
        if (val == std::numeric_limits<T>::max ())
          {
            if (neg)
              val = -std::numeric_limits<T>::infinity ();
            else
              val = std::numeric_limits<T>::infinity ();
            is.clear (status & ~std::ios::failbit);
          }
        else
          {
            // True error.  Reset stream to original position and pass status on.
            is.clear ();
            is.seekg (pos);
            is.setstate (status);
          }
      }

    return val;
  }