Using directed rounding in Octave/Matlab

The current IEEE Standard for Floating-Point Arithmetic specifies several rounding modes, which have become accessible to the C/C++ programming languages as part of the C99, C++11, and following standards. GNU Octave and Matlab use in general multi-threaded BLAS/ LAPACK library functions for fast matrix-vector-computations. Depending on the used library, improper thread synchronization might lead to unreliable results when switching the rounding mode using C/C++ functions. This article investigates how to reliably switch the rounding mode in GNU Octave using the performant OpenBLAS library and last resort problem mitigations for Matlab.


This is a companion discussion topic for the original entry at https://siko1056.github.io/blog/2021/12/23/octave-matlab-directed-rounding.html
1 Like

Do you know why this isn’t turned on by default? Is there a performance impact?

Here is some benchamrks:


octave:7> setround(-1)
octave:8> tic; test integral2; toc
PASSES 50 out of 50 tests
Elapsed time is 2.02877 seconds.
octave:9> setround(0)
octave:10> tic; test integral2; toc
PASSES 50 out of 50 tests
Elapsed time is 2.73241 seconds.
octave:11> setround(1)
octave:12> tic; test integral2; toc
PASSES 50 out of 50 tests
Elapsed time is 10.4941 seconds.
octave:13> setround(2)
octave:14> tic; test integral2; toc
PASSES 50 out of 50 tests
Elapsed time is 1.99064 seconds.
octave:15> setround(0)
octave:16> tic; test integral2; toc
PASSES 50 out of 50 tests
Elapsed time is 2.66937 seconds.
octave:17> setround(1)
octave:18> tic; test integral2; toc
PASSES 50 out of 50 tests
Elapsed time is 10.1466 seconds.

My time tests are similar to @dasergatskov with setround(-1) and setround(+2) both being the fastest. There was about a 28% performance boost going from not specifying any rounding mode to either of the two options above.

The weird thing is that I did not need to rebuild OpenBlas. Adding the CONSISTENT_FPCSR flag did not add new features and rebuilding it without it did not take it away either. The setround function could be compiled and used for either. When I looked through the build logs for OpenBlas under default settings there was no mention of that flag but it still works as though it were specified. My OpenBlas is from the distro default here AUR (en) - openblas-lapack. Hopefully this means that more people can use this functionality without having to add new flags to OpenBlas. It would be even nicer if it can be made the default for Octave when running OpenBlas.

This has nothing to do with openblas, this is due to some iterative solvers in the integral2 which
seem to be very sensitive to rounding mode.

I did:

setround (foo); __run_test_suite__;

for various values of foo. I’m definitely seeing regressions and failures for anything other than setround(0). Do these happen for you as well?

Yes I do see a lot of fail/regressions

For -1:

Summary:

  PASS                            16712
  FAIL                              287
  REGRESSION                         20
  XFAIL (reported bug)               40
  SKIP (missing feature)             98
  SKIP (run-time condition)          26

and 2

Summary:

  PASS                            16735
  FAIL                              269
  REGRESSION                         19
  XFAIL (reported bug)               36
  SKIP (missing feature)             98
  SKIP (run-time condition)          26

and 1

Summary:

  PASS                            16721
  FAIL                              278
  REGRESSION                         21
  XFAIL (reported bug)               39
  SKIP (missing feature)             98
  SKIP (run-time condition)          26

And (most interesting) with 0:

Summary:

  PASS                            17023
  FAIL                                7
  XFAIL (reported bug)               29
  SKIP (missing feature)             98
  SKIP (run-time condition)          26

(
libinterp/corefcn/rand.cc-tst .................................. pass   61/68  
                                                                   FAIL    7
  )

while running the test in the fresh octave session passes all tests


  PASS                            17031
  FAIL                                0
  XFAIL (reported bug)               28
  SKIP (missing feature)             98
  SKIP (run-time condition)          26

Those with openblas compiled with CONSISTENT_FPCSR=1
though I am using an OMP interface (rather than pthread) in case it matters.

version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.19 DYNAMIC_ARCH NO_AFFINITY USE_OPENMP Zen MAX_THREADS=16)

Well, the test for sparse/gmres.m is now causing a crash for me with a “double free or corruption” messag for all rounding modes. Let me reset OpenBlas all to default settings and try again.

Yes, building OpenBlas with CONSISTENT_FPCSR=1 causes random failures in the Octave test suite even for rounding mode 0, but only after the mode has been changed to other values and then set back to 0. This indicates some part of the FPU is rounding differently from the rest after reset, so the changes are not clean for me. This can also cause a crash in the test suite for gmres.m.

Removing the CONSISTENT_FPCSR flag from OpenBlas build settings restores normal operation for me.

I am running OpenBlas with OMP_NUM_THREADS = number of physical cores on the processor, not threads or virtual cores.

I also see the mode switching inconsistencies (number of failures depends on the previous mode).

@siko1056 : did the test suite run properly for you?

Thank you for all the response @mmuetzel , @dasergatskov , and @arungiridhar . I did not expect so much interest in this topic :slightly_smiling_face: Using directed rounding is to my perception a tool of verified compuation, that never really came to its full potential, due to the stated limitations (or ignorance) with hardware and software. The aim of the blog post is to document my humble knowledge about it for interested users. I do not assume it to be of utmost relevance for most verified compuations :innocent:

Regarding the benchmarks: as described in the blog post, all this flag causes is the synchronization of the MXCSR register at the start of a new compuation thread. It is not like absolutely different code paths in OpenBLAS are taken. Huge timing differences in pure BLAS compuations would surprise me, and crashes should not occur either (this might be related to unproperly exchanging the library?).

To my own test not really. In the realm of BLAS development, however, every instrucution cycle counts, thus additional synchronization is an unwelcome (and for most applications unnecessary) overhead. At least to my knowledge, it has never been enabled in OpenBLAS for that reason.

The differences in the integral test are probably due to different iterates (because of directed rounding) and results cannot be beaten down to zero fast enough for some rounding modes. When designing verification methods for integration (for example), the idea is usually not taking the original (complex) method: round up, round down, and finished. This works out of the box mostly for simple linear algebara calculations.

Finally, the Octave test suite is calling a plethora of libraries (ARPACK, SuiteSparse, FFTW, …), never designed to work with another rounding mode than the default to nearest (at least I do not know of many high-level math libraries that explicitly say they care about different rounding modes) :innocent: Same for Octave, all control values are tuned (for good reason) for rounding to nearest. Caring for all four rounding modes might be unneccesary overhead, at least I would not like to take care of it :sweat_smile:

In the Docker images (using CONSISTENT_FPCSR for half a year now):

>> __run_test_suite__

Summary:

  PASS                            16141
  FAIL                                0
  XFAIL (reported bug)               27
  SKIP (missing feature)             37
  SKIP (run-time condition)          25

>> setround (+1)
>> __run_test_suite__

Summary:

  PASS                            15845
  FAIL                              272
  REGRESSION                         23
  XFAIL (reported bug)               28
  SKIP (missing feature)             37
  SKIP (run-time condition)          25

>> setround (0)
>> __run_test_suite__

Summary:

  PASS                            16141
  FAIL                                0
  XFAIL (reported bug)               27
  SKIP (missing feature)             37
  SKIP (run-time condition)          25

Most erros are related to “exceeding tol by x*e-15” fntests.zip - Google Drive.

AUR updated OpenBlas from 0.3.18 to 0.3.19 yesterday, and now I don’t see any crashes, but I get 0 failures only for a clean Octave start. If I change modes and come back to mode 0, there are 7 failures like @dasergatskov mentioned above. Unable to get 0 failures like @siko1056 could in the Docker image. One thing that occurs is that @dasergatskov and I are both running on Zen. Not sure if that changes the behavior for mode changing.

I read that the typical use case is something like this:

old = getroundingmode();
setroundingmode (whatever);
// do the actual computation here
setroundingmode (old);

Are there places within the Octave C++ codebase that could benefit from changing modes like that?

Thanks for the answer. I am not sure what causes those 7 errors, is the nature of the errors like “exceeding tol by x*e-15”? Are two consecutive runs of the Octave test suite (without any rounding switching) without errors?

The described pattern is very typical for algorithms in interval arithmetic for example. For numerical calculations of approximate floating-point values (like all Octave), switching the roudning mode is usually of minor interest.