Clang warns for conversions from large integers to floating point

During compilation, clang emits several warnings like these ones:
Buildbot (

 ../src/libinterp/corefcn/ warning: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-const-int-float-conversion]
        if (d > std::numeric_limits<octave_idx_type>::max ())
              ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
../src/libinterp/corefcn/ warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion]
          && ((x >= 0 && x < std::numeric_limits<int>::max ())
                           ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/libinterp/corefcn/ note: in instantiation of function template specialization 'xisint<float>' requested here
  if (a < 0.0 && ! xisint (b))
1 warning generated.

Should we do something about that?
Should we also error/warn if the respective double value is larger than flintmax?

Strange, my gcc 7.5.0 so far did not complain here, even in a minimal example and even though it can be problematic when you have to rely on the exact values. Which gcc version do the buildbots use?

The problem boils down to that double (binary64) cannot hold the maximal value of long (int64), same for float (binary32) and int (int32).

What happens in the 32-bit case, analogous for the 64-bit case:

int   i = std::numeric_limits<int>::max ();
float f = i;
//    i = 2^31 - 1 = 2147483647
//      = 0111 1111  1111 1111  1111 1111  1111 1111
//         |---- fraction part of f (*) ---|
//    (*) 23 bit and implicit one

Thus the value stored exceeds the precision of f and is rounded up to the nearest floating-point number that is 2^31 = 2147483648.

Both cases are a little different

  1. For the first error (line 141) the question is, does the value stored in double d exceed the maximum possible index size?

    if (d > std::numeric_limits<octave_idx_type>::max ())

    The question is correctly translated to C++. flintmax() would put a too strict limit 2^53 on the permitted range 2^63-1, even though d can never address the full range of possible indices. A general problem of using double for indexing :sweat:

    This warning can be maybe fixed with

    // Note: +1.0 to avoid implicit rounding warning!
    if (d >= (std::numeric_limits<octave_idx_type>::max () + 1.0))

    that is making the implicit rounding explicit.

  2. In the case of xisint(T x) the question is, whether a conversion from “double” or “float” to “int” (int32) will succeed (not overflow). Only in case of “float” the warning is thrown, the “double” usages of the code are fine. Maybe a similar fix is fine too:

    template <typename T>
    static inline bool
    xisint (T x)
      // Note: max () + 1.0 to avoid implicit rounding warning!
      return (octave::math::x_nint (x) == x
              && ((x >= 0 && x < (std::numeric_limits<int>::max () + 1.0))
                  || (x <= 0 && x >= std::numeric_limits<int>::min ())));

    I also relaxed the bounds by one. Previously, it was INT_MIN < x < INT_MAX, now I use the “<=” comparison (in the first case implicitly). To me it is not obvious why we have to go one stop inside the bounds to be correct here and the Octave BIST does not complain.

    I would apply those changes only if it really helps omitting the correct compiler warnings.

This is about pleasing the compiler, rather than something that is going to cause much trouble to Octave users. If we do go down this route then we don’t want there to be any performance penalty. In particular, I don’t want Octave to have to compute at runtime the addition + 1.0. One way to enforce this would be to use constexpr to require that the compiler compute the value during compilation and to place the resolved value as a constant in to the resulting binary. For example,

constexpr MAX_INT = std::numeric_limits<octave_idx_type>::max () + 1.0;
1 Like

I can’t recall if there was a specific reason to use the range (INT_MIN, INT_MAX) versus [INT_MIN, INT_MAX] where the notation ‘(’ does not include the endpoint and ‘[’ does include the endpoint. I’m going to mention @jwe here so that Discourse will send him this thread, because it may be that there was a reason for this.

@Kai: Like I wrote (and in the title of the thread) these are (two of many) warnings that clang emits. I don’t see these warnings with gcc either.

In the cases where clang emits this warning, the + 1.0 you propose is essentially a no-op (in that the result is not different from the other number in the addition).
That means the upper limit in xisint is excluding the largest int for xisint (double), but it is included for xisint (float).
These subtle differences are why I was asking whether it would be worth checking these warnings.
In some cases, it might just be silencing the compiler. But in other cases, the result might be unintuitive.

@rik yes, constexpr should avoid any runtime overhead here.

@mmuetzel sorry for ignoring the title :sweat: Both g++ 7.5.0 and clang 9.0 do not complain about this circumstance. Thus I cannot judge if clang is “pleased”.

But, I do not think it is only about pleasing the compiler. Octave does an assignment, which is not possible to be performed lossless and I want Octave to be a reliable numeric software on any platform using any compiler :slightly_smiling_face: Octave is relying on the compiler to follow the C99 Annex F (floating-point rounding to nearest). One possibility to trigger these warnings is to compile with -Wconversion, but there might pop up more cases that should be in time inspected with care. As there haven’t been serious bug reports about those issues, I do not give them a super huge priority, more “nice to be fixed”.

Looking back at your analysis in case 1, I’m no longer sure I follow your conclusion.
E.g., let d be 2^63 (which is representable as a double precision floating point number). The compiler takes the result of std::numeric_limits<octave_idx_type>::max () which is the integer 2^63-1 (if Octave’s indexing type is 64bit). That number is cast to double which results in 2^63 (because 2^63-1 is not representable as a double precision floating point number). The code then compares that to 2^63 (>). The result of this comparison is false. But it should probably be true.
I’m assuming that this check is there to prevent an integer overflow (in pseudo-code):

octave_idx_type ret = (double) 2^63;

IIUC, that results in undefined behavior.

Which comparison do you refer to? In my first example 2^63 would not be accepted.

It would result in:

if (2^63 >= 2^63)

I was referring to the current code for which you wrote “The question is correctly translated to C++”. I’m not sure I understand correctly what you mean by that. But I don’t think that is true (see my previous comment).

As written before, the + 1.0 wouldn’t make a difference. It is the >= that would lead to a different result.

I haven’t tried. But I would be surprised if the compiler warning would be silenced with the changes you proposed. It is probably the cast from an integer that cannot exactly be represented in floating point precision that gives rise to the warning. That cast would still happen…

This answer on stackoverflow maybe describes the issue in better words than I can:

That answer references C99. But I believe the same rules still apply to C++11.

Small update, C++ can be nasty sometimes :sweat:

I think I get the clue about your concern. In the resulting code there is no comparison between double and integer types

But it still does not please the compiler using -Wconversion :sob:

Probably we have to define the double constant as calculated in your stackoverflow post 9223372036854775808.0 // 2^63 so that future Octave developers don’t have to discuss again for days about this issue, e.g.

I agree that it would be best if we could come up with a general solution that wouldn’t require to change the code in many places if Octave’s indexing type should ever change.

It might make sense to define macros for floating point values that are just at the borders of the indexing range. E.g.:

#if defined (OCTAVE_ENABLE_64)
#  define OCTAVE_IDX_TYPE_MIN_FLOAT -9223372036854775808.0
#  define OCTAVE_IDX_TYPE_MAX_FLOAT 9223372036854775808.0
#  define OCTAVE_IDX_TYPE_MIN_FLOAT -2147483648.0
#  define OCTAVE_IDX_TYPE_MAX_FLOAT 2147483648.0

Then check whether the system uses 2’s complement for negative numbers and use the correct comparison operators in each case (x < OCTAVE_IDX_TYPE_MIN_FLOAT or x <= OCTAVE_IDX_TYPE_MIN_FLOAT for the lower limit, and x >= OCTAVE_IDX_TYPE_MAX_FLOAT for the upper limit).

We’d need similar macros for each used integer type (including something like suitesparse_int) which doesn’t look very elegant to me.
Is there a better solution?

1 Like

Just to not forget about that: We probably need a better solution only if the other variable is a floating point type. The solution that we have is good if the other parameter is an integer type (and we should not do tricks like adding double 1.0 in that case).

SafeInt seems to use a different approach that doesn’t seem to require hardcoding values for each combination of floating point type as source and (signed or unsigned) integer type as a target:
SafeInt/SafeInt.hpp at fdba0c20be0810ab0cb8da0bac3e3e1e9e524539 · dcleblanc/SafeInt (
Might be worth checking…

It might even make sense to use SafeInt for that instead of trying to mimic what it does. Any objections to adding that as a build dependency?

When octave_idx_type was expected to be a 32-bit integer, then std::numeric_limits<octave_idx_type>::max() would always fit in a double with no loss of precision, so the comparison made sense. I suppose we need to revise these functions to do the right thing when octave_idx_type can be (and usually is) int64_t.

What does Matlab do when specifying fread sizes? Can they be int64 values or are they always expected to be double (because they can include Inf to indicate "read to the end and compute the size as needed)? If they can be int values or double, then we need to go back to the Ffread function and fix the way it decodes arguments. Then we can do the right thing for integer sizes. And if the size is a double value, then yes, it seems that it should be limited to the min of flintmax and std::numeric_limits<octave_idx_type>::max().