Bitwise complement on octave_int

I’m having some issues computing the one’s complement (bitwise complement) with liboctave. I think the problem is that octave_uint8 (and similar) are their own types and not just a typedef to uint8_t (and similar). Minimal example:

uint8_t x = 1;  // 0000 0001
uint8_t y = ~x;  // returns 254 (1111 1110)

octave_uint8 x = 1;
octave_uint8 y = ~x;  // this returns zero and not 254

Why does ~ not work as expected for octave_uint8 ? I see that this works but it’s a workaround:

octave_uint8 x = 1;
octave_uint8 m = 255; // 1111 1111
octave_uint8 y = x ^ m;  // returns 254 as expected

Correct, the ~ and ! operators in Octave are synonymous in all contexts that I’ve used them in. The Octave style manual prefers ! for Octave internals but users can use either. If there’s a difference between them in Octave I’ve never encountered it.

Octave integer types are different from C integer types because of saturation semantics.

Bitwise operations are provided by bitxor, bitand etc.

That’s all correct when one’s writing in Octave language. The question is about using liboctave, i.e., C++ code (the ~ is bitwise NOT which is different from !). The bitX functions in Octave are not available in liboctave because they’re implemented in libinterp (and bitcmp specifially is just a m file).

Argh. Had forgotten that some are m-files and that the remaining are in libinterp not liboctave.

Re the ~ operator, isn’t it overloaded for octave_uint8 to be different than regular integer types?

I’m looking at oct-inttypes.h and while we overload most arithmetic operators, it seems that we forgot the ~ operator. Not sure if there’s a reason for it. I’ve tried to just add this

template <typename T>
octave_int<T>
operator ~ (const octave_int<T>& x)
{    
  return ~ x.value ();
}

but that made no change to behaviour.

Afaict, the octave_(u)int* types follow the same saturation logic on overflow that integers in Octave do. That is done for compatibility. (Unsigned) C++ integers exhibit wrap around on overflow.
If you need that wrap around on overflow, you’ll probably need to extract the C++ value, do your operations, and pack the result into an Octave type again.

Edit: Oops. I read the original post on a small phone screen and mis-read the ~ for a -.
Maybe it is a bug that the bit-wise NOT operator does the “wrong thing”(?). It should probably do what bitxor(uint8(1), uint8(255)) or bitcmp(uint8(1), 8) does in an Octave script.

Edit 2: Does the following change (untested!) make a difference?

diff -r 7d4cf04665e6 liboctave/util/oct-inttypes.h
--- a/liboctave/util/oct-inttypes.h	Wed Jul 20 16:37:58 2022 +0200
+++ b/liboctave/util/oct-inttypes.h	Tue Jul 26 08:53:44 2022 +0200
@@ -449,6 +449,8 @@
 
   static T minus (T) { return static_cast<T> (0); }
 
+  static T bitnot (T x) { return ~x; }
+
   // The overflow behavior for unsigned integers is guaranteed by
   // C and C++, so the following should always work.
 
@@ -615,6 +617,8 @@
             : -x);
   }
 
+  static T bitnot (T x) { return ~x; }
+
   static T add (T x, T y)
   {
     // Avoid anything that may overflow.
@@ -859,6 +863,7 @@
   OCTAVE_INT_UN_OP (operator -, minus)
   OCTAVE_INT_UN_OP (abs, abs)
   OCTAVE_INT_UN_OP (signum, signum)
+  OCTAVE_INT_UN_OP (operator ~, bitnot)
 
 #undef OCTAVE_INT_UN_OP
 
1 Like

Shouldn’t ~ x behave the same in the Octave interpreter as in C++ liboctave if you are using an Octave class like octave_uint8? If I am in C++ and I write

octave_uint8 x = 1;
x = x +300;
std::cout << static_cast<double> (x) << std:endl;

should the answer be 45 (which is what it is with uint8 which wraps around) or 255 (as in Octave which uses saturation conventions? I would argue that it should be 255 since that is what Octave does in the interpreter.

EDIT (7/27/22)

Here is real code that demonstrates the saturation mechanics

#include <iostream>
#include <octave/oct.h>

using namespace std;

int main ()
{
   octave_uint8 x (200);
   
   x = 2.0 * x;

   cout << "x = " << static_cast<double> (x) << endl;

   return EXIT_SUCCESS;
}

Assuming it is in the file tst_uint8.cc then at the shell

mkoctfile --link-stand-alone tst_uint8.cc -o tst_uint8

will produce an executable.

Imho, the wrap around / saturation behavior should be the same as for integers in Octave’s interpreter. But the operator symbols don’t necessarily need to coincide with Octave-syntax imho.
There is already support for operator ^ (the bit-wise XOR operator) in these integer classes. That is a good thing imho.
Adding support for the unary operator ~ would fall into the same category of C++​-style operators for our integer classes in liboctave.

Implementing ~ outside the class definition, i.e., as T operator~(const T &a), does not work (~ continues to return zero) but implementing inside the class, i.e., as T T::operator~() const works as expected.

I’m guessing there’s some operator ~ defined somewhere that octave_uint8 it’s falling back to (otherwise I should be getting an error about undefined operation) but I don’t know what and where that is. Any hints?

I could fix(?) this by just implementing it inside the class but the other bitwise operators (^, |, and &) are all implemented outside (why?) and I’m trying to be consistent.

I don’t know what you mean by “inside the class” or “outside the class”. Could you please show some diffs as to which changes you tested?

With this change it seems to be working for me:

diff -r 929c05cf2afa liboctave/util/oct-inttypes.h
--- a/liboctave/util/oct-inttypes.h	Wed Jul 27 13:27:00 2022 +0200
+++ b/liboctave/util/oct-inttypes.h	Wed Jul 27 22:23:58 2022 +0200
@@ -836,6 +836,12 @@
 
   bool operator ! (void) const { return ! m_ival; }
 
+  octave_int<T> operator ~ (void) const
+  {
+    T bitinv = ~ m_ival;
+    return bitinv;
+  }
+
   bool bool_value (void) const { return static_cast<bool> (value ()); }
 
   char char_value (void) const { return static_cast<char> (value ()); }

For some reason, it doesn’t work without the intermediate variable. Maybe some type deferral that works different than I expected…

I tried with this test program:

#include <iostream>
#include <bitset>
#include <octave/oct.h>

int main ()
{
  octave_uint8 x (1);
  std::bitset<8> b (static_cast<uint8_t> (x));
  std::cout << "b = " << b << "b" << std::endl;
  x = (~ x);
  b = static_cast<uint8_t> (x);
  std::cout << "b = " << b << "b" << std::endl;

  uint8_t y = 1;
  y = (~ y);
  std::bitset<8> c (y);
  std::cout << "c = " << c << "b" << std::endl;

  return EXIT_SUCCESS;
}

Which outputs the following for me with the above diff applied:

b = 00000001b
b = 11111110b
c = 11111110b
diff -r 7d4cf04665e6 liboctave/util/oct-inttypes.h
--- a/liboctave/util/oct-inttypes.h     Wed Jul 20 16:37:58 2022 +0200
+++ b/liboctave/util/oct-inttypes.h     Wed Jul 27 22:48:56 2022 +0100
@@ -887,6 +887,14 @@ public:
 
 #undef OCTAVE_INT_BIN_OP
 
+  // This is defined inside the class
+  inline octave_int<T>
+  operator ~ () const
+  {
+    T x = ~ m_ival
+    return x;
+  }
+
   static octave_int<T> min (void) { return std::numeric_limits<T>::min (); }
   static octave_int<T> max (void) { return std::numeric_limits<T>::max (); }
 
@@ -1066,6 +1074,15 @@ OCTAVE_INT_BITCMP_OP (^)
 
 #undef OCTAVE_INT_BITCMP_OP
 
+// This is defined outside the class
+template <typename T>
+octave_int<T>
+operator ~ (const octave_int<T>& x)
+{
+  return ~ x.value ();
+}
+
+
 // General bit shift.
 template <typename T>
 octave_int<T>

I tried both (one at a a time) and found that only when I defined the operator inside the class did it work as expected. However, as you have since found:

The reason why it was working inside the class was because of the intermediate value not because where the operator was being defined. I’m curious to know what’s happening before I push the fix.

The implicit conversion rules in C++ are pretty complex. It’s probably related to this in the “Integral promotion” section of Implicit conversions - cppreference.com:

  • unsigned char or unsigned short can be converted to int if it can hold its entire value range, and unsigned int otherwise;

It might be that this converted int is bit-negated. After that, the “wrong” constructor is selected and the saturation rules cab the (negative) value to 0.
With the intermediate value, the compiler is “forced” to an output type, and that implicit conversion path is no longer an option.

I pushed a change to the default branch that overloads the operator ~ for octave_int<T>:
octave: 4efd735d034c (gnu.org)

2 Likes

I want to shed some light on this as the original issue got me very curious and I became even more curious as I couldn’t find any operator ~ that octave_uint8 can fall back to. Actually as far as I can tell, there isn’t any overloads for this particular operator in the entire code base.

What I have concluded through digging around and testing is that since the octave_int class has an implicit conversion defined for its underlying C++ datatype, octave_uint8 is implicitly converted to uint8_t in the expression ~x in the original example.

What happens next is exactly as @mmuetzel described:

So, if I got this right, what happens under the hood is roughly this:

octave_uint8 x = 1;
octave_uint8 y = int(~((uint8_t)x));

With the assignment forcing the int back to an octave_uint8 which is capped to 0.

So, the original example could have been done without the fix like this:

octave_uint8 x = 1;
octave_uint8 y = static_cast<uint8_t> (~x);

Which avoids the implicit conversion to int (the same way the intermediate variable does) and yields the correct result.

That’s roughly what I thought was happening. But I think the implicit conversion sequence was more similar to this:

octave_uint8 x = 1;
octave_uint8 y = ~ ( (int) (uint8_t) x);

Where uint8_t is a typedef for unsigned char.

A work-around might have been: Scratch that, your work-around looks fine

octave_uint8 x = 1;
octave_uint8 y = ~ static_cast<uint8_t> (x);

You are right. I wasn’t sure why the implicit conversion to int is needed either before or after the ~ and the rules in the standard are very unclear at least to me.

But I found this SO post which says that the smallest data type for all operations is int so the implicit conversion does happen before applying the ~.

And then it can’t go back after the operation as it goes through the wrong constructor which applies saturation semantics.

That’s all great work, thank you for the investigation.

I’ve also been reading some about it and found the following (in Working Draft, Standard for Programming Language C ++):

5.3.1 Unary operators [expr.unary.op]

10. The operand of ~ shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand. […]

Following to section for integral promotions, I think this is the paragraph that applies to uint8_t

4.5 Integral promotions [conv.prom]
[…]

  1. A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

There’s a bunch of other paragraphs for other types. The reason why I was looking into this was to implement bit complement on an Octave array, effectively this (bitcmp is implemented in a .m file but I want to use it on a C++ project):

template<typename T>
intNDArray<T>
octave::bitcmp (const intNDArray<T>& a)
{
  intNDArray<T> c (a.dims ());

  T* cp = c.fortran_vec ();
  const T* ap = a.data ();
  const octave_idx_type n = c.numel ();
  for (octave_idx_type i = 0; i < n; i++)
    cp[i] = ~ap[i];

  return c;
}

All that conversions back and forth seem a bit too much and since we’re likely to already have an array, would it be possible to just get get the ideal size of data and cast to it to do the complement? There is also a bunch of “can” on the standard which I guess mean that this does not always happen.

That mostly looks ok to me. Maybe, you’d need to change cp[i] = ~ap[i]; to cp[i] = static_cast<T::val_type> (~ap[i]); (like @magedrifaat suggested) to have this assignment use the correct constructor in versions of Octave before the recent changes. (The default wrap around behavior of (unsigned) integers in C/C++ should do the correct thing IIUC.)
Does that make a difference?

Sorry, I was not saying the code was not working. I was just wondering that if all those uint8_t are being converted to int for the operation, whether it would make sense to pack them into int and do the complement then. I will experiment.