Performance of a switch command in a while loop

Dear All,
not a big issue only for my understanding. My parser uses a while loop in which the case command is used. the script takes about 10x more time in Octave than in Matlab. The code, reduced to the relevant lines, is:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load BenchData
tic
i = 1; indT = 1; idbDate = 1;
while(i <= (EndOfData-3)) % The last 6 bytes are not processed!
%while(i <= (EndOfData-5)) % The last 10 bytes are not processed!
  Marker = strData(i:(i+3)); % All 4 bytes are checked if it is a marker. I
     switch Marker
      case {"EEDD"} % Timemarker
          Marker2 = strData((i+4):(i+5));
          switch Marker2
              case{'00'}
              TimeStrLen = (i):((i)+23);
              TimeString = strData(TimeStrLen);
              dateVec = ['20' sprintf('%02d',hex2dec(TimeString(7:8)))...
              sprintf('%02d',hex2dec(TimeString(9:10)))...
              sprintf('%02d',hex2dec(TimeString(11:12)))...
              sprintf('%02d',hex2dec(TimeString(13:14)))...
              sprintf('%02d',hex2dec(TimeString(15:16)))...
              sprintf('%02d',hex2dec(TimeString(17:18)))];
              SDate = datenum(dateVec(rows(dateVec),:),"yyyymmddHHMMSS");
              d55AA00num(idbDate)=SDate;
              tNew(end) = d55AA00num(idbDate); idbDate = idbDate + 1; %
              TimeType = [TimeString(23:24)];
              switch TimeType
                  case {"03"}
                    TimeInc = OneHour;
                  case {"06"}
                    TimeInc = OneMinute;
                  case {"05"}
                    TimeInc = OneSecond;
              end
           end
           i = i + 22;
    case {"55AA"}  % Double Byte! Hopefully I never have to test this part of the switch loop
      dbData(indT) = hex2dec(strData((i+4):(i+7))); % 4 bytes for 1 value
      fprintf('DoubleByte! Value: %i at %i \n',dbData(indT), i);
      t(indT+1)      = t(indT) + TimeInc;
      tNew(indT+1) = tNew(indT) + TimeInc;
      i = i + 6; indT=indT+1;
    otherwise % Just count processing
      dbData(indT) = hex2dec(strData(i:(i+1)));
      t(indT+1)    = t(indT) + TimeInc;
      tNew(indT+1) = tNew(indT) + TimeInc;
      indT=indT+1;
     end
    i = i+2; % Step 2 because byte wise
end
toc;fflush(1);

function n = rows(x)
  n = size(x, 1);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I uploaded the file Benchdata to verify. There are a lot of variables in which are obsolete, but I hesitated to sort it perfectly out.

BenchData.mat (91.1 KB)

System:
GNU Octave Version: 7.1.0 (hg id: 04120d65778a)
GNU Octave License: GNU General Public License
Operating System: MINGW32_NT-6.1 Windows 7 Service Pack 1 x86_64

matlab has incorporated a number of compiler optimizations that make a particular difference on looped code, so it’s not overly surprising to see such a drastic difference between the two programs. before Matlab introduced those improvements, it was general practice that code had to be vectorized in order to see any reasonable performance. Now, that is far less of a concern in Matlab, but it still is in Octave. I haven’t looked at your code in detail, but in Octave identifying as much as possible to do outside the loop using vectorized operations is fairly necessary if timing is important.

Thanks nrjank,
as I mentioned above this issue is nothing essential. Everything works fine in Matlab and Octave. If possible I’m vectorizing, but in this case I’m parsing data from a serial stream of an Geiger counter. In this stream are a lot of informations (marker) which a no measurements but describing what follows (time, comment, counter status, …). Hence, vectorization was not possible.
Thanks again

Erwin

TLDR: it was running slow because of hex2dec not because you were using switch inside a loop. Details follow.

Baseline performance:

octave:1> clear all; format short g compact; eegtest
Elapsed time is 15.8392 seconds.
octave:2> clear all; format short g compact; eegtest
Elapsed time is 15.675 seconds.

I first deleted the superfluous function rows from the bottom. There is already a compiled builtin rows, and replacing it with an interpreted version is unnecessary and can slow down the code, though in your case the new rows function is not actually called, since the function is defined after the main code. But on examining your code further, you were calling foo(rows(foo),:) which can be rewritten as the clearer foo(end,:). (After running it many times it became clear that foo is always a single row, so it can be replaced by just foo with no subscripts.)

Then I did a few changes like replacing {"string"} with "string" and [vector(10:15)] with vector(10:15). This is done to avoid unnecessary constructions of new intermediate objects of type cell or matrix.

Third I changed i = i + 10 with i += 10 which uses the faster in-place increment operator.

Fourth I changed the long line with multiple sprintf calls into a single sprintf call like this:

dateVec = sprintf ("20%02d%02d%02d%02d%02d%02d", hex2dec(TimeString(7:8)),   hex2dec(TimeString(9:10)),...
                                                 hex2dec(TimeString(11:12)), hex2dec(TimeString(13:14)),...
                                                 hex2dec(TimeString(15:16)), hex2dec(TimeString(17:18)));

All the above were done for clarity than for speed. The speed did not change significantly. Now it is time to measure individual sections.

First I commented out all the lines inside the individual cases except for the i += N incrementing lines, so it would run through the string and match various substrings without taking action. That took only 0.28 seconds, so the time was not being occupied by iterating and string matching. (If it were, then a call to strfind or regexp would be faster than manual string matching.)

Now to profile the code:

octave:10> clear all; format short g compact; profile clear; profile on; eegtest; profile off; T = profile ("info"); profshow(T)
Elapsed time is 18.3956 seconds.
   #                   Function Attr     Time (s)   Time (%)        Calls
-------------------------------------------------------------------------
  23                     repmat             6.490      38.33       118894
  10                   base2dec             5.280      31.18        59447
   1                    eegtest             1.347       7.95            1
   7                    hex2dec             0.534       3.16        59447
  33                    reshape             0.231       1.36       298137
  18                       size             0.225       1.33       357584
  27                        all             0.223       1.32       356682
  22                        max             0.161       0.95       178341
  65                      index             0.158       0.93         7216
  17                    toupper             0.142       0.84        59447
  56 datevec>__date_vfmt2sfmt__             0.136       0.80          902
  26                    isempty             0.110       0.65       490008
  19                   binary >             0.106       0.63       305353
  41                        NaN             0.104       0.61        59447
  20                 postfix .'             0.096       0.56       356682
  34                       ones             0.087       0.51       118894
  24                   binary <             0.081       0.48       308060
  38                   repelems             0.078       0.46        59447
   8                     nargin             0.074       0.44       370213
  42                     double             0.070       0.42       178341

The 59447 calls to hex2dec are calling base2dec once each and each of them is calling repmat 118894 (== 59447*2) times. (Look for multiples of 59447 in the function call count.) Between those two, and the 178341 (== 59447*3) calls to max and the 356682 (== 59447*6) calls to all also initiated by them, they together take at least 75% of the runtime, so let’s try to replace them with a lookup table. First I created a lookup table before the loop:

tbl("0123456789ABCDEFabcdef") = [0:15 10:15];
mult4 = [4096 256 16 1]';
mult2 = [16 1]';

Then I replaced the calls to hex2dec with the following lines.

Case EEDD:

  tmp = tbl(TimeString(7:2:17)) * 16 + tbl(TimeString(8:2:18));
  dateVec = sprintf ("20%02d%02d%02d%02d%02d%02d", tmp(1), tmp(2), tmp(3), tmp(4), tmp(5), tmp(6));

Case 55AA:

  dbData(indT) = tbl(strData((i+4):(i+7))) * mult4; % 4 bytes for 1 value

Otherwise:

  dbData(indT) = tbl(strData(i:(i+1))) * mult2; # hex2dec replaced by this call

New performance:

octave:32> clear all; format short g compact; eegtest
Elapsed time is 1.68317 seconds.
octave:33> clear all; format short g compact; eegtest
Elapsed time is 1.67631 seconds.

So that’s about 9.4 times faster than the original.

I also noticed that you are growing the dbData, t, and tNew variables incrementally, so I preallocated them with

dbData = t = tNew = zeros (1, 6e4);

before the main loop and trimmed them afterwards with

dbData = dbData(1:indT-1);
t = t(1:indT-1);
tNew = tNew(1:indT-1);

but the speed was unchanged in this case.

There’s probably more performance to extract, but you can use the testing and profiling techniques above to speed things up from there on.

Modified code attached.
eegtest.m (2.0 KB)

1 Like

Thank you for your helpful hints. I’ll rework my script accordingly.

Be careful if you want this script to continue to work in Matlab as well as Octave. Matlab does not support the += operation.

1 Like

You are right and I recognised this immediately. But += or # are the smallest problems to let it run in Matlab. But e.g.:
tbl(“0123456789ABCDEFabcdef”) = [0:15 10:15];
produces an ML-error which I don’t know how to translate in Matlab, which means, I can’t compare compatible code between ML and Octave. I think, I don’t follow this issue longer. Nrjanks answer is explaining the difference very good.

This would probably work in Octave and in Matlab:
tbl(double('0123456789ABCDEFabcdef')) = [0:15, 10:15];

Yes, this works. In Matlab there is no difference between the original code and the modified from arun (about 2.5s). In Octave there is big difference between this both codes (5s vrs. 45s) . I think, this confirms the information from nrjank.
But now I have another question, even if it’s stupid: The construct
tbl(double(‘0123456789ABCDEFabcdef’)) = [0:15, 10:15];
sounds absolutely strange for me and I have no glue for what I have to search in the documentation. Can anyone post a link to the documentation where I can read and understand how this construct works?
If not, don’t worry.
Thank you to all who spent some time with answering my question.

That construct is just a vector like any other vector. The difference to Matlab is that Octave can index with more types (including char) into a vector.

@arungiridhar’s “trick” is to use that vector as a “map” from the ASCII characters to their respective hexadecimal values and use that map in the calculation.

It took same hours to understand the “trick”. Just a german saying, I think this is nothing to translate:
“Von hinten durch die Brust ins Auge” or, maybe better, “Dafür, daß die Abkürzung so kompliziert war, war sie kaum länger” :smile:
Anyway, I learned a lot and a big thank you to mmuetzel and arungiridhar who took the time to investigate my code. Alex (aka Erwin55)

I don’t think that Arun was overcomplicating things or even outsmarting himself (that is how I’d understand the German sayings). On the contrary, I think his “trick” is pretty clever.
I’m happy you found his hints helpful.

That shouldn’t be a critic, it should express my respect, maybe in a kidding way. And to be honest, I never would have thought about this solution. I implemented Aruns code. Ok, no change to Matlab timing but Octatve runs with a significant better performance.
By the way: I never allocated a vector in this way:
tbl([1 5 6 10]) = [1 2 3 4];, and I never have seen this before. My first contact with Matlab was in 1988 (PC-Matlab). Hence, I started up my old XP-Notebook and in the cmd-window I run Mat386 (3.5m, no older versions available on the quick), to see if this was possible since the beginning. And: tbl was allocated correctly. I was astonished that I never had recognized that feature. Aruns hint and implementation in my code was: “Aller erste Sahne”.

Impressive that you have a version from that long ago that you are still able to run.