How to Optimise a while loop that contains a vector that increases in size within the loop?

As part of a code project (downloaded from the web) there is a scoring function which contains this while loop which is drastically slowing down the whole routine (profiling shows that >70% of running time is spent just in this scoring function).

The problem, as I see it, is two-fold: it’s a non-vectorised while loop and there is a constantly growing size of the vector anno_st, which has not been initialised before the while loop begins. Is there any way to speed up this section of code without having to rewrite just this part as a .oct file?

## This first part of code added for reproduceability, re: dastew request
anno_st  = sort( randperm( 1000 , 100 ) )' ;
sub_len = 12 ;
## This recreates a "typical looking"  anno_st vector

i = 1;
while true
    if i >= length(anno_st)
        break;
    end
    first_part = anno_st(1:i);
    second_part = anno_st(i+1:end);
    bad_st = abs(second_part - anno_st(i)) < sub_len;
    second_part = second_part(~bad_st);
    anno_st = [first_part; second_part;];
    i = i + 1;
end

You say that anno_st " has not been initialised before the while loop begins"
but I get an error because it is not initialised!
So how to run your code?

1 Like

This code appears to remove points in the array which are too close to each other.
It seems to do this by selecting all the close-by points in the second half of the array and deleting them. The remaining far-apart elements are then used to replace / reconstitute the original anno_st and the process repeats with the partition between the first and the second part shifted one step towards the end.

Rather than reconstituting the anno_st each iteration, create a logical array same size as the original anno_st filled with true. Then, during each iteration, blacklist elements of anno_st by flipping the corresponding logical element to false. In this way, we are not constantly creating and deleting and concatenating arrays of varying sizes (which is likely to be the time consuming part; profile before proceeding).

1 Like

Yes, my mistake. I should have said that an output vector is not initialised, into which elements of anno_st are written. The vector anno_st is created prior to the while loop being called.

Can you make a minimal set that I can run, to see the error?

Have added some code to my original question for reproduceability. Please note, there is no error in the code, but I just want to refactor the while loop to make it run much faster.

The example anno_st = sort(...) implies that the variable is sorted. Is anno_st in your real application guaranteed to be sorted before being passed to this scoring routine ?

I made a version with the logical array mentioned above. Profiling results were quite discouraging. The logical version outperforms only when there are few points to remove from the vector. When there are a lot points which can be potentially removed, the original algorithm seems to be faster. I guess that the burden on the abs(), - and < (in the original algorithm) goes down drastically when the anno_st vector is rapidly shrinking in size with each iteration.

Guess Octave is quite good at concatenating vectors.

I tried to optimise my version of the code. But most of the time was spent in self so I couldn’t pinpoint any place to optimise. The code is pasted below if you want to try it out. YMMV.

Profiling results.

   #                     Function Attr     Time (s)   Time (%)        Calls
---------------------------------------------------------------------------
   1    reovepointsclose>orig_alg             9.281      27.71            1
   7                     binary -             5.234      15.63       155799
   9                     binary <             5.202      15.53       155798
   8                          abs             4.623      13.80       155798
  10                     prefix !             3.308       9.88       155798
  11 reovepointsclose>revised_alg             3.255       9.72            1

L(anno_st) : 77900
L(anno_copy) : 77900
number of mis matches : 0

   #                     Function Attr     Time (s)   Time (%)        Calls
---------------------------------------------------------------------------
  11 reovepointsclose>revised_alg             0.245      51.94            1
   1    reovepointsclose>orig_alg             0.047       9.93            1
L(anno_st) : 544
L(anno_copy) : 544
number of mis matches : 0

The code

function anno_out = revised_alg(anno_copy, sub_len)
%% Alternate implementation
is_retained = ones(length(anno_copy), 1) == 1;

for iii = 1 : length(anno_copy)-1
  if(is_retained(iii))
    bad_st = abs(anno_copy(iii+1:end) - anno_copy(iii)) < sub_len;
  
    % mark for removal. don't overwrite earlier marks for removal
    is_retained(iii+1:end) = is_retained(iii+1:end) & ~bad_st;
  end
end

anno_out = anno_copy(is_retained);
end

I have rewritten the slow while loop as a for loop thus:

## This first part of code added for reproduceability, re: dastew request
anno_st  = sort( randperm( 400000 , 50000 ) )' ;
anno_st_copy = anno_st ;
sub_len = 12 ;
## This recreates a "typical looking"  anno_st vector and copy anno_st_copy

tic ;
i = 1;
while true
    if i >= length(anno_st)
        break;
    end
    first_part = anno_st(1:i);
    second_part = anno_st(i+1:end);
    bad_st = abs(second_part - anno_st(i)) < sub_len;
    second_part = second_part(~bad_st);
    anno_st = [first_part; second_part;];
    i = i + 1;
    end
old_while_loop_tic = toc

## My faster version
tic ;
last_iter = anno_st_copy( 1 ) ;
    for ii = 2 : length( anno_st_copy )

      if ( anno_st_copy( ii ) - last_iter >= sub_len )
         last_iter = anno_st_copy( ii ) ;
      else
         anno_st_copy( ii ) = 0 ;
      endif

    endfor
anno_st_copy( anno_st_copy == 0 ) = [] ;
my_faster_loop_tic = toc
speedup = old_while_loop_tic / my_faster_loop_tic
assert( anno_st_copy , anno_st )

This seems to give speed ups of @ 9.5 to 18/19+, dependent on the number of required deletions, with the speed up increasing with greater numbers of deletions. Since this is actually a rather simple for loop, if it were (easily) compiled into a .oct function, the final speed up would be orders of magnitude faster.

I would welcome constructive comments on this.

Additional info: I did, in fact, compile a .oct function just to run my faster for loop version, with speed ups between 1000 to 2000 times faster than the original, slow while loop. When I now run the whole routine in which this scoring function is called, the run times have dropped from 4+ hours to less than 15 minutes.