Finding the durations of states within a matrix

Octave newbie here, one-time Matlab user in the distant past… I’m working on a problem where I have a state matrix of 1s and 0s. From this I want to create a new matrix of the durations of each “on” (i.e. 1) state. For example, if my state matrix is [0 1 1 1 0 1 0 0 1 1 0], the output duration matrix would be [3 1 2]. The duration of the 0s in between isn’t terribly relevant to me at the moment. I’m specifically interested in the most resource-efficient method of tackling this, as once scaled up I’ll be applying it to a matrix tens to hundreds of millions of rows long.

Win 10 / Octave 6.4.0

Try this:

>> [count, value] = runlength([0 1 1 1 0 1 0 0 1 1 0])
count =

   1   3   1   1   2   2   1

value =

   0   1   0   1   0   1   0

>> ones_count = count(value==1)
ones_count =

   3   1   2
5 Likes

as hinted at by @lesb, what you’re doing is a well known compression technique called run length encoding. You are welcome to look at the underlying code in the runlength function (edit runlength) to see how it works. Essentially it’s looking for changes in value, then determining the spacing between those changes. It does this in a vectorized fashion which leverages some efficiencies within octave. As the wikipedia article above suggests, there are numerous algorithms to accomplish this encoding, and whether this is the most resource-efficient method for you will depend a lot on the type and size of data you’re processing. There may be generally more efficient methods for your specific data.

2 Likes

@lesb, @nrjank thanks both not just for the solutions but the useful background info - down the rabbit hole I go. As ever half the challenge is knowing the right name for what you’re looking for!