`mean` function in Octave core

While adding new implementations of the geomean and harmmean functions in the statistics package, I realized that mean is part of octave core. The new functions are standalone and don’t rely on mean. However, I would also like to implement mean so that it becomes fully compatible with Matlab.
If I open a patch in savanna to update the mean in core function, it will go in the forthcoming release of Octave 7.2 or later on. More important it won’t be available to previous releases. If I add the new mean function in statistics package, will it cause problems/conflicts with the one in octave core or it will just shadow the mean function from Octave core without any trouble so that users that will install the forthcoming release of statistics package will use the Matlab compatible version? Are there any other issues that I cannot think of?
I would really appreciate some feedback on this :slight_smile:

1 Like

Patches for Matlab compatibility are always welcome for core Octave (if it doesn’t break Octave specific syntax).
But you are right: New functions and feature are generally added for the next major version (i.e., currently Octave 8).
In principle, a package can overload any function from Octave core if it places the path to that function at the front of the load path. It depends on each case whether that is a good idea.
I don’t know which changes you are planning to make for mean. If it breaks backward-compatibility, there might be functions in Octave core that might break as well…

The 2 functions I know of that rely on mean are the existing geomean and harmmean as in statistics-1.4.3 release. Their new implementations though are independent. Based on the current implementation of mean, which has extra functionality to facilitate the existing geomean and harmmean with the ‘g’ and ‘h’ options, it is quite messy to make it fully compatible with Matlab and it will definitely result in very complicated code for parsing input arguments. Is there any easy way to identify whether backward compatibility is affected? How can I search whether other core octave functions rely on mean 's idiosyncratic implementation?

Iff there are BISTs for those functions, you could try running make check in the build tree of Octave. Or try running __run_test_suite__ in an Octave installed on your computer.
If that doesn’t fail with the updated mean function, it doesn’t necessarily mean that there are no functions that could be affected by the change. It could be that there are just no BISTs for those functions.
It could also be that the updated version works without issue with, e.g., Octave 6. But it might cause issues with, e.g., Octave 4.

In general, core functions should only be overloaded very cautiously.

1 Like

If the statistics package gets a mean function, that will definitely shadow the function in core. Please don’t do that. Even if you think that your version is backwards compatible, don’t do it. Here’s reasons why:

  1. Even if you’ve tested it and you’re certain that your version is not backwards incompatible, you may be wrong. Also, users have unpredictable workflows. That’s the reason why Octave core only fixes regressions in patch releases and why if submit your new mean version it will only be added to Octave 8.

  2. Suppose you’re right and there really is no backwards compatibility issue. Suppose then, that some other package also shadows mean because it needs some other missing feature. Then, the version of mean you get is dependent on the order packages are loaded and one the packages will always be broken.

  3. Suppose there’s no backwards compatibility and no other package does it. In a year or so Octave 8 is released, not only with the extra feature you need but also some other feature. If someone installs the statistics package you release now, it will shadow the core function and other functions in core that may use other features will no longer work.

  4. Octave Forge has, or at least had, a requirement that packages do not shadow functions from core or from other packages. A long time ago this used to not be the case and there were constantly bugs because an old package broke something somewhere else.

  5. If you really really need you own implementation of mean, then make it a private function to the package (add it to inst/private).

1 Like

In general I support @mmuetzel 's and @carandraug 's claim: use shadowing as a last resort solution.

If I correctly understand the topic, the mean function should become more Matlab compatible (no fancy new features) and an upstream core contribution is planned as well.

In this case shadowing would improve the user experience even for old Octave versions and I would like to see this change.

If you can share a draft of your intended changes we could probably reason less speculative about this case :innocent:

Thank you all for the feedback. I am really cautious about introducing a change that might brake something. That’s why I opened this topic. I will test over the weekend my new implementation of mean according @mmuetzel 's recommendation and manually check whether the remaining functions without BISTs rely in on mean and if so check their compatibility. If I get the time, I might add some BISTs on these functions and add them as patches in savanna.

If I don’t find any conflict or breakage, I will open an issue at GiHub with the new candidate implementation of mean so that everyone can test it in different environments and Octave versions. I only have Octave 6.2 and 6.4 installed on linux.

@carandraug I don’t really need my own implementation (I doubt if I ever used mean in my workflows). I just came across this compatibility issue while addressing an old open issue from savanna about geomean, then I went forward making harmmean also Matlab compatible. So I figured that I can also make a proper Matlab compatible implementation for mean, but I seek for advice first so that my changes in statistics package don’t introduce any problems. I will definitely seek a general consensus before pushing mean into statistics.

@mmuetzel if I install and load in my Octave every other package there is at -forge, will __run_test_suite__ check the BISTs in the packaged functions or it just works on octave core?

This is probably about the NaN-flags that were introduced for some functions in Matlab some time ago.
There is already bug #50571 about that.
It would be best to discuss a possible implementation for Octave core there.

I have pushed a Matlab compatible version of mean into statistics with this commit 5bbb7e1. Unlike the geomean and harmmean functions I previously updated, mean does not rely on the nansum function from the statistics package. It has been tested in Octave 6.2.0 with __run_test_suite__ as @mmuetzel suggested and it does not produce any errors. I scanned all remaining m files in octave core that have no tests and found only 2 function relying on mean. These are the diffpara and hurst functions from ../octave/6.2.0/m/signal/ directory. Under the assumption that the functions’ implementation is correct, I tested them with random input arguments and they did not produce any error. They both call mean internally with a single input argument e.g. mean(X), so I assume there is no conflict with the new implementation of mean.
Please, try it out and make a comment about whether it should be kept in the forthcoming statistics release. As @siko1056 mentioned, the only reason I am considering shadowing of the octave core’s mean function is to make it available in previous versions of Octave as well. I will remove mean unless there is a consensus in keeping it.
@mmuetzel following bugs #50571 and #50007 and @rik 's reply on this thread I consider the new implementation and shadowing as the best way forward to address the compatibility with the nanflag option. As long as nothing brakes of course!

For the reasons @carandraug already mentioned, I don’t know if it is a good idea to overload a core function in a package.
Currently, it might be true that your new version is more compatible than the corresponding version in Octave core. But sometimes packages tend to stay around for a very long while. In a few years from now, it might be the reverse. And loading the statistics package could lead to unexpected side effects.
As kind of a compromise could you consider naming your new function something different? E.g., mean_nan? In that case users could decide to actively add a wrapper function to their load path if they know what that means. That wrapper could maybe look something like this (untested!):

function varargout = mean (varargin)
[varargout{1:nargout}] = mean_nan (varargin{:});

Could you explain this a bit because I am not sure what it means.

Sorry, I wasn’t very clear. What I meant is the following:
I hope that the mean in core Octave will at some point have support for the NaN flag. At that point in time, there won’t be any more reason to overload that function in the statistics package. But in a best case scenario, it would not do much harm either. There is no way in Octave to force a user to stop using any given version of any given package (at least as of now). So, some users will continue to use versions of the statistics package that overload mean. No big deal so far.
But let’s assume that Matlab adds yet another “cool new feature” to the mean function, and Octave supports that new feature at some point as well.
In that case, a user could write code using that new feature that would be working for them “most of the time”. But as soon as they load the statistics package that code would start failing (if it overloaded with your new mean function that doesn’t know about that hypothetical new feature yet).
This could be quite difficult to track down…

Edit: Or it might not be user code but future core functions that could be starting to fail when the statistics package overloads with an older mean function…

I assume keeping both functions identical would solve this problem, right? I am not sure if this could be automated somehow (i.e. sync a specific file between two repositories). But making sure that these two are identical before each new release of statistics doesn’t seem quite a hassle to me. If a new feature comes along in mean or as a matter of fact in any function, it seems to me easier and faster to update it in a package rather than in core. So the chances are that new patches would appear in statistics first.

If the new implementation is rigorously tested for not breaking existing functions in core octave, then it could move into some future release of octave, but then it would be unavailable in previous versions. As far as previous versions of statistics package are concerned, there is no implementation of mean in those, hence loading an earlier version of statistics will not matter at all, independently of which version of Octave you install it in.

Don’t take my responses as a kind of pressure to include the new implementation of mean, they are more out of my interest to fully understand the workflow you are using into maintaining and developing Octave, since all this is quite new to me. Thanks

I’m sorry, I’m not sure I got your point.
If you always synchronize with the core function (e.g., at package installation time), what is the point in overloading at all? If you plan to synchronize when releasing new versions of the package, keep in mind that not all users always use the latest version of packages.
Also, I very much appreciate your motivation to keep maintaining this package as long as possible. But circumstances might change that might influence that (e.g., new job, family,…).

what I meant with synchronizing is to make sure that iff mean gets a future patch at Octave core with further characteristics to maintain Matlab compatibility then I can import those changes in statistics package prior to a new release at that time. Alternatively, I could add a rule to overload the newly implemented mean only up to a certain Octave release (i.e. only these that use the previous version) and then ignore it during installation from future Octave releases. I am not sure yet how this could be implemented, but if it helps I can look into it.

Please don’t do that. I know it sounds like a good idea but it’s not. We (through Octave Forge) have been there 15 years ago. It didn’t work well then and I don’t see why it would work well now. We and the users paid the price for years. I already explained above why, and so did @mmuetzel .

This is why there’s symbol uniqueness is a requirement for Octave Forge:

Symbol uniqueness

Except if overwriting existing functions is intended, symbol duplications over function names, class basenames, and namespace (base)names should be avoided. Note that community packages may violate this rule by using symbols which are already defined in external packages; this can be necessary e.g. for Matlab compatibility.

The exception for “if overwriting existing functions is intended …” is not about just being aware and doing it in purpose. That exception is aimed at packages such as nan that shadow a series of Octave core functions with the goal of having the whole Octave itself behave in a different manner.

The statistics package is a widely used Octave package. Please do not shadow functions from core.

1 Like

And most of “we” are gone for more than two years…

Please stop enforcing an inflexible policy that burned out many package developers on each newcomer. In case of @pr0m1th3as an experienced statistics package user, who is willing to devote his free time to maintain a stale code with 40 unheard request for improvement with significant user friendly changes to give feedback on development outside Savannah.

In case of stability fears (again, 40 open issues is stable :thinking:), stay with the latest stale release on mxe, branch the package, or work on it as new maintainer. We (I) can also skip publishing a few releases on Forge just to give a newcomer a chance to find a home in Octave package development. So far it has the charme of stupid unpaid work fixing other developers bugs, who are for long history, for the sake of “stability” which is a pseudonym for no development at all. Where was the quality control when those bugs and bad design decisions were introduced in the statistics package?

Me and @pr0m1th3as heard your concerns, they are valid, need certainly justification in the nearer future, will cause some users to complain on GitHub, explaining their problems. Solutions will be found. And if the solution was “don’t overload”, be happy to know “I told you so”. The big difference is that the package developer does it of own conviction. Open source is a process which must accept (repeated) errors. Onboarding developers comes with friction, if we cannot accept this we will never grow.

But if I read your comments dealing only with “don’t do what you think is right”, they already imply that @pr0m1th3as is burning out too soon, all his design decisions are already legacy code. And I am pretty sure with this constant encouragement we are almost there.

As mentioned in the same post, it is temporary for testing purposes and it can be removed prior to next release.

So what is the alternative for making mean Matlab compatible? I am currently implementing anova2 which is very useful but missing from statistics. In its implementation, I need to code mean(x,"all"); for example. Should I just wait for next major release of Octave, until a proper mean function is supported and then add as package depencency Octave >=8.0.0 or something similar? Obviously, I can circumvent this with mean (x(:)); but you get the idea. It’s a mere example. The way I usually code is by keeping dependencies to the minimum possible, even if this means writing a bit longer code. After all we are way past the '90s where storage was an issue.

I had published code for my science work written in Octave 4.4 and they don’t work any longer because of differences in array indexing with (:) introduced in 6 (if I recall correctly), so I understand the hassle of compatibility. The beauty about Octave (and open source in general) is that people can always install the latest version (freely and for free). So even if I make a dependency for statistics such as Octave >= 6.4. or even 7.1 , there shouldn’t be much trouble for people still on 5.2 installing a later version if they want access to new functionality. And in case there is some breaking with legacy code, either their own or from other sources, they can always install an earlier version of the statistics package.

My aim for applying this particular shadowing is to enhance Matlab compatibility but also making the functions in statistics more consistent with one another. That’s why I decided to re-implement mean as well. I take @mmuetzel and @carandraug comments with serious consideration and I reassure you I am not planning into shadowing octave core functions one after another as a general trend in maintaining the statistics package.

This is the main reason I stepped up and made the commitment to maintain it. Except for the price of breaking code for maintainers and users alike, there is also the need to make the statistics package as complete and compatible with Matlab as possible, because it is our best chance into making Octave a valid alternative for teaching statistics in academia (and schools) and this certainly affects a lot of users.

I am planing to make a few virtual machines for testing various releases (probably from 5.1.0 or perhaps 6.1.0 onward) and see how the shadowing behaves and if there are any issues. And then perhaps change the statistics dependency on Octave accordingly. We 'll see how it goes. For the time being, I really appreciate your comments. Please, give the updated functions a try to check if there are any issues with these. I don’t own a Matlab license or have any MS Windows machine, so testing would be very helpful.

1 Like

I understand your motivation. But I can’t re-iterate enough that shadowing a central function like mean in a package is really a very bad idea. Especially if the package is as widely used as the statistics package…

Please, do not do that!

I’d be happy to review any patches for core Octave concerning that function though.

1 Like