Octave.org search engine settings for doc

I occasionally do a web search for a specific mathematical term when I am not sure whether it is in Octave or a package or not at all. I notice that when search engines pick up octave.org doc pages, they pick up v4.2.1 like this example:

The more current v6.4.0 page exists but simply does not show up in search results unless a term like “6.4.0” is added. I could edit the URL to change 4.2.1 to 6.4.0 and get this

Does anyone know why 4.2.1 simply blocks other more current versions from showing up?

this was discussed before:

https://lists.gnu.org/archive/html/octave-maintainers/2019-09/msg00036.html

https://lists.gnu.org/archive/html/octave-maintainers/2020-07/msg00006.html

in the second instance, the ‘latest’ redirect was created, but it was pointed out how we don’t have a good way of redirecting outdated from old versions of the docs to new versions without breaking some links. How to actually convince google to point to the current docs is a separate issue altogether

1 Like

Thank you @nrjank for the links to older discussions. I had not been aware of them.

Is it acceptable to delete all past versions of HTML docs from the website and retain only the current version like a rolling release for documentation?

I don’t think that is a good idea. There are many (more or less reasonable) reasons for users to work with any given version of Octave. We should retain the documentation for all of them (if possible).

something like that was mentioned in the second discussion linked above, but the issue is really with how many broken links you want floating around the internet. “latest” and “interpreter” redirects will get you to the most recent one, but prior to that almost all links out there point to the version specific doc. And as Markus mentioned, sometimes that’s needed. E.g., Kai’s example where betacdf was moved out of core octave. The URL:

https://octave.org/doc/v4.2.0/XREFbetacdf.html

still works, but it does not exist in 6.4.0, and
https://octave.org/doc/latest/XREFbetacdf.html

gives is a broken link.

We could move everything to a new ‘archive’ subfolder, and maybe set it up so that any version specific links redirect to the new archive location whenever a new version is released. then, only the links using ‘latest’ might have the issue above. I don’t know if that location change would affect the google results. or what it is that keeps the v4 results at the top of the google list compared to the others.

this also reminds me that we had a mailing list conversation about the broken link page going to ‘missing function’ is probably not the best choice.

This is a tricky problem and other projects have the same problem. For example, when I search for Qt documentation, I rarely get a link for the latest version. And the Qt documentation pages do not offer any indication that I’m not reading the docs for the latest version.

I guess that having a link that redirects to the latest version of that specific part of the manual can be really tricky in some cases, for example when a documentation page is later split into multiple parts.

I think the goal is to avoid problem of having someone searching for some term, and end up reading the documentation for an old version without realising that the documentation may be out of date.

There is one thing that I found helps a lot, and which Octave documentation does not do, is to include the version number in the page title. Including the version number on the page title means that the version number is displayed on the search results. Just compare searching for “Octave betacdf” and “Qt QDateEdit”

Another thing is having a banner at the top warning that the page currently being viewed is not for the latest version of Octave. See for example, the red banner shown by django for no longer supported and unsecure versions Applications | Django documentation | Django

3 Likes

I agree with leaving all the documentation pages as is without redirection but adding a banner with a link to latest version (same topic whenever possible or starting page). Another example of this: Libbson — libbson 1.10.1

In addition to the discussion above, I draw your attention to the wording. 4.2.1 uses page titles like “GNU Octave: TOPIC” but subsequent versions use “TOPIC (GNU Octave (version 6.4.0))”. I think the fact that Octave is inside parentheses hides it partially from search engines. Would it be OK to change it to the style of 4.2.1?

As this topic comes up again, and TexInfo is too inflexible defining a custom HTML-header, I worked on a JavaScript solution https://octave.org/doc/version_check.js.

This solution avoids a rebuild of all existing documentation and only requires a minimal modification of the existing HTML pages.

Injecting the following <script>-tag before the closing </head>-tag:

  <script type="text/javascript" src="../version_check.js"></script>
</head>

via a sed-one-liner:

sed -i 's|</head>|<script type="text/javascript" src="../version_check.js"></script>\n</head>|' v4.0.0/[^X]*.html

One example I created is:

The maintenance of this solution is reasonable:

  • Update the latest release number in version_check.js manually.
  • Run the sed-one-liner in the new documentation folder after deployment.

If this is a satisfying solution, I will backup the existing documentation and inject this tag in all HTML files.

Apply this solution?

  • Yes
  • No

0 voters

Curious, what will the “latest version of this page” link look like for a page that has no current version like the betacdf page mentioned above? assuming it will look the same and just point to a nonexistent page, we should probably change the default not found landing page from the current “missing function” one.

Could this be done using robots.txt to instruct search engines which pages to index ?

The JavaScript is now applied to all existing documentation. Please report any problems.

Regarding your assertion @nrjank , the “Latest version of this page:” link is now only shown, if that page really exists, to avoid confusion.

In your example, @nrjank for example, the XREF target Distributions.html page still exists, but the XREF to the betacdf function is broken. The solution is for sure not perfect, but in most cases this change is helpful to get to the latest version of the documentation (basically what I did manually in the past years in the browser address bar). Nobody is forced to click the link and luckily there is a history back function in many browsers :innocent:

@phopfgartner thanks for the hint. If you would like to work on this, I can give you details about the current homepage.

1 Like

Looks pretty good! Thanks @siko1056!

Are there pages corresponding to deprecated functions and operators that no longer exist in newer versions?

for the example above for betacdf where the page exists but the function does not, you get:

the link https://octave.org/doc/v6.4.0/Distributions.html#XREFbetacdf still works, but it just takes you to the top of the Distributions page because the #XREFbetacdf bookmark doesn’t exist on the new page.

For whole pages that don’t exist in the new manual, e.g., the contributor guidelines section at https://octave.org/doc/v4.0.1/How-to-Contribute.html#How-to-Contribute, you get:

It only gives the old version warning and the first link to the root of the current manual. (Since this material mainly moved from the manual to the wiki, this makes sense. there is a ‘how you can contribute’ page at the front of the manual with a link to the octave.org ‘get involved’ page.) Google of course keeps pointing to the old page first, but at least the wiki’s there (see below). it’s a bit difficult to fix that unless @phopfgartner’s robots.txt suggestion helps.

image

are there other functions you had in mind that this isn’t working for?

Yes, I can try to have a look at it.