CI with GitHub hosted runners

Kai set up a repository mirror of Octave on GitHub:
gnu-octave/octave: GNU Octave Mirror (https://www.octave.org/hg/octave). Report bugs at https://bugs.octave.org (github.com)

They offer running CI tests (via their GitHub Actions) on runners hosted by them:
About GitHub-hosted runners - GitHub Docs

I played around a little bit and managed to get a simple bootstrap-configure-make-make check cycle with a “default” configuration running:
octave/make.yaml at mmuetzel-CI-github-1 · mmuetzel/octave

Would we consider accepting such GitHub specific files in the hg repository? That way, the GitHub mirror would automatically run the CI test on (about) each push.

They also have hosted runners with macOS Catalina 10.15 and macOS Big Sur 11.0.
If somebody knows how to set them up, maybe we could use those to have CI tests run on macOS (which we are currently lacking).

Cheers, I’ve been using GitHub Actions to build and test Octave snap and flatpak packages for a few weeks now.

I don’t have a strong feeling whether or not this file should be included in the Octave repository. But if we do include it, should we also include an equivalent .gitlab-ci.yml file to automatically build on Gitlab CI for mirrors there as well? Seems fair to include both.

FWIW, if the decision is to not include this in the official repo, you can pretty easily create another empty repo on GitHub that has a workflow that simply checks out your mirror and builds it on a schedule.

Nice coincidence.
Maybe we can share some hints.
I think I finally managed to get the rules to take advantage of ccache to speed up build time.
I noticed that the .eps images for the manual seem to be created with gnuplot. Maybe that is because the build is running on a headless system. Not a high priority: But do you know how to have it use a “more modern” graphics toolkit for that build step?

I agree. If we accept this, we should probably also accept a corresponding Gitlab file.
I’d suspect, they could probably be quite similar.

You can try xvfb-run.
E.g.: xvfb-run -a -s ‘-screen 0 640x480x24’ make

An alternative (I have not tried this, but may be Kai has):
https://lists.gnu.org/archive/html/octave-maintainers/2020-02/msg00048.html

xvfb-run seems to work.
Thank you.

The CI rule runs “make check”. But it doesn’t actually take the results of the test suite into account.
Is there a way to have it analyze, e.g., the summary that the test suite prints at the end?

While xvfb-run seems to work most of the time, it sometimes fails with errors like this one:
CI: Fix typo · mmuetzel/octave@64b34e2 (github.com)

QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-runner'
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-runner'
  GEN      doc/interpreter/splinefit3.pdf
  GEN      doc/interpreter/splinefit4.pdf
qt.qpa.xcb: could not connect to display :99
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, xcb.

QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-runner'
/bin/bash: line 1: 168697 Aborted                 (core dumped) /bin/bash run-octave --norc --silent --no-history --path /home/runner/work/octave/octave/.build/../doc/interpreter/ --eval "splineimages ('doc/interpreter/', 'splinefit3', 'pdf');"
make[2]: *** [Makefile:31979: doc/interpreter/splinefit3.pdf] Error 134
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/runner/work/octave/octave/.build'
make[1]: *** [Makefile:27996: all-recursive] Error 1
make[1]: Leaving directory '/home/runner/work/octave/octave/.build'
make: *** [Makefile:11422: all] Error 2
make: Leaving directory '/home/runner/work/octave/octave/.build'
Error: Process completed with exit code 2.

Is there some trick to avoid that error more reliably?

I do not know what it is. It looks to me that Qt cannot connect to the xvfb server.
Perhaps it is some kind of hardware resource limit (storage/memory)?
Do you run many parallel builds?

Dmitri.

According to their documentation, their Linux VMs have 2 cores. So, I thought it would be ok to run make -j2.
Do you think it would be better to not use parallel make?

I am more concern with only 14 GB of disk space. With ccache in use you may just run out of disk space and qt cannot create some temp files…

Dmitri.

Good point. I underestimated the size of a build tree.
I tried compiling with the same configuration on a fresh check out. And the repository including the build tree amounts to 6.5 GB here.
The dependencies that are installed manually in the first step alone account for 1.6 GB according to the logs.

I’m not sure what counts towards the 14 GB limit. Does it include everything (including the base image)?
Ubuntu claims that it requires 5 GB for a minimal installation: Install Ubuntu desktop | Ubuntu
I’m not sure if the VMs are a minimal installation. They are probably not because compilers, build tools and alike are already installed. On the other hand, they probably don’t include a graphical shell.

In any case, it is probably a good idea to limit the size of ccache. I thought a limit of 2 GB might be appropriate because after the first time building with ccache, the used cache was about 1.2 GB.
With 2 GB cache size, there’ll probably still be a decent amount of hits on the long run.

I accidentally built with the wrong compiler for the gcc jobs (for quite a few repetitions). So, the cache is polluted and the current timings are probably not representative.

fyi – here is ccache stat for fedora buildbots (5 builders):

stats updated Sun Apr 25 13:24:09 2021
cache hit (direct) 15799489
cache hit (preprocessed) 387858
cache miss 1172171
cache hit rate 93.25 %
called for link 438612
called for preprocessing 988137
compile failed 486587
preprocessor error 209203
cache file missing 11
bad compiler arguments 1466816
unsupported source language 5172
autoconf compile/link 3850121
no input file 169780
cleanups performed 2068
files in cache 141789
cache size 53.3 GB
max cache size 59.0 GB

In my experience, ccache uses about 90% of the maximum cache size independent on how high it is set most of the time. It is hard to predict from only a single statistic how a smaller or larger cache size would affect the hit rate exactly. (Apart from the general trend that the hit rate will probably be higher for larger caches up to a certain point.)
IIUC, a high increase in the number of cleanups between subsequent runs could indicate that ccache might benefit from a larger cache.

Even with the smaller ccache, it still fails once in a while:

  GEN      doc/interpreter/splinefit3.eps
qt.qpa.xcb: could not connect to display :99
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, xcb.

  GEN      doc/interpreter/splinefit4.eps
/bin/bash: line 1: 163995 Aborted                 (core dumped) /bin/bash run-octave --norc --silent --no-history --path /home/runner/work/octave/octave/.build/../doc/interpreter/ --eval "splineimages ('doc/interpreter/', 'splinefit3', 'eps');"
make[2]: *** [Makefile:31969: doc/interpreter/splinefit3.eps] Error 134
make[2]: *** Waiting for unfinished jobs....

The buildbots seem to run without the -a option. Could that make a difference?

I think @dasergatskov uses Fedora, on my Ubuntu 18.04 Buildbots, I use xvfb-run make -j4

I removed the arguments of xvfb-run.
The error message is different now. But it still sounds like it had issues with the display:
CI: Use xvfb-run without options. · mmuetzel/octave@bd5f176 (github.com)

  GEN      doc/interpreter/extended.png
  GEN      doc/interpreter/precisiondate.png
warning: using the gnuplot graphics toolkit is discouraged

The gnuplot graphics toolkit is not actively maintained and has a number
of limitations that are ulikely to be fixed.  Communication with gnuplot
uses a one-directional pipe and limited information is passed back to the
Octave interpreter so most changes made interactively in the plot window
will not be reflected in the graphics properties managed by Octave.  For
example, if the plot window is closed with a mouse click, Octave will not
be notified and will not update it's internal list of open figure windows.
We recommend using the qt toolkit instead.
Makefile:28505: recipe for target 'doc/interpreter/extended.png' failed
make[2]: *** [doc/interpreter/extended.png] Error 1
make[2]: *** Waiting for unfinished jobs....

Why would it eventually choose to use the gnuplot graphics toolkit when it looks like it was using the qt toolkit before successfully?

Unfortunately i cannot see logs – it require an account login (that I do not have).
Can you inspect the generated EPS files to see is some of them indeed were generated
by qt and some by gnuplot?
The parameters were needed back when we setup buildbots. Apparently “-a” is deprecated and now
you should use “-d”. xvfb-run --help will give you list of options:

xvfb-run --help
Usage: xvfb-run [OPTION …] COMMAND
Run COMMAND (usually an X client) in a virtual X server environment.
Options:
-a --auto-servernum try to get a free server number, starting at
–server-num (deprecated, use --auto-display
instead)
-d --auto-display use the X server to find a display number
automatically
-e FILE --error-file=FILE file used to store xauth errors and Xvfb
output (default: /dev/null)
-f FILE --auth-file=FILE file used to store auth cookie
(default: ./.Xauthority)
-h --help display this usage message and exit
-n NUM --server-num=NUM server number to use (default: 99)
-l --listen-tcp enable TCP port listening in the X server
-p PROTO --xauth-protocol=PROTO X authority protocol name to use
(default: xauth command’s default)
-s ARGS --server-args=ARGS arguments (other than server number and
“-nolisten tcp”) to pass to the Xvfb server
(default: “-screen 0 640x480x24”)
-w DELAY --wait=DELAY delay in seconds to wait for Xvfb to start
before running COMMAND (default: 3)

Afaict, when one of the steps fails, the job aborts and the runner is purged.
I don’t know if it is possible to access some of the files after the fact.

I’ll try the -d option.

I clicked around a little bit in the settings. Can you access the actions panel now?

Can you run make with V=1 ?
(I still get “Sign to view logs”)

On the Ubuntu 18.04 runner:

  xvfb-run -d make -C ./.build -j2 all
  shell: /bin/bash -e {0}
  env:
    XDG_RUNTIME_DIR: /home/runner/tmp
xvfb-run: invalid option -- 'd'
Error: Process completed with exit code 1.

With respect to determining whether gnuplot or qt was used: Octave emits a warning when it uses the gnuplot toolkit since recently. IIUC, make opens a new octave process for each figure it prints. I would see a lot of those warnings in the logs if gnuplot was used consistently.
Afaict, that doesn’t happen.