Octave.space fails to build `gnutls`

The buildbots on octave.space fails to build gnutls:
Buildbot (octave.space)

Failed to build package gnutls!
------------------------------------------------------------
make[2]: Entering directory '/buildbot/octave-mxe-default-w64/build'
rm -rf   '/buildbot/octave-mxe-default-w64/build/tmp-gnutls' '/buildbot/octave-mxe-default-w64/build/tmp-gnutls-install'
mkdir -p '/buildbot/octave-mxe-default-w64/build/tmp-gnutls'
( cd '/buildbot/octave-mxe-default-w64/build/tmp-gnutls' &&      xz -dc '/buildbot/mxe-octave-pkg/gnutls-3.6.16.tar.xz' | tar xf - ) ||  false 
test ! -d '/buildbot/octave-mxe-default-w64/build/src/gnutls' || cp -a '/buildbot/octave-mxe-default-w64/build/src/gnutls' '/buildbot/octave-mxe-default-w64/build/tmp-gnutls'
cd '/buildbot/octave-mxe-default-w64/build/tmp-gnutls/gnutls-3.6.16'
(cd '/buildbot/octave-mxe-default-w64/build/tmp-gnutls/gnutls-3.6.16' && patch -p1 -u) < /buildbot/octave-mxe-default-w64/build/src/gnutls-1-fixes.patch
patching file configure.ac
mkdir '/buildbot/octave-mxe-default-w64/build/tmp-gnutls/gnutls-3.6.16/.build'
cd '/buildbot/octave-mxe-default-w64/build/tmp-gnutls/gnutls-3.6.16' && autoreconf -fi 
Can't exec "autopoint": No such file or directory at /buildbot/octave-mxe-default-w64/build/usr/share/autoconf/Autom4te/FileUtils.pm line 345.
autoreconf: failed to run autopoint: No such file or directory
autoreconf: autopoint is needed because this package uses Gettext
make[2]: *** [/buildbot/octave-mxe-default-w64/build/Makefile:978: build-only-gnutls] Error 1
make[2]: Leaving directory '/buildbot/octave-mxe-default-w64/build'
real	0m8.472s
user	0m1.649s
sys	0m0.951s
------------------------------------------------------------
[log]      /buildbot/octave-mxe-default-w64/build/log/gnutls
make[1]: *** [Makefile:980: /buildbot/octave-mxe-default-w64/build/installed-packages/gnutls] Error 1
make[1]: Leaving directory '/buildbot/octave-mxe-default-w64/build'
make: *** [Makefile:650: all] Error 2
program finished with exit code 2

Since the error message mentions gettext, I wonder if this is related to this recent change:
mxe-octave: c30da1cd5e3b

Is autopoint (part of GNU gettext) now a build dependency for mxe? Then I can upgrade the workers next week.

I don’t know why @jwe changed the build rule for gettext. IIUC, something to do with parallel builds?
I still have an old build tree before these changes. In that tree, ./usr/bin/autopoint was still installed (as a shell script).
I haven’t built yet after that change. But it might be that that script is no longer installed. That might be a bug in MXE Octave.

I started seeing this error while building gettext:

http://buildbot.octave.org:8010/#/builders/21/builds/1122/steps/7/logs/stdio

At first, I thought this might be related to parallel builds, but forcing make -j 1 did not fix the problem for me so then I tried using the same rules in build-gettext.mk as we use in gettext.mk. If my changes omit some targets that are needed for other build tools so that they can properly build other targets like the cross build for gnutls then I guess we need to add them to the build-gettext.mk file, but just running a single “make” there was not working.

I reverted my recent changes here:

https://hg.octave.org/mxe-octave/rev/f08c7cbb8df1

I also removed the special actions for msvc because it shouldn’t be needed for a build-PKG like this. But with or without that (it wouldn’t be executed anyway) I can’t reproduce the error when attempting a build myself, even though I am executing it as the buildbot user on the same system where I was seeing the failure. Let’s see what the actual buildbot jobs do now.

Judging by the content of built-packages/build-gettext.tar.xz, the only file that was installed by that package was lib/preloadable_libintl.so. I’d guess the major parts of gettext weren’t installed…

Copying the relevant(?) part of the log with the failure here:

Making all in adhoc-tests
make[6]: Entering directory '/scratch/buildbot/workers/jwe-debian-x86_64-5/w64-32-on-debian/src/tmp-build-gettext/gettext-0.21.build/libtextstyle/adhoc-tests'
gcc -DHAVE_CONFIG_H -I. -I/scratch/buildbot/workers/jwe-debian-x86_64-5/w64-32-on-debian/src/tmp-build-gettext/gettext-0.21/libtextstyle/adhoc-tests -I..  -I. -I/scratch/buildbot/workers/jwe-debian-x86_64-5/w64-32-on-debian/src/tmp-build-gettext/gettext-0.21/libtextstyle/adhoc-tests -I.. -I../lib -I/scratch/buildbot/workers/jwe-debian-x86_64-5/w64-32-on-debian/src/tmp-build-gettext/gettext-0.21/libtextstyle/adhoc-tests/../lib -DSRCDIR=\"/scratch/buildbot/workers/jwe-debian-x86_64-5/w64-32-on-debian/src/tmp-build-gettext/gettext-0.21/libtextstyle/adhoc-tests/\"   -g -O2 -MT hello.o -MD -MP -MF .deps/hello.Tpo -c -o hello.o /scratch/buildbot/workers/jwe-debian-x86_64-5/w64-32-on-debian/src/tmp-build-gettext/gettext-0.21/libtextstyle/adhoc-tests/hello.c
mv -f .deps/hello.Tpo .deps/hello.Po
/bin/bash ../libtool  --tag=CC   --mode=link gcc  -g -O2   -o hello hello.o ../lib/libtextstyle.la 
libtool: link: gcc -g -O2 -o .libs/hello hello.o  ../lib/.libs/libtextstyle.so -lm -lncurses -Wl,-rpath -Wl,/scratch/buildbot/workers/jwe-debian-x86_64-5/w64-32-on-debian/src/usr/lib
/usr/bin/ld: ../lib/.libs/libtextstyle.so: undefined reference to `full_write'
collect2: error: ld returned 1 exit status

IIUC, full_write is from the gnulib module full-write.

I can’t reproduce here either.

I started a normal buildbot job for the w64-32 target with the latest change:

http://buildbot.octave.org:8010/#/builders/21/builds/1136

Looks like it doesn’t do.
Which version of gcc is installed on the buildbots currently?
I have:

$ gcc --version
gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Edit: Do you have access to the full build log? Maybe there are hints earlier on that something is wrong…

Yes, I have the full build logs and I’ll try to compare them.

The weird thing is that the build that just failed is on the same system where it succeeded just a few minutes earlier. In both cases, I was running under the buildbot user ID on the same exact system. It has gcc version 10.2.1 20210110 (Debian 10.2.1-6).

One difference is that for the successful build I did

make clean
./configure ...
make JOBS= ... build-gettext

and build-gettext was the only package built, but buildbot built the following packages before build-gettext:

build-m4
build-xz
build-autoconf
build-automake
build-binutils
build-bison
gcc-gmp
gcc-isl
gcc-mpfr
gcc-mpc
mingw-w64
build-gcc
build-cmake
build-lzip
build-flex
build-gawk

Could that matter? I just tried logging in, becoming the buildbot user and running

make clean
./configure ...
make JOBS= ...

to build all those other packages before build-gettext and it succeeded to build all of them including build-gettext in the same order as above!?!

@mmuetzel: Maybe this change will make it work properly?

https://hg.octave.org/mxe-octave/rev/62eada485ce7

New buildbot test build here:

http://buildbot.octave.org:8010/#/builders/21/builds/1138

I guess not. WTF?!?

IIUC, full_write is defined in gettext-0.21/libtextstyle/lib/full-write.c (copied from gnulib?). That should have been built and linked into libtextstyle.so.
Do you have access to the left-overs of the failed build?
Can you check if that symbol is exported correctly? Maybe something along the lines of

nm libtextstyle/lib/.libs/full-write.o | grep full_write
nm libtextstyle/lib/.libs/libtextstyle.so | grep full_write

In the directory tree where the latest buildbot build failed:

$ files=$(find . -name '*.o') ; for f in $files ; do if test -n "$(nm $f | grep full_write)" ; then echo $f ; nm $f | grep full_write ; fi ; done
./gettext-tools/gnulib-lib/full-write.o
0000000000000000 T full_write
./gettext-tools/gnulib-lib/copy-file.o
                 U full_write
./gettext-tools/gnulib-lib/.libs/full-write.o
0000000000000000 T full_write
./gettext-tools/gnulib-lib/.libs/copy-file.o
                 U full_write
./libtextstyle/lib/full-write.o
0000000000000000 T libtextstyle_full_write
./libtextstyle/lib/.libs/full-write.o
0000000000000000 T libtextstyle_full_write
./libtextstyle/lib/.libs/term-ostream.o
                 U libtextstyle_full_write
./libtextstyle/lib/.libs/term-style-control.o
                 U full_write
./libtextstyle/lib/.libs/fd-ostream.o
                 U libtextstyle_full_write
./libtextstyle/lib/term-ostream.o
                 U full_write
./libtextstyle/lib/term-style-control.o
                 U full_write
./libtextstyle/lib/fd-ostream.o
                 U full_write

In the directory tree of a successful build:

$ files=$(find . -name '*.o') ; for f in $files ; do if test -n "$(nm $f | grep full_write)" ; then echo $f ; nm $f | grep full_write ; fi ; done
./gettext-tools/gnulib-lib/full-write.o
0000000000000000 T full_write
./gettext-tools/gnulib-lib/copy-file.o
                 U full_write
./gettext-tools/gnulib-lib/.libs/full-write.o
0000000000000000 T full_write
./gettext-tools/gnulib-lib/.libs/copy-file.o
                 U full_write
./gettext-tools/src/msgexec-msgexec.o
                 U full_write
./gettext-tools/src/urlget-urlget.o
                 U full_write
./libtextstyle/lib/full-write.o
0000000000000000 T libtextstyle_full_write
./libtextstyle/lib/.libs/full-write.o
0000000000000000 T libtextstyle_full_write
./libtextstyle/lib/.libs/term-ostream.o
                 U libtextstyle_full_write
./libtextstyle/lib/.libs/term-style-control.o
                 U libtextstyle_full_write
./libtextstyle/lib/.libs/fd-ostream.o
                 U libtextstyle_full_write
./libtextstyle/lib/term-ostream.o
                 U libtextstyle_full_write
./libtextstyle/lib/term-style-control.o
                 U libtextstyle_full_write
./libtextstyle/lib/fd-ostream.o
                 U libtextstyle_full_write

I’m also uploading log files for the latest failed build and for a successful build:

build-gettext-failure-log.txt (1.1 MB)

build-gettext-success-log.txt (2.9 MB)

The successful build was done in the same directory as the failed build after executing

make clean
./configure ... using options from the previous config.log file ...
make JOBS=16 KEEP_BUILD=1 build-gettext

I’m out of ideas.

For some reason, the libtextstyle_ prefix seems to be missing for some undefined symbols in objects for libtextstyle.so. In the successful build, all symbols seem to have the correct prefix.
This seems to be done with sed rules in gettext-0.21\libtextstyle\lib\Makefile.in at around line 5000.
Maybe a disc buffer issue on a fast system? Is it possible that the un-modified files are built somehow before the files were correctly updated?

Or ccache? I had one more idea. When buildbot performs the build, the log files available on the web server show that PATH is

PATH=/var/lib/buildbot/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

but when I execute the commands myself under the buildbot user ID, the PATH is

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

The only things in /var/lib/buildbot/bin are links to /usr/lib/ccache for gcc, g++, gfortran, clang, etc.

I tested again separate from triggering a buildbot build and it seems that if I add that directory to the path, the build fails. If I remove it, it succeeds. So I’m guessing something is tricking ccache to return the wrong object file in some cases.

The ~/.ccache/ccache.conf file is

umask = 002
max_size = 100G
cache_dir_levels = 3

I’m also building with ccache early in the path.

$ echo $PATH
/home/osboxes/usr/bin:/usr/lib/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
$ ccache --version
ccache version 4.2
$ cat ~/.ccache/ccache.conf 
max_size = 80G
hash_dir = false
compiler_check = %compiler% -v

All of the following are symlinks to ccache:

$ ls /usr/lib/ccache
c++      clang       clang++-11  g++     gcc-7                 i686-w64-mingw32-gcc-10-posix  x86_64-linux-gnu-gcc     x86_64-w64-mingw32-g++
c89-gcc  clang++     clang-11    g++-10  gcc-9                 i686-w64-mingw32-gcc-10-win32  x86_64-linux-gnu-gcc-10  x86_64-w64-mingw32-gcc
c99-gcc  clang++-10  clang++-12  gcc     i686-w64-mingw32-g++  x86_64-linux-gnu-g++           x86_64-linux-gnu-gcc-7   x86_64-w64-mingw32-gcc-10-posix
cc       clang-10    clang-12    gcc-10  i686-w64-mingw32-gcc  x86_64-linux-gnu-g++-10        x86_64-linux-gnu-gcc-9   x86_64-w64-mingw32-gcc-10-win32

Could it be a bug in ccache? I have version 4.4.

My hunch is that ccache could have a slight effect on disc performance and it is a caching issue.
Maybe a sleep 1 at the end of the command that adds the #defines with the prefixes to config.h might help…
But it could also be something different…

The following change works for me:

diff -r 62eada485ce7 src/build-gettext.mk
--- a/src/build-gettext.mk      Fri Sep 10 13:23:24 2021 -0400
+++ b/src/build-gettext.mk      Fri Sep 10 14:49:42 2021 -0400
@@ -24,9 +24,6 @@
         --without-libexpat-prefix \
         --without-libxml2-prefix \
        $($(PKG)_CONFIGURE_OPTIONS)
-    $(MAKE) -C '$(1).build/gnulib-local' -j $(JOBS)
-    $(MAKE) -C '$(1).build/gettext-runtime/gnulib-lib' -j $(JOBS)
-    $(MAKE) -C '$(1).build/gettext-tools/gnulib-lib' -j $(JOBS)
-    $(MAKE) -C '$(1).build' -j $(JOBS)
+    CCACHE_DISABLE=1 $(MAKE) -C '$(1).build' -j $(JOBS)
     $(MAKE) -C '$(1).build' -j 1 $(MXE_DISABLE_DOCS) install DESTDIR='$(3)'
 endef

If I remove the CCACHE_DISABLE=1 from the Make command line, then it fails.

Great!
Should we just take that as the solution? Or do we need to dig deeper to find the “actual” reason for the failure?
It could still be ccache itself or the disc buffer…

I doubt things like OS disk caching because it seems that would cause all kinds of trouble for parallel builds with or without things like ccache.

There could definitely be a race condition of some kind though. For example, it also works for me if I use

CCACHE_NODIRECT=1 $(MAKE) -C '$(1).build' -j $(JOBS)

If I understand correctly, that tells ccache to use the preprocessor mode instead of hashing the contents of source and header files separately. So if ccache hashes the config.h it finds in the search path before libtextstyle/lib/config.h is created, then it will return the wrong info?

I tried the following change but it didn’t work.

diff -r 62eada485ce7 src/build-gettext.mk
--- a/src/build-gettext.mk      Fri Sep 10 13:23:24 2021 -0400
+++ b/src/build-gettext.mk      Fri Sep 10 15:30:43 2021 -0400
@@ -24,9 +24,7 @@
         --without-libexpat-prefix \
         --without-libxml2-prefix \
        $($(PKG)_CONFIGURE_OPTIONS)
-    $(MAKE) -C '$(1).build/gnulib-local' -j $(JOBS)
-    $(MAKE) -C '$(1).build/gettext-runtime/gnulib-lib' -j $(JOBS)
-    $(MAKE) -C '$(1).build/gettext-tools/gnulib-lib' -j $(JOBS)
-    $(MAKE) -C '$(1).build' -j $(JOBS)
+    $(MAKE) -C '$(1).build/libtextstyle/lib' config.h
+    $(MAKE) -C '$(1).build' -j '$(JOBS)'
     $(MAKE) -C '$(1).build' -j 1 $(MXE_DISABLE_DOCS) install DESTDIR='$(3)'
 endef

With that change, the final version of libtextstyle/lib/config.h has lines like

#define libtextstyle_xalloc_die libtextstyle_libtextstyle_xalloc_die

instead of

#define xalloc_die libtextstyle_xalloc_die

I don’t see the harm in the following change, so I checked it in.

https://hg.octave.org/mxe-octave/rev/fbad4bcfdf3c

Test build started here:

http://buildbot.octave.org:8010/#/builders/21/builds/1139

1 Like