Seems there is an issue with locale files in recent CentOS versions,
and we're hitting it in slaves that have been updated recently.
The symptoms are that Jenkins disconnects from the slave and then
refuses to reconnect to it. The agent log in Jenkins shows:
[09/19/17 15:42:56] [SSH] Connection closed.
[09/19/17 15:58:48] [SSH] Opening SSH connection to
vm0002.workers-phx.ovirt.org:22 .
[09/19/17 15:58:48] [SSH] WARNING: SSH Host Keys are not being
verified. Man-in-the-middle attacks may be possible against this
connection.
[09/19/17 15:58:48] [SSH] Authentication successful.
SSH connection reports a garbage before a command execution.
Check your .bashrc, .profile, and so on to make sure it is quiet.
The received junk text is as follows:
/etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot
change locale (en_US.utf8): No such file or directory
/etc/profile.d/lang.sh: line 20: warning: setlocale: LC_COLLATE:
cannot change locale (en_US.utf8): No such file or directory
/etc/profile.d/lang.sh: line 23: warning: setlocale: LC_MESSAGES:
cannot change locale (en_US.utf8): No such file or directory
/etc/profile.d/lang.sh: line 26: warning: setlocale: LC_NUMERIC:
cannot change locale (en_US.utf8): No such file or directory
/etc/profile.d/lang.sh: line 29: warning: setlocale: LC_TIME: cannot
change locale (en_US.utf8): No such file or directory
null
[09/19/17 15:58:48] Launch failed - cleaning up connection
[09/19/17 15:58:48] [SSH] Connection closed.
The same locale error messages can also be reproduced on the slave by
running an interactive login from the console or 'su -'. When running
'locale -a' you can also see the en_US.UTF-8 locale is somehow
missing.
Looking around for this I found the following:
https://github.com/CentOS/sig-cloud-instance-images/issues/ 71
I tried downgrading glibc back to the version we had before, but that
did not seem to resolve the issue. Eventually I managed to resolve it
by running 'localedef -i en_US -f UTF-8 en_US.UTF-8' on the slave.
I've seen this happen on 'vm0002.workers-phx.ovirt.org' which is
attached to the staging Jenkins, but I've no reason to believe this
won't start impacting production slaves.
We need to research this further and find out if we need to do
something to prevent this issue from surfacing on production slaves.
--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra