Barak Korren created OVIRT-1658:

   Summary: Potential issue coming to CentOS7 slaves (glibc, missing UTF-8 locale)
       Key: OVIRT-1658
       URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1658
   Project: oVirt - virtualization made easy
Issue Type: By-EMAIL
  Reporter: Barak Korren
  Assignee: infra

Seems there is an issue with locale files in recent CentOS versions, and we're hitting it in slaves that have been updated recently.

The symptoms are that Jenkins disconnects from the slave and then refuses to reconnect to it. The agent log in Jenkins shows:

[09/19/17 15:42:56] [SSH] Connection closed.

[09/19/17 15:58:48] [SSH] Opening SSH connection to vm0002.workers-phx.ovirt.org:22. [09/19/17 15:58:48] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.

[09/19/17 15:58:48] [SSH] Authentication successful. SSH connection reports a garbage before a command execution. Check your .bashrc, .profile, and so on to make sure it is quiet. The received junk text is as follows: /etc/profile.d/lang.sh: line 19: warning: setlocale: LC_CTYPE: cannot change locale (en_US.utf8): No such file or directory /etc/profile.d/lang.sh: line 20: warning: setlocale: LC_COLLATE: cannot change locale (en_US.utf8): No such file or directory /etc/profile.d/lang.sh: line 23: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.utf8): No such file or directory /etc/profile.d/lang.sh: line 26: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.utf8): No such file or directory /etc/profile.d/lang.sh: line 29: warning: setlocale: LC_TIME: cannot change locale (en_US.utf8): No such file or directory

null [09/19/17 15:58:48] Launch failed – cleaning up connection [09/19/17 15:58:48] [SSH] Connection closed.

The same locale error messages can also be reproduced on the slave by running an interactive login from the console or ‘su -’. When running ‘locale -a’ you can also see the en_US.UTF-8 locale is somehow missing.

Looking around for this I found the following: https://github.com/CentOS/sig-cloud-instance-images/issues/71

I tried downgrading glibc back to the version we had before, but that did not seem to resolve the issue. Eventually I managed to resolve it by running ‘localedef -i en_US -f UTF-8 en_US.UTF-8’ on the slave.

I've seen this happen on ‘vm0002.workers-phx.ovirt.org’ which is attached to the staging Jenkins, but I've no reason to believe this won't start impacting production slaves.

We need to research this further and find out if we need to do something to prevent this issue from surfacing on production slaves.

— Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

— This message was sent by Atlassian {0} (v1001.0.0-SNAPSHOT#100060)