
On Sun, Aug 20, 2017 at 11:08 AM Dan Kenigsberg <danken@redhat.com> wrote:
On Sun, Aug 20, 2017 at 10:39 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Sun, Aug 20, 2017 at 8:48 AM, Daniel Belenky <dbelenky@redhat.com>
Failed test: basic_suite_master/002_bootstrap Version: oVirt Master Link to failed job: ovirt-master_change-queue-tester/1860/ Link to logs (Jenkins): test logs Suspected patch: https://gerrit.ovirt.org/#/c/80749/3
From what I was able to find, It seems that for some reason VDSM failed
to
start on host 1. The VDSM log is empty, and the only error I could find in supervdsm.log is that start of LLDP failed (Not sure if it's related)
Can you check the networking on the hosts? Something's very strange
wrote: there.
For example: Aug 19 16:38:42 lago-basic-suite-master-host0 NetworkManager[685]: <info> [1503175122.2682] manager: (e7NZWeNDXwIjQia): new Bond device (/org/freedesktop/NetworkManager/Devices/17) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting xmit hash policy to layer2+3 (2) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting xmit hash policy to encap2+3 (3) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting xmit hash policy to encap3+4 (4) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: option xmit_hash_policy: invalid value (5) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting primary_reselect to always (0) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting primary_reselect to better (1) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting primary_reselect to failure (2) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: option primary_reselect: invalid value (3) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting arp_all_targets to any (0) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: Setting arp_all_targets to all (1) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia: option arp_all_targets: invalid value (2) Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: bonding: e7NZWeNDXwIjQia is being deleted... Aug 19 16:38:42 lago-basic-suite-master-host0 lldpad: recvfrom(Event interface): No buffer space available
Y.
The post-boot noise with funny-looking bonds is due to our calling of `vdsm-tool dump-bonding-options` every boot, in order to find the bonding defaults for the current kernel.
From host-deploy log:
2017-08-19 16:38:41,476-0400 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 starting service vdsmd 2017-08-19 16:38:41,476-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/bin/systemctl', 'start',
executable='None', cwd='None', env=None 2017-08-19 16:38:44,628-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start', 'vdsmd.service'), rc=1 2017-08-19 16:38:44,630-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout:
2017-08-19 16:38:44,630-0400 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: Job for vdsmd.service failed because the control process exited with error code. See "systemctl status vdsmd.service" and "journalctl -xe" for
'vdsmd.service'), details.
2017-08-19 16:38:44,631-0400 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-dunwHj8Njn/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File
"/tmp/ovirt-dunwHj8Njn/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
line 224, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-dunwHj8Njn/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd'
From /var/log/messages:
Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: Error: Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: One of the modules is not configured to work with VDSM. Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: To configure the module use the following: Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: 'vdsm-tool configure [--module module-name]'. Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: If all modules are not configured try to use: Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: 'vdsm-tool configure --force' Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: (The force flag will stop the module's service and start it Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: afterwards automatically to load the new configuration.) Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: abrt is already configured for vdsm Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: lvm is configured for vdsm Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: libvirt is already configured for vdsm Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: multipath requires configuration Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: Modules sanlock, multipath are not configured
This means the host was not deployed correctly. When deploying vdsm host deploy must run "vdsm-tool configure --force", which configures multipath and sanlock. We did not change anything in multipath and sanlock configurators lately. Didi, can you check this?