<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Sun, Aug 20, 2017 at 11:08 AM Dan Kenigsberg <<a href="mailto:danken@redhat.com">danken@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Sun, Aug 20, 2017 at 10:39 AM, Yaniv Kaul <<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>> wrote:<br>
><br>
><br>
> On Sun, Aug 20, 2017 at 8:48 AM, Daniel Belenky <<a href="mailto:dbelenky@redhat.com" target="_blank">dbelenky@redhat.com</a>> wrote:<br>
>><br>
>> Failed test: basic_suite_master/002_bootstrap<br>
>> Version: oVirt Master<br>
>> Link to failed job: ovirt-master_change-queue-tester/1860/<br>
>> Link to logs (Jenkins): test logs<br>
>> Suspected patch: <a href="https://gerrit.ovirt.org/#/c/80749/3" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/80749/3</a><br>
>><br>
>> From what I was able to find, It seems that for some reason VDSM failed to<br>
>> start on host 1. The VDSM log is empty, and the only error I could find in<br>
>> supervdsm.log is that start of LLDP failed (Not sure if it's related)<br>
><br>
><br>
> Can you check the networking on the hosts? Something's very strange there.<br>
> For example:<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 NetworkManager[685]: <info><br>
> [1503175122.2682] manager: (e7NZWeNDXwIjQia): new Bond device<br>
> (/org/freedesktop/NetworkManager/Devices/17)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting xmit hash policy to layer2+3 (2)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting xmit hash policy to encap2+3 (3)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting xmit hash policy to encap3+4 (4)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> option xmit_hash_policy: invalid value (5)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting primary_reselect to always (0)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting primary_reselect to better (1)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting primary_reselect to failure (2)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> option primary_reselect: invalid value (3)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting arp_all_targets to any (0)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> Setting arp_all_targets to all (1)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:<br>
> option arp_all_targets: invalid value (2)<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: bonding:<br>
> e7NZWeNDXwIjQia is being deleted...<br>
> Aug 19 16:38:42 lago-basic-suite-master-host0 lldpad: recvfrom(Event<br>
> interface): No buffer space available<br>
><br>
> Y.<br>
<br>
<br>
<br>
The post-boot noise with funny-looking bonds is due to our calling of<br>
`vdsm-tool dump-bonding-options` every boot, in order to find the<br>
bonding defaults for the current kernel.<br>
<br>
><br>
>><br>
>> From host-deploy log:<br>
>><br>
>> 2017-08-19 16:38:41,476-0400 DEBUG otopi.plugins.otopi.services.systemd<br>
>> systemd.state:130 starting service vdsmd<br>
>> 2017-08-19 16:38:41,476-0400 DEBUG otopi.plugins.otopi.services.systemd<br>
>> plugin.executeRaw:813 execute: ('/bin/systemctl', 'start', 'vdsmd.service'),<br>
>> executable='None', cwd='None', env=None<br>
>> 2017-08-19 16:38:44,628-0400 DEBUG otopi.plugins.otopi.services.systemd<br>
>> plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start',<br>
>> 'vdsmd.service'), rc=1<br>
>> 2017-08-19 16:38:44,630-0400 DEBUG otopi.plugins.otopi.services.systemd<br>
>> plugin.execute:921 execute-output: ('/bin/systemctl', 'start',<br>
>> 'vdsmd.service') stdout:<br>
>><br>
>><br>
>> 2017-08-19 16:38:44,630-0400 DEBUG otopi.plugins.otopi.services.systemd<br>
>> plugin.execute:926 execute-output: ('/bin/systemctl', 'start',<br>
>> 'vdsmd.service') stderr:<br>
>> Job for vdsmd.service failed because the control process exited with error<br>
>> code. See "systemctl status vdsmd.service" and "journalctl -xe" for details.<br>
>><br>
>> 2017-08-19 16:38:44,631-0400 DEBUG otopi.context<br>
>> context._executeMethod:142 method exception<br>
>> Traceback (most recent call last):<br>
>> File "/tmp/ovirt-dunwHj8Njn/pythonlib/otopi/context.py", line 132, in<br>
>> _executeMethod<br>
>> method['method']()<br>
>> File<br>
>> "/tmp/ovirt-dunwHj8Njn/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",<br>
>> line 224, in _start<br>
>> self.services.state('vdsmd', True)<br>
>> File "/tmp/ovirt-dunwHj8Njn/otopi-plugins/otopi/services/systemd.py",<br>
>> line 141, in state<br>
>> service=name,<br>
>> RuntimeError: Failed to start service 'vdsmd'<br>
>><br>
>><br>
>> From /var/log/messages:<br>
>><br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: Error:<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: One of<br>
>> the modules is not configured to work with VDSM.<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: To<br>
>> configure the module use the following:<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:<br>
>> 'vdsm-tool configure [--module module-name]'.<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: If all<br>
>> modules are not configured try to use:<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:<br>
>> 'vdsm-tool configure --force'<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: (The<br>
>> force flag will stop the module's service and start it<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:<br>
>> afterwards automatically to load the new configuration.)<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: abrt<br>
>> is already configured for vdsm<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: lvm is<br>
>> configured for vdsm<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:<br>
>> libvirt is already configured for vdsm<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:<br>
>> multipath requires configuration<br>
>> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:<br>
>> Modules sanlock, multipath are not configured<br></blockquote><div><br></div><div>This means the host was not deployed correctly. When deploying vdsm</div><div>host deploy must run "vdsm-tool configure --force", which configures</div><div>multipath and sanlock.</div><div><br></div><div>We did not change anything in multipath and sanlock configurators lately.</div><div><br></div><div>Didi, can you check this?</div></div></div>