+ Yaniv B.

On Tue, Aug 22, 2017 at 12:14 PM, Nir Soffer <nsoffer@redhat.com> wrote:


On יום ג׳, 22 באוג׳ 2017, 12:57 Yedidyah Bar David <didi@redhat.com> wrote:
On Tue, Aug 22, 2017 at 12:52 PM, Anton Marchukov <amarchuk@redhat.com> wrote:
> Hello All.
>
> Any news on this?  I see the latest failures for vdsm is the same [1] and
> the job is still not working for it.
>
> [1]
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1901/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20170822035135-lago-basic-suite-master-host0-1f46d892.log

This log has:

2017-08-22 03:51:28,272-0400 DEBUG otopi.context
context._executeMethod:128 Stage closeup METHOD
otopi.plugins.ovirt_host_deploy.vdsm.packages.Plugin._reconfigure
2017-08-22 03:51:28,272-0400 DEBUG
otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.executeRaw:813
execute: ('/bin/vdsm-tool', 'configure', '--force'),
executable='None', cwd='None', env=None
2017-08-22 03:51:30,687-0400 DEBUG
otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.executeRaw:863
execute-result: ('/bin/vdsm-tool', 'configure', '--force'), rc=1
2017-08-22 03:51:30,688-0400 DEBUG
otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921
execute-output: ('/bin/vdsm-tool', 'configure', '--force') stdout:

Checking configuration status...

abrt is not configured for vdsm
WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based
on vdsm configuration
lvm requires configuration
libvirt is not configured for vdsm yet
FAILED: conflicting vdsm and libvirt-qemu tls configuration.
vdsm.conf with ssl=True requires the following changes:
libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1
qemu.conf: spice_tls=1.
multipath requires configuration

Running configure...
Reconfiguration of abrt is done.
Reconfiguration of passwd is done.
WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based
on vdsm configuration
Backing up /etc/lvm/lvmlocal.conf to /etc/lvm/lvmlocal.conf.201708220351
Installing /usr/share/vdsm/lvmlocal.conf at /etc/lvm/lvmlocal.conf
Units need configuration: {'lvm2-lvmetad.service': {'LoadState':
'loaded', 'ActiveState': 'active'}, 'lvm2-lvmetad.socket':
{'LoadState': 'loaded', 'ActiveState': 'active'}}
Reconfiguration of lvm is done.
Reconfiguration of sebool is done.

2017-08-22 03:51:30,688-0400 DEBUG
otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:926
execute-output: ('/bin/vdsm-tool', 'configure', '--force') stderr:
Error:  ServiceNotExistError: Tried all alternatives but failed:
ServiceNotExistError: dev-hugepages1G.mount is not native systemctl service
ServiceNotExistError: dev-hugepages1G.mount is not a SysV service


2017-08-22 03:51:30,689-0400 WARNING
otopi.plugins.ovirt_host_deploy.vdsm.packages
packages._reconfigure:155 Cannot configure vdsm

Nir, any idea?

Looks like some configurator has failed after sebool, but we don't have proper error message with the name of the configurator.

Piotr, can you take a look?


I saw it yesterday and already talked to Yaniv about it.

@Yaniv please take a look.
 


>
>
>
> On Sun, Aug 20, 2017 at 12:39 PM, Nir Soffer <nsoffer@redhat.com> wrote:
>>
>> On Sun, Aug 20, 2017 at 11:08 AM Dan Kenigsberg <danken@redhat.com> wrote:
>>>
>>> On Sun, Aug 20, 2017 at 10:39 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
>>> >
>>> >
>>> > On Sun, Aug 20, 2017 at 8:48 AM, Daniel Belenky <dbelenky@redhat.com>
>>> > wrote:
>>> >>
>>> >> Failed test: basic_suite_master/002_bootstrap
>>> >> Version: oVirt Master
>>> >> Link to failed job: ovirt-master_change-queue-tester/1860/
>>> >> Link to logs (Jenkins): test logs
>>> >> Suspected patch: https://gerrit.ovirt.org/#/c/80749/3
>>> >>
>>> >> From what I was able to find, It seems that for some reason VDSM
>>> >> failed to
>>> >> start on host 1. The VDSM log is empty, and the only error I could
>>> >> find in
>>> >> supervdsm.log is that start of LLDP failed (Not sure if it's related)
>>> >
>>> >
>>> > Can you check the networking on the hosts? Something's very strange
>>> > there.
>>> > For example:
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 NetworkManager[685]:
>>> > <info>
>>> > [1503175122.2682] manager: (e7NZWeNDXwIjQia): new Bond device
>>> > (/org/freedesktop/NetworkManager/Devices/17)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting xmit hash policy to layer2+3 (2)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting xmit hash policy to encap2+3 (3)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting xmit hash policy to encap3+4 (4)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > option xmit_hash_policy: invalid value (5)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting primary_reselect to always (0)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting primary_reselect to better (1)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting primary_reselect to failure (2)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > option primary_reselect: invalid value (3)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting arp_all_targets to any (0)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > Setting arp_all_targets to all (1)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: e7NZWeNDXwIjQia:
>>> > option arp_all_targets: invalid value (2)
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: bonding:
>>> > e7NZWeNDXwIjQia is being deleted...
>>> > Aug 19 16:38:42 lago-basic-suite-master-host0 lldpad: recvfrom(Event
>>> > interface): No buffer space available
>>> >
>>> > Y.
>>>
>>>
>>>
>>> The post-boot noise with funny-looking bonds is due to our calling of
>>> `vdsm-tool dump-bonding-options` every boot, in order to find the
>>> bonding defaults for the current kernel.
>>>
>>> >
>>> >>
>>> >> From host-deploy log:
>>> >>
>>> >> 2017-08-19 16:38:41,476-0400 DEBUG
>>> >> otopi.plugins.otopi.services.systemd
>>> >> systemd.state:130 starting service vdsmd
>>> >> 2017-08-19 16:38:41,476-0400 DEBUG
>>> >> otopi.plugins.otopi.services.systemd
>>> >> plugin.executeRaw:813 execute: ('/bin/systemctl', 'start',
>>> >> 'vdsmd.service'),
>>> >> executable='None', cwd='None', env=None
>>> >> 2017-08-19 16:38:44,628-0400 DEBUG
>>> >> otopi.plugins.otopi.services.systemd
>>> >> plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start',
>>> >> 'vdsmd.service'), rc=1
>>> >> 2017-08-19 16:38:44,630-0400 DEBUG
>>> >> otopi.plugins.otopi.services.systemd
>>> >> plugin.execute:921 execute-output: ('/bin/systemctl', 'start',
>>> >> 'vdsmd.service') stdout:
>>> >>
>>> >>
>>> >> 2017-08-19 16:38:44,630-0400 DEBUG
>>> >> otopi.plugins.otopi.services.systemd
>>> >> plugin.execute:926 execute-output: ('/bin/systemctl', 'start',
>>> >> 'vdsmd.service') stderr:
>>> >> Job for vdsmd.service failed because the control process exited with
>>> >> error
>>> >> code. See "systemctl status vdsmd.service" and "journalctl -xe" for
>>> >> details.
>>> >>
>>> >> 2017-08-19 16:38:44,631-0400 DEBUG otopi.context
>>> >> context._executeMethod:142 method exception
>>> >> Traceback (most recent call last):
>>> >>   File "/tmp/ovirt-dunwHj8Njn/pythonlib/otopi/context.py", line 132,
>>> >> in
>>> >> _executeMethod
>>> >>     method['method']()
>>> >>   File
>>> >>
>>> >> "/tmp/ovirt-dunwHj8Njn/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
>>> >> line 224, in _start
>>> >>     self.services.state('vdsmd', True)
>>> >>   File
>>> >> "/tmp/ovirt-dunwHj8Njn/otopi-plugins/otopi/services/systemd.py",
>>> >> line 141, in state
>>> >>     service=name,
>>> >> RuntimeError: Failed to start service 'vdsmd'
>>> >>
>>> >>
>>> >> From /var/log/messages:
>>> >>
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> Error:
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> One of
>>> >> the modules is not configured to work with VDSM.
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: To
>>> >> configure the module use the following:
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> 'vdsm-tool configure [--module module-name]'.
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh: If
>>> >> all
>>> >> modules are not configured try to use:
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> 'vdsm-tool configure --force'
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> (The
>>> >> force flag will stop the module's service and start it
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> afterwards automatically to load the new configuration.)
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> abrt
>>> >> is already configured for vdsm
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> lvm is
>>> >> configured for vdsm
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> libvirt is already configured for vdsm
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> multipath requires configuration
>>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>>> >> Modules sanlock, multipath are not configured
>>
>>
>> This means the host was not deployed correctly. When deploying vdsm
>> host deploy must run "vdsm-tool configure --force", which configures
>> multipath and sanlock.
>>
>> We did not change anything in multipath and sanlock configurators lately.
>>
>> Didi, can you check this?
>>
>> _______________________________________________
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
>
> --
> Anton Marchukov
> Team Lead - Release Management - RHV DevOps - Red Hat
>



--
Didi