On Tue, Aug 22, 2017 at 1:48 PM Dan Kenigsberg <danken@redhat.com> wrote:
This seems to be my fault, https://gerrit.ovirt.org/80908 should fix it.

This fix the actual error, but we still have bad logging.

Piotr, can you fix error handling so we get something like:

    Error configuring "foobar": actual error...
 

On Tue, Aug 22, 2017 at 1:14 PM, Nir Soffer <nsoffer@redhat.com> wrote:
>
>
> On יום ג׳, 22 באוג׳ 2017, 12:57 Yedidyah Bar David <didi@redhat.com> wrote:
>>
>> On Tue, Aug 22, 2017 at 12:52 PM, Anton Marchukov <amarchuk@redhat.com>
>> wrote:
>> > Hello All.
>> >
>> > Any news on this?  I see the latest failures for vdsm is the same [1]
>> > and
>> > the job is still not working for it.
>> >
>> > [1]
>> >
>> > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/1901/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20170822035135-lago-basic-suite-master-host0-1f46d892.log
>>
>> This log has:
>>
>> 2017-08-22 03:51:28,272-0400 DEBUG otopi.context
>> context._executeMethod:128 Stage closeup METHOD
>> otopi.plugins.ovirt_host_deploy.vdsm.packages.Plugin._reconfigure
>> 2017-08-22 03:51:28,272-0400 DEBUG
>> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.executeRaw:813
>> execute: ('/bin/vdsm-tool', 'configure', '--force'),
>> executable='None', cwd='None', env=None
>> 2017-08-22 03:51:30,687-0400 DEBUG
>> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.executeRaw:863
>> execute-result: ('/bin/vdsm-tool', 'configure', '--force'), rc=1
>> 2017-08-22 03:51:30,688-0400 DEBUG
>> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921
>> execute-output: ('/bin/vdsm-tool', 'configure', '--force') stdout:
>>
>> Checking configuration status...
>>
>> abrt is not configured for vdsm
>> WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based
>> on vdsm configuration
>> lvm requires configuration
>> libvirt is not configured for vdsm yet
>> FAILED: conflicting vdsm and libvirt-qemu tls configuration.
>> vdsm.conf with ssl=True requires the following changes:
>> libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1
>> qemu.conf: spice_tls=1.
>> multipath requires configuration
>>
>> Running configure...
>> Reconfiguration of abrt is done.
>> Reconfiguration of passwd is done.
>> WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based
>> on vdsm configuration
>> Backing up /etc/lvm/lvmlocal.conf to /etc/lvm/lvmlocal.conf.201708220351
>> Installing /usr/share/vdsm/lvmlocal.conf at /etc/lvm/lvmlocal.conf
>> Units need configuration: {'lvm2-lvmetad.service': {'LoadState':
>> 'loaded', 'ActiveState': 'active'}, 'lvm2-lvmetad.socket':
>> {'LoadState': 'loaded', 'ActiveState': 'active'}}
>> Reconfiguration of lvm is done.
>> Reconfiguration of sebool is done.
>>
>> 2017-08-22 03:51:30,688-0400 DEBUG
>> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:926
>> execute-output: ('/bin/vdsm-tool', 'configure', '--force') stderr:
>> Error:  ServiceNotExistError: Tried all alternatives but failed:
>> ServiceNotExistError: dev-hugepages1G.mount is not native systemctl
>> service
>> ServiceNotExistError: dev-hugepages1G.mount is not a SysV service
>>
>>
>> 2017-08-22 03:51:30,689-0400 WARNING
>> otopi.plugins.ovirt_host_deploy.vdsm.packages
>> packages._reconfigure:155 Cannot configure vdsm
>>
>> Nir, any idea?
>
>
> Looks like some configurator has failed after sebool, but we don't have
> proper error message with the name of the configurator.
>
> Piotr, can you take a look?
>
>
>>
>> >
>> >
>> >
>> > On Sun, Aug 20, 2017 at 12:39 PM, Nir Soffer <nsoffer@redhat.com> wrote:
>> >>
>> >> On Sun, Aug 20, 2017 at 11:08 AM Dan Kenigsberg <danken@redhat.com>
>> >> wrote:
>> >>>
>> >>> On Sun, Aug 20, 2017 at 10:39 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
>> >>> >
>> >>> >
>> >>> > On Sun, Aug 20, 2017 at 8:48 AM, Daniel Belenky
>> >>> > <dbelenky@redhat.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Failed test: basic_suite_master/002_bootstrap
>> >>> >> Version: oVirt Master
>> >>> >> Link to failed job: ovirt-master_change-queue-tester/1860/
>> >>> >> Link to logs (Jenkins): test logs
>> >>> >> Suspected patch: https://gerrit.ovirt.org/#/c/80749/3
>> >>> >>
>> >>> >> From what I was able to find, It seems that for some reason VDSM
>> >>> >> failed to
>> >>> >> start on host 1. The VDSM log is empty, and the only error I could
>> >>> >> find in
>> >>> >> supervdsm.log is that start of LLDP failed (Not sure if it's
>> >>> >> related)
>> >>> >
>> >>> >
>> >>> > Can you check the networking on the hosts? Something's very strange
>> >>> > there.
>> >>> > For example:
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 NetworkManager[685]:
>> >>> > <info>
>> >>> > [1503175122.2682] manager: (e7NZWeNDXwIjQia): new Bond device
>> >>> > (/org/freedesktop/NetworkManager/Devices/17)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting xmit hash policy to layer2+3 (2)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting xmit hash policy to encap2+3 (3)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting xmit hash policy to encap3+4 (4)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > option xmit_hash_policy: invalid value (5)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting primary_reselect to always (0)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting primary_reselect to better (1)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting primary_reselect to failure (2)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > option primary_reselect: invalid value (3)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting arp_all_targets to any (0)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > Setting arp_all_targets to all (1)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel:
>> >>> > e7NZWeNDXwIjQia:
>> >>> > option arp_all_targets: invalid value (2)
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 kernel: bonding:
>> >>> > e7NZWeNDXwIjQia is being deleted...
>> >>> > Aug 19 16:38:42 lago-basic-suite-master-host0 lldpad: recvfrom(Event
>> >>> > interface): No buffer space available
>> >>> >
>> >>> > Y.
>> >>>
>> >>>
>> >>>
>> >>> The post-boot noise with funny-looking bonds is due to our calling of
>> >>> `vdsm-tool dump-bonding-options` every boot, in order to find the
>> >>> bonding defaults for the current kernel.
>> >>>
>> >>> >
>> >>> >>
>> >>> >> From host-deploy log:
>> >>> >>
>> >>> >> 2017-08-19 16:38:41,476-0400 DEBUG
>> >>> >> otopi.plugins.otopi.services.systemd
>> >>> >> systemd.state:130 starting service vdsmd
>> >>> >> 2017-08-19 16:38:41,476-0400 DEBUG
>> >>> >> otopi.plugins.otopi.services.systemd
>> >>> >> plugin.executeRaw:813 execute: ('/bin/systemctl', 'start',
>> >>> >> 'vdsmd.service'),
>> >>> >> executable='None', cwd='None', env=None
>> >>> >> 2017-08-19 16:38:44,628-0400 DEBUG
>> >>> >> otopi.plugins.otopi.services.systemd
>> >>> >> plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start',
>> >>> >> 'vdsmd.service'), rc=1
>> >>> >> 2017-08-19 16:38:44,630-0400 DEBUG
>> >>> >> otopi.plugins.otopi.services.systemd
>> >>> >> plugin.execute:921 execute-output: ('/bin/systemctl', 'start',
>> >>> >> 'vdsmd.service') stdout:
>> >>> >>
>> >>> >>
>> >>> >> 2017-08-19 16:38:44,630-0400 DEBUG
>> >>> >> otopi.plugins.otopi.services.systemd
>> >>> >> plugin.execute:926 execute-output: ('/bin/systemctl', 'start',
>> >>> >> 'vdsmd.service') stderr:
>> >>> >> Job for vdsmd.service failed because the control process exited
>> >>> >> with
>> >>> >> error
>> >>> >> code. See "systemctl status vdsmd.service" and "journalctl -xe" for
>> >>> >> details.
>> >>> >>
>> >>> >> 2017-08-19 16:38:44,631-0400 DEBUG otopi.context
>> >>> >> context._executeMethod:142 method exception
>> >>> >> Traceback (most recent call last):
>> >>> >>   File "/tmp/ovirt-dunwHj8Njn/pythonlib/otopi/context.py", line
>> >>> >> 132,
>> >>> >> in
>> >>> >> _executeMethod
>> >>> >>     method['method']()
>> >>> >>   File
>> >>> >>
>> >>> >>
>> >>> >> "/tmp/ovirt-dunwHj8Njn/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
>> >>> >> line 224, in _start
>> >>> >>     self.services.state('vdsmd', True)
>> >>> >>   File
>> >>> >> "/tmp/ovirt-dunwHj8Njn/otopi-plugins/otopi/services/systemd.py",
>> >>> >> line 141, in state
>> >>> >>     service=name,
>> >>> >> RuntimeError: Failed to start service 'vdsmd'
>> >>> >>
>> >>> >>
>> >>> >> From /var/log/messages:
>> >>> >>
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> Error:
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> One of
>> >>> >> the modules is not configured to work with VDSM.
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> To
>> >>> >> configure the module use the following:
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> 'vdsm-tool configure [--module module-name]'.
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> If
>> >>> >> all
>> >>> >> modules are not configured try to use:
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> 'vdsm-tool configure --force'
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> (The
>> >>> >> force flag will stop the module's service and start it
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> afterwards automatically to load the new configuration.)
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> abrt
>> >>> >> is already configured for vdsm
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> lvm is
>> >>> >> configured for vdsm
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> libvirt is already configured for vdsm
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> multipath requires configuration
>> >>> >> Aug 19 16:38:44 lago-basic-suite-master-host0 vdsmd_init_common.sh:
>> >>> >> Modules sanlock, multipath are not configured
>> >>
>> >>
>> >> This means the host was not deployed correctly. When deploying vdsm
>> >> host deploy must run "vdsm-tool configure --force", which configures
>> >> multipath and sanlock.
>> >>
>> >> We did not change anything in multipath and sanlock configurators
>> >> lately.
>> >>
>> >> Didi, can you check this?
>> >>
>> >> _______________________________________________
>> >> Devel mailing list
>> >> Devel@ovirt.org
>> >> http://lists.ovirt.org/mailman/listinfo/devel
>> >
>> >
>> >
>> >
>> > --
>> > Anton Marchukov
>> > Team Lead - Release Management - RHV DevOps - Red Hat
>> >
>>
>>
>>
>> --
>> Didi
>
>
> _______________________________________________
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel