On Tue, Apr 3, 2018 at 3:57 PM, Piotr Kliczewski
<pkliczew(a)redhat.com> wrote:
> Dan,
>
> It looks like it was one of the calls triggered when vdsm was down:
>
> 2018-04-03 05:30:16,065-0400 INFO (mailbox-hsm)
> [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM -
> ['/usr/bin/dd',
>
'of=/rhev/data-center/ddb765d2-2137-437d-95f8-c46dbdbc7711/mastersd/dom_md/inbox',
> 'iflag=fullblock', 'oflag=direct', 'conv=notrunc',
'bs=4096', 'count=1',
> 'seek=1'] (mailbox:387)
> 2018-04-03 05:31:22,441-0400 INFO (MainThread) [vds] (PID: 20548) I am the
> actual vdsm 4.20.23-28.gitd11ed44.el7.centos lago-basic-suite-4-2-host-0
> (3.10.0-693.21.1.el7.x86_64) (vdsmd:149)
>
>
> which failed and caused timeout.
>
> Thanks,
> Piotr
>
> On Tue, Apr 3, 2018 at 1:57 PM, Dan Kenigsberg <danken(a)redhat.com> wrote:
>>
>> On Tue, Apr 3, 2018 at 2:07 PM, Barak Korren <bkorren(a)redhat.com> wrote:
>> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ]
>> >
>> > Link to suspected patches:
>> >
>> > (Patch seems unrelated - do we have sporadic communication issues
>> > arising in PST?)
>> >
https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch:
>> > attempt to install vdsm-gluster
>> >
>> > Link to Job:
>> >
http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/
>> >
>> > Link to all logs:
>> >
>> >
http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/...
>> >
>> > Error snippet from log:
>> >
>> > <error>
>> >
>> > Traceback (most recent call last):
>> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
>> > testMethod()
>> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197,
in
>> > runTest
>> > self.test(*self.arg)
>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py",
line
>> > 129, in wrapped_test
>> > test()
>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py",
line
>> > 59, in wrapper
>> > return func(get_test_prefix(), *args, **kwargs)
>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py",
line
>> > 78, in wrapper
>> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs
>> > File
>> >
"/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py",
>> > line 139, in prepare_migration_attachments_ipv6
>> > engine, host_service, MIGRATION_NETWORK, ip_configuration)
>> > File
>> >
"/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py",
>> > line 71, in modify_ip_config
>> > check_connectivity=True)
>> > File
"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py",
>> > line 36729, in setup_networks
>> > return self._internal_action(action, 'setupnetworks', None,
>> > headers, query, wait)
>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
>> > 299, in _internal_action
>> > return future.wait() if wait else future
>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
>> > 55, in wait
>> > return self._code(response)
>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
>> > 296, in callback
>> > self._check_fault(response)
>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
>> > 132, in _check_fault
>> > self._raise_error(response, body)
>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
>> > 118, in _raise_error
>> > raise error
>> > Error: Fault reason is "Operation Failed". Fault detail is
"[Network
>> > error during communication with the Host.]". HTTP response code is
>> > 400.
>>
>> The error occurred sometime in the interval
>>
>> 09:32:58 [basic-suit] @ Run test: 006_migrations.py:
>> 09:33:55 [basic-suit] Error occured, aborting
>>
>> and indeed
>>
>>
>>
http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/...
>>
>> has Engine disconnected from the host at
>>
>> 2018-04-03 05:33:32,307-04 ERROR
>> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] Unable to
>> RefreshCapabilities: VDSNetworkException: VDSGenericException:
>> VDSNetworkException: Vds timeout occured
>>
>> Maybe Piotr can read more into it.
I should have thought of a down vdsm; but it was down because what
seems to be soft fencing
http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/...
Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Started Session
46 of user root.
Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Starting Session
46 of user root.
Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopped MOM
instance configured for VDSM purposes.
Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopping Virtual
Desktop Server Manager...
Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel:
scsi_verify_blk_ioctl: 33 callbacks suppressed
Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel: dd: sending ioctl
80306d02 to a partition!
Apr 3 05:30:17 lago-basic-suite-4-2-host-0 systemd: vdsmd.service
stop-sigterm timed out. Killing.
| TRIED. TESTED. TRUSTED. |