On Fri, Jul 6, 2018 at 1:01 PM, Sandro Bonazzola <sbonazzo(a)redhat.com>
wrote:
>
https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326
>
> fails on add host test with:
>
> Error: The response content type 'text/html; charset=iso-8859-1' isn't
the expected XML
>
>
> Something bad happened during the deployment because the engine complains
> about an host not included in the cluster:
>
> 2018-07-05 21:34:47,768-04 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler6) [3009952a] Could not add brick
'lago-hc-basic-suite-4-2-host1:/rhs/brick1/engine' to volume
'c1146520-3bf7-4b81-b31a-7cc5475b6438' - server uuid
'50e37ed8-86f3-4b50-9258-f516169025ea' not found in cluster
'3125aa60-80bb-11e8-a143-00163e24d363'
>
>
In[2] we can see:
2018-07-05 22:03:42,975-0400 ERROR (monitor/f6c4ab4) [storage.Monitor]
Error checking domain f6c4ab4a-005d-4ab7-acda-03810014c841 (monitor:424)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
405, in _checkDomainStatus
self.domain.selftest()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48,
in __getattr__
return getattr(self.getRealDomain(), attrName)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51,
in getRealDomain
return self._cache._realProduce(self._sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134,
in _realProduce
domain = self._findDomain(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151,
in _findDomain
return findMethod(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line
55, in findDomain
return GlusterStorageDomain(GlusterStorageDomain.
findDomainPath(sdUUID))
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line
391, in __init__
validateFileSystemFeatures(manifest.sdUUID, manifest.mountpoint)
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line
104, in validateFileSystemFeatures
oop.getProcessPool(sdUUID).directTouch(testFilePath)
File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py",
line 320, in directTouch
ioproc.touch(path, flags, mode)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line
567, in touch
self.timeout)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line
451, in _sendCommand
raise OSError(errcode, errstr)
OSError: [Errno 30] Read-only file system
And just before that:
2018-07-05 22:03:33,214-0400 INFO (libvirt/events) [virt.vm]
(vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') abnormal vm stop device
ua-c0592bd6-20e6-4dbf-9610-9a35e3f566ab error eother (vm:5116)
2018-07-05 22:03:33,214-0400 INFO (libvirt/events) [virt.vm]
(vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onIOError (vm:6157)
2018-07-05 22:03:33,222-0400 INFO (libvirt/events) [virt.vm]
(vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onSuspend (vm:6157)
2018-07-05 22:03:33,225-0400 WARN (libvirt/events) [virt.vm]
(vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') device vda reported I/O error
(vm:4065)
And indeed, @[3]:
[2018-07-05 22:04:38,936] WARNING [utils - 298:publish_to_webhook] - Event push failed to
URL:
http://hc-engine:80/ovirt-engine/services/glusterevents, Event: {"event":
"QUORUM_LOST", "message": {"volume": "vmstore"},
"nodeid": "59bf7956-60a4-4152-9cf9-99fcdccb211f", "ts":
1530842614}, Status: ('Connection aborted.', error(113, 'No route to
host'))
And we can also see
https://bugzilla.redhat.com/show_bug.cgi?id=1595436
there as well.
Sahina, Gobinda, can you please investigate?
>
> Ondra, no idea why the engine is returning text/html instead of xml here,
> can you please check?
>
Because of the exception[1].
Y.
Thanks Yaniv!
The failure to add hosts is because engine was down due to quorum loss.
I see that HC suite has failed in the past due to similar errors, and even
in the runs that pass there are quorum loss messages (as glusterd is
restarted whenever the host is added). I need to dig into the reason for
quorum loss - if it's the parallel addition of hosts causing it, or
something else. Will update this thread.
[1]
https://jenkins.ovirt.org/job/ovirt-system-tests_hc-
basic-suite-4.2/326/artifact/exported-artifacts/test_logs/
hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic-
suite-4-2-engine/_var_log/ovirt-engine/server.log
[2]
https://jenkins.ovirt.org/job/ovirt-system-tests_hc-
basic-suite-4.2/326/artifact/exported-artifacts/test_logs/
hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic-
suite-4-2-host0/_var_log/vdsm/vdsm.log
[3]
https://jenkins.ovirt.org/job/ovirt-system-tests_hc-
basic-suite-4.2/326/artifact/exported-artifacts/test_logs/
hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic-
suite-4-2-host0/_var_log/glusterfs/events.log
>
>
> --
>
> SANDRO BONAZZOLA
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>
> Red Hat EMEA <
https://www.redhat.com/>
>
> sbonazzo(a)redhat.com
> <
https://red.ht/sig>
>
> _______________________________________________
> Devel mailing list -- devel(a)ovirt.org
> To unsubscribe send an email to devel-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/communit
> y/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archiv
> es/list/devel(a)ovirt.org/message/3FSX6M23CN2ZKIBMGUOLKOQ36LNGL4MH/
>
>