ovirt-system-tests_hc-basic-suite failing due to host not in cluster and incorrect content served by the engine to SDK.

https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326 fails on add host test with: Error: The response content type 'text/html; charset=iso-8859-1' isn't the expected XML Something bad happened during the deployment because the engine complains about an host not included in the cluster: 2018-07-05 21:34:47,768-04 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler6) [3009952a] Could not add brick 'lago-hc-basic-suite-4-2-host1:/rhs/brick1/engine' to volume 'c1146520-3bf7-4b81-b31a-7cc5475b6438' - server uuid '50e37ed8-86f3-4b50-9258-f516169025ea' not found in cluster '3125aa60-80bb-11e8-a143-00163e24d363' Sahina, Gobinda, can you please investigate? Ondra, no idea why the engine is returning text/html instead of xml here, can you please check? -- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig>

On Fri, Jul 6, 2018 at 1:01 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326
fails on add host test with:
Error: The response content type 'text/html; charset=iso-8859-1' isn't the expected XML
Something bad happened during the deployment because the engine complains about an host not included in the cluster:
2018-07-05 21:34:47,768-04 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler6) [3009952a] Could not add brick 'lago-hc-basic-suite-4-2-host1:/rhs/brick1/engine' to volume 'c1146520-3bf7-4b81-b31a-7cc5475b6438' - server uuid '50e37ed8-86f3-4b50-9258-f516169025ea' not found in cluster '3125aa60-80bb-11e8-a143-00163e24d363'
In[2] we can see: 2018-07-05 22:03:42,975-0400 ERROR (monitor/f6c4ab4) [storage.Monitor] Error checking domain f6c4ab4a-005d-4ab7-acda-03810014c841 (monitor:424) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 405, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 55, in findDomain return GlusterStorageDomain(GlusterStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 391, in __init__ validateFileSystemFeatures(manifest.sdUUID, manifest.mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 104, in validateFileSystemFeatures oop.getProcessPool(sdUUID).directTouch(testFilePath) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 320, in directTouch ioproc.touch(path, flags, mode) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 567, in touch self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 451, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 30] Read-only file system And just before that: 2018-07-05 22:03:33,214-0400 INFO (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') abnormal vm stop device ua-c0592bd6-20e6-4dbf-9610-9a35e3f566ab error eother (vm:5116) 2018-07-05 22:03:33,214-0400 INFO (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onIOError (vm:6157) 2018-07-05 22:03:33,222-0400 INFO (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onSuspend (vm:6157) 2018-07-05 22:03:33,225-0400 WARN (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') device vda reported I/O error (vm:4065) And indeed, @[3]: [2018-07-05 22:04:38,936] WARNING [utils - 298:publish_to_webhook] - Event push failed to URL: http://hc-engine:80/ovirt-engine/services/glusterevents, Event: {"event": "QUORUM_LOST", "message": {"volume": "vmstore"}, "nodeid": "59bf7956-60a4-4152-9cf9-99fcdccb211f", "ts": 1530842614}, Status: ('Connection aborted.', error(113, 'No route to host')) And we can also see https://bugzilla.redhat.com/show_bug.cgi?id=1595436 there as well. Sahina, Gobinda, can you please investigate?
Ondra, no idea why the engine is returning text/html instead of xml here, can you please check?
Because of the exception[1]. Y. [1] https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326/arti... [2] https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326/arti... [3] https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326/arti...
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community- guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/ message/3FSX6M23CN2ZKIBMGUOLKOQ36LNGL4MH/

On Sun, Jul 8, 2018 at 12:23 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
On Fri, Jul 6, 2018 at 1:01 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326
fails on add host test with:
Error: The response content type 'text/html; charset=iso-8859-1' isn't the expected XML
Something bad happened during the deployment because the engine complains about an host not included in the cluster:
2018-07-05 21:34:47,768-04 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler6) [3009952a] Could not add brick 'lago-hc-basic-suite-4-2-host1:/rhs/brick1/engine' to volume 'c1146520-3bf7-4b81-b31a-7cc5475b6438' - server uuid '50e37ed8-86f3-4b50-9258-f516169025ea' not found in cluster '3125aa60-80bb-11e8-a143-00163e24d363'
In[2] we can see: 2018-07-05 22:03:42,975-0400 ERROR (monitor/f6c4ab4) [storage.Monitor] Error checking domain f6c4ab4a-005d-4ab7-acda-03810014c841 (monitor:424) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 405, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 55, in findDomain return GlusterStorageDomain(GlusterStorageDomain. findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 391, in __init__ validateFileSystemFeatures(manifest.sdUUID, manifest.mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 104, in validateFileSystemFeatures oop.getProcessPool(sdUUID).directTouch(testFilePath) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 320, in directTouch ioproc.touch(path, flags, mode) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 567, in touch self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 451, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 30] Read-only file system
And just before that:
2018-07-05 22:03:33,214-0400 INFO (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') abnormal vm stop device ua-c0592bd6-20e6-4dbf-9610-9a35e3f566ab error eother (vm:5116) 2018-07-05 22:03:33,214-0400 INFO (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onIOError (vm:6157) 2018-07-05 22:03:33,222-0400 INFO (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onSuspend (vm:6157) 2018-07-05 22:03:33,225-0400 WARN (libvirt/events) [virt.vm] (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') device vda reported I/O error (vm:4065)
And indeed, @[3]:
[2018-07-05 22:04:38,936] WARNING [utils - 298:publish_to_webhook] - Event push failed to URL: http://hc-engine:80/ovirt-engine/services/glusterevents, Event: {"event": "QUORUM_LOST", "message": {"volume": "vmstore"}, "nodeid": "59bf7956-60a4-4152-9cf9-99fcdccb211f", "ts": 1530842614}, Status: ('Connection aborted.', error(113, 'No route to host'))
And we can also see https://bugzilla.redhat.com/show_bug.cgi?id=1595436 there as well.
Sahina, Gobinda, can you please investigate?
Ondra, no idea why the engine is returning text/html instead of xml here, can you please check?
Because of the exception[1]. Y.
Thanks Yaniv! The failure to add hosts is because engine was down due to quorum loss. I see that HC suite has failed in the past due to similar errors, and even in the runs that pass there are quorum loss messages (as glusterd is restarted whenever the host is added). I need to dig into the reason for quorum loss - if it's the parallel addition of hosts causing it, or something else. Will update this thread.
[1] https://jenkins.ovirt.org/job/ovirt-system-tests_hc- basic-suite-4.2/326/artifact/exported-artifacts/test_logs/ hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic- suite-4-2-engine/_var_log/ovirt-engine/server.log [2] https://jenkins.ovirt.org/job/ovirt-system-tests_hc- basic-suite-4.2/326/artifact/exported-artifacts/test_logs/ hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic- suite-4-2-host0/_var_log/vdsm/vdsm.log [3] https://jenkins.ovirt.org/job/ovirt-system-tests_hc- basic-suite-4.2/326/artifact/exported-artifacts/test_logs/ hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic- suite-4-2-host0/_var_log/glusterfs/events.log
--
SANDRO BONAZZOLA
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo@redhat.com <https://red.ht/sig>
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/communit y/about/community-guidelines/ List Archives: https://lists.ovirt.org/archiv es/list/devel@ovirt.org/message/3FSX6M23CN2ZKIBMGUOLKOQ36LNGL4MH/
participants (3)
-
Sahina Bose
-
Sandro Bonazzola
-
Yaniv Kaul