[ovirt-users] hosted-engine engine crash
Mark Gagnon
rhubarbe at gmail.com
Thu Jul 14 03:20:13 UTC 2016
Sorry, hit send by accident.
More details :
When I notice that the engine is down, if I type hosted-engine --vm-status
on any hosts, it hangs and then writes a bunch of stuff saying it's down.
If I type hosted-engine --vm-start on one of the hosts (Any), it just
starts and gets back to business.
hosted-engine --vm-status result :
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage
domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid':
'3d67cf89-92de-428d-9714-e02aceae281e'}: Connection timed out
Here's some logs from vdsm.log :
Thread-98649::WARNING::2016-07-13
22:54:04,418::fileSD::749::Storage.scanDomains::(collectMetaFiles) Could
not collect metadata file for domain path
/rhev/data-center/mnt/engine.domain.com:_var_lib_exports_iso
Traceback (most recent call last):
File "/usr/share/vdsm/storage/fileSD.py", line 735, in collectMetaFiles
sd.DOMAIN_META_DATA))
File "/usr/share/vdsm/storage/outOfProcess.py", line 121, in glob
return self._iop.glob(pattern)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 534,
in glob
return self._sendCommand("glob", {"pattern": pattern}, self.timeout)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 419,
in _sendCommand
raise Timeout(os.strerror(errno.ETIMEDOUT))
Timeout: Connection timed out
Thread-63::ERROR::2016-07-13
22:54:04,418::sdc::145::Storage.StorageDomainCache::(_findDomain) domain
bd73cb0f-bb9c-432a-90ee-a32757a8bc10 not found
Thread-98498::ERROR::2016-07-13
22:50:33,895::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)
Connection closed: Connection timed out
Thread-98498::ERROR::2016-07-13 22:50:33,895::API::1871::vds::(_getHaInfo)
failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
File "/usr/share/vdsm/API.py", line 1851, in _getHaInfo
stats = instance.get_all_stats()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 103, in get_all_stats
self._configure_broker_conn(broker)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 180, in _configure_broker_conn
dom_type=dom_type)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 176, in set_storage_domain
.format(sd_type, options, e))
RequestError: Failed to set storage domain FilesystemBackend, options
{'dom_type': 'nfs3', 'sd_uuid': '3d67cf89-92de-428d-9714-e02aceae281e'}:
Connection timed out
Thanks for your input, and even if it's a storage problem, if it's to
happen, how can I force it to restart the engine?
At first I tought it was a split-brain issue so I added a 3rd host but I
still have the same problem.
On Wed, Jul 13, 2016 at 11:13 PM, Mark Gagnon <rhubarbe at gmail.com> wrote:
> Hi,
> We have a 3 hosted-engine nodes setup using 2 NFS3 shares on which the
> engine keeps crashing every few days.
>
> Looking at VDSM logs, it looks like a storage problem but I'm wondering
> why don't they restart the engine?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160713/9831eb98/attachment-0001.html>
More information about the Users
mailing list