So, after a week of bashing my head against a wall we finally tracked it down.

One of the developers was using the hosts for extra processing power and in the process he periodically turned up an sshfs mount, when this appeared in /etc/mnttab it broke vdsm and caused the error below. So was unrelated to the power outage.

This is in 4.2 - if I re-create in 4.3 I'll file a bug, I think the parser really should be a but more robust than this.


---- On Tue, 07 May 2019 15:47:45 +0100 Darrell Budic <budic@onholyground.com> wrote ----

Was your hyper converged and is this storage gluster based?

Your error is DNS related, if a bit odd. Have you checked the resolv.conf configs and confirmed the servers listed there are reachable and responsive? When your hosts are active, are they able to mount all the storage domains they need? You should also make sure each HA node can reliably ping your gateway IP, failures there will cause nodes to bounce.

A starting place rather a solution, but the first places to look. Good luck!

  -Darrell



On May 7, 2019, at 5:14 AM, Alan G <alan+ovirt@griff.me.uk> wrote:

Hi,

We have a dev cluster running 4.2. It had to be powered down as the building was going to loose power. Since we've brought it back up it has been massively un-stable (Hosts constantly switching state, VMs migrating all the time).

I now have one host running (with HE) and all others in maintenance mode. When I try activate another host i see storage errors in vdsm.log

2019-05-07 09:41:00,114+0000 ERROR (monitor/a98c0b4) [storage.Monitor] Error checking domain a98c0b42-47b9-4632-8b54-0ff3bd80d4c2 (monitor:424)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 416, in _checkDomainStatus
    masterStats = self.domain.validateMaster()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 941, in validateMaster
    if not self.validateMasterMount():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1377, in validateMasterMount
    return mount.isMounted(self.getMasterDir())
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 161, in isMounted
    getMountFromTarget(target)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 173, in getMountFromTarget
    for rec in _iterMountRecords():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 143, in _iterMountRecords
    for rec in _iterKnownMounts():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 139, in _iterKnownMounts
    yield _parseFstabLine(line)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 81, in _parseFstabLine
    fs_spec = fileUtils.normalize_path(_unescape_spaces(fs_spec))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 94, in normalize_path
    host, tail = address.hosttail_split(path)
  File "/usr/lib/python2.7/site-packages/vdsm/common/network/address.py", line 43, in hosttail_split
    raise HosttailError('%s is not a valid hosttail address:' % hosttail)
HosttailError: :/ is not a valid hosttail address:

Not sure if it's related but since the restart the hosted_storage domain has been elected the master domain.

I'm a bit stuck at the moment. My only idea is to remove HE and switch to a standalone Engine VM running outside the cluster.

Thanks,

Alan

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDINZK5BQQHXYENSVV3OYFMVLG2YXBNT/

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/I6YJQFP43R5NTQN3HG2VWBJW2WFFBGNB/