Hi,

We have a dev cluster running 4.2. It had to be powered down as the building was going to loose power. Since we've brought it back up it has been massively un-stable (Hosts constantly switching state, VMs migrating all the time).

I now have one host running (with HE) and all others in maintenance mode. When I try activate another host i see storage errors in vdsm.log

2019-05-07 09:41:00,114+0000 ERROR (monitor/a98c0b4) [storage.Monitor] Error checking domain a98c0b42-47b9-4632-8b54-0ff3bd80d4c2 (monitor:424)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 416, in _checkDomainStatus
    masterStats = self.domain.validateMaster()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 941, in validateMaster
    if not self.validateMasterMount():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1377, in validateMasterMount
    return mount.isMounted(self.getMasterDir())
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 161, in isMounted
    getMountFromTarget(target)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 173, in getMountFromTarget
    for rec in _iterMountRecords():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 143, in _iterMountRecords
    for rec in _iterKnownMounts():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 139, in _iterKnownMounts
    yield _parseFstabLine(line)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 81, in _parseFstabLine
    fs_spec = fileUtils.normalize_path(_unescape_spaces(fs_spec))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 94, in normalize_path
    host, tail = address.hosttail_split(path)
  File "/usr/lib/python2.7/site-packages/vdsm/common/network/address.py", line 43, in hosttail_split
    raise HosttailError('%s is not a valid hosttail address:' % hosttail)
HosttailError: :/ is not a valid hosttail address:

Not sure if it's related but since the restart the hosted_storage domain has been elected the master domain.

I'm a bit stuck at the moment. My only idea is to remove HE and switch to a standalone Engine VM running outside the cluster.

Thanks,

Alan