Was your hyper converged and is this storage gluster based?
Your error is DNS related, if a bit odd. Have you checked the resolv.conf configs and
confirmed the servers listed there are reachable and responsive? When your hosts are
active, are they able to mount all the storage domains they need? You should also make
sure each HA node can reliably ping your gateway IP, failures there will cause nodes to
bounce.
A starting place rather a solution, but the first places to look. Good luck!
-Darrell
On May 7, 2019, at 5:14 AM, Alan G <alan+ovirt(a)griff.me.uk>
wrote:
Hi,
We have a dev cluster running 4.2. It had to be powered down as the building was going to
loose power. Since we've brought it back up it has been massively un-stable (Hosts
constantly switching state, VMs migrating all the time).
I now have one host running (with HE) and all others in maintenance mode. When I try
activate another host i see storage errors in vdsm.log
2019-05-07 09:41:00,114+0000 ERROR (monitor/a98c0b4) [storage.Monitor] Error checking
domain a98c0b42-47b9-4632-8b54-0ff3bd80d4c2 (monitor:424)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 416, in
_checkDomainStatus
masterStats = self.domain.validateMaster()
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 941, in
validateMaster
if not self.validateMasterMount():
File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1377,
in validateMasterMount
return mount.isMounted(self.getMasterDir())
File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 161, in
isMounted
getMountFromTarget(target)
File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 173, in
getMountFromTarget
for rec in _iterMountRecords():
File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 143, in
_iterMountRecords
for rec in _iterKnownMounts():
File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 139, in
_iterKnownMounts
yield _parseFstabLine(line)
File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 81, in
_parseFstabLine
fs_spec = fileUtils.normalize_path(_unescape_spaces(fs_spec))
File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 94,
in normalize_path
host, tail = address.hosttail_split(path)
File "/usr/lib/python2.7/site-packages/vdsm/common/network/address.py", line
43, in hosttail_split
raise HosttailError('%s is not a valid hosttail address:' % hosttail)
HosttailError: :/ is not a valid hosttail address:
Not sure if it's related but since the restart the hosted_storage domain has been
elected the master domain.
I'm a bit stuck at the moment. My only idea is to remove HE and switch to a
standalone Engine VM running outside the cluster.
Thanks,
Alan
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDINZK5BQQH...