Anyone? Help please!

On 2018/09/11 2:31 PM, Roderick Mooi wrote:
Greetings!

I'm running a 3 node ovirt (3.6) hosted engine (3.6.5.3-1.el7.centos) cluster with glusterfs (3.7.11) storage. I keep getting this error for my hosted engine storage:

ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to stop monitoring domain (sd_uuid=ff8ce693-5a52-47df-8e06-3443b4dc98a4): Error 900 from stopMonitoringDomain: Storage domain is member of pool: 'domain=ff8ce693-5a52-47df-8e06-3443b4dc98a4'
ovirt_hosted_engine_ha.lib.image.Image:Teardown images
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Disconnecting the storage
ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Disconnecting storage server

seemingly related to these ERROR messages in vdsm.log:
Storage.TaskManager.Task::(_setError) Task=`xyz`::Unexpected error
> Storage domain is member of pool
or
> Domain is either partially accessible or entirely inaccessible
or
Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer

and then it updates config, mounts again, rinse, repeat, every +-minute! (and seems to introduce side effects like engine state changes, inability to migrate engine VM, hosted engine HA status changing, etc.)

Extracts from logs attached. The main issues seem to stem from the ERROR lines 214, 778, 1037, etc. in the vdsm.log...

Everything else seems to be working fine.

Please advise?

Thanks,

Roderick