Hi, I understand we're on an old version and therefore their might be some reluctance to assist (we are building a new cluster on the latest Ovirt 4 but I need to maintain this in the meantime and plan to upgrade once I can host the VMs on the new cluster). So, I found some related bugs (and fixes) [e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1317699 and duplicates] but haven't managed to completely resolve this [e.g. tweaking the engine DB as per https://bugzilla.redhat.com/show_bug.cgi?id=1351203]. Even if you point me to the relevant redhat bugs or a previous list conversation I'd appreciate it. Minor patch(es) to source files I can handle if necessary...

Perhaps someone can help me with a copy of https://access.redhat.com/solutions/2423321 (I'm not a redhat subscriber so cannot access) and it follows from the bug reports above...

Appreciated,

Roderick

On 2018/09/20 2:55 PM, Roderick Mooi wrote:
Anyone? Help please!

On 2018/09/11 2:31 PM, Roderick Mooi wrote:
Greetings!

I'm running a 3 node ovirt (3.6) hosted engine (3.6.5.3-1.el7.centos) cluster with glusterfs (3.7.11) storage. I keep getting this error for my hosted engine storage:

ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to stop monitoring domain (sd_uuid=ff8ce693-5a52-47df-8e06-3443b4dc98a4): Error 900 from stopMonitoringDomain: Storage domain is member of pool: 'domain=ff8ce693-5a52-47df-8e06-3443b4dc98a4'
ovirt_hosted_engine_ha.lib.image.Image:Teardown images
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Disconnecting the storage
ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Disconnecting storage server

seemingly related to these ERROR messages in vdsm.log:
Storage.TaskManager.Task::(_setError) Task=`xyz`::Unexpected error
> Storage domain is member of pool
or
> Domain is either partially accessible or entirely inaccessible
or
Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer

and then it updates config, mounts again, rinse, repeat, every +-minute! (and seems to introduce side effects like engine state changes, inability to migrate engine VM, hosted engine HA status changing, etc.)

Extracts from logs attached. The main issues seem to stem from the ERROR lines 214, 778, 1037, etc. in the vdsm.log...

Everything else seems to be working fine.

Please advise?

Thanks,

Roderick