Hi Folks,

i got a Setup running with the following Specs

4 VM Hosts - CentOS 6.4 - latest Ovirt 3.2 from dreyou

vdsm-xmlrpc-4.10.3-0.36.23.el6.noarch

vdsm-cli-4.10.3-0.36.23.el6.noarch

vdsm-python-4.10.3-0.36.23.el6.x86_64

vdsm-4.10.3-0.36.23.el6.x86_64

qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64

qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64

qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64

gpxe-roms-qemu-0.9.7-6.9.el6.noarch

Management Node is also running latest 3.2 from dreyou

ovirt-engine-cli-3.2.0.10-1.el6.noarch

ovirt-engine-jbossas711-1-0.x86_64

ovirt-engine-tools-3.2.1-1.41.el6.noarch

ovirt-engine-backend-3.2.1-1.41.el6.noarch

ovirt-engine-sdk-3.2.0.9-1.el6.noarch

ovirt-engine-userportal-3.2.1-1.41.el6.noarch

ovirt-engine-setup-3.2.1-1.41.el6.noarch

ovirt-engine-webadmin-portal-3.2.1-1.41.el6.noarch

ovirt-engine-dbscripts-3.2.1-1.41.el6.noarch

ovirt-engine-3.2.1-1.41.el6.noarch

ovirt-engine-genericapi-3.2.1-1.41.el6.noarch

ovirt-engine-restapi-3.2.1-1.41.el6.noarch

VM are running from a Freebsd 9.1 NFS Server, which works absolutly flawless until i need to reload the /etc/exports File on the NFS Server. For this, the NFS Server itself doesnt need to be restarted, just the mountd Daemon is "Hup´ed".

But after sending a HUP to the mountd, Ovirt immidiatly thinks that there was a problem with the storage backend and suspends random some VM. Luckily this VM can be resumed instant without further issues.

The VM Hosts dont show any NFS related errors, so i expect the vdsm or engine to check the nfs server continous.

The only thing i can find in the vdsm.log of a related host is

-- snip --

Thread-539::DEBUG::2013-07-24 22:29:46,935::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}

Thread-539::DEBUG::2013-07-24 22:29:46,935::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}

Thread-539::DEBUG::2013-07-24 22:29:46,935::task::957::TaskManager.Task::(_decref) Task=`9332cd24-d899-4226-b0a2-93544ee737b4`::ref 0 aborting False

libvirtEventLoop::INFO::2013-07-24 22:29:55,142::libvirtvm::2509::vm.Vm::(_onAbnormalStop) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device virtio-disk0 error e

other

libvirtEventLoop::DEBUG::2013-07-24 22:29:55,143::libvirtvm::3079::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::event Suspended detail 2 opaque No

libvirtEventLoop::INFO::2013-07-24 22:29:55,143::libvirtvm::2509::vm.Vm::(_onAbnormalStop) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device virtio-disk0 error e

other

-- snip --

i am a little bit at a dead end currently, since reloading a nfs servers export table isnt a unusual task and everything is working like expected. just ovirt seems way to picky.

is there any possibility to make this check a little bit more tolerant?

i try setting "sd_health_check_delay = 30" in vdsm.conf, but this didnt change anything.

anyone got an idea how i can get rid of this annoying problem?

Cheers,

Juergen

Sent from the Delta quadrant using Borg technology!