[Users] CentOS 6.4 + Ovirt 3.2 + NFS Backend Problems

Wed Jul 24 17:04:35 EDT 2013

Hi Folks,

i got a Setup running with the following Specs

4 VM Hosts - CentOS 6.4 - latest Ovirt 3.2 from dreyou

vdsm-xmlrpc-4.10.3-0.36.23.el6.noarch
vdsm-cli-4.10.3-0.36.23.el6.noarch
vdsm-python-4.10.3-0.36.23.el6.x86_64
vdsm-4.10.3-0.36.23.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

Management Node is also running latest 3.2 from dreyou

ovirt-engine-cli-3.2.0.10-1.el6.noarch
ovirt-engine-jbossas711-1-0.x86_64
ovirt-engine-tools-3.2.1-1.41.el6.noarch
ovirt-engine-backend-3.2.1-1.41.el6.noarch
ovirt-engine-sdk-3.2.0.9-1.el6.noarch
ovirt-engine-userportal-3.2.1-1.41.el6.noarch
ovirt-engine-setup-3.2.1-1.41.el6.noarch
ovirt-engine-webadmin-portal-3.2.1-1.41.el6.noarch
ovirt-engine-dbscripts-3.2.1-1.41.el6.noarch
ovirt-engine-3.2.1-1.41.el6.noarch
ovirt-engine-genericapi-3.2.1-1.41.el6.noarch
ovirt-engine-restapi-3.2.1-1.41.el6.noarch

VM are running from a Freebsd 9.1 NFS Server, which works absolutly
flawless until i need to reload the /etc/exports File on the NFS Server.
For this, the NFS Server itself doesnt need to be restarted, just the
mountd Daemon is "Hup´ed".

But after sending a HUP to the mountd, Ovirt immidiatly thinks that there
was a problem with the storage backend and suspends random some VM. Luckily
this VM can be resumed instant without further issues.

The VM Hosts dont show any NFS related errors, so i expect the vdsm or
engine to check the nfs server continous.

The only thing i can find in the vdsm.log of a related host is

-- snip --

Thread-539::DEBUG::2013-07-24
22:29:46,935::resourceManager::830::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-539::DEBUG::2013-07-24
22:29:46,935::resourceManager::864::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-539::DEBUG::2013-07-24
22:29:46,935::task::957::TaskManager.Task::(_decref)
Task=`9332cd24-d899-4226-b0a2-93544ee737b4`::ref 0 aborting False
libvirtEventLoop::INFO::2013-07-24
22:29:55,142::libvirtvm::2509::vm.Vm::(_onAbnormalStop)
vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device
virtio-disk0 error e
other
libvirtEventLoop::DEBUG::2013-07-24
22:29:55,143::libvirtvm::3079::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::event Suspended detail 2
opaque No
ne
libvirtEventLoop::INFO::2013-07-24
22:29:55,143::libvirtvm::2509::vm.Vm::(_onAbnormalStop)
vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device
virtio-disk0 error e
other

-- snip --

i am a little bit at a dead end currently, since reloading a nfs servers
export table isnt a unusual task and everything is working like expected.
just ovirt seems way to picky.

is there any possibility to make this check a little bit more tolerant?

i try setting "sd_health_check_delay = 30" in vdsm.conf, but this didnt
change anything.

anyone got an idea how i can get rid of this annoying problem?

Cheers,

Juergen

-- 

Sent from the Delta quadrant using Borg technology!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130724/f35ac982/attachment.html>