[Users] CentOS 6.4 + Ovirt 3.2 + NFS Backend Problems

Wed Jul 24 21:35:49 UTC 2013

Maybe found a workaround on the NFS server side, a option for the mountd
service

     -S      Tell mountd to suspend/resume execution of the nfsd threads
when-
             ever the exports list is being reloaded.  This avoids intermit-
             tent access errors for clients that do NFS RPCs while the
exports
             are being reloaded, but introduces a delay in RPC response
while
             the reload is in progress.  If mountd crashes while an exports
             load is in progress, mountd must be restarted to get the nfsd
             threads running again, if this option is used.

so far, i was able to reload the exports list twice, without any random
suspended vm. lets see if this is a real solution or if i just had luck two
times.

but i am still interested in parameters which make the vdsm more tolerant
to short interruptions. instant suspend of a vm after such a short "outage"
is not very nice.

On Wed, Jul 24, 2013 at 11:04 PM, squadra <squadra at gmail.com> wrote:

> Hi Folks,
>
> i got a Setup running with the following Specs
>
> 4 VM Hosts - CentOS 6.4 - latest Ovirt 3.2 from dreyou
>
> vdsm-xmlrpc-4.10.3-0.36.23.el6.noarch
> vdsm-cli-4.10.3-0.36.23.el6.noarch
> vdsm-python-4.10.3-0.36.23.el6.x86_64
> vdsm-4.10.3-0.36.23.el6.x86_64
> qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
> qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
> qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
> gpxe-roms-qemu-0.9.7-6.9.el6.noarch
>
> Management Node is also running latest 3.2 from dreyou
>
> ovirt-engine-cli-3.2.0.10-1.el6.noarch
> ovirt-engine-jbossas711-1-0.x86_64
> ovirt-engine-tools-3.2.1-1.41.el6.noarch
> ovirt-engine-backend-3.2.1-1.41.el6.noarch
> ovirt-engine-sdk-3.2.0.9-1.el6.noarch
> ovirt-engine-userportal-3.2.1-1.41.el6.noarch
> ovirt-engine-setup-3.2.1-1.41.el6.noarch
> ovirt-engine-webadmin-portal-3.2.1-1.41.el6.noarch
> ovirt-engine-dbscripts-3.2.1-1.41.el6.noarch
> ovirt-engine-3.2.1-1.41.el6.noarch
> ovirt-engine-genericapi-3.2.1-1.41.el6.noarch
> ovirt-engine-restapi-3.2.1-1.41.el6.noarch
>
>
> VM are running from a Freebsd 9.1 NFS Server, which works absolutly
> flawless until i need to reload the /etc/exports File on the NFS Server.
> For this, the NFS Server itself doesnt need to be restarted, just the
> mountd Daemon is "Hup´ed".
>
> But after sending a HUP to the mountd, Ovirt immidiatly thinks that there
> was a problem with the storage backend and suspends random some VM. Luckily
> this VM can be resumed instant without further issues.
>
> The VM Hosts dont show any NFS related errors, so i expect the vdsm or
> engine to check the nfs server continous.
>
> The only thing i can find in the vdsm.log of a related host is
>
> -- snip --
>
> Thread-539::DEBUG::2013-07-24
> 22:29:46,935::resourceManager::830::ResourceManager.Owner::(releaseAll)
> Owner.releaseAll requests {} resources {}
> Thread-539::DEBUG::2013-07-24
> 22:29:46,935::resourceManager::864::ResourceManager.Owner::(cancelAll)
> Owner.cancelAll requests {}
> Thread-539::DEBUG::2013-07-24
> 22:29:46,935::task::957::TaskManager.Task::(_decref)
> Task=`9332cd24-d899-4226-b0a2-93544ee737b4`::ref 0 aborting False
> libvirtEventLoop::INFO::2013-07-24
> 22:29:55,142::libvirtvm::2509::vm.Vm::(_onAbnormalStop)
> vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device
> virtio-disk0 error e
> other
> libvirtEventLoop::DEBUG::2013-07-24
> 22:29:55,143::libvirtvm::3079::vm.Vm::(_onLibvirtLifecycleEvent)
> vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::event Suspended detail 2
> opaque No
> ne
> libvirtEventLoop::INFO::2013-07-24
> 22:29:55,143::libvirtvm::2509::vm.Vm::(_onAbnormalStop)
> vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device
> virtio-disk0 error e
> other
>
>
> -- snip --
>
> i am a little bit at a dead end currently, since reloading a nfs servers
> export table isnt a unusual task and everything is working like expected.
> just ovirt seems way to picky.
>
> is there any possibility to make this check a little bit more tolerant?
>
> i try setting "sd_health_check_delay = 30" in vdsm.conf, but this didnt
> change anything.
>
> anyone got an idea how i can get rid of this annoying problem?
>
> Cheers,
>
> Juergen
>
>
> --
>
> Sent from the Delta quadrant using Borg technology!
>
>

-- 

Sent from the Delta quadrant using Borg technology!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130724/2a5cba51/attachment-0001.html>