[Users] CentOS 6.4 + Ovirt 3.2 + NFS Backend Problems

Karli Sjöberg Karli.Sjoberg at slu.se
Mon Jul 29 11:11:59 UTC 2013


ons 2013-07-24 klockan 23:35 +0200 skrev squadra:
Maybe found a workaround on the NFS server side, a option for the mountd service




     -S      Tell mountd to suspend/resume execution of the nfsd threads when-
             ever the exports list is being reloaded.  This avoids intermit-
             tent access errors for clients that do NFS RPCs while the exports
             are being reloaded, but introduces a delay in RPC response while
             the reload is in progress.  If mountd crashes while an exports
             load is in progress, mountd must be restarted to get the nfsd
             threads running again, if this option is used.


so far, i was able to reload the exports list twice, without any random suspended vm. lets see if this is a real solution or if i just had luck two times.

It would seem as if we are on the same boat:) Actually I hadn´t thought about it before, but you´re right; issuing a "service mountd reload" does pause a large number of VM´s, frickin annoying really. I mean, the NFS server doesn´t care what or who it´s serving, you could be creating a new export for a completely different system, and not even have oVirt in mind before customers start to call, wondering why their VM´s have stopped responding!?

I actually tried that "-S" but it didn´t work for me at all, and looking at the man-page for mountd, there´s no mention of it either, even though we are presumably running the same version:
# uname -r
9.1-RELEASE

Or are you perhaps tracking "-STABLE", and there´s a minor difference there?



but i am still interested in parameters which make the vdsm more tolerant to short interruptions. instant suspend of a vm after such a short "outage" is not very nice.

+1!

/Karli







On Wed, Jul 24, 2013 at 11:04 PM, squadra <squadra at gmail.com<mailto:squadra at gmail.com>> wrote:
Hi Folks,


i got a Setup running with the following Specs


4 VM Hosts - CentOS 6.4 - latest Ovirt 3.2 from dreyou


vdsm-xmlrpc-4.10.3-0.36.23.el6.noarch
vdsm-cli-4.10.3-0.36.23.el6.noarch
vdsm-python-4.10.3-0.36.23.el6.x86_64
vdsm-4.10.3-0.36.23.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch


Management Node is also running latest 3.2 from dreyou


ovirt-engine-cli-3.2.0.10-1.el6.noarch
ovirt-engine-jbossas711-1-0.x86_64
ovirt-engine-tools-3.2.1-1.41.el6.noarch
ovirt-engine-backend-3.2.1-1.41.el6.noarch
ovirt-engine-sdk-3.2.0.9-1.el6.noarch
ovirt-engine-userportal-3.2.1-1.41.el6.noarch
ovirt-engine-setup-3.2.1-1.41.el6.noarch
ovirt-engine-webadmin-portal-3.2.1-1.41.el6.noarch
ovirt-engine-dbscripts-3.2.1-1.41.el6.noarch
ovirt-engine-3.2.1-1.41.el6.noarch
ovirt-engine-genericapi-3.2.1-1.41.el6.noarch
ovirt-engine-restapi-3.2.1-1.41.el6.noarch




VM are running from a Freebsd 9.1 NFS Server, which works absolutly flawless until i need to reload the /etc/exports File on the NFS Server. For this, the NFS Server itself doesnt need to be restarted, just the mountd Daemon is "Hup´ed".


But after sending a HUP to the mountd, Ovirt immidiatly thinks that there was a problem with the storage backend and suspends random some VM. Luckily this VM can be resumed instant without further issues.


The VM Hosts dont show any NFS related errors, so i expect the vdsm or engine to check the nfs server continous.


The only thing i can find in the vdsm.log of a related host is


-- snip --


Thread-539::DEBUG::2013-07-24 22:29:46,935::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-539::DEBUG::2013-07-24 22:29:46,935::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-539::DEBUG::2013-07-24 22:29:46,935::task::957::TaskManager.Task::(_decref) Task=`9332cd24-d899-4226-b0a2-93544ee737b4`::ref 0 aborting False
libvirtEventLoop::INFO::2013-07-24 22:29:55,142::libvirtvm::2509::vm.Vm::(_onAbnormalStop) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device virtio-disk0 error e
other
libvirtEventLoop::DEBUG::2013-07-24 22:29:55,143::libvirtvm::3079::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::event Suspended detail 2 opaque No
ne
libvirtEventLoop::INFO::2013-07-24 22:29:55,143::libvirtvm::2509::vm.Vm::(_onAbnormalStop) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device virtio-disk0 error e
other




-- snip --


i am a little bit at a dead end currently, since reloading a nfs servers export table isnt a unusual task and everything is working like expected. just ovirt seems way to picky.


is there any possibility to make this check a little bit more tolerant?


i try setting "sd_health_check_delay = 30" in vdsm.conf, but this didnt change anything.


anyone got an idea how i can get rid of this annoying problem?


Cheers,


Juergen




--


Sent from the Delta quadrant using Borg technology!





--


Sent from the Delta quadrant using Borg technology!


--

Med Vänliga Hälsningar
-------------------------------------------------------------------------------
Karli Sjöberg
Swedish University of Agricultural Sciences
Box 7079 (Visiting Address Kronåsvägen 8)
S-750 07 Uppsala, Sweden
Phone:  +46-(0)18-67 15 66
karli.sjoberg at slu.se<mailto:karli.sjoberg at adm.slu.se>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130729/2b1c3e00/attachment-0001.html>


More information about the Users mailing list