<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="GENERATOR" content="GtkHTML/4.4.4">
</head>
<body>
ons 2013-07-24 klockan 23:35 &#43;0200 skrev squadra:
<blockquote type="CITE">Maybe found a workaround on the NFS server side, a option for the mountd service
</blockquote>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE">&nbsp; &nbsp; &nbsp;-S &nbsp; &nbsp; &nbsp;Tell mountd to suspend/resume execution of the nfsd threads when-
</blockquote>
<blockquote type="CITE">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ever the exports list is being reloaded. &nbsp;This avoids intermit-
</blockquote>
<blockquote type="CITE">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;tent access errors for clients that do NFS RPCs while the exports
</blockquote>
<blockquote type="CITE">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;are being reloaded, but introduces a delay in RPC response while
</blockquote>
<blockquote type="CITE">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;the reload is in progress. &nbsp;If mountd crashes while an exports
</blockquote>
<blockquote type="CITE">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;load is in progress, mountd must be restarted to get the nfsd
</blockquote>
<blockquote type="CITE">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;threads running again, if this option is used.
</blockquote>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE">so far, i was able to reload the exports list twice, without any random suspended vm. lets see if this is a real solution or if i just had luck two times.<br>
</blockquote>
<br>
It would seem as if we are on the same boat:) Actually I hadn´t thought about it before, but you´re right; issuing a &quot;service mountd reload&quot; does pause a large number of VM´s, frickin annoying really. I mean, the NFS server doesn´t care what or who it´s serving,
 you could be creating a new export for a completely different system, and not even have oVirt in mind before customers start to call, wondering why their VM´s have stopped responding!?<br>
<br>
I actually tried that &quot;-S&quot; but it didn´t work for me at all, and looking at the man-page for mountd, there´s no mention of it either, even though we are presumably running the same version:<br>
# uname -r<br>
9.1-RELEASE<br>
<br>
Or are you perhaps tracking &quot;-STABLE&quot;, and there´s a minor difference there?<br>
<br>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE">but i am still interested in parameters which make the vdsm more tolerant to short interruptions. instant suspend of a vm after such a short &quot;outage&quot; is not very nice.<br>
</blockquote>
<br>
&#43;1!<br>
<br>
/Karli<br>
<br>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE">On Wed, Jul 24, 2013 at 11:04 PM, squadra &lt;<a href="mailto:squadra@gmail.com">squadra@gmail.com</a>&gt; wrote:
</blockquote>
<blockquote type="CITE">
<blockquote>Hi Folks, </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>i got a Setup running with the following Specs </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>4 VM Hosts - CentOS 6.4 - latest Ovirt 3.2 from dreyou </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>vdsm-xmlrpc-4.10.3-0.36.23.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>vdsm-cli-4.10.3-0.36.23.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>vdsm-python-4.10.3-0.36.23.el6.x86_64 </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>vdsm-4.10.3-0.36.23.el6.x86_64 </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64 </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64 </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64 </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>gpxe-roms-qemu-0.9.7-6.9.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>Management Node is also running latest 3.2 from dreyou </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-cli-3.2.0.10-1.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-jbossas711-1-0.x86_64 </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-tools-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-backend-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-sdk-3.2.0.9-1.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-userportal-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-setup-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-webadmin-portal-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-dbscripts-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-genericapi-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ovirt-engine-restapi-3.2.1-1.41.el6.noarch </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>VM are running from a Freebsd 9.1 NFS Server, which works absolutly flawless until i need to reload the /etc/exports File on the NFS Server. For this, the NFS Server itself doesnt need to be restarted, just the mountd Daemon is &quot;Hup´ed&quot;.&nbsp;
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>But after sending a HUP to the mountd, Ovirt immidiatly thinks that there was a problem with the storage backend and suspends random some VM. Luckily this VM can be resumed instant without further issues.&nbsp;
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>The VM Hosts dont show any NFS related errors, so i expect the vdsm or engine to check the nfs server continous.&nbsp;
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>The only thing i can find in the vdsm.log of a related host is </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>-- snip -- </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>Thread-539::DEBUG::2013-07-24 22:29:46,935::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>Thread-539::DEBUG::2013-07-24 22:29:46,935::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>Thread-539::DEBUG::2013-07-24 22:29:46,935::task::957::TaskManager.Task::(_decref) Task=`9332cd24-d899-4226-b0a2-93544ee737b4`::ref 0 aborting False
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>libvirtEventLoop::INFO::2013-07-24 22:29:55,142::libvirtvm::2509::vm.Vm::(_onAbnormalStop) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device virtio-disk0 error e
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>other </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>libvirtEventLoop::DEBUG::2013-07-24 22:29:55,143::libvirtvm::3079::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::event Suspended detail 2 opaque No
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>ne </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>libvirtEventLoop::INFO::2013-07-24 22:29:55,143::libvirtvm::2509::vm.Vm::(_onAbnormalStop) vmId=`244f6c8d-bc2b-4669-8f6d-bd957222b946`::abnormal vm stop device virtio-disk0 error e
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>other </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>-- snip -- </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>i am a little bit at a dead end currently, since reloading a nfs servers export table isnt a unusual task and everything is working like expected. just ovirt seems way to picky.&nbsp;
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>is there any possibility to make this check a little bit more tolerant?
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>i try setting &quot;sd_health_check_delay = 30&quot; in vdsm.conf, but this didnt change anything.
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>anyone got an idea how i can get rid of this annoying problem? </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>Cheers, </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote>Juergen </blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><br>
<br>
</blockquote>
</blockquote>
<blockquote type="CITE">
<blockquote><font color="#888888">-- </font>
<pre>
<font color="#888888">Sent from the Delta quadrant using Borg technology!</font>
</pre>
</blockquote>
</blockquote>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE"><br>
<br>
</blockquote>
<blockquote type="CITE">--
<pre>
Sent from the Delta quadrant using Borg technology!
</pre>
</blockquote>
<br>
<table cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr>
<td>-- <br>
<br>
Med Vänliga Hälsningar<br>
-------------------------------------------------------------------------------<br>
Karli Sjöberg<br>
Swedish University of Agricultural Sciences<br>
Box 7079 (Visiting Address Kronåsvägen 8)<br>
S-750 07 Uppsala, Sweden<br>
Phone: &nbsp;&#43;46-(0)18-67 15 66<br>
<a href="mailto:karli.sjoberg@adm.slu.se">karli.sjoberg@slu.se</a> </td>
</tr>
</tbody>
</table>
</body>
</html>