<p dir="ltr"></p>

<p dir="ltr">On Aug 22, 2016 10:57 PM, &quot;Ekin Meroğlu&quot; &lt;<a href="mailto:ekin.meroglu@linuxera.com">ekin.meroglu@linuxera.com</a>&gt; wrote:<br>

&gt;<br>

&gt; Hi Yaniv,<br>

&gt;<br>

&gt;&gt; On Sun, Aug 7, 2016 at 9:37 PM, Ekin Meroğlu &lt;<a href="mailto:ekin.meroglu@linuxera.com">ekin.meroglu@linuxera.com</a>&gt; wrote:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Hi,<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Just a reminder, if you have power management configured, first turn that off for the host - when you restart vdsmd with the power management configured, engine finds it not responding and tries to fence (e.g. reboot) the host.<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; That&#39;s not true - if it&#39;s a graceful restart, it should not happen.<br>

&gt;<br>

&gt;<br>

&gt; Can you explain this a little more? Is there a mechanism to prevent fencing on this scenario? <br>

&gt;<br>

&gt; In two of our customers&#39; production systems we&#39;ve experienced this exact behavior (i.e. engine fencing the host while restarting vdsm service manually) for a number of times, and we were specifically advised by Red Hat Support to turn off PM before restarting service. I&#39;d like to to know if we have a better / easier way to restart vdsm. <br>

&gt;<br>

&gt; btw, b<br>

&gt; oth of the environments were RHEV-H based RHEV 3.5 clusters, and both we were busy systems, so restarting vdsm service took quite a long time. I&#39;m guessing this might be a factor.</p>

<p dir="ltr">That indeed might be the factor - but vdsm should not take long to restart. If it happens on a more recent version, I&#39;d be happy to know about it, as we&#39;ve done work on ensuring that it restarts and answers quickly to the engine (as far as I remember, even before it fully completed the restart). <br>

Y. </p>

<p dir="ltr">&gt;<br>

&gt; Regards,<br>

&gt; <br>

&gt;&gt;<br>

&gt;&gt;  <br>

&gt;&gt;<br>

&gt;&gt;  <br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Other than that, restarting vdsmd has been safe in my experience...<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Regards,  <br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; On Thu, Aug 4, 2016 at 6:10 PM, Nicolás &lt;<a href="mailto:nicolas@devels.es">nicolas@devels.es</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; El 04/08/16 a las 15:25, Arik Hadas escribió:<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; ----- Original Message -----<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; El 2016-08-04 08:24, Arik Hadas escribió:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; ----- Original Message -----<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; El 04/08/16 a las 07:18, Arik Hadas escribió:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; ----- Original Message -----<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Hi,<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; We&#39;re running oVirt 4.0.1 and today I found out that one of our hosts<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; has all its VMs in an unknown state. I actually don&#39;t know how (and<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; when) did this happen, but I&#39;d like to restore service possibly without<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; turning off these machines. The host is up, the VMs are up, &#39;qemu&#39;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; process exists, no errors, it&#39;s just the VMs running on it that have a<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &#39;?&#39; where status is defined.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Is it safe in this case to simply modify database and set those VM&#39;s<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; status to &#39;up&#39;? I remember having to do this a time ago when we faced<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; storage issues, it didn&#39;t break anything back then. If not, is there a<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &quot;safe&quot; way to migrate those VMs to a different host and restart the<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; host<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; that marked them as unknown?<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Hi Nicolás,<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; I assume that the host these VMs are running on is empty in the<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; webadmin,<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; right? if that is the case then you&#39;ve probably hit [1]. Changing their<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; status to up is not the way to go since these VMs will not be monitored.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Hi Arik,<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; By &quot;empty&quot; you mean the webadmin reports the host being running 0 VMs?<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; If so, that&#39;s not the case, actually the VM count seems to be correct<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; in<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; relation to &quot;qemu-*&quot; processes (about 32 VMs), I can even see the<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; machines in the &quot;Virtual machines&quot; tab of the host, it&#39;s just they are<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; all marked with the &#39;?&#39; mark.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; No, I meant the &#39;Host&#39; column in the Virtual Machines tab but if you<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; see<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; the VMs in the &quot;Virtual machines&quot; sub-tab of the host then run_on_vds<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; points to the right host..<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; The host is up in the webadmin as well?<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Can you share the engine log?<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; Yes, the host is up in the webadmin, there are no issues with it, just<br>

&gt;&gt;&gt;&gt;&gt;&gt; the VMs running on it have the &#39;?&#39; mark. I&#39;ve made 3 tests:<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; 1) Restart engine: did not help<br>

&gt;&gt;&gt;&gt;&gt;&gt; 2) Check firewall, seems to be ok.<br>

&gt;&gt;&gt;&gt;&gt;&gt; 2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE status = 8; :<br>

&gt;&gt;&gt;&gt;&gt;&gt; After a while, I see lots of entries like this:<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;       2016-08-04 09:23:10,910 WARN<br>

&gt;&gt;&gt;&gt;&gt;&gt; [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]<br>

&gt;&gt;&gt;&gt;&gt;&gt; (DefaultQuartzScheduler4) [6ad135b8] Correlation ID: null, Call Stack:<br>

&gt;&gt;&gt;&gt;&gt;&gt; null, Custom Event ID: -1, Message: VM xxx is not responding.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; I&#39;m attaching the engine log, but I don&#39;t know when did this happen for<br>

&gt;&gt;&gt;&gt;&gt;&gt; the first time, though. If there&#39;s a manual way/command to migrate VMs<br>

&gt;&gt;&gt;&gt;&gt;&gt; to a different host I&#39;d appreciate a hint about it.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; Is it safe to restart vdsmd on this host?<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; The engine log looks fine - the VMs are reported as not-responding for<br>

&gt;&gt;&gt;&gt;&gt; some reason. I would restart libvirtd and vdsmd then<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Is restarting those two daemons safe? I mean, will that stop all qemu-* processes, so the VMs marked as unknown will stop?<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; Thanks.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Thanks.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Yes, there is no other way to resolve it other than changing the DB but<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; the change should be to update run_on_vds field of these VMs to the host<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; you know they are running on. Their status will then be updates in 15<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; sec.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; [1] <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1354494">https://bugzilla.redhat.com/show_bug.cgi?id=1354494</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Arik.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Thanks.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Nicolás<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Users mailing list<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt;&gt; Users mailing list<br>

&gt;&gt;&gt;&gt; <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>

&gt;&gt;&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; -- <br>

&gt;&gt;&gt; Ekin Meroğlu Red Hat Certified Architect <br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; linuxera Özgür Yazılım Çözüm ve Hizmetleri <br>

&gt;&gt;&gt; T +90 (850) 22 LINUX | GSM +90 (532) 137 77 04<br>

&gt;&gt;&gt; <a href="http://www.linuxera.com">www.linuxera.com</a> | <a href="mailto:bilgi@linuxera.com">bilgi@linuxera.com</a><br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; _______________________________________________<br>

&gt;&gt;&gt; Users mailing list<br>

&gt;&gt;&gt; <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>

&gt;&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>

&gt;&gt;&gt;<br>

&gt;&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; -- <br>

&gt; Ekin Meroğlu Red Hat Certified Architect <br>

&gt;<br>

&gt; linuxera Özgür Yazılım Çözüm ve Hizmetleri <br>

&gt; T +90 (850) 22 LINUX | GSM +90 (532) 137 77 04<br>

&gt; <a href="http://www.linuxera.com">www.linuxera.com</a> | <a href="mailto:bilgi@linuxera.com">bilgi@linuxera.com</a><br></p>