<p dir="ltr"></p>
<p dir="ltr">On Aug 22, 2016 10:57 PM, "Ekin Meroğlu" <<a href="mailto:ekin.meroglu@linuxera.com">ekin.meroglu@linuxera.com</a>> wrote:<br>
><br>
> Hi Yaniv,<br>
><br>
>> On Sun, Aug 7, 2016 at 9:37 PM, Ekin Meroğlu <<a href="mailto:ekin.meroglu@linuxera.com">ekin.meroglu@linuxera.com</a>> wrote:<br>
>>><br>
>>> Hi,<br>
>>><br>
>>> Just a reminder, if you have power management configured, first turn that off for the host - when you restart vdsmd with the power management configured, engine finds it not responding and tries to fence (e.g. reboot) the host.<br>
>><br>
>><br>
>> That's not true - if it's a graceful restart, it should not happen.<br>
><br>
><br>
> Can you explain this a little more? Is there a mechanism to prevent fencing on this scenario? <br>
><br>
> In two of our customers' production systems we've experienced this exact behavior (i.e. engine fencing the host while restarting vdsm service manually) for a number of times, and we were specifically advised by Red Hat Support to turn off PM before restarting service. I'd like to to know if we have a better / easier way to restart vdsm. <br>
><br>
> btw, b<br>
> oth of the environments were RHEV-H based RHEV 3.5 clusters, and both we were busy systems, so restarting vdsm service took quite a long time. I'm guessing this might be a factor.</p>
<p dir="ltr">That indeed might be the factor - but vdsm should not take long to restart. If it happens on a more recent version, I'd be happy to know about it, as we've done work on ensuring that it restarts and answers quickly to the engine (as far as I remember, even before it fully completed the restart). <br>
Y. </p>
<p dir="ltr">><br>
> Regards,<br>
> <br>
>><br>
>> <br>
>><br>
>> <br>
>>><br>
>>><br>
>>> Other than that, restarting vdsmd has been safe in my experience...<br>
>>><br>
>>> Regards, <br>
>>><br>
>>> On Thu, Aug 4, 2016 at 6:10 PM, Nicolás <<a href="mailto:nicolas@devels.es">nicolas@devels.es</a>> wrote:<br>
>>>><br>
>>>><br>
>>>><br>
>>>> El 04/08/16 a las 15:25, Arik Hadas escribió:<br>
>>>>><br>
>>>>><br>
>>>>> ----- Original Message -----<br>
>>>>>><br>
>>>>>> El 2016-08-04 08:24, Arik Hadas escribió:<br>
>>>>>>><br>
>>>>>>> ----- Original Message -----<br>
>>>>>>>><br>
>>>>>>>><br>
>>>>>>>> El 04/08/16 a las 07:18, Arik Hadas escribió:<br>
>>>>>>>>><br>
>>>>>>>>> ----- Original Message -----<br>
>>>>>>>>>><br>
>>>>>>>>>> Hi,<br>
>>>>>>>>>><br>
>>>>>>>>>> We're running oVirt 4.0.1 and today I found out that one of our hosts<br>
>>>>>>>>>> has all its VMs in an unknown state. I actually don't know how (and<br>
>>>>>>>>>> when) did this happen, but I'd like to restore service possibly without<br>
>>>>>>>>>> turning off these machines. The host is up, the VMs are up, 'qemu'<br>
>>>>>>>>>> process exists, no errors, it's just the VMs running on it that have a<br>
>>>>>>>>>> '?' where status is defined.<br>
>>>>>>>>>><br>
>>>>>>>>>> Is it safe in this case to simply modify database and set those VM's<br>
>>>>>>>>>> status to 'up'? I remember having to do this a time ago when we faced<br>
>>>>>>>>>> storage issues, it didn't break anything back then. If not, is there a<br>
>>>>>>>>>> "safe" way to migrate those VMs to a different host and restart the<br>
>>>>>>>>>> host<br>
>>>>>>>>>> that marked them as unknown?<br>
>>>>>>>>><br>
>>>>>>>>> Hi Nicolás,<br>
>>>>>>>>><br>
>>>>>>>>> I assume that the host these VMs are running on is empty in the<br>
>>>>>>>>> webadmin,<br>
>>>>>>>>> right? if that is the case then you've probably hit [1]. Changing their<br>
>>>>>>>>> status to up is not the way to go since these VMs will not be monitored.<br>
>>>>>>>><br>
>>>>>>>> Hi Arik,<br>
>>>>>>>><br>
>>>>>>>> By "empty" you mean the webadmin reports the host being running 0 VMs?<br>
>>>>>>>> If so, that's not the case, actually the VM count seems to be correct<br>
>>>>>>>> in<br>
>>>>>>>> relation to "qemu-*" processes (about 32 VMs), I can even see the<br>
>>>>>>>> machines in the "Virtual machines" tab of the host, it's just they are<br>
>>>>>>>> all marked with the '?' mark.<br>
>>>>>>><br>
>>>>>>> No, I meant the 'Host' column in the Virtual Machines tab but if you<br>
>>>>>>> see<br>
>>>>>>> the VMs in the "Virtual machines" sub-tab of the host then run_on_vds<br>
>>>>>>> points to the right host..<br>
>>>>>>><br>
>>>>>>> The host is up in the webadmin as well?<br>
>>>>>>> Can you share the engine log?<br>
>>>>>>><br>
>>>>>> Yes, the host is up in the webadmin, there are no issues with it, just<br>
>>>>>> the VMs running on it have the '?' mark. I've made 3 tests:<br>
>>>>>><br>
>>>>>> 1) Restart engine: did not help<br>
>>>>>> 2) Check firewall, seems to be ok.<br>
>>>>>> 2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE status = 8; :<br>
>>>>>> After a while, I see lots of entries like this:<br>
>>>>>><br>
>>>>>> 2016-08-04 09:23:10,910 WARN<br>
>>>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]<br>
>>>>>> (DefaultQuartzScheduler4) [6ad135b8] Correlation ID: null, Call Stack:<br>
>>>>>> null, Custom Event ID: -1, Message: VM xxx is not responding.<br>
>>>>>><br>
>>>>>> I'm attaching the engine log, but I don't know when did this happen for<br>
>>>>>> the first time, though. If there's a manual way/command to migrate VMs<br>
>>>>>> to a different host I'd appreciate a hint about it.<br>
>>>>>><br>
>>>>>> Is it safe to restart vdsmd on this host?<br>
>>>>><br>
>>>>> The engine log looks fine - the VMs are reported as not-responding for<br>
>>>>> some reason. I would restart libvirtd and vdsmd then<br>
>>>><br>
>>>><br>
>>>> Is restarting those two daemons safe? I mean, will that stop all qemu-* processes, so the VMs marked as unknown will stop?<br>
>>>><br>
>>>><br>
>>>>>> Thanks.<br>
>>>>>><br>
>>>>>>>> Thanks.<br>
>>>>>>>><br>
>>>>>>>>> Yes, there is no other way to resolve it other than changing the DB but<br>
>>>>>>>>> the change should be to update run_on_vds field of these VMs to the host<br>
>>>>>>>>> you know they are running on. Their status will then be updates in 15<br>
>>>>>>>>> sec.<br>
>>>>>>>>><br>
>>>>>>>>> [1] <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1354494">https://bugzilla.redhat.com/show_bug.cgi?id=1354494</a><br>
>>>>>>>>><br>
>>>>>>>>> Arik.<br>
>>>>>>>>><br>
>>>>>>>>>> Thanks.<br>
>>>>>>>>>><br>
>>>>>>>>>> Nicolás<br>
>>>>>>>>>><br>
>>>>>>>>>> _______________________________________________<br>
>>>>>>>>>> Users mailing list<br>
>>>>>>>>>> <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
>>>>>>>>>> <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>>>>>>>>>><br>
>>>>>>>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Users mailing list<br>
>>>> <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
>>>> <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> -- <br>
>>> Ekin Meroğlu Red Hat Certified Architect <br>
>>><br>
>>> linuxera Özgür Yazılım Çözüm ve Hizmetleri <br>
>>> T +90 (850) 22 LINUX | GSM +90 (532) 137 77 04<br>
>>> <a href="http://www.linuxera.com">www.linuxera.com</a> | <a href="mailto:bilgi@linuxera.com">bilgi@linuxera.com</a><br>
>>><br>
>>> _______________________________________________<br>
>>> Users mailing list<br>
>>> <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
>>> <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>
>>><br>
>><br>
><br>
><br>
><br>
> -- <br>
> Ekin Meroğlu Red Hat Certified Architect <br>
><br>
> linuxera Özgür Yazılım Çözüm ve Hizmetleri <br>
> T +90 (850) 22 LINUX | GSM +90 (532) 137 77 04<br>
> <a href="http://www.linuxera.com">www.linuxera.com</a> | <a href="mailto:bilgi@linuxera.com">bilgi@linuxera.com</a><br></p>