[ovirt-users] Migrate machines in unknown state?

Ekin Meroğlu ekin.meroglu at linuxera.com
Mon Aug 22 19:57:27 UTC 2016


Hi Yaniv,

On Sun, Aug 7, 2016 at 9:37 PM, Ekin Meroğlu <ekin.meroglu at linuxera.com>
> wrote:
>
>> Hi,
>>
>> Just a reminder, if you have power management configured, first turn that
>> off for the host - when you restart vdsmd with the power management
>> configured, engine finds it not responding and tries to fence (e.g. reboot)
>> the host.
>>
>
> That's not true - if it's a graceful restart, it should not happen.
>

​Can you explain this a little more? Is there a mechanism to prevent
fencing on this scenario?

In two of our customers' production systems we've experienced this exact
behavior (i.e. engine fencing the host while restarting vdsm service
manually) for a number of times, and we were specifically advised by Red
Hat Support to turn off PM before restarting service. I'd like to to know
if we have a better / easier way to restart vdsm. ​

​btw, ​b
oth of the environments were RHE​V-H based RHEV 3.5 clusters, and both we
were busy systems, so restarting vdsm service took quite a long time. I'm
guessing this might be a factor.

Regards,
​

>
>

>
>>
>> Other than that, restarting vdsmd has been safe in my experience...
>>
>> Regards,
>>
>> On Thu, Aug 4, 2016 at 6:10 PM, Nicolás <nicolas at devels.es> wrote:
>>
>>>
>>>
>>> El 04/08/16 a las 15:25, Arik Hadas escribió:
>>>
>>>>
>>>> ----- Original Message -----
>>>>
>>>>> El 2016-08-04 08:24, Arik Hadas escribió:
>>>>>
>>>>>> ----- Original Message -----
>>>>>>
>>>>>>>
>>>>>>> El 04/08/16 a las 07:18, Arik Hadas escribió:
>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We're running oVirt 4.0.1 and today I found out that one of our
>>>>>>>>> hosts
>>>>>>>>> has all its VMs in an unknown state. I actually don't know how (and
>>>>>>>>> when) did this happen, but I'd like to restore service possibly
>>>>>>>>> without
>>>>>>>>> turning off these machines. The host is up, the VMs are up, 'qemu'
>>>>>>>>> process exists, no errors, it's just the VMs running on it that
>>>>>>>>> have a
>>>>>>>>> '?' where status is defined.
>>>>>>>>>
>>>>>>>>> Is it safe in this case to simply modify database and set those
>>>>>>>>> VM's
>>>>>>>>> status to 'up'? I remember having to do this a time ago when we
>>>>>>>>> faced
>>>>>>>>> storage issues, it didn't break anything back then. If not, is
>>>>>>>>> there a
>>>>>>>>> "safe" way to migrate those VMs to a different host and restart the
>>>>>>>>> host
>>>>>>>>> that marked them as unknown?
>>>>>>>>>
>>>>>>>> Hi Nicolás,
>>>>>>>>
>>>>>>>> I assume that the host these VMs are running on is empty in the
>>>>>>>> webadmin,
>>>>>>>> right? if that is the case then you've probably hit [1]. Changing
>>>>>>>> their
>>>>>>>> status to up is not the way to go since these VMs will not be
>>>>>>>> monitored.
>>>>>>>>
>>>>>>> Hi Arik,
>>>>>>>
>>>>>>> By "empty" you mean the webadmin reports the host being running 0
>>>>>>> VMs?
>>>>>>> If so, that's not the case, actually the VM count seems to be correct
>>>>>>> in
>>>>>>> relation to "qemu-*" processes (about 32 VMs), I can even see the
>>>>>>> machines in the "Virtual machines" tab of the host, it's just they
>>>>>>> are
>>>>>>> all marked with the '?' mark.
>>>>>>>
>>>>>> No, I meant the 'Host' column in the Virtual Machines tab but if you
>>>>>> see
>>>>>> the VMs in the "Virtual machines" sub-tab of the host then run_on_vds
>>>>>> points to the right host..
>>>>>>
>>>>>> The host is up in the webadmin as well?
>>>>>> Can you share the engine log?
>>>>>>
>>>>>> Yes, the host is up in the webadmin, there are no issues with it, just
>>>>> the VMs running on it have the '?' mark. I've made 3 tests:
>>>>>
>>>>> 1) Restart engine: did not help
>>>>> 2) Check firewall, seems to be ok.
>>>>> 2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE status = 8; :
>>>>> After a while, I see lots of entries like this:
>>>>>
>>>>>       2016-08-04 09:23:10,910 WARN
>>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>>> (DefaultQuartzScheduler4) [6ad135b8] Correlation ID: null, Call Stack:
>>>>> null, Custom Event ID: -1, Message: VM xxx is not responding.
>>>>>
>>>>> I'm attaching the engine log, but I don't know when did this happen for
>>>>> the first time, though. If there's a manual way/command to migrate VMs
>>>>> to a different host I'd appreciate a hint about it.
>>>>>
>>>>> Is it safe to restart vdsmd on this host?
>>>>>
>>>> The engine log looks fine - the VMs are reported as not-responding for
>>>> some reason. I would restart libvirtd and vdsmd then
>>>>
>>>
>>> Is restarting those two daemons safe? I mean, will that stop all qemu-*
>>> processes, so the VMs marked as unknown will stop?
>>>
>>>
>>> Thanks.
>>>>>
>>>>> Thanks.
>>>>>>>
>>>>>>> Yes, there is no other way to resolve it other than changing the DB
>>>>>>>> but
>>>>>>>> the change should be to update run_on_vds field of these VMs to the
>>>>>>>> host
>>>>>>>> you know they are running on. Their status will then be updates in
>>>>>>>> 15
>>>>>>>> sec.
>>>>>>>>
>>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1354494
>>>>>>>>
>>>>>>>> Arik.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Nicolás
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at ovirt.org
>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>
>>
>> --
>> *Ekin Meroğlu** Red Hat Certified Architect*
>>
>> linuxera Özgür Yazılım Çözüm ve Hizmetleri
>> *T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
>> www.linuxera.com | bilgi at linuxera.com
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>


-- 
*Ekin Meroğlu** Red Hat Certified Architect*

linuxera Özgür Yazılım Çözüm ve Hizmetleri
*T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
www.linuxera.com | bilgi at linuxera.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160822/7b2e1786/attachment-0001.html>


More information about the Users mailing list