[ovirt-users] Migrate machines in unknown state?

Ekin Meroğlu ekin.meroglu at linuxera.com
Fri Sep 30 16:34:06 UTC 2016


Hi Yaniv,

Just a reminder, can you give us a pointer? Red Hat Support just asked us
to disable PM before restarting vdsm again.

Thanks & Best regards,

On Mon, Aug 22, 2016 at 10:57 PM, Ekin Meroğlu <ekin.meroglu at linuxera.com>
wrote:

> Hi Yaniv,
>
> On Sun, Aug 7, 2016 at 9:37 PM, Ekin Meroğlu <ekin.meroglu at linuxera.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Just a reminder, if you have power management configured, first turn
>>> that off for the host - when you restart vdsmd with the power management
>>> configured, engine finds it not responding and tries to fence (e.g. reboot)
>>> the host.
>>>
>>
>> That's not true - if it's a graceful restart, it should not happen.
>>
>
> ​Can you explain this a little more? Is there a mechanism to prevent
> fencing on this scenario?
>
> In two of our customers' production systems we've experienced this exact
> behavior (i.e. engine fencing the host while restarting vdsm service
> manually) for a number of times, and we were specifically advised by Red
> Hat Support to turn off PM before restarting service. I'd like to to know
> if we have a better / easier way to restart vdsm. ​
>
> ​btw, ​b
> oth of the environments were RHE​V-H based RHEV 3.5 clusters, and both we
> were busy systems, so restarting vdsm service took quite a long time. I'm
> guessing this might be a factor.
>
> Regards,
>>
>>
>>
>
>>
>>>
>>> Other than that, restarting vdsmd has been safe in my experience...
>>>
>>> Regards,
>>>
>>> On Thu, Aug 4, 2016 at 6:10 PM, Nicolás <nicolas at devels.es> wrote:
>>>
>>>>
>>>>
>>>> El 04/08/16 a las 15:25, Arik Hadas escribió:
>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>
>>>>>> El 2016-08-04 08:24, Arik Hadas escribió:
>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>
>>>>>>>>
>>>>>>>> El 04/08/16 a las 07:18, Arik Hadas escribió:
>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We're running oVirt 4.0.1 and today I found out that one of our
>>>>>>>>>> hosts
>>>>>>>>>> has all its VMs in an unknown state. I actually don't know how
>>>>>>>>>> (and
>>>>>>>>>> when) did this happen, but I'd like to restore service possibly
>>>>>>>>>> without
>>>>>>>>>> turning off these machines. The host is up, the VMs are up, 'qemu'
>>>>>>>>>> process exists, no errors, it's just the VMs running on it that
>>>>>>>>>> have a
>>>>>>>>>> '?' where status is defined.
>>>>>>>>>>
>>>>>>>>>> Is it safe in this case to simply modify database and set those
>>>>>>>>>> VM's
>>>>>>>>>> status to 'up'? I remember having to do this a time ago when we
>>>>>>>>>> faced
>>>>>>>>>> storage issues, it didn't break anything back then. If not, is
>>>>>>>>>> there a
>>>>>>>>>> "safe" way to migrate those VMs to a different host and restart
>>>>>>>>>> the
>>>>>>>>>> host
>>>>>>>>>> that marked them as unknown?
>>>>>>>>>>
>>>>>>>>> Hi Nicolás,
>>>>>>>>>
>>>>>>>>> I assume that the host these VMs are running on is empty in the
>>>>>>>>> webadmin,
>>>>>>>>> right? if that is the case then you've probably hit [1]. Changing
>>>>>>>>> their
>>>>>>>>> status to up is not the way to go since these VMs will not be
>>>>>>>>> monitored.
>>>>>>>>>
>>>>>>>> Hi Arik,
>>>>>>>>
>>>>>>>> By "empty" you mean the webadmin reports the host being running 0
>>>>>>>> VMs?
>>>>>>>> If so, that's not the case, actually the VM count seems to be
>>>>>>>> correct
>>>>>>>> in
>>>>>>>> relation to "qemu-*" processes (about 32 VMs), I can even see the
>>>>>>>> machines in the "Virtual machines" tab of the host, it's just they
>>>>>>>> are
>>>>>>>> all marked with the '?' mark.
>>>>>>>>
>>>>>>> No, I meant the 'Host' column in the Virtual Machines tab but if you
>>>>>>> see
>>>>>>> the VMs in the "Virtual machines" sub-tab of the host then run_on_vds
>>>>>>> points to the right host..
>>>>>>>
>>>>>>> The host is up in the webadmin as well?
>>>>>>> Can you share the engine log?
>>>>>>>
>>>>>>> Yes, the host is up in the webadmin, there are no issues with it,
>>>>>> just
>>>>>> the VMs running on it have the '?' mark. I've made 3 tests:
>>>>>>
>>>>>> 1) Restart engine: did not help
>>>>>> 2) Check firewall, seems to be ok.
>>>>>> 2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE status = 8; :
>>>>>> After a while, I see lots of entries like this:
>>>>>>
>>>>>>       2016-08-04 09:23:10,910 WARN
>>>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
>>>>>> gDirector]
>>>>>> (DefaultQuartzScheduler4) [6ad135b8] Correlation ID: null, Call Stack:
>>>>>> null, Custom Event ID: -1, Message: VM xxx is not responding.
>>>>>>
>>>>>> I'm attaching the engine log, but I don't know when did this happen
>>>>>> for
>>>>>> the first time, though. If there's a manual way/command to migrate VMs
>>>>>> to a different host I'd appreciate a hint about it.
>>>>>>
>>>>>> Is it safe to restart vdsmd on this host?
>>>>>>
>>>>> The engine log looks fine - the VMs are reported as not-responding for
>>>>> some reason. I would restart libvirtd and vdsmd then
>>>>>
>>>>
>>>> Is restarting those two daemons safe? I mean, will that stop all qemu-*
>>>> processes, so the VMs marked as unknown will stop?
>>>>
>>>>
>>>> Thanks.
>>>>>>
>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Yes, there is no other way to resolve it other than changing the DB
>>>>>>>>> but
>>>>>>>>> the change should be to update run_on_vds field of these VMs to
>>>>>>>>> the host
>>>>>>>>> you know they are running on. Their status will then be updates in
>>>>>>>>> 15
>>>>>>>>> sec.
>>>>>>>>>
>>>>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1354494
>>>>>>>>>
>>>>>>>>> Arik.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Nicolás
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users at ovirt.org
>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>
>>>
>>>
>>> --
>>> *Ekin Meroğlu** Red Hat Certified Architect*
>>>
>>> linuxera Özgür Yazılım Çözüm ve Hizmetleri
>>> *T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
>>> www.linuxera.com | bilgi at linuxera.com
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
>
> --
> *Ekin Meroğlu** Red Hat Certified Architect*
>
> linuxera Özgür Yazılım Çözüm ve Hizmetleri
> *T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
> www.linuxera.com | bilgi at linuxera.com
>



-- 
*Ekin Meroğlu** Red Hat Certified Architect*

linuxera Özgür Yazılım Çözüm ve Hizmetleri
*T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
www.linuxera.com | bilgi at linuxera.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160930/ac2395fb/attachment-0001.html>


More information about the Users mailing list