Hi Yaniv,
Just a reminder, can you give us a pointer? Red Hat Support just asked us
to disable PM before restarting vdsm again.
Thanks & Best regards,
On Mon, Aug 22, 2016 at 10:57 PM, Ekin Meroğlu <ekin.meroglu(a)linuxera.com>
wrote:
Hi Yaniv,
On Sun, Aug 7, 2016 at 9:37 PM, Ekin Meroğlu <ekin.meroglu(a)linuxera.com>
> wrote:
>
>> Hi,
>>
>> Just a reminder, if you have power management configured, first turn
>> that off for the host - when you restart vdsmd with the power management
>> configured, engine finds it not responding and tries to fence (e.g. reboot)
>> the host.
>>
>
> That's not true - if it's a graceful restart, it should not happen.
>
Can you explain this a little more? Is there a mechanism to prevent
fencing on this scenario?
In two of our customers' production systems we've experienced this exact
behavior (i.e. engine fencing the host while restarting vdsm service
manually) for a number of times, and we were specifically advised by Red
Hat Support to turn off PM before restarting service. I'd like to to know
if we have a better / easier way to restart vdsm.
btw, b
oth of the environments were RHEV-H based RHEV 3.5 clusters, and both we
were busy systems, so restarting vdsm service took quite a long time. I'm
guessing this might be a factor.
Regards,
>
>
>
>>
>> Other than that, restarting vdsmd has been safe in my experience...
>>
>> Regards,
>>
>> On Thu, Aug 4, 2016 at 6:10 PM, Nicolás <nicolas(a)devels.es> wrote:
>>
>>>
>>>
>>> El 04/08/16 a las 15:25, Arik Hadas escribió:
>>>
>>>>
>>>> ----- Original Message -----
>>>>
>>>>> El 2016-08-04 08:24, Arik Hadas escribió:
>>>>>
>>>>>> ----- Original Message -----
>>>>>>
>>>>>>>
>>>>>>> El 04/08/16 a las 07:18, Arik Hadas escribió:
>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We're running oVirt 4.0.1 and today I found out
that one of our
>>>>>>>>> hosts
>>>>>>>>> has all its VMs in an unknown state. I actually
don't know how
>>>>>>>>> (and
>>>>>>>>> when) did this happen, but I'd like to restore
service possibly
>>>>>>>>> without
>>>>>>>>> turning off these machines. The host is up, the VMs
are up, 'qemu'
>>>>>>>>> process exists, no errors, it's just the VMs
running on it that
>>>>>>>>> have a
>>>>>>>>> '?' where status is defined.
>>>>>>>>>
>>>>>>>>> Is it safe in this case to simply modify database and
set those
>>>>>>>>> VM's
>>>>>>>>> status to 'up'? I remember having to do this
a time ago when we
>>>>>>>>> faced
>>>>>>>>> storage issues, it didn't break anything back
then. If not, is
>>>>>>>>> there a
>>>>>>>>> "safe" way to migrate those VMs to a
different host and restart
>>>>>>>>> the
>>>>>>>>> host
>>>>>>>>> that marked them as unknown?
>>>>>>>>>
>>>>>>>> Hi Nicolás,
>>>>>>>>
>>>>>>>> I assume that the host these VMs are running on is empty
in the
>>>>>>>> webadmin,
>>>>>>>> right? if that is the case then you've probably hit
[1]. Changing
>>>>>>>> their
>>>>>>>> status to up is not the way to go since these VMs will
not be
>>>>>>>> monitored.
>>>>>>>>
>>>>>>> Hi Arik,
>>>>>>>
>>>>>>> By "empty" you mean the webadmin reports the host
being running 0
>>>>>>> VMs?
>>>>>>> If so, that's not the case, actually the VM count seems
to be
>>>>>>> correct
>>>>>>> in
>>>>>>> relation to "qemu-*" processes (about 32 VMs), I
can even see the
>>>>>>> machines in the "Virtual machines" tab of the host,
it's just they
>>>>>>> are
>>>>>>> all marked with the '?' mark.
>>>>>>>
>>>>>> No, I meant the 'Host' column in the Virtual Machines tab
but if you
>>>>>> see
>>>>>> the VMs in the "Virtual machines" sub-tab of the host
then run_on_vds
>>>>>> points to the right host..
>>>>>>
>>>>>> The host is up in the webadmin as well?
>>>>>> Can you share the engine log?
>>>>>>
>>>>>> Yes, the host is up in the webadmin, there are no issues with
it,
>>>>> just
>>>>> the VMs running on it have the '?' mark. I've made 3
tests:
>>>>>
>>>>> 1) Restart engine: did not help
>>>>> 2) Check firewall, seems to be ok.
>>>>> 2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE status = 8; :
>>>>> After a while, I see lots of entries like this:
>>>>>
>>>>> 2016-08-04 09:23:10,910 WARN
>>>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
>>>>> gDirector]
>>>>> (DefaultQuartzScheduler4) [6ad135b8] Correlation ID: null, Call
Stack:
>>>>> null, Custom Event ID: -1, Message: VM xxx is not responding.
>>>>>
>>>>> I'm attaching the engine log, but I don't know when did this
happen
>>>>> for
>>>>> the first time, though. If there's a manual way/command to
migrate VMs
>>>>> to a different host I'd appreciate a hint about it.
>>>>>
>>>>> Is it safe to restart vdsmd on this host?
>>>>>
>>>> The engine log looks fine - the VMs are reported as not-responding for
>>>> some reason. I would restart libvirtd and vdsmd then
>>>>
>>>
>>> Is restarting those two daemons safe? I mean, will that stop all qemu-*
>>> processes, so the VMs marked as unknown will stop?
>>>
>>>
>>> Thanks.
>>>>>
>>>>> Thanks.
>>>>>>>
>>>>>>> Yes, there is no other way to resolve it other than changing
the DB
>>>>>>>> but
>>>>>>>> the change should be to update run_on_vds field of these
VMs to
>>>>>>>> the host
>>>>>>>> you know they are running on. Their status will then be
updates in
>>>>>>>> 15
>>>>>>>> sec.
>>>>>>>>
>>>>>>>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1354494
>>>>>>>>
>>>>>>>> Arik.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Nicolás
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>
>>
>> --
>> *Ekin Meroğlu** Red Hat Certified Architect*
>>
>> linuxera Özgür Yazılım Çözüm ve Hizmetleri
>> *T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
>>
www.linuxera.com | bilgi(a)linuxera.com
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
--
*Ekin Meroğlu** Red Hat Certified Architect*
linuxera Özgür Yazılım Çözüm ve Hizmetleri
*T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04
www.linuxera.com | bilgi(a)linuxera.com
--
*Ekin Meroğlu** Red Hat Certified Architect*
linuxera Özgür Yazılım Çözüm ve Hizmetleri
*T* +90 (850) 22 LINUX | *GSM* +90 (532) 137 77 04