On Aug 22, 2016 10:57 PM, "Ekin Meroğlu" <ekin.meroglu(a)linuxera.com>
wrote:
Hi Yaniv,
> On Sun, Aug 7, 2016 at 9:37 PM, Ekin Meroğlu <ekin.meroglu(a)linuxera.com>
wrote:
>>
>> Hi,
>>
>> Just a reminder, if you have power management configured, first turn
that
off for the host - when you restart vdsmd with the power management
configured, engine finds it not responding and tries to fence (e.g. reboot)
the host.
>
>
> That's not true - if it's a graceful restart, it should not happen.
Can you explain this a little more? Is there a mechanism to prevent
fencing on
this scenario?
In two of our customers' production systems we've experienced this exact
behavior (i.e. engine fencing the host while restarting vdsm service
manually) for a number of times, and we were specifically advised by Red
Hat Support to turn off PM before restarting service. I'd like to to know
if we have a better / easier way to restart vdsm.
btw, b
oth of the environments were RHEV-H based RHEV 3.5 clusters, and both we
were busy
systems, so restarting vdsm service took quite a long time. I'm
guessing this might be a factor.
That indeed might be the factor - but vdsm should not take long to restart.
If it happens on a more recent version, I'd be happy to know about it, as
we've done work on ensuring that it restarts and answers quickly to the
engine (as far as I remember, even before it fully completed the restart).
Y.
Regards,
>
>
>
>
>>
>>
>> Other than that, restarting vdsmd has been safe in my experience...
>>
>> Regards,
>>
>> On Thu, Aug 4, 2016 at 6:10 PM, Nicolás <nicolas(a)devels.es> wrote:
>>>
>>>
>>>
>>> El 04/08/16 a las 15:25, Arik Hadas escribió:
>>>>
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>> El 2016-08-04 08:24, Arik Hadas escribió:
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>>
>>>>>>>
>>>>>>> El 04/08/16 a las 07:18, Arik Hadas escribió:
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We're running oVirt 4.0.1 and today I found out
that one of our
hosts
>>>>>>>>> has all its VMs in an unknown
state. I actually don't know how
(and
>>>>>>>>> when) did this happen, but
I'd like to restore service possibly
without
>>>>>>>>> turning off these machines. The
host is up, the VMs are up,
'qemu'
>>>>>>>>> process exists, no errors,
it's just the VMs running on it that
have a
>>>>>>>>> '?' where status is
defined.
>>>>>>>>>
>>>>>>>>> Is it safe in this case to simply modify database and
set those
VM's
>>>>>>>>> status to 'up'? I
remember having to do this a time ago when we
faced
>>>>>>>>> storage issues, it didn't
break anything back then. If not, is
there a
>>>>>>>>> "safe" way to migrate
those VMs to a different host and restart
the
>>>>>>>>> host
>>>>>>>>> that marked them as unknown?
>>>>>>>>
>>>>>>>> Hi Nicolás,
>>>>>>>>
>>>>>>>> I assume that the host these VMs are running on is empty
in the
>>>>>>>> webadmin,
>>>>>>>> right? if that is the case then you've probably hit
[1]. Changing
their
>>>>>>>> status to up is not the way to go
since these VMs will not be
monitored.
>>>>>>>
>>>>>>> Hi Arik,
>>>>>>>
>>>>>>> By "empty" you mean the webadmin reports the host
being running 0
VMs?
>>>>>>> If so, that's not the case, actually
the VM count seems to be
correct
>>>>>>> in
>>>>>>> relation to "qemu-*" processes (about 32 VMs), I
can even see the
>>>>>>> machines in the "Virtual machines" tab of the host,
it's just they
are
>>>>>>> all marked with the '?' mark.
>>>>>>
>>>>>> No, I meant the 'Host' column in the Virtual Machines tab
but if you
>>>>>> see
>>>>>> the VMs in the "Virtual machines" sub-tab of the host
then
run_on_vds
>>>>>> points to the right host..
>>>>>>
>>>>>> The host is up in the webadmin as well?
>>>>>> Can you share the engine log?
>>>>>>
>>>>> Yes, the host is up in the webadmin, there are no issues with it,
just
>>>>> the VMs running on it have the '?' mark.
I've made 3 tests:
>>>>>
>>>>> 1) Restart engine: did not help
>>>>> 2) Check firewall, seems to be ok.
>>>>> 2) PostgreSQL: UPDATE vm_dynamic SET status = 1 WHERE status = 8; :
>>>>> After a while, I see lots of entries like this:
>>>>>
>>>>> 2016-08-04 09:23:10,910 WARN
>>>>>
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>>> (DefaultQuartzScheduler4) [6ad135b8] Correlation
ID: null, Call
Stack:
>>>>> null, Custom Event ID: -1, Message: VM xxx is not
responding.
>>>>>
>>>>> I'm attaching the engine log, but I don't know when did this
happen
for
>>>>> the first time, though. If there's a manual
way/command to migrate
VMs
>>>>> to a different host I'd appreciate a hint
about it.
>>>>>
>>>>> Is it safe to restart vdsmd on this host?
>>>>
>>>> The engine log looks fine - the VMs are reported as not-responding for
>>>> some reason. I would restart libvirtd and vdsmd then
>>>
>>>
>>> Is restarting those two daemons safe? I mean, will that stop all
qemu-*
processes, so the VMs marked as unknown will stop?
>>>
>>>
>>>>> Thanks.
>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>> Yes, there is no other way to resolve it other than
changing the
DB but
>>>>>>>> the change should be to update
run_on_vds field of these VMs to
the host
>>>>>>>> you know they are running on. Their
status will then be updates
in 15
>>>>>>>> sec.
>>>>>>>>
>>>>>>>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1354494
>>>>>>>>
>>>>>>>> Arik.
>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Nicolás
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>
>>>>>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>
>> --
>> Ekin Meroğlu Red Hat Certified Architect
>>
>> linuxera Özgür Yazılım Çözüm ve Hizmetleri
>> T +90 (850) 22 LINUX | GSM +90 (532) 137 77 04
>>
www.linuxera.com | bilgi(a)linuxera.com
>>
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
>>
>
--
Ekin Meroğlu Red Hat Certified Architect
linuxera Özgür Yazılım Çözüm ve Hizmetleri
T +90 (850) 22 LINUX | GSM +90 (532) 137 77 04
www.linuxera.com | bilgi(a)linuxera.com