Hi
Can create thread dump, please send details on howto.
Regards
Nardus
On Thu, 6 Aug 2020 at 14:17, Artur Socha <asocha(a)redhat.com> wrote:
Hi Nardus,
You might have hit an issue I have been hunting for some time ( [1] and
[2] ).
[1] could not be properly resolved because at a time was not able to
recreate an issue on dev setup.
I suspect [2] is related.
Would you be able to prepare a thread dump from your engine instance?
Additionally, please check for potential libvirt errors/warnings.
Can you also paste the output of:
sudo yum list installed | grep vdsm
sudo yum list installed | grep ovirt-engine
sudo yum list installed | grep libvirt
Usually, according to previous reports, restarting the engine helps to
restore connectivity with hosts ... at least for some time.
[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1845152
[2]
https://bugzilla.redhat.com/show_bug.cgi?id=1846338
regards,
Artur
On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys <nardusg(a)gmail.com>
wrote:
> Also see this in engine:
>
> Aug 6, 2020, 7:37:17 AM
> VDSM someserver command Get Host Capabilities failed: Message timeout
> which can be caused by communication issues
>
> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov <hunter86_bg(a)yahoo.com>
> wrote:
>
>> Can you fheck for errors on the affected host. Most probably you need
>> the vdsm logs.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>> nardusg(a)gmail.com> написа:
>> >Hi Strahil
>> >
>> >Hope you are well. I get the following error when I tried to confirm
>> >reboot:
>> >
>> >Error while executing action: Cannot confirm 'Host has been
rebooted'
>> >Host.
>> >Valid Host statuses are "Non operational", "Maintenance"
or
>> >"Connecting".
>> >
>> >And I can't put it in maintenance, only option is "restart" or
"stop".
>> >
>> >Regards
>> >
>> >Nar
>> >
>> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov <hunter86_bg(a)yahoo.com>
>> >wrote:
>> >
>> >> After rebooting the node, have you "marked" it that it was
rebooted ?
>> >>
>> >> Best Regards,
>> >> Strahil Nikolov
>> >>
>> >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>> >> nardusg(a)gmail.com> написа:
>> >> >Hi oVirt land
>> >> >
>> >> >Hope you are well. Got a bit of an issue, actually a big issue. We
>> >had
>> >> >some
>> >> >sort of dip of some sort. All the VM's is still running, but
some of
>> >> >the
>> >> >hosts is show "Unassigned" or "NonResponsive".
So all the hosts was
>> >> >showing
>> >> >UP and was fine before our dip. So I did increase
>> >vdsHeartbeatInSecond
>> >> >to
>> >> >240, no luck.
>> >> >
>> >> >I still get a timeout on the engine lock even thou I can connect to
>> >> >that
>> >> >host from the engine using nc to test to port 54321. I also did
>> >restart
>> >> >vdsmd and also rebooted the host with no luck.
>> >> >
>> >> > nc -v someserver 54321
>> >> >Ncat: Version 7.50 (
https://nmap.org/ncat )
>> >> >Ncat: Connected to 172.40.2.172:54321.
>> >> >
>> >> >2020-08-05 20:20:34,256+02 ERROR
>> >>
>> >>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
>> >> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get
Host
>> >> >Capabilities failed: Message timeout which can be caused by
>> >> >communication
>> >> >issues
>> >> >
>> >> >Any troubleshoot ideas will be gladly appreciated.
>> >> >
>> >> >Regards
>> >> >
>> >> >Nar
>> >>
>>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4HB2J3MH76...
>
--
Artur Socha
Senior Software Engineer, RHV
Red Hat