Hi Artur

Please find attached, also let me know if I need to rerun. They 5 min apart

[root@engine-aa-1-01 ovirt-engine]#  ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
27390
[root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 > your_engine_thread_dump_1.txt
[root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 > your_engine_thread_dump_2.txt
[root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 > your_engine_thread_dump_3.txt

Regards

Nar

On Thu, 6 Aug 2020 at 15:55, Artur Socha <asocha@redhat.com> wrote:
Sure thing.
On engine host please find  jboss pid. You can use this command:
 ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
or jps tool from jdk. Sample output on my dev environment is:

± % jps                                                                                                                        !2860
64853 jboss-modules.jar
196217 Jps

Then use jstack from jdk:
jstack <pid>  > your_engine_thread_dump.txt
2 or 3 dumps taken in approximately 5 minutes intervals would be even more useful.

Here you can find even more options
https://www.baeldung.com/java-thread-dump

Artur

On Thu, Aug 6, 2020 at 3:15 PM Nardus Geldenhuys <nardusg@gmail.com> wrote:
Hi

Can create thread dump, please send details on howto.

Regards

Nardus

On Thu, 6 Aug 2020 at 14:17, Artur Socha <asocha@redhat.com> wrote:
Hi Nardus,
You might have hit an issue I have been hunting for some time ( [1] and  [2] ).
[1] could not be properly resolved because at a time was not able to recreate an issue on dev setup.
I suspect [2] is related.

Would you be able to prepare a thread dump from your engine instance?
Additionally, please check for potential libvirt errors/warnings.
Can you also paste the output of:
sudo yum list installed | grep vdsm
sudo yum list installed | grep ovirt-engine
sudo yum list installed | grep libvirt

Usually, according to previous reports, restarting the engine helps to restore connectivity with hosts ... at least for some time.




On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys <nardusg@gmail.com> wrote:
Also see this in engine:

Aug 6, 2020, 7:37:17 AM
VDSM someserver command Get Host Capabilities failed: Message timeout which can be caused by communication issues

On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Can you fheck for errors on the affected host. Most probably you need the vdsm logs.

Best Regards,
Strahil Nikolov

На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <nardusg@gmail.com> написа:
>Hi Strahil
>
>Hope you are well. I get the following error when I tried to confirm
>reboot:
>
>Error while executing action: Cannot confirm 'Host has been rebooted'
>Host.
>Valid Host statuses are "Non operational", "Maintenance" or
>"Connecting".
>
>And I can't put it in maintenance, only option is "restart" or "stop".
>
>Regards
>
>Nar
>
>On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov <hunter86_bg@yahoo.com>
>wrote:
>
>> After rebooting the node, have you "marked" it that it was rebooted ?
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>> nardusg@gmail.com> написа:
>> >Hi oVirt land
>> >
>> >Hope you are well. Got a bit of an issue, actually a big issue. We
>had
>> >some
>> >sort of dip of some sort. All the VM's is still running, but some of
>> >the
>> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
>> >showing
>> >UP and was fine before our dip. So I did increase
>vdsHeartbeatInSecond
>> >to
>> >240, no luck.
>> >
>> >I still get a timeout on the engine lock even thou I can connect to
>> >that
>> >host from the engine using nc to test to port 54321. I also did
>restart
>> >vdsmd and also rebooted the host with no luck.
>> >
>> > nc -v someserver 54321
>> >Ncat: Version 7.50 ( https://nmap.org/ncat )
>> >Ncat: Connected to 172.40.2.172:54321.
>> >
>> >2020-08-05 20:20:34,256+02 ERROR
>>
>>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
>> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
>> >Capabilities failed: Message timeout which can be caused by
>> >communication
>> >issues
>> >
>> >Any troubleshoot ideas will be gladly appreciated.
>> >
>> >Regards
>> >
>> >Nar
>>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4HB2J3MH76FI2325Z4AV4VCCEKH4M3S/


--
Artur Socha
Senior Software Engineer, RHV
Red Hat


--
Artur Socha
Senior Software Engineer, RHV
Red Hat