Hi Artur
Please find attached, also let me know if I need to rerun. They 5 min apart
[root@engine-aa-1-01 ovirt-engine]# ps -ef | grep jboss | grep -v grep |
awk '{ print $2 }'
27390
[root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
your_engine_thread_dump_1.txt
[root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
your_engine_thread_dump_2.txt
[root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
your_engine_thread_dump_3.txt
Regards
Nar
On Thu, 6 Aug 2020 at 15:55, Artur Socha <asocha(a)redhat.com> wrote:
Sure thing.
On engine host please find jboss pid. You can use this command:
ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
or jps tool from jdk. Sample output on my dev environment is:
± % jps
!2860
64853 jboss-modules.jar
196217 Jps
Then use jstack from jdk:
jstack <pid> > your_engine_thread_dump.txt
2 or 3 dumps taken in approximately 5 minutes intervals would be even more
useful.
Here you can find even more options
https://www.baeldung.com/java-thread-dump
Artur
On Thu, Aug 6, 2020 at 3:15 PM Nardus Geldenhuys <nardusg(a)gmail.com>
wrote:
> Hi
>
> Can create thread dump, please send details on howto.
>
> Regards
>
> Nardus
>
> On Thu, 6 Aug 2020 at 14:17, Artur Socha <asocha(a)redhat.com> wrote:
>
>> Hi Nardus,
>> You might have hit an issue I have been hunting for some time ( [1] and
>> [2] ).
>> [1] could not be properly resolved because at a time was not able to
>> recreate an issue on dev setup.
>> I suspect [2] is related.
>>
>> Would you be able to prepare a thread dump from your engine instance?
>> Additionally, please check for potential libvirt errors/warnings.
>> Can you also paste the output of:
>> sudo yum list installed | grep vdsm
>> sudo yum list installed | grep ovirt-engine
>> sudo yum list installed | grep libvirt
>>
>> Usually, according to previous reports, restarting the engine helps to
>> restore connectivity with hosts ... at least for some time.
>>
>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1845152
>> [2]
https://bugzilla.redhat.com/show_bug.cgi?id=1846338
>>
>> regards,
>> Artur
>>
>>
>>
>> On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys <nardusg(a)gmail.com>
>> wrote:
>>
>>> Also see this in engine:
>>>
>>> Aug 6, 2020, 7:37:17 AM
>>> VDSM someserver command Get Host Capabilities failed: Message timeout
>>> which can be caused by communication issues
>>>
>>> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov <hunter86_bg(a)yahoo.com>
>>> wrote:
>>>
>>>> Can you fheck for errors on the affected host. Most probably you need
>>>> the vdsm logs.
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>>>> nardusg(a)gmail.com> написа:
>>>> >Hi Strahil
>>>> >
>>>> >Hope you are well. I get the following error when I tried to confirm
>>>> >reboot:
>>>> >
>>>> >Error while executing action: Cannot confirm 'Host has been
rebooted'
>>>> >Host.
>>>> >Valid Host statuses are "Non operational",
"Maintenance" or
>>>> >"Connecting".
>>>> >
>>>> >And I can't put it in maintenance, only option is
"restart" or "stop".
>>>> >
>>>> >Regards
>>>> >
>>>> >Nar
>>>> >
>>>> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov
<hunter86_bg(a)yahoo.com>
>>>> >wrote:
>>>> >
>>>> >> After rebooting the node, have you "marked" it that it
was rebooted
>>>> ?
>>>> >>
>>>> >> Best Regards,
>>>> >> Strahil Nikolov
>>>> >>
>>>> >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>>>> >> nardusg(a)gmail.com> написа:
>>>> >> >Hi oVirt land
>>>> >> >
>>>> >> >Hope you are well. Got a bit of an issue, actually a big
issue. We
>>>> >had
>>>> >> >some
>>>> >> >sort of dip of some sort. All the VM's is still running,
but some
>>>> of
>>>> >> >the
>>>> >> >hosts is show "Unassigned" or
"NonResponsive". So all the hosts was
>>>> >> >showing
>>>> >> >UP and was fine before our dip. So I did increase
>>>> >vdsHeartbeatInSecond
>>>> >> >to
>>>> >> >240, no luck.
>>>> >> >
>>>> >> >I still get a timeout on the engine lock even thou I can
connect to
>>>> >> >that
>>>> >> >host from the engine using nc to test to port 54321. I also
did
>>>> >restart
>>>> >> >vdsmd and also rebooted the host with no luck.
>>>> >> >
>>>> >> > nc -v someserver 54321
>>>> >> >Ncat: Version 7.50 (
https://nmap.org/ncat )
>>>> >> >Ncat: Connected to 172.40.2.172:54321.
>>>> >> >
>>>> >> >2020-08-05 20:20:34,256+02 ERROR
>>>> >>
>>>>
>>>>
>>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>> >> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) []
EVENT_ID:
>>>> >> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command
Get
>>>> Host
>>>> >> >Capabilities failed: Message timeout which can be caused by
>>>> >> >communication
>>>> >> >issues
>>>> >> >
>>>> >> >Any troubleshoot ideas will be gladly appreciated.
>>>> >> >
>>>> >> >Regards
>>>> >> >
>>>> >> >Nar
>>>> >>
>>>>
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>>
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4HB2J3MH76...
>>>
>>
>>
>> --
>> Artur Socha
>> Senior Software Engineer, RHV
>> Red Hat
>>
>
--
Artur Socha
Senior Software Engineer, RHV
Red Hat