Artur
On Mon, Aug 9, 2021 at 5:01 PM Andrei Verovski <andreil1(a)starlett.lv> wrote:
Hi
Should I use threaddump_linux.sh.tar.gz ?
from:
https://access.redhat.com/solutions/18178
> On 9 Aug 2021, at 17:56, Artur Socha <asocha(a)redhat.com> wrote:
>
> Actually you could even make 3 thread dumps in 30second intervals.
> Artur
>
> On Mon, Aug 9, 2021 at 4:53 PM Artur Socha <asocha(a)redhat.com> wrote:
> Unfortunately I don't see anything wrong in both engine and vdsm logs.
> There is one last thing that comes to my mind that you try - restart
engine service. That is exactly the case I have been investigating.
> But before restarting I would like to ask you, if possible, for a java
(jvm) thread dump.
> The procedure is as follows:
> 1) find jboss pid ie.
> $ ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
> 2) trigger thread dump
> $ kill -3 <jboss-pid>
> 3) thread dump logs can be found at /var/log/ovirt-engine/console.log
>
> And then restart engine service to check if that helps.
>
> Artur
>
>
> On Mon, Aug 9, 2021 at 2:19 PM Andrei Verovski <andreil1(a)starlett.lv>
wrote:
> Hi, Artur,
>
> Small update with vdsm status, forgot to include in previous post.
>
> I partially fixed problem with VDSM start.
>
> Bug "Failed to create session: Start job for unit user-0.slice failed
with ‘canceled’”
> is being described here
>
https://bugzilla.redhat.com/show_bug.cgi?id=1967962
> and fix seem to be available here, so I have downgraded systemd with
backport fix:
>
http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelay...
>
> Now vdsmd service starts successfully, but node14 still cannot be
activated because of same error. This is quite strange, before restart on
Friday node just worked. There were no upgrades, nothing, just restart.
>
> [root@node14 ~]# service vdsmd status
> Redirecting to /bin/systemctl status vdsmd.service
> ● vdsmd.service - Virtual Desktop Server Manager
> Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled;
vendor preset: disabled)
> Active: active (running) since Mon 2021-08-09 15:12:59 EEST; 4min 20s
ago
> Process: 4066 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
--pre-start (code=exited, status=0/SUCCESS)
> Main PID: 4130 (vdsmd)
> Tasks: 41 (limit: 615525)
> Memory: 59.5M
> CGroup: /system.slice/vdsmd.service
> └─4130 /usr/bin/python3 /usr/share/vdsm/vdsmd
>
> Aug 09 15:12:55 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
prepare_transient_repository
> Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
syslog_available
> Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
nwfilter
> Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
dummybr
> Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
tune_system
> Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
test_space
> Aug 09 15:12:59 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running
test_lo
> Aug 09 15:12:59 node14.***.lv systemd[1]: Started Virtual Desktop Server
Manager.
> Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available. Error:
[Errno 111] Connection refused
> Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available, KSM
stats will be missing. Error:
>
>
> [root@node14]# firewall-cmd --list-all
> public (active)
> target: default
> icmp-block-inversion: no
> interfaces: DMZ_node14 eno1 eno2 ovirtmgmt
> sources:
> services: cockpit dhcpv6-client libvirt-tls mountd nfs ovirt-imageio
ovirt-vmconsole rpc-bind snmp ssh vdsm
> ports: 2301/tcp 2381/tcp 22/tcp 6081/udp
> protocols:
> forward: no
> masquerade: no
> forward-ports:
> source-ports:
> icmp-blocks:
> rich rules:
> [root@node14 andrei]#
>
>
> vdsm-client Host getStats and vdsm-client Host getCapabilities attached.
>
>
>
>
>> On 9 Aug 2021, at 13:18, Artur Socha <asocha(a)redhat.com> wrote:
>>
>> Thanks for the logs. I am checking them at the moment. I have noticed
so far that node14 is serving NFS share which had been marked as
problematic (probably because of the downtime during the migration) but it
has recovered.
>>
>> In the meantime, is is possible to get some meaningful results when
calling:
>> $ vdsm-client Host getStats
>> and
>> $ vdsm-client Host getCapabilities
>> on node14?
>>
>> What is the state for vdsmd service when running systemctl status
vdsmd? One other thing to rule out is the networking/firewall. Here the
list of the ports to be open for the host (the documentation is for hosted
engine, but it applies for standalone setup as well):
>>
https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_eng...
>>
>> btw. I have been hunting for the rare and hard to recreate bug for
quite a long time (without success yet) so any reported connectivity issues
between the manager and hosts are super interesting to me.
>>
>> Artur
>>
>> On Mon, Aug 9, 2021 at 11:44 AM Andrei Verovski <andreil1(a)***.lv>
wrote:
>> Hi, Artur,
>>
>>
>> Thanks for assistance. Zipped engine starting from the day of upgrade
attached.
>> Restart via SSH from oVirt Web GUI works.
>> oVirt engine runs on dedicated server, not hosted engine.
>>
>>
>>
>>
>>> On 9 Aug 2021, at 11:24, Artur Socha <asocha(a)redhat.com> wrote:
>>>
>>> Hi Andrei,
>>> Could you also post a relevant piece of engine.log? I don't have high
expectations to find the answer there but I just want to be sure of it.
>>> VDSM.log does not show any trace of error from the vdsm point of view.
For example it looks like it started correctly and subscribed to receiving
commands from the engine (yet that does not mean I connected to it - only
in listening mode).
>>>
>>> Can you confirm that 'SSH restart' from UI works - by
'works' I mean
the host is actually restarted after a few minutes and there are no ssh
related (public key etc) errors in engine.log?
>>>
>>> Artur
>>>
>>> On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andreil1(a)***.lv>
wrote:
>>> Hi,
>>>
>>> I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with
CentOS 8 stream).
>>> After replacing server rack router switch and restart got this error I
can’t recover from:
>>>
>>> VDSM node14 command Get Host Capabilities failed: Message timeout
which can be caused by communication issues
>>>
>>> vdsm-network running fine, but vdsmd can’t start on node14 for
whatever reason. All other nodes running fine.
>>>
>>> Aug 09 10:24:12 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running dummybr
>>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running tune_system
>>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running test_space
>>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
Running test_lo
>>> Aug 09 10:24:13 node14.mydomain.lv systemd[1]: Started Virtual
Desktop Server Manager.
>>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]:
pam_systemd(sudo:session): Failed to create session: Start job for unit
user-0.slice failed with 'canceled'
>>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]:
pam_unix(sudo:session): session opened for user root by (uid=0)
>>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]:
pam_unix(sudo:session): session closed for user root
>>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not
available. Error: [Errno 2] No such file or directory
>>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not
available, KSM stats will be missing. Error:
>>>
>>>
>>> In web gui -> Management I can’t do anything with the host except
restart. Stop aborts with error, all other commands are gray-ed out.
>>> Status is “Unassigned”. Host is answering to pings as usual.
>>> vdsm.log (from node14) attached.
>>>
>>> Thanks in advance for any help.
>>>
>>>
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/55M65W57Z43...
>>>
>>>
>>> --
>>> Artur Socha
>>> Senior Software Engineer, RHV
>>> Red Hat
>>
>>
>>
>> --
>> Artur Socha
>> Senior Software Engineer, RHV
>> Red Hat
>
>
>
> --
> Artur Socha
> Senior Software Engineer, RHV
> Red Hat
>
>
> --
> Artur Socha
> Senior Software Engineer, RHV
> Red Hat