----- Original Message -----
From: "Douglas Schilling Landgraf"
<dougsland(a)redhat.com>
To: "Simone Tiraboschi" <stirabos(a)redhat.com>, devel(a)ovirt.org
Cc: "Fabian Deutsch" <fdeutsch(a)redhat.com>
Sent: Saturday, May 30, 2015 11:28:38 PM
Subject: Re: oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd
indefinitely tries to restart
On 05/29/2015 06:44 AM, Simone Tiraboschi wrote:
> Hi,
> I tried to have hosted-engine deploying the engine appliance over oVirt
> node. I think it will be quite a common scenario.
> I tried with an oVirt node build from yesterday.
>
> Unfortunately I'm not able to conclude the setup cause oVirt node got the
> CPU load indefinitely stuck on 100% and so it's almost unresponsive.
>
> The issue seams to be related to vdsmd daemon witch couldn't really start
> and so it retries indefinitely using all the available CPU power (it also
> runs with niceless -20...).
>
> [root@node36 admin]# grep "Unit vdsmd.service entered failed state."
> /var/log/messages | wc -l
> 368
> It tried 368 times in a row in a few minutes.
>
> With journalctl I can read:
> May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state.
> May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
> scheduling restart.
> May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server
> Manager...
> May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server
> Manager...
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> configure_coredump
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> configure_vdsm_logs
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> wait_for_network
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> run_init_hooks
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> upgraded_version_check
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> check_is_configured
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> validate_configuration
> May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> prepare_transient_repository
> May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> syslog_available
> May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter
> May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> load_needed_modules
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> tune_system
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> test_space
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
> May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager.
> May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
> code=exited, status=1/FAILURE
> May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
> run_final_hooks
> May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state.
> May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
> scheduling restart.
> May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server
> Manager...
> May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server
> Manager...
> repeated a lot of times
>
> /var/log/vdsm/vdsm.log is empty.
>
> while
> [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1
> /dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $?
> 1
>
Thanks for the report Simone. From my tests you are facing:
non-root user cannot `from ovirtnode import ovirtfunctions`: permission
denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log
https://bugzilla.redhat.com/show_bug.cgi?id=1224400
We should handle this bug very soon. The workaround is chmod o+rw in
/var/log/ovirt.log /var/log/ovirt-node.log
OK. I tried
[root@node36 admin]# chmod o+rw /var/log/ovirt.log /var/log/ovirt-node.log
but now I'm getting:
[root@node36 admin]# systemctl status -l vdsmd
vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
Active: active (running) since Mon 2015-06-01 07:53:09 UTC; 17s ago
Process: 4040 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop
(code=exited, status=0/SUCCESS)
Process: 4049 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
(code=exited, status=0/SUCCESS)
Main PID: 4164 (vdsm)
CGroup: /system.slice/vdsmd.service
└─4164 /usr/bin/python /usr/share/vdsm/vdsm
Jun 01 07:53:07 node36 vdsmd_init_common.sh[4049]: vdsm: Running nwfilter
Jun 01 07:53:08 node36 vdsmd_init_common.sh[4049]: vdsm: Running dummybr
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running load_needed_modules
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running tune_system
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_space
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_lo
Jun 01 07:53:09 node36 systemd[1]: Started Virtual Desktop Server Manager.
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR failed to init clientIF, shutting down
storage dispatcher
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR Exception raised
Traceback (most recent call last):
File "/usr/share/vdsm/vdsm", line 154, in
run
serve_clients(log)
File "/usr/share/vdsm/vdsm", line 93, in
serve_clients
cif = clientIF.getInstance(irs, log)
File "/usr/share/vdsm/clientIF.py", line
166, in getInstance
File "/usr/share/vdsm/clientIF.py", line
112, in __init__
File "/usr/share/vdsm/clientIF.py", line
170, in _createAcceptor
File "/usr/share/vdsm/clientIF.py", line
183, in _createSSLContext
File
"/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 149, in __init__
File
"/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 174, in _initContext
File
"/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 153, in
_loadCertChain
File
"/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Context.py", line 100, in
load_cert_chain
SSLError: No such file or directory
Jun 01 07:53:20 node36 vdsm[4164]: vdsm vds ERROR Vm's recovery failed
Traceback (most recent call last):
File "/usr/share/vdsm/clientIF.py", line
416, in _recoverExistingVms
File "/usr/share/vdsm/caps.py", line 177,
in __init__
File "/usr/share/vdsm/caps.py", line 209,
in _getCpuTopology
File "/usr/share/vdsm/caps.py", line 199,
in _getFreshCapsXMLStr
File
"/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 162, in get
File
"/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 99, in
open_connection
File
"/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1008, in retry
File
"/usr/lib64/python2.7/site-packages/libvirt.py", line 105, in openAuth
libvirtError: authentication failed: polkit:
polkit\56retains_authorization_after_challenge=1
Authorization requires authentication but no agent is
available.
Was it just a partial workaround or am I facing a different issue?
thanks,
Simone
--
Cheers
Douglas