[ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
Simone Tiraboschi
stirabos at redhat.com
Mon Jun 1 07:56:01 UTC 2015
----- Original Message -----
> From: "Douglas Schilling Landgraf" <dougsland at redhat.com>
> To: "Simone Tiraboschi" <stirabos at redhat.com>, devel at ovirt.org
> Cc: "Fabian Deutsch" <fdeutsch at redhat.com>
> Sent: Saturday, May 30, 2015 11:28:38 PM
> Subject: Re: oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
>
> On 05/29/2015 06:44 AM, Simone Tiraboschi wrote:
> > Hi,
> > I tried to have hosted-engine deploying the engine appliance over oVirt
> > node. I think it will be quite a common scenario.
> > I tried with an oVirt node build from yesterday.
> >
> > Unfortunately I'm not able to conclude the setup cause oVirt node got the
> > CPU load indefinitely stuck on 100% and so it's almost unresponsive.
> >
> > The issue seams to be related to vdsmd daemon witch couldn't really start
> > and so it retries indefinitely using all the available CPU power (it also
> > runs with niceless -20...).
> >
> > [root at node36 admin]# grep "Unit vdsmd.service entered failed state."
> > /var/log/messages | wc -l
> > 368
> > It tried 368 times in a row in a few minutes.
> >
> > With journalctl I can read:
> > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state.
> > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
> > scheduling restart.
> > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server
> > Manager...
> > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server
> > Manager...
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > configure_coredump
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > configure_vdsm_logs
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > wait_for_network
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > run_init_hooks
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > upgraded_version_check
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > check_is_configured
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > validate_configuration
> > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > prepare_transient_repository
> > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > syslog_available
> > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter
> > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > load_needed_modules
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > tune_system
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > test_space
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
> > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager.
> > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
> > code=exited, status=1/FAILURE
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
> > run_final_hooks
> > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state.
> > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
> > scheduling restart.
> > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server
> > Manager...
> > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server
> > Manager...
> > repeated a lot of times
> >
> > /var/log/vdsm/vdsm.log is empty.
> >
> > while
> > [root at node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1
> > /dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $?
> > 1
> >
>
> Thanks for the report Simone. From my tests you are facing:
>
> non-root user cannot `from ovirtnode import ovirtfunctions`: permission
> denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log
> https://bugzilla.redhat.com/show_bug.cgi?id=1224400
>
> We should handle this bug very soon. The workaround is chmod o+rw in
> /var/log/ovirt.log /var/log/ovirt-node.log
OK. I tried
[root at node36 admin]# chmod o+rw /var/log/ovirt.log /var/log/ovirt-node.log
but now I'm getting:
[root at node36 admin]# systemctl status -l vdsmd
vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
Active: active (running) since Mon 2015-06-01 07:53:09 UTC; 17s ago
Process: 4040 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
Process: 4049 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
Main PID: 4164 (vdsm)
CGroup: /system.slice/vdsmd.service
└─4164 /usr/bin/python /usr/share/vdsm/vdsm
Jun 01 07:53:07 node36 vdsmd_init_common.sh[4049]: vdsm: Running nwfilter
Jun 01 07:53:08 node36 vdsmd_init_common.sh[4049]: vdsm: Running dummybr
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running load_needed_modules
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running tune_system
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_space
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_lo
Jun 01 07:53:09 node36 systemd[1]: Started Virtual Desktop Server Manager.
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR failed to init clientIF, shutting down storage dispatcher
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR Exception raised
Traceback (most recent call last):
File "/usr/share/vdsm/vdsm", line 154, in run
serve_clients(log)
File "/usr/share/vdsm/vdsm", line 93, in serve_clients
cif = clientIF.getInstance(irs, log)
File "/usr/share/vdsm/clientIF.py", line 166, in getInstance
File "/usr/share/vdsm/clientIF.py", line 112, in __init__
File "/usr/share/vdsm/clientIF.py", line 170, in _createAcceptor
File "/usr/share/vdsm/clientIF.py", line 183, in _createSSLContext
File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 149, in __init__
File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 174, in _initContext
File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 153, in _loadCertChain
File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Context.py", line 100, in load_cert_chain
SSLError: No such file or directory
Jun 01 07:53:20 node36 vdsm[4164]: vdsm vds ERROR Vm's recovery failed
Traceback (most recent call last):
File "/usr/share/vdsm/clientIF.py", line 416, in _recoverExistingVms
File "/usr/share/vdsm/caps.py", line 177, in __init__
File "/usr/share/vdsm/caps.py", line 209, in _getCpuTopology
File "/usr/share/vdsm/caps.py", line 199, in _getFreshCapsXMLStr
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 162, in get
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 99, in open_connection
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1008, in retry
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 105, in openAuth
libvirtError: authentication failed: polkit: polkit\56retains_authorization_after_challenge=1
Authorization requires authentication but no agent is available.
Was it just a partial workaround or am I facing a different issue?
thanks,
Simone
> --
> Cheers
> Douglas
>
More information about the Devel
mailing list