[ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart

Simone Tiraboschi stirabos at redhat.com
Mon Jun 1 07:56:01 UTC 2015



----- Original Message -----
> From: "Douglas Schilling Landgraf" <dougsland at redhat.com>
> To: "Simone Tiraboschi" <stirabos at redhat.com>, devel at ovirt.org
> Cc: "Fabian Deutsch" <fdeutsch at redhat.com>
> Sent: Saturday, May 30, 2015 11:28:38 PM
> Subject: Re: oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
> 
> On 05/29/2015 06:44 AM, Simone Tiraboschi wrote:
> > Hi,
> > I tried to have hosted-engine deploying the engine appliance over oVirt
> > node. I think it will be quite a common scenario.
> > I tried with an oVirt node build from yesterday.
> >
> > Unfortunately I'm not able to conclude the setup cause oVirt node got the
> > CPU load indefinitely stuck on 100% and so it's almost unresponsive.
> >
> > The issue seams to be related to vdsmd daemon witch couldn't really start
> > and so it retries indefinitely using all the available CPU power (it also
> > runs with niceless -20...).
> >
> > [root at node36 admin]# grep "Unit vdsmd.service entered failed state."
> > /var/log/messages  | wc -l
> > 368
> > It tried 368 times in a row in a few minutes.
> >
> > With journalctl I can read:
> > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state.
> > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
> > scheduling restart.
> > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server
> > Manager...
> > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server
> > Manager...
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > configure_coredump
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > configure_vdsm_logs
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > wait_for_network
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > run_init_hooks
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > upgraded_version_check
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > check_is_configured
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > validate_configuration
> > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > prepare_transient_repository
> > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > syslog_available
> > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter
> > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > load_needed_modules
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > tune_system
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > test_space
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
> > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager.
> > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
> > code=exited, status=1/FAILURE
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
> > run_final_hooks
> > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state.
> > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
> > scheduling restart.
> > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server
> > Manager...
> > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server
> > Manager...
> > repeated a lot of times
> >
> > /var/log/vdsm/vdsm.log is empty.
> >
> > while
> > [root at node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1
> > /dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $?
> > 1
> >
> 
> Thanks for the report Simone. From my tests you are facing:
> 
> non-root user cannot `from ovirtnode import ovirtfunctions`: permission
> denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log
> https://bugzilla.redhat.com/show_bug.cgi?id=1224400
> 
> We should handle this bug very soon. The workaround is chmod o+rw in
> /var/log/ovirt.log /var/log/ovirt-node.log

OK. I tried 
[root at node36 admin]# chmod o+rw /var/log/ovirt.log /var/log/ovirt-node.log

but now I'm getting:
[root at node36 admin]# systemctl status -l vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Mon 2015-06-01 07:53:09 UTC; 17s ago
  Process: 4040 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 4049 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 4164 (vdsm)
   CGroup: /system.slice/vdsmd.service
           └─4164 /usr/bin/python /usr/share/vdsm/vdsm

Jun 01 07:53:07 node36 vdsmd_init_common.sh[4049]: vdsm: Running nwfilter
Jun 01 07:53:08 node36 vdsmd_init_common.sh[4049]: vdsm: Running dummybr
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running load_needed_modules
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running tune_system
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_space
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_lo
Jun 01 07:53:09 node36 systemd[1]: Started Virtual Desktop Server Manager.
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR failed to init clientIF, shutting down storage dispatcher
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR Exception raised
                                   Traceback (most recent call last):
                                     File "/usr/share/vdsm/vdsm", line 154, in run
                                       serve_clients(log)
                                     File "/usr/share/vdsm/vdsm", line 93, in serve_clients
                                       cif = clientIF.getInstance(irs, log)
                                     File "/usr/share/vdsm/clientIF.py", line 166, in getInstance
                                     File "/usr/share/vdsm/clientIF.py", line 112, in __init__
                                     File "/usr/share/vdsm/clientIF.py", line 170, in _createAcceptor
                                     File "/usr/share/vdsm/clientIF.py", line 183, in _createSSLContext
                                     File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 149, in __init__
                                     File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 174, in _initContext
                                     File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 153, in _loadCertChain
                                     File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Context.py", line 100, in load_cert_chain
                                   SSLError: No such file or directory
Jun 01 07:53:20 node36 vdsm[4164]: vdsm vds ERROR Vm's recovery failed
                                   Traceback (most recent call last):
                                     File "/usr/share/vdsm/clientIF.py", line 416, in _recoverExistingVms
                                     File "/usr/share/vdsm/caps.py", line 177, in __init__
                                     File "/usr/share/vdsm/caps.py", line 209, in _getCpuTopology
                                     File "/usr/share/vdsm/caps.py", line 199, in _getFreshCapsXMLStr
                                     File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 162, in get
                                     File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 99, in open_connection
                                     File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1008, in retry
                                     File "/usr/lib64/python2.7/site-packages/libvirt.py", line 105, in openAuth
                                   libvirtError: authentication failed: polkit: polkit\56retains_authorization_after_challenge=1
                                   Authorization requires authentication but no agent is available.

Was it just a partial workaround or am I facing a different issue?

thanks,
Simone


> --
> Cheers
> Douglas
>



More information about the Devel mailing list