[ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart

Simone Tiraboschi stirabos at redhat.com
Fri May 29 16:36:40 UTC 2015



----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "Simone Tiraboschi" <stirabos at redhat.com>
> Cc: devel at ovirt.org, "Fabian Deutsch" <fdeutsch at redhat.com>
> Sent: Friday, May 29, 2015 5:45:48 PM
> Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to
> restart
> 
> 
> 
> ----- Original Message -----
> > From: "Simone Tiraboschi" <stirabos at redhat.com>
> > To: "Nir Soffer" <nsoffer at redhat.com>
> > Cc: devel at ovirt.org, "Fabian Deutsch" <fdeutsch at redhat.com>
> > Sent: Friday, May 29, 2015 6:42:08 PM
> > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck
> > on 100% while vdsmd indefinitely tries to
> > restart
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Nir Soffer" <nsoffer at redhat.com>
> > > To: "Simone Tiraboschi" <stirabos at redhat.com>
> > > Cc: devel at ovirt.org, "Fabian Deutsch" <fdeutsch at redhat.com>
> > > Sent: Friday, May 29, 2015 5:26:52 PM
> > > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck
> > > on 100% while vdsmd indefinitely tries to
> > > restart
> > > 
> > > ----- Original Message -----
> > > > From: "Simone Tiraboschi" <stirabos at redhat.com>
> > > > To: devel at ovirt.org
> > > > Cc: "Fabian Deutsch" <fdeutsch at redhat.com>
> > > > Sent: Friday, May 29, 2015 1:44:02 PM
> > > > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck
> > > > on
> > > > 100% while vdsmd indefinitely tries to
> > > > restart
> > > > 
> > > > Hi,
> > > > I tried to have hosted-engine deploying the engine appliance over oVirt
> > > > node.
> > > > I think it will be quite a common scenario.
> > > > I tried with an oVirt node build from yesterday.
> > > > 
> > > > Unfortunately I'm not able to conclude the setup cause oVirt node got
> > > > the
> > > > CPU
> > > > load indefinitely stuck on 100% and so it's almost unresponsive.
> > > > 
> > > > The issue seams to be related to vdsmd daemon witch couldn't really
> > > > start
> > > > and
> > > > so it retries indefinitely using all the available CPU power (it also
> > > > runs
> > > > with niceless -20...).
> > > > 
> > > > [root at node36 admin]# grep "Unit vdsmd.service entered failed state."
> > > > /var/log/messages  | wc -l
> > > > 368
> > > > It tried 368 times in a row in a few minutes.
> > > > 
> > > > With journalctl I can read:
> > > > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed
> > > > state.
> > > > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
> > > > scheduling restart.
> > > > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server
> > > > Manager...
> > > > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server
> > > > Manager...
> > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > mkdirs
> > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > configure_coredump
> > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > configure_vdsm_logs
> > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > wait_for_network
> > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > run_init_hooks
> > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > upgraded_version_check
> > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > check_is_configured
> > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > validate_configuration
> > > > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > prepare_transient_repository
> > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > syslog_available
> > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > nwfilter
> > > > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > dummybr
> > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > load_needed_modules
> > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > tune_system
> > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > test_space
> > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > > > test_lo
> > > > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server
> > > > Manager.
> > > > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
> > > > code=exited, status=1/FAILURE
> > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
> > > > run_final_hooks
> > > > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed
> > > > state.
> > > > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
> > > > scheduling restart.
> > > > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server
> > > > Manager...
> > > > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server
> > > > Manager...
> > > > repeated a lot of times
> > > > 
> > > > /var/log/vdsm/vdsm.log is empty.
> > > > 
> > > > while
> > > > [root at node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1
> > > > /dev/null
> > > > -2 /dev/null /usr/share/vdsm/vdsm; echo $?
> > > > 1
> > > 
> > > Can you try to run vdsm manually from the shell?
> > > 
> > > # /usr/share/vdsm/vdsm
> > > 
> > > Typically you would see a python traceback explaining the failure.
> > 
> > I tried and it just fails.
> > Exit code is 1
> 
> Can show strace of the failure?
> 
> # strace /usr/share/vdsm/vdsm


It's getting a lot of 
stat("/usr/share/vdsm/virt/caps", 0x7fff12cba270) = -1 ENOENT (No such file or directory)
open("/usr/share/vdsm/virt/caps.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/vdsm/virt/capsmodule.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/vdsm/virt/caps.py", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/vdsm/virt/caps.pyc", O_RDONLY) = -1 ENOENT (No such file or directory)

on almost all the modules.

I'm attaching the full strace.

thanks,
Simone
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm_strace.gz
Type: application/x-gzip
Size: 105957 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20150529/8d87d526/attachment-0001.gz>


More information about the Devel mailing list