----- Original Message -----
From: "Simone Tiraboschi" <stirabos(a)redhat.com>
To: "Nir Soffer" <nsoffer(a)redhat.com>
Cc: devel(a)ovirt.org, "Fabian Deutsch" <fdeutsch(a)redhat.com>
Sent: Friday, May 29, 2015 6:42:08 PM
Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while
vdsmd indefinitely tries to
restart
----- Original Message -----
> From: "Nir Soffer" <nsoffer(a)redhat.com>
> To: "Simone Tiraboschi" <stirabos(a)redhat.com>
> Cc: devel(a)ovirt.org, "Fabian Deutsch" <fdeutsch(a)redhat.com>
> Sent: Friday, May 29, 2015 5:26:52 PM
> Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck
> on 100% while vdsmd indefinitely tries to
> restart
>
> ----- Original Message -----
> > From: "Simone Tiraboschi" <stirabos(a)redhat.com>
> > To: devel(a)ovirt.org
> > Cc: "Fabian Deutsch" <fdeutsch(a)redhat.com>
> > Sent: Friday, May 29, 2015 1:44:02 PM
> > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on
> > 100% while vdsmd indefinitely tries to
> > restart
> >
> > Hi,
> > I tried to have hosted-engine deploying the engine appliance over oVirt
> > node.
> > I think it will be quite a common scenario.
> > I tried with an oVirt node build from yesterday.
> >
> > Unfortunately I'm not able to conclude the setup cause oVirt node got the
> > CPU
> > load indefinitely stuck on 100% and so it's almost unresponsive.
> >
> > The issue seams to be related to vdsmd daemon witch couldn't really start
> > and
> > so it retries indefinitely using all the available CPU power (it also
> > runs
> > with niceless -20...).
> >
> > [root@node36 admin]# grep "Unit vdsmd.service entered failed state."
> > /var/log/messages | wc -l
> > 368
> > It tried 368 times in a row in a few minutes.
> >
> > With journalctl I can read:
> > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed
> > state.
> > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
> > scheduling restart.
> > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server
> > Manager...
> > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server
> > Manager...
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > configure_coredump
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > configure_vdsm_logs
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > wait_for_network
> > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > run_init_hooks
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > upgraded_version_check
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > check_is_configured
> > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > validate_configuration
> > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > prepare_transient_repository
> > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > syslog_available
> > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > nwfilter
> > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > load_needed_modules
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > tune_system
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> > test_space
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
> > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server
> > Manager.
> > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
> > code=exited, status=1/FAILURE
> > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
> > run_final_hooks
> > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed
> > state.
> > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
> > scheduling restart.
> > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server
> > Manager...
> > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server
> > Manager...
> > repeated a lot of times
> >
> > /var/log/vdsm/vdsm.log is empty.
> >
> > while
> > [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1
> > /dev/null
> > -2 /dev/null /usr/share/vdsm/vdsm; echo $?
> > 1
>
> Can you try to run vdsm manually from the shell?
>
> # /usr/share/vdsm/vdsm
>
> Typically you would see a python traceback explaining the failure.
I tried and it just fails.
Exit code is 1
Can show strace of the failure?
# strace /usr/share/vdsm/vdsm
Nir