[ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
Nir Soffer
nsoffer at redhat.com
Fri May 29 11:26:52 EDT 2015
----- Original Message -----
> From: "Simone Tiraboschi" <stirabos at redhat.com>
> To: devel at ovirt.org
> Cc: "Fabian Deutsch" <fdeutsch at redhat.com>
> Sent: Friday, May 29, 2015 1:44:02 PM
> Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to
> restart
>
> Hi,
> I tried to have hosted-engine deploying the engine appliance over oVirt node.
> I think it will be quite a common scenario.
> I tried with an oVirt node build from yesterday.
>
> Unfortunately I'm not able to conclude the setup cause oVirt node got the CPU
> load indefinitely stuck on 100% and so it's almost unresponsive.
>
> The issue seams to be related to vdsmd daemon witch couldn't really start and
> so it retries indefinitely using all the available CPU power (it also runs
> with niceless -20...).
>
> [root at node36 admin]# grep "Unit vdsmd.service entered failed state."
> /var/log/messages | wc -l
> 368
> It tried 368 times in a row in a few minutes.
>
> With journalctl I can read:
> May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state.
> May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
> scheduling restart.
> May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server Manager...
> May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server Manager...
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> configure_coredump
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> configure_vdsm_logs
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> wait_for_network
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> run_init_hooks
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> upgraded_version_check
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> check_is_configured
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> validate_configuration
> May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> prepare_transient_repository
> May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> syslog_available
> May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter
> May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
> load_needed_modules
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running tune_system
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_space
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
> May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager.
> May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
> code=exited, status=1/FAILURE
> May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
> run_final_hooks
> May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state.
> May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
> scheduling restart.
> May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server Manager...
> May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server Manager...
> repeated a lot of times
>
> /var/log/vdsm/vdsm.log is empty.
>
> while
> [root at node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null
> -2 /dev/null /usr/share/vdsm/vdsm; echo $?
> 1
Can you try to run vdsm manually from the shell?
# /usr/share/vdsm/vdsm
Typically you would see a python traceback explaining the failure.
Nir
More information about the Devel
mailing list