[ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart

Douglas Schilling Landgraf dougsland at redhat.com
Sat May 30 21:28:38 UTC 2015


On 05/29/2015 06:44 AM, Simone Tiraboschi wrote:
> Hi,
> I tried to have hosted-engine deploying the engine appliance over oVirt node. I think it will be quite a common scenario.
> I tried with an oVirt node build from yesterday.
>
> Unfortunately I'm not able to conclude the setup cause oVirt node got the CPU load indefinitely stuck on 100% and so it's almost unresponsive.
>
> The issue seams to be related to vdsmd daemon witch couldn't really start and so it retries indefinitely using all the available CPU power (it also runs with niceless -20...).
>
> [root at node36 admin]# grep "Unit vdsmd.service entered failed state." /var/log/messages  | wc -l
> 368
> It tried 368 times in a row in a few minutes.
>
> With journalctl I can read:
> May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state.
> May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, scheduling restart.
> May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server Manager...
> May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server Manager...
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running configure_coredump
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running configure_vdsm_logs
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running wait_for_network
> May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running run_init_hooks
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running upgraded_version_check
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running check_is_configured
> May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running validate_configuration
> May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running prepare_transient_repository
> May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running syslog_available
> May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter
> May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running load_needed_modules
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running tune_system
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_space
> May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
> May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager.
> May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, code=exited, status=1/FAILURE
> May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running run_final_hooks
> May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state.
> May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, scheduling restart.
> May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server Manager...
> May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server Manager...
> repeated a lot of times
>
> /var/log/vdsm/vdsm.log is empty.
>
> while
> [root at node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $?
> 1
>

Thanks for the report Simone. From my tests you are facing:

non-root user cannot `from ovirtnode import ovirtfunctions`: permission 
denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log
https://bugzilla.redhat.com/show_bug.cgi?id=1224400

We should handle this bug very soon. The workaround is chmod o+rw in 
/var/log/ovirt.log /var/log/ovirt-node.log

-- 
Cheers
Douglas



More information about the Devel mailing list