----- Original Message -----
From: "Simone Tiraboschi" <stirabos(a)redhat.com>
To: devel(a)ovirt.org
Cc: "Fabian Deutsch" <fdeutsch(a)redhat.com>
Sent: Friday, May 29, 2015 1:44:02 PM
Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd
indefinitely tries to
restart
Hi,
I tried to have hosted-engine deploying the engine appliance over oVirt node.
I think it will be quite a common scenario.
I tried with an oVirt node build from yesterday.
Unfortunately I'm not able to conclude the setup cause oVirt node got the CPU
load indefinitely stuck on 100% and so it's almost unresponsive.
The issue seams to be related to vdsmd daemon witch couldn't really start and
so it retries indefinitely using all the available CPU power (it also runs
with niceless -20...).
[root@node36 admin]# grep "Unit vdsmd.service entered failed state."
/var/log/messages | wc -l
368
It tried 368 times in a row in a few minutes.
With journalctl I can read:
May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state.
May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
scheduling restart.
May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server Manager...
May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server Manager...
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
configure_coredump
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
configure_vdsm_logs
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
wait_for_network
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
run_init_hooks
May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
upgraded_version_check
May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
check_is_configured
May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
validate_configuration
May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
prepare_transient_repository
May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
syslog_available
May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter
May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
load_needed_modules
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running tune_system
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_space
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager.
May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
code=exited, status=1/FAILURE
May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
run_final_hooks
May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state.
May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
scheduling restart.
May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server Manager...
May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server Manager...
repeated a lot of times
/var/log/vdsm/vdsm.log is empty.
while
[root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null
-2 /dev/null /usr/share/vdsm/vdsm; echo $?
1
Can you try to run vdsm manually from the shell?
# /usr/share/vdsm/vdsm
Typically you would see a python traceback explaining the failure.
Nir