That was the issue.  Found out yesterday that vdsm.log was somehow changed to root:root.  Just now got a chance to put it back on the mailing list.  How does the ownership of that file get cahnged.  When the issue occurred I am certain there was no one on the system.


On Thu, Apr 11, 2013 at 2:15 PM, Joop <jvdwege@xs4all.nl> wrote:
Dan Kenigsberg wrote:
On Wed, Apr 10, 2013 at 08:59:01AM -0500, Tony Feldmann wrote:
 
I am having a strange issue in my ovirt cluster.  I have 2 hosts, 1 running
engine and added as a host and one other system added as a host.  Both
systems are running gluster across local disks for shared storage.
Everything was working fine until last night, where my system that is also
running the engine when unresponsive in the admin page.  All vms were still
running that were on the host.  I shut down the vms that were on the host
from within the guest os as I was not able to do anything to the vm with
the host in unresponsive state.  After getting the vms off and rebooting
the host, the vdsmd service says that it is running, but it continually
restarts the vdsm process and dumps out these messages: detected unhandled
Python exception in '/usr/share/vdsm/vdsm'.  All services say they are up
and running but the host stays in unresponsive state and the vdsm process
keeps respawning.  There is also no data in the vdsm.log.  Can anyone shed
any light on this for me?
   

vdsm-devel@fedorahosted.org may be a better place to ask vdsm-specific
questions.

Could you log into the non-operational host as root, and stop the vdsm
service.

Then become the vdsm user with

    su -s /bin/bash - vdsm

and run /usr/share/vdsm/vdsm manually. Do you see anything in
particular?

 
Please have a look at the permissions/owner of /var/log/vdsm/vdsm.log. Should be vdsm:kvm and not root:root

Joop