I must admit I’m getting a bit weary of fighting oVirt problems at this point… Before I
move on to deploying any VMs onto my new infra, I’d like to get the base infra working…
I’m still experiencing a “Non Operational” problem on my “ovirt-node-02” host:
http://s1096.photobucket.com/user/willdennis/media/ovirt-node-02_problem....
I have pored thru the logs (all the engine logs, plus the syslogs from the engine VM + and
my three hypervisor/storage hosts) and I can’t pin down why the one node is having a
problem… Of course with how voluminous all these logs are, it’s kind of like looking for a
needle in a haystack, and I’m not even sure what the needle looks like, or if it’s even a
needle :-/
I have also rebooted this host in past days, this also did not fix the problem.
Note that on the screenshot I posted above, that the webadmin hosts screen says that
-node-01 has one VM running, and the others 0… You’d think that would be the HE VM running
on there, but it’s actually on -node-02:
$ ansible istgroup-ovirt -f 1 -i prod -u root -m shell -a "hosted-engine --vm-status
| grep -e '^Hostname' -e '^Engine'"
ovirt-node-01 | success | rc=0 >>
Hostname : ovirt-node-01
Engine status : {"reason": "bad vm status",
"health": "bad", "vm": "down", "detail":
"down"}
Hostname : ovirt-node-02
Engine status : {"health": "good",
"vm": "up", "detail": "up"}
Hostname : ovirt-node-03
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
ovirt-node-02 | success | rc=0 >>
Hostname : ovirt-node-01
Engine status : {"reason": "bad vm status",
"health": "bad", "vm": "down", "detail":
"down"}
Hostname : ovirt-node-02
Engine status : {"health": "good",
"vm": "up", "detail": "up"}
Hostname : ovirt-node-03
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
ovirt-node-03 | success | rc=0 >>
Hostname : ovirt-node-01
Engine status : {"reason": "bad vm status",
"health": "bad", "vm": "down", "detail":
"down"}
Hostname : ovirt-node-02
Engine status : {"health": "good",
"vm": "up", "detail": "up"}
Hostname : ovirt-node-03
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown”}
So it looks like the webadmin UI is wrong as well…
It would be awesome if the UI would give a reason for the “Non Operational” status
somehow… Or if there was a troubleshooter that could be used to analyze the problem… As it
is, being so new to all of this, I am completely at the list’s mercy to figure this out.
This software has such promise, so I’ll keep working thru these issues, but it sure hasn’t
been a smooth ride so far…
On Jan 4, 2016, at 7:54 AM, Will Dennis
<wdennis@nec-labs.com<mailto:wdennis@nec-labs.com>> wrote:
I put all of the engine logs up there now… Try
engine.log-20160103.gzhttp://i1096.photobucket.com/albums/g330/willdennis...